Has anyone done any local LLM inference on the rk3588, what's the performance like?

DanMan · January 28, 2025, 5:21pm

I’ve been seeing what room this has in my workflow, I will probably keep using it on my desktop, maybe through ssh, but I just wanted to know what the inference performance was like. It takes a few seconds to render responses even on my quite overpowered gaming PC.

mountain · January 28, 2025, 6:37pm

I haven’t personally tried it but apparently it is possible. Libre drivers for the NPU were submitted to mainline last year although I don’t know the current status. As with everything LLM, much of the ecosystem is in flux and there’s no telling what currently works without dredging through forums and probably Discord chats.

Resources that may help:

RockchipNPU Reddit community
Github list of resources
FriendlyELEC RK3588 NPU wiki page
Github project to help with RKNPU
Reddit thread on current status (apparently it is possible, 4096 context limit could be lifted with driver changes, GitHub issue has discussion)

Edit: I see now that you were asking more about performance. The important thing to note is that you want to leverage the NPU to get maximum performance, which involves converting models to a special format. Performance appears to be adequate within certain context limits as described above. It should be better than many average machines, but worse than either a desktop with a beefy GPU or a M* Mac with their unique architecture.

DanMan · January 29, 2025, 4:28am

Thanks a lot; this is a lot of useful info. It seems that the SDK to compile for the NPU is not open source, but still potentially usable.

mountain · February 9, 2025, 5:12am

Update for anyone wanting to do this, Radxa has released a guide for running a Deepseek R1 1.5B distilled model on RK3588, reported performance is 15 tokens/sec: DeepSeek shown to run on Rockchip RK3588 with AI acceleration at about 15 tokens/s - CNX Software

DanMan · February 11, 2025, 6:41pm

I’m still unable to get a suitably licensed version of RKLLM2 which is used to compile them to the correct format for the npu.

But v cool nonetheless!

van_quixote · May 13, 2025, 2:10pm

Did you try? Very curious about doing some light inference in the pocket…