𝗖𝘂𝘀𝘁𝗼𝗺 𝗩𝘂𝗹𝗸𝗮𝗻 𝗞𝗲𝗿𝗻𝗲𝗹𝘀 𝗳𝗼𝗿 𝗔𝗻𝗱𝗿𝗼𝗶𝗱 𝗟𝗟𝗠𝘀

Stop using NNAPI and TFLite for LLMs on Android. These frameworks add too much overhead. You double your token speed with custom Vulkan kernels.

Here is the data from Snapdragon 8 Gen 4:

Follow these steps for better performance:

Profile your dispatch overhead first. The bottleneck is often the dispatch, not the math.

Source: https://dev.to/software_mvp-factory/custom-vulkan-compute-kernels-for-on-device-llm-inference-on-android-566f