I Ran an LLM Locally on my ASUS ROG Ally

AI-assisted draft.

I ran a local AI model on my ASUS ROG Ally for a few weeks. I thought it would be a fun project. Instead, it became a lesson in hardware limits.

I did not use it as a cloud replacement. I used it as a specialized tool for small tasks. Here is what I learned about running AI on handheld hardware.

The Memory Barrier

Handhelds use Unified Memory Architecture. This means the CPU and GPU share the same RAM. By default, the GPU gets a tiny slice of memory.

If your model does not fit in that slice, the system uses the CPU. This makes generation painfully slow.

The Fix:

Go into your BIOS.
Manually increase the UMA frame buffer.
I pushed mine to 4 GB. This change helped more than any other tweak.

What Doesn't Work

I tried using zRAM to squeeze more out of my memory. It failed. Most AI models use GGUF files which are already compressed. You cannot compress them further to gain space.

I also tried using disk swap to help. Swap does not make things faster. It makes them unusable. If your model relies on disk swap, you will see only one word every few seconds.

The only reason to keep swap enabled is to prevent the system from killing your process when you run out of RAM.

Tips for Smooth Runs

If your AI output feels choppy or jumpy, check your Linux kernel settings.

Lower your vm.swappiness value.
This stops the system from moving memory to swap too early.
It makes the generation feel steady instead of stuttering.

Model Choice is about Use-Case

Most people look for the fastest model. I chose a slower, sharper model instead.

If you chat in real time, you need speed.
If you run a background agent, you need quality.

I use my setup for background tasks. I send a request and check the result later. Because I am not watching the screen, I do not care if a response takes 40 seconds instead of 8. I want the best answer, not the fastest one.

Avoid reasoning models on handhelds. The step-by-step thinking process takes too much time on weak hardware. The quality gain is often not worth the wait.

What this is Good For

A 16 GB device is great for:

Drafting short emails.
Reviewing small code snippets.
Rough daily planning.
Private tasks that should not leave your network.

It is bad for:

Long documents.
Deep research.
Complex coding projects.

Local AI is a tool, not a miracle. It is perfect for routine, lightweight work.

Source: https://dev.to/frankydzoro/i-ran-an-llm-locally-on-my-asus-rog-ally-and-heres-what-i-actually-learned-3o6j

Optional learning community: https://t.me/GyaanSetuAi

I Ran an LLM Locally on my ASUS ROG Ally

Continue reading

The Right Way To Build AN AI Architecture

How I Cut Our AI API Bill in Half While Hitting p99 SLAs

Giving AgentGateway a Semantic Brain

Your AI feels slow? Maybe it's not dumb.

Local AI: How to Run Open Source Models Locally