Rock, Paper, Silicon: How I Ran A 235B AI Model on a MacBook
Most people say you cannot run frontier AI models on consumer hardware.
To run a model like Qwen3-235B, you need 470 GB of RAM. A high-end Mac Studio only has 192 GB. The industry tells you to rent a cloud GPU instead.
I am a web developer, not a systems engineer. I do not work with GPU kernels or low-level memory. But I had a question: What if you only loaded the parts of the model that actually fire?
In a Mixture of Experts (MoE) model, most parameters stay silent most of the time. I decided to build a system that loads weights just before they are needed.
I used an AI agent to help me write the C++ code. I brought the curiosity, and the agent brought the implementation depth.
My inspiration came from a satellite paper by Filippo Biondi. He used radar to see inside the Great Pyramid of Giza. Radar cannot penetrate rock, but it can measure the vibrations the rock makes when hit. He measured those vibrations to map the interior.
I applied this logic to AI memory.
I call this S-MoE (Seismic Mixture of Experts). It works using three streams:
• The Scout: A lightweight part of the model that runs in RAM. It predicts which experts will activate next. • The Streamer: An I/O thread that loads those specific expert blocks from your SSD into memory. • The GPU: Executes the math using the weights that just arrived.
This system uses Direct I/O to bypass the OS cache. It uses no runtime heap allocations. It avoids all OS mutexes.
The result? A 16 GB Mac and a 512 GB Mac will produce the exact same intelligence from a 235B model. One is just faster than the other.
The memory wall around AI is a software assumption, not a law of nature. You can run frontier models on the hardware you already own.
S-MoE is open source.
Optional learning community: https://t.me/GyaanSetuAi
