𝟯𝟮𝗕 𝗟𝗟𝗠 𝗼𝗻 𝗮 𝟮𝟬𝟬𝟴 𝗫𝗲𝗼𝗻: 𝗥𝗮𝗺 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 𝗠𝗼𝗿𝗲 𝗧𝗵𝗮𝗻 𝗩𝗥𝗔𝗠
I tried to run a 20 GB model on my work laptop. The laptop has an RTX 4070 and 16 GB of RAM. It failed. The system froze completely.
I decided to test an old 2008 server instead. The server has two Intel Xeon E5440 CPUs and 64 GB of RAM. It has no GPU.
The goal was simple. Can old hardware with enough memory run a large model that my laptop cannot?
Here is how the hardware compares:
Laptop:
- CPU: Modern notebook
- RAM: 16 GB
- GPU: 8 GB VRAM
- Result: System freeze
Server:
- CPU: 2x Xeon E5440
- RAM: 64 GB
- GPU: None
- Result: It runs
The server is slow. It generates about 0.01 tokens per second. I started the test at midnight and checked it in the morning.
The model tried to write code in Forth. It produced two different versions after several hours. Both versions failed to run.
I learned two things from this:
RAM volume matters. 64 GB of system RAM allows you to run models that 24 GB of combined VRAM and RAM cannot. However, 0.01 tokens per second is not practical for work.
Large models are not magic. A large model cannot program in a niche language like Forth if it was not trained on it. To get working code, you need a better process. You need algorithms, deterministic transpilers, and better tools.
Do not buy expensive hardware to test an idea. Run your experiments on what you have first. Slow inference is still inference. It gave me the answer I needed without a massive bill.
Source: https://dev.to/ua3mqj/32b-llm-on-a-2008-xeon-when-ram-matters-more-than-vram-28e2
Optional learning community: https://t.me/GyaanSetuAi