𝟯𝟮𝗕 𝗟𝗟𝗠 𝗼𝗻 𝗮 𝟮𝟬𝟬𝟴 𝗫𝗲𝗼𝗻: 𝗥𝗮𝗺 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 𝗠𝗼𝗿𝗲 𝗧𝗵𝗮𝗻 𝗩𝗥𝗔𝗠

📅3 hours ago⏱1 min read

I tried to run a 20 GB model on my work laptop. The laptop has an RTX 4070 and 16 GB of RAM. It failed. The system froze completely.

I decided to test an old 2008 server instead. The server has two Intel Xeon E5440 CPUs and 64 GB of RAM. It has no GPU.

The goal was simple. Can old hardware with enough memory run a large model that my laptop cannot?

Here is how the hardware compares:

Laptop:

CPU: Modern notebook
RAM: 16 GB
GPU: 8 GB VRAM
Result: System freeze

Server:

CPU: 2x Xeon E5440
RAM: 64 GB
GPU: None
Result: It runs

The server is slow. It generates about 0.01 tokens per second. I started the test at midnight and checked it in the morning.

The model tried to write code in Forth. It produced two different versions after several hours. Both versions failed to run.

I learned two things from this:

RAM volume matters. 64 GB of system RAM allows you to run models that 24 GB of combined VRAM and RAM cannot. However, 0.01 tokens per second is not practical for work.
Large models are not magic. A large model cannot program in a niche language like Forth if it was not trained on it. To get working code, you need a better process. You need algorithms, deterministic transpilers, and better tools.

Do not buy expensive hardware to test an idea. Run your experiments on what you have first. Slow inference is still inference. It gave me the answer I needed without a massive bill.

Source: https://dev.to/ua3mqj/32b-llm-on-a-2008-xeon-when-ram-matters-more-than-vram-28e2

Optional learning community: https://t.me/GyaanSetuAi

𝟯𝟮𝗕 𝗟𝗟𝗠 𝗼𝗻 𝗮 𝟮𝟬𝟬𝟴 𝗫𝗲𝗼𝗻: 𝗥𝗮𝗺 𝗠𝗮𝘁𝘁𝗲𝗿𝘀 𝗠𝗼𝗿𝗲 𝗧𝗵𝗮𝗻 𝗩𝗥𝗔𝗠

Continue reading

𝗟𝗹𝗮𝗺𝗮.𝗰𝗽𝗽 𝗡𝗼𝘄 𝗠𝗮𝘁𝗰𝗵𝗲𝘀 𝘃𝗟𝗟𝗠 𝗦𝗽𝗲𝗲𝗱

𝗟𝗼𝗰𝗮𝗹 𝗟𝗟𝗠𝘀 𝗶𝗻 𝟮𝟬𝟮𝟲 𝗯𝘂𝘁 𝗗𝗲𝘃 𝗘𝘅𝗽𝗲𝗿𝗶𝗲𝗻𝗰𝗲 𝗶𝗻 𝟮𝟬𝟭𝟬

Vers un service efficace des LLM

La RAM est le nouveau GPU

Exécuter deux modèles sur un seul GPU : les mathématiques derrière les LLM locaux