𝗜 𝗙𝗶𝗻𝗲 𝗧𝘂𝗻𝗲𝗱 𝗮 𝟮𝟳𝟬𝗠 𝗠𝗼𝗱𝗲𝗹 𝗼𝗻 𝗠𝘆 𝗟𝗮𝗽𝘁𝗼𝗽

Translated for your language. Lire l'original.

AI-assisted draft.

𝗜 𝗙𝗶𝗻𝗲-𝗧𝘂𝗻𝗲𝗱 𝗮 𝟮𝟳𝟬𝗠 𝗠𝗼𝗱𝗲𝗹 𝗼𝗻 𝗠𝘆 𝗟𝗮𝗽𝘁𝗼𝗽

I am testing three ways to fine-tune models. I use the same task for all three. I scale from the smallest model to the largest.

The series follows this path:

Full Fine-Tuning (270M parameters)
LoRA (1.5B parameters)
QLoRA (7B parameters)

I want to understand the mechanics. I do not want to follow a tutorial blindly.

In this first step, I used full fine-tuning. This method updates every weight in the model. It is the most expensive way to train.

I used the Banking77 dataset. It contains 13,000 customer support messages. The goal is to identify 77 different intents, such as lost cards or exchange rates.

I chose Gemma 3 (270M). This model is small enough to train on a laptop using Apple Silicon. Full fine-tuning requires four times the model size in memory to store gradients and optimizer states.

Instead of adding a classification head, I made the model generate the intent as text. This makes the process identical to instruction tuning. It prepares the project for the next steps.

A critical step is masking the loss. You must tell the model to ignore the prompt and only grade itself on the label. If you skip this, the model wastes effort learning to repeat your prompt.

I used a low learning rate of 5e-5. High learning rates destroy pretrained knowledge during full fine-tuning. A rate of 2e-4 caused the model to fail.

The results:

96% accuracy on common intents.
The model works well on a laptop.
It still confuses card arrival with delivery estimates.

In Part 2, I will use a model five times larger. I will train less than 1% of its weights using LoRA. I will see if I can get the same accuracy.

Source: https://dev.to/sumanpro/i-fine-tuned-a-270m-model-on-my-laptop-full-fine-tuning-from-scratch-3p4l

Optional learning community: https://t.me/GyaanSetuAi

𝗜 𝗙𝗶𝗻𝗲 𝗧𝘂𝗻𝗲𝗱 𝗮 𝟮𝟳𝟬𝗠 𝗠𝗼𝗱𝗲𝗹 𝗼𝗻 𝗠𝘆 𝗟𝗮𝗽𝘁𝗼𝗽

Continuer la lecture

Exécuter deux modèles sur un seul GPU : les mathématiques derrière les LLM locaux

𝗤𝘂𝗮𝗻𝘁𝗶𝘇𝗶𝗻𝗴 𝗠𝗼𝗱𝗲𝗹𝘀 𝗼𝗻 𝗮 𝟲 𝗚𝗕 𝗟𝗮𝗽𝘁𝗼𝗽 𝗚𝗣𝗨

𝗠𝗼𝗱𝗲𝗹 𝗥𝗼𝘂𝘁𝗶𝗻𝗴: 𝗦𝘁𝗼𝗽 𝗨𝘀𝗶𝗻𝗴 𝗢𝗻𝗲 𝗠𝗼𝗱𝗲𝗹 𝗳𝗼𝗿 𝗘𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴

𝗟𝗟𝗠 𝗙𝗶𝗻𝗲 𝗧𝘂𝗻𝗶𝗻𝗴 𝟮𝟬𝟮𝟲: 𝗧𝗵𝗲 𝗨𝗹𝘁𝗶𝗺𝗮𝘁𝗲 𝗚𝘂𝗶𝗱𝗲

Conception d'un pipeline de données synthétiques persanes