Three Rounds of Training Make a Chatbot

Building a Transformer is not enough. You can pour the whole internet into it and spend millions on compute. You will still end up with a machine that cannot answer a simple question.

A raw model is just a text mimic. It predicts the next word based on patterns. If you ask it "How do I reset my router?", it might respond with more questions like "How do I change my password?". It does not know you want help. It only knows how the internet continues a sentence.

To turn this predictor into a chatbot, you need three rounds of training.

  1. Pretraining (The Engine) You show the model trillions of words. You hide the last word and make it guess. This builds the knowledge. It learns facts, grammar, and logic. This works because the data labels itself. Scale makes this predictable. More data and more compute leads to better results.

  2. Instruction Tuning (The Script) The base model knows everything but has no goal. In this round, you show it a few thousand examples of a prompt paired with a good human response. This does not add new knowledge. It teaches the model a new behavior. You are handing the actor a script. It learns to act like a helpful assistant instead of just a text completer.

  3. Preference Tuning (The Manners) Scripts are limited. You cannot write a rule for every situation. In this round, you show the model two different answers and let a human pick the better one. The model learns to chase a high score based on human taste. This gives the model its tone, its politeness, and its safety limits.

The summary is simple:

  • Pretraining builds the knowledge.
  • Instruction tuning picks the assistant out of the crowd.
  • Preference tuning adds the judgment and manners.

The personality you see in a chat window is just a thin layer on top of a raw word predictor. We did not need a theory of intelligence to build this. We needed a simple goal, scale, and two rounds of coaching.

Source: https://dev.to/karthi_raman_02ec8161bda0/three-rounds-of-training-turn-a-word-predictor-into-a-chatbot-none-of-them-are-magic-395i

Optional learning community: https://t.me/GyaanSetuAi