𝗦𝘁𝗼𝗽 𝗚𝘂𝗲𝘀𝘀𝗶𝗻𝗴 𝗟𝗟𝗠 𝗦𝗮𝗺𝗽𝗹𝗶𝗻𝗴 𝗣𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀

📅4 days ago⏱1 min read

You pick temperature 0.7 because a blog says so. Your bot starts talking nonsense. You try top-p 0.9. Then top-k 50. You are guessing.

Most teams lack a test set. They use defaults for general chat. Your use case is different.

Learn these four knobs:

Temperature: Controls randomness. Use 0 for facts. Use 0.7 for chat. Use 1.0 for ideas.
Top-p: Keeps tokens until a probability limit is hit. It adapts to model confidence.
Top-k: Keeps a fixed number of tokens. It is too rigid for most work.
Min-p: Keeps tokens based on the top choice. It is a great safety net for production.

Production Recipes:

Avoid these mistakes:

If greedy decoding (temp 0) fails, your prompt is the problem. Sampling parameters will not fix a bad model.

Continue reading