๐ฑ ๐ฃ๐ฟ๐ผ๐บ๐ฝ๐ ๐๐ฟ๐ฎ๐บ๐ฒ๐๐ผ๐ฟ๐ธ๐ ๐ณ๐ผ๐ฟ ๐ ๐ฒ๐ฑ๐ถ๐ฐ๐ฎ๐น ๐๐
Long prompts can make AI mistakes in medical testing.
I tested how different prompt styles affect AI accuracy in classifying genetic variants. I used 27 tests to ensure the results were reliable.
Here is what I found.
๐ง๐ต๐ฒ ๐ง๐ฒ๐๐๐ถ๐ป๐ด ๐ฆ๐๐๐น๐ฒ๐
- Verbose: Detailed instructions. Tells the AI to act as an expert and list every clinical rule.
- Concise: Short and direct. Tells the AI to classify the gene and stop.
- Structured: Uses a JSON-like format with specific fields like Gene and Variant.
๐ง๐ต๐ฒ ๐ฅ๐ฒ๐๐๐น๐๐
The Verbose style had the lowest accuracy at 48.1%. The Concise style had the highest accuracy at 81.5%.
Why did Verbose fail?
When you tell an AI to look for specific disease markers, you bias it. In one test, the AI saw a common benign variant. Because the prompt forced it to look for disease rules, the AI ignored the frequency data. It tried too hard to find a problem that was not there.
The Concise style worked better because it did not force a bias. It allowed the AI to evaluate all data equally.
๐ง๐ต๐ฒ ๐ง๐ต๐ถ๐ป๐ธ๐ถ๐ป๐ด ๐ง๐ผ๐ธ๐ฒ๐ป ๐ง๐ฎ๐
Adding more words does not make the AI think harder.
In my tests, moving from a medium task to a complex task increased the prompt length by 5 times. However, the AI's actual reasoning tokens only increased by 1.6 times.
If you want better reasoning, do not just write more. Instead, ask for "Step-by-step evaluation" within a structured format.
๐๐ฒ๐ ๐ง๐ฎ๐ธ๐ฒ๐ฎ๐๐ฎ๐๐
- Do not lead with disease rules. This causes the AI to miss benign results.
- Short prompts often beat long prompts for quality.
- Use structured formats for stability in large data batches.
- Always run tests multiple times. A single test might just be a lucky cache hit.
Optional learning community: https://t.me/GyaanSetuAi