Effort Levels in Practice: My Benchmark Results

AI-assisted draft.

GyaanSetu Editorial21 hours ago2min read

Claude models offer five effort levels: low, medium, high, xhigh, and max.

Most people assume higher effort always costs more. I tested this. I ran three real tasks across all five levels. I measured tokens, latency, and quality.

The results changed how I use these models.

What effort actually does: It controls token spend. It changes how much the model thinks and acts. Low effort means fewer tool calls and shorter answers. High effort means more exploration before the model answers.

The tasks I tested:

Classification: Labeling contract findings.
Code generation: Writing TypeScript functions.
Multi-step audit: Analyzing a contract for vulnerabilities.

Here is what I found:

Classification Quality stayed the same at every level. Max effort used 8x more tokens than low effort for the same answer. My rule: Use low effort for simple, scoped tasks. High effort is a waste here.
Code generation Quality improved from low to high. After high, quality plateaued. Xhigh and max produced the same code as high but cost more. My rule: Use high effort for single-shot code or content.
Multi-step audits This result surprised me. Higher effort did not always mean higher cost. For this task, xhigh used fewer tokens than medium.

At medium effort, the model explored less per step. It took more turns and hit dead ends. At xhigh, the model planned better. It finished in fewer turns. Better planning led to lower total cost and higher quality.

My new strategy:

Classification, routing, extraction: Use low effort.
Single-shot code or content: Use high effort.
Agentic loops, multi-step audits: Use xhigh effort.
Maximum correctness required: Use max effort.

Stop guessing your settings. Pick three tasks you do often. Run them through all five levels. Measure the tokens and the quality.

Testing takes one afternoon. Using the wrong setting costs you money every day.

Source: https://dev.to/pavelespitia/effort-levels-in-practice-i-benchmarked-low-through-max-on-real-tasks-7lf

Optional learning community: https://t.me/GyaanSetuAi

Effort Levels in Practice: My Benchmark Results

Continue reading

The LLM Benchmark Lie

Lossless, But Not Free: When Speculative Decoding Works

A Verification Ladder for Low Cost AI Coding Models