Microsoft's SkillOpt Boosts GPT-5.5 Performance via Markdown Optimization
Microsoft and researchers from three Chinese universities have unveiled SkillOpt, a groundbreaking method that treats instructional Markdown files as trainable parameters. By optimizing these "skill" documents, the researchers achieved a massive 23-point performance jump for GPT-5.5 on procedural tasks.
Treating Text as Trainable Weights
In the current AI landscape, "skills"—modular instructions that guide agents through specific procedures, tool-use rules, and output formats—are becoming industry standards. While companies like Anthropic use these to enhance Claude, these documents are traditionally written by humans or generated in a single pass by an LLM. Neither method functions as a true optimizer.
SkillOpt changes this paradigm by treating a Markdown file as an external, trainable state for a frozen target model. Instead of updating the model's weights, a second "optimizer" language model analyzes execution logs to identify recurring errors and successes. This optimizer proposes surgical edits—adding, deleting, or replacing specific passages—within a Markdown document. Crucially, these changes are only accepted if they yield measurable improvements on a held-out validation set.
Deep Learning Concepts Applied to Prose
The brilliance of SkillOpt lies in how it maps traditional deep learning mechanics onto text-level optimization. The researchers implemented several sophisticated control mechanisms to ensure stability:
- Learning Rate and Schedulers: A learning rate caps the number of edits allowed per step, while a scheduler shrinks the edit size across training epochs to prevent volatility.
- Negative Feedback Buffers: Rejected edits are stored in a buffer, serving as negative examples that prevent the optimizer from repeating the same mistakes.
- Gradient Smoothing: A "slow update" mechanism at the end of each epoch preserves stable edit directions, mimicking how gradient smoothing stabilizes traditional neural network training.
This separation of concerns means the heavy lifting happens during training. At inference time, the target model remains lightweight, simply receiving a compact Markdown file of 300 to 2,000 tokens as context.
Benchmark Dominance and Cross-Model Transferability
The empirical results are significant. Testing across six benchmarks—including search, math, spreadsheets, and embodied action—SkillOpt consistently outperformed handwritten skills and specialized methods like TextGrad and EvoSkill. On GPT-5.5 in direct chat, the method yielded an average performance increase of approximately 23 points.
One of the most impactful findings is the method's transferability. A skill optimized for a large model like GPT-5.5 can be applied to much smaller models, such as Qwen3.5-4B, effectively providing them with procedural knowledge they lack in their native weights. Furthermore, skills are environment-agnostic; a spreadsheet skill trained in a Codex loop works seamlessly in Claude Code without retraining.
For example, in spreadsheet tasks, the optimized skill learns to check worksheet structures first and write evaluated values directly rather than relying on formulas. In embodied AI tasks like ALFWorld, the skill learns to maintain a log of visited locations to ensure objectives are met in the correct order.
Key Takeaways
- Text-Based Optimization: SkillOpt treats Markdown instruction files as trainable states, using a second LLM to optimize them much like model weights.
- Massive Performance Gains: The method boosted GPT-5.5 by an average of 23 points on procedural benchmarks, specifically excelling in tool-use and strict formatting tasks.
- Efficient and Transferable: Optimized skills are compact (under 2,000 tokens) and can be transferred from large models to smaller ones or between different agent environments.