Microsoft's SkillOpt Boosts GPT 5.5 Performance via Markdown Optimization

📅2 hours ago⏱3 min read

In this article

Microsoft's SkillOpt Boosts GPT-5.5 Performance via Markdown Optimization

Microsoft and researchers from three Chinese universities have unveiled SkillOpt, a groundbreaking method that treats instructional Markdown files as trainable parameters. By optimizing these "skill" documents, the researchers achieved a massive 23-point performance jump for GPT-5.5 on procedural tasks.

Treating Text as Trainable Weights

In the current AI landscape, "skills"—modular instructions that guide agents through specific procedures, tool-use rules, and output formats—are becoming industry standards. While companies like Anthropic use these to enhance Claude, these documents are traditionally written by humans or generated in a single pass by an LLM. Neither method functions as a true optimizer.

SkillOpt changes this paradigm by treating a Markdown file as an external, trainable state for a frozen target model. Instead of updating the model's weights, a second "optimizer" language model analyzes execution logs to identify recurring errors and successes. This optimizer proposes surgical edits—adding, deleting, or replacing specific passages—within a Markdown document. Crucially, these changes are only accepted if they yield measurable improvements on a held-out validation set.

Deep Learning Concepts Applied to Prose

The brilliance of SkillOpt lies in how it maps traditional deep learning mechanics onto text-level optimization. The researchers implemented several sophisticated control mechanisms to ensure stability:

Learning Rate and Schedulers: A learning rate caps the number of edits allowed per step, while a scheduler shrinks the edit size across training epochs to prevent volatility.
Negative Feedback Buffers: Rejected edits are stored in a buffer, serving as negative examples that prevent the optimizer from repeating the same mistakes.
Gradient Smoothing: A "slow update" mechanism at the end of each epoch preserves stable edit directions, mimicking how gradient smoothing stabilizes traditional neural network training.

This separation of concerns means the heavy lifting happens during training. At inference time, the target model remains lightweight, simply receiving a compact Markdown file of 300 to 2,000 tokens as context.

Benchmark Dominance and Cross-Model Transferability

The empirical results are significant. Testing across six benchmarks—including search, math, spreadsheets, and embodied action—SkillOpt consistently outperformed handwritten skills and specialized methods like TextGrad and EvoSkill. On GPT-5.5 in direct chat, the method yielded an average performance increase of approximately 23 points.

One of the most impactful findings is the method's transferability. A skill optimized for a large model like GPT-5.5 can be applied to much smaller models, such as Qwen3.5-4B, effectively providing them with procedural knowledge they lack in their native weights. Furthermore, skills are environment-agnostic; a spreadsheet skill trained in a Codex loop works seamlessly in Claude Code without retraining.

For example, in spreadsheet tasks, the optimized skill learns to check worksheet structures first and write evaluated values directly rather than relying on formulas. In embodied AI tasks like ALFWorld, the skill learns to maintain a log of visited locations to ensure objectives are met in the correct order.

Key Takeaways

Text-Based Optimization: SkillOpt treats Markdown instruction files as trainable states, using a second LLM to optimize them much like model weights.
Massive Performance Gains: The method boosted GPT-5.5 by an average of 23 points on procedural benchmarks, specifically excelling in tool-use and strict formatting tasks.
Efficient and Transferable: Optimized skills are compact (under 2,000 tokens) and can be transferred from large models to smaller ones or between different agent environments.

Microsoft's SkillOpt Boosts GPT 5.5 Performance via Markdown Optimization

Microsoft's SkillOpt Boosts GPT-5.5 Performance via Markdown Optimization

Treating Text as Trainable Weights

Deep Learning Concepts Applied to Prose

Benchmark Dominance and Cross-Model Transferability

Key Takeaways

Continue reading

𝗧𝗮𝗺𝗶𝗻𝗴 𝗟𝗼𝗻𝗴 𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵 𝗟𝗟𝗠𝘀

𝗦𝘁𝗼𝗽 𝗕𝗹𝗮𝗺𝗶𝗻𝗴 𝘁𝗵𝗲 𝗔𝗜 𝗠𝗼𝗱𝗲𝗹

𝗦𝘁𝗼𝗽 𝗨𝘀𝗶𝗻𝗴 𝗠𝗮𝗿𝗸𝗱𝗼𝘄𝗻 𝗙𝗼𝗿 𝗔𝗜 𝗦𝗽𝗲𝗰𝘀

𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀 𝗳𝗼𝗿 𝗥𝗔𝗚

𝟯 𝗪𝗮𝘆𝘀 𝘁𝗼 𝗦𝗵𝗮𝗿𝗲 𝗔𝗜 𝗢𝘂𝘁𝗽𝘂𝘁 𝗮𝘀 𝗮 𝗪𝗲𝗯 𝗣𝗮𝗴𝗲