The OpenAI API everyone copied isn't the one they recommend

Translated for your language. Read the original.

AI-assisted draft.

Most local model tools like Ollama, vLLM, and LM Studio use an "OpenAI-compatible" badge.

But there is a problem. Most people think this means one specific thing. In reality, there are two different formats. One is the industry standard. The other is what OpenAI actually wants you to use.

Here is the breakdown.

The Old Standard: Chat Completions API This is the format everyone copied. It uses a list of messages with roles like developer, user, and assistant.

It has two main issues:

It is stateless. You must resend the entire conversation history every single time.
It is heavy. For complex agents, sending huge transcripts becomes slow and expensive.

The New Standard: Responses API OpenAI introduced this in March 2025. It is designed for agents, not just simple chatbots.

Why it is better:

It is stateful. The server remembers the conversation. You do not need to resend everything.
It handles reasoning better. It keeps the model's "chain of thought" on the server.
It uses a cleaner structure. It separates instructions from the actual user input.

The Confusion When a tool says it is "OpenAI-compatible," it almost always means it supports the old Chat Completions format.

The industry built a massive ecosystem around this old format. Because it was everywhere, it became the default. This created a risk where everyone was building clones of a single company's private API.

The Solution: Open Responses To fix this, OpenAI and partners like Hugging Face and Vercel launched the Open Responses specification.

Instead of guessing how an API works, developers now have a documented, testable standard. This allows you to switch between OpenAI and local models with minimal code changes.

What you should do:

If you are building a new project, use the Responses API.
If you are maintaining old apps, Chat Completions will stay supported for a long time.
Always check if your tool supports the new stateful format to save on costs and latency.

Knowing the difference prevents errors in token counting and message structures.

Source: https://dev.to/rlnorthcutt/the-openai-api-everyone-copied-isnt-the-one-openai-recommends-28o8

Optional learning community: https://t.me/GyaanSetuAi

The OpenAI API everyone copied isn't the one they recommend

Continue reading

איך OpenAI ו-Anthropic מעצבות מערכות AI

מודלי שפה קטנים בשנת 2026: מתי כדאי לוותר על ה-API הגדול