𝗙𝗶𝘅𝗶𝗻𝗴 𝗝𝗦𝗢𝗡 𝗢𝘂𝘁𝗽𝘂𝘁 𝗳𝗿𝗼𝗺 𝗚𝗣𝗧

📅1 week ago⏱1 min read

I spent three days fixing broken JSON from GPT-4. Prompting did not work. I asked for valid JSON. It still broke in production.

I built a tool for meeting notes. I wanted action items and dates. The model gave me "next Thursday" instead of a real date. It added extra fields. It was chaos.

I tried better prompts. I tried Python parsers. I tried try-except blocks. These were patches on a broken pipe.

The solution is constrained decoding. This forces the model to follow a schema during generation. It does not fix output after it is done. It restricts model tokens.

I used a Python library called Outlines. It uses Pydantic models.

It works by using a finite state machine. It masks tokens breaking the schema.

The results:

Errors dropped from 25% to 0.1%.
Generation time increased by 5% to 10%.

My advice:

Stop post-generation parsing.
Use a structural constraint.
Keep your schemas flat.
Test with real user data.

Prompting has a limit. You need a structural constraint for guaranteed structure.

How do you get structured data from LLMs?

Source: https://dev.to/__c1b9e06dc90a7e0a676b/fixing-json-output-from-gpt-a-pattern-that-actually-works-284g

𝗙𝗶𝘅𝗶𝗻𝗴 𝗝𝗦𝗢𝗡 𝗢𝘂𝘁𝗽𝘂𝘁 𝗳𝗿𝗼𝗺 𝗚𝗣𝗧

Continue reading

𝗜𝗠𝗣𝗥𝗢𝗩𝗜𝗡𝗚 𝗔𝗜 𝗖𝗢𝗗𝗘 𝗚𝗘𝗡𝗘𝗥𝗔𝗧𝗜𝗢𝗡

𝗦𝘁𝗼𝗽 𝗙𝗶𝗴𝗵𝘁𝗶𝗻𝗴 𝗥𝗲𝗴𝗲𝘅 𝗪𝗶𝘁𝗵 𝗟𝗟𝗠𝘀

𝗙𝗜𝗫𝗜𝗡𝗚 𝗝𝗦𝗢𝗡 𝗢𝗨𝗧𝗣𝗨𝗧 𝗙𝗥𝗢𝗠 𝗚𝗣𝗧

𝟭𝟬 𝗝𝗦𝗢𝗡 𝗘𝗥𝗥𝗢𝗥𝗦 𝗬𝗢𝗨 𝗪𝗜𝗟𝗟 𝗛𝗜𝗧

𝗦𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱 𝗢𝘂𝘁𝗽𝘂𝘁 𝗳𝗿𝗼𝗺 𝗟𝗟𝗠𝘀