๐ง๐ฎ๐บ๐ถ๐ป๐ด ๐๐ผ๐ป๐ด ๐๐ผ๐ฐ๐๐บ๐ฒ๐ป๐๐ ๐๐ถ๐๐ต ๐๐๐ ๐
I built a system to answer questions from 100-page technical PDFs.
Simple scripts failed. I fought token limits and high costs for weeks.
My first try used GPT-4 with the full text. This worked for 10 pages. At 100 pages, I hit the token cap. The model forgot details in the middle. Costs were too high.
I tried these methods:
- Basic chunking: The model picked the wrong sections. It missed context.
- Map-reduce: I lost specific details.
- Sliding windows: This was too slow and expensive.
I mimic how humans read. We skim the table of contents. Then we read specific sections.
Here is the new workflow:
- Create a hierarchy. Use an LLM to make a short summary for each chunk.
- Store summaries and full text in a vector database.
- Use hybrid search. Combine keywords and semantic search.
- Retrieve the top 3 summaries first.
- Fetch the full text for those summaries.
- Feed this context to the LLM.
The results:
- Costs dropped by 70%.
- Technical terms are now accurate.
- Accuracy improved.
Tips for your setup:
- Use GPT-3.5 for summaries.
- Use GPT-4 for the final answer.
- Build a test dataset early.
- Stuff the prompt for docs under 20 pages.