๐ง๐ฎ๐บ๐ถ๐ป๐ด ๐๐ผ๐ป๐ด ๐๐ผ๐ฐ๐๐บ๐ฒ๐ป๐ ๐๐ป๐ฎ๐น๐๐๐ถ๐ ๐๐ถ๐๐ต ๐๐๐ ๐
I needed to answer questions from 100 page PDFs. A simple script failed. I fought token limits and high costs for weeks.
First, I tried the full text. The model forgot details in the middle. Costs hit 50 cents per call.
Then I tried these methods:
- Fixed chunks: The model picked the wrong parts.
- Map-reduce: Summaries lost the details.
- Sliding window: It was too slow.
I decided to mimic how humans read. Humans skim first. Then they read.
Here is my process:
- Create a hierarchy of chunks.
- Write a short summary for each chunk.
- Store both summaries and raw text in a vector database.
- Use hybrid search to find the best summaries.
- Fetch the raw text from those summaries.
- Use a strict prompt to stop hallucinations.
This changed the results:
- Costs dropped by 70 percent.
- Accuracy went up.
- Technical terms stayed intact.
My tips for you:
- Use cheap models for summaries.
- Use GPT-4 for the final answer.
- Build a test dataset in the first week.
- Skip this for docs under 20 pages.
What is your setup for long docs?
Source: https://dev.to/__c1b9e06dc90a7e0a676b/how-i-finally-tamed-long-document-analysis-with-llms-it-wasnt-simple-chunking-5ed3 Optional learning community: https://t.me/GyaanSetuAi