𝗧𝗮𝗺𝗶𝗻𝗴 𝗟𝗼𝗻𝗴 𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵 𝗟𝗟𝗠𝘀

I built a system to answer questions from 100-page technical PDFs.

Simple scripts failed. I fought token limits and high costs for weeks.

My first try used GPT-4 with the full text. This worked for 10 pages. At 100 pages, I hit the token cap. The model forgot details in the middle. Costs were too high.

I tried these methods:

  • Basic chunking: The model picked the wrong sections. It missed context.
  • Map-reduce: I lost specific details.
  • Sliding windows: This was too slow and expensive.

I mimic how humans read. We skim the table of contents. Then we read specific sections.

Here is the new workflow:

  • Create a hierarchy. Use an LLM to make a short summary for each chunk.
  • Store summaries and full text in a vector database.
  • Use hybrid search. Combine keywords and semantic search.
  • Retrieve the top 3 summaries first.
  • Fetch the full text for those summaries.
  • Feed this context to the LLM.

The results:

  • Costs dropped by 70%.
  • Technical terms are now accurate.
  • Accuracy improved.

Tips for your setup:

  • Use GPT-3.5 for summaries.
  • Use GPT-4 for the final answer.
  • Build a test dataset early.
  • Stuff the prompt for docs under 20 pages.

Source: https://dev.to/__c1b9e06dc90a7e0a676b/how-i-finally-tamed-long-document-analysis-with-llms-it-wasnt-simple-chunking-5ed3