LLMs के साथ लंबे दस्तावेज़ों को संभालना

आपकी भाषा के लिए अनुवादित. मूल पढ़ें.

AI-सहायता प्राप्त ड्राफ़्ट.

GyaanSetu Editorial3 सप्ताह पहले1मिनट पढ़ें

𝗧𝗮𝗺𝗶𝗻𝗴 𝗟𝗼𝗻𝗴 𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁𝘀 𝘄𝗶𝘁𝗵 𝗟𝗟𝗠𝘀

I built a system to answer questions from 100-page technical PDFs.

Simple scripts failed. I fought token limits and high costs for weeks.

My first try used GPT-4 with the full text. This worked for 10 pages. At 100 pages, I hit the token cap. The model forgot details in the middle. Costs were too high.

I tried these methods:

Basic chunking: The model picked the wrong sections. It missed context.
Map-reduce: I lost specific details.
Sliding windows: This was too slow and expensive.

I mimic how humans read. We skim the table of contents. Then we read specific sections.

Here is the new workflow:

Create a hierarchy. Use an LLM to make a short summary for each chunk.
Store summaries and full text in a vector database.
Use hybrid search. Combine keywords and semantic search.
Retrieve the top 3 summaries first.
Fetch the full text for those summaries.
Feed this context to the LLM.

The results:

Costs dropped by 70%.
Technical terms are now accurate.
Accuracy improved.

Tips for your setup:

Use GPT-3.5 for summaries.
Use GPT-4 for the final answer.
Build a test dataset early.
Stuff the prompt for docs under 20 pages.

Source: https://dev.to/__c1b9e06dc90a7e0a676b/how-i-finally-tamed-long-document-analysis-with-llms-it-wasnt-simple-chunking-5ed3

LLMs के साथ लंबे दस्तावेज़ों को संभालना

पढ़ना जारी रखें

𝗕𝘂𝗶𝗹𝗱𝗶𝗻𝗴 𝗔𝗻 𝗜𝗻𝘁𝗲𝗿𝗻𝗮𝗹 𝗔𝗜 𝗖𝗵𝗮𝘁𝗯𝗼𝘁: 𝗟𝗲𝘀𝘀𝗼𝗻𝘀 𝗟𝗲𝗮𝗿𝗻𝗲𝗱

LLMs के साथ लंबे दस्तावेज़ विश्लेषण पर काबू पाना

𝗧𝗼𝘄𝗮𝗿𝗱𝘀 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗟𝗟𝗠 𝗦𝗲𝗿𝘃𝗶𝗻𝗴

इन 7 गलतियों को सुधारने से पहले मैंने RAG इंफ्रास्ट्रक्चर पर $500 खर्च किए

7 गलतियाँ करने से पहले मैंने RAG इंफ्रास्ट्रक्चर पर $500 खर्च किए