𝗥𝗔𝗚 𝗔𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲 𝗳𝗼𝗿 𝗦𝗮𝗮𝗦 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻𝘀 𝗨𝘀𝗶𝗻𝗴 𝗔𝗺𝗮𝘇𝗼𝗻 𝗕𝗲𝗱𝗿𝗼𝗰𝗸

My first production batch on Autowired.ai cost 3x my budget.

I sent full OCR text from 200 documents to a frontier model for every single field. This was a mistake. I was paying for data the model did not need.

I redesigned the architecture and cut costs by 40%. Here is how you can do the same.

  1. Stop using LLMs for everything

Textract is excellent at extracting structured fields like dates and totals. I was using Bedrock to redo work Textract already finished.

The new flow uses three stages: • Use Textract for the bulk of the work. • Send only missing fields to Bedrock for a gap-fill call. • Use Bedrock for a final verification call.

If Textract is confident, Bedrock does less work. This drops your token count immediately.

  1. Use Prompt Caching

System prompts for field definitions and schemas are static. They do not change between documents.

Amazon Bedrock allows you to cache these prompts. The first call in a batch pays a small premium. Every subsequent call in that window hits the cache at 10% of the usual price. This reduced my input costs by 20%.

  1. Filter your context

Do not send the full OCR response to Bedrock.

• For gap-fill: Send only the specific OCR blocks related to the missing fields. • For verification: Send the extracted values, not the raw OCR.

I also cleaned my prompts. Removing redundant instructions reduced my prompt size from 2,400 tokens to 1,100 tokens with zero loss in accuracy.

  1. Match the model to the task

Do not use Claude Sonnet for every task. Sonnet is 5x more expensive than Haiku.

I tested them on specific tasks: • Structured form gap-fill: Haiku was 2% as accurate as Sonnet. I switched to Haiku. • Unstructured contracts: Haiku was less accurate. I kept Sonnet. • Verification: Haiku performed well. I switched to Haiku.

Choose your model based on the task complexity, not the whole system.

  1. Implement application-layer caching

I added a cache in DynamoDB using a hash of the schema and the Textract output. If you run the same test set multiple times while tuning your code, this eliminates 80% to 90% of your Bedrock calls.

우승한 아키텍처 요약: • 반복 요청 시 Bedrock을 건너뛰기 위한 애플리케이션 캐시. • 정적 시스템 지침을 위한 Bedrock 프롬프트 캐시. • 가능한 경우 Haiku를 사용하기 위한 모델 티어링. • 필요한 데이터만 전송하기 위한 컨텍스트 필터링.

최적화하기 전에 토큰 사용량을 측정하세요. 데이터를 통해 어디에서 비용이 낭비되고 있는지 확인할 수 있습니다.

출처: https://dev.to/yogieee/rag-architecture-for-saas-applications-using-amazon-bedrock-10df

선택 사항 학습 커뮤니티: https://t.me/GyaanSetuAi