𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗪𝗶𝗻𝗱𝗼𝘄𝘀 𝗔𝗿𝗲 𝗚𝗲𝘁𝘁𝗶𝗻𝗴 𝗛𝘂𝗴𝗲

People use the word agent for everything.

A function that calls a tool is an agent. A chatbot with memory is an agent. A script with a loop is an agent.

This mistake leads to bad engineering. Teams over-engineer simple tasks and under-engineer complex ones. I see teams spend weeks on agent orchestration for workflows that only need one good prompt.

Here is my definition of a real agent.

An agent has an objective. It does not just follow instructions. It decides what to do next. It handles failure. It knows when to stop.

Use these benchmarks:

  • If a human must guide every step, it is a chat interface.
  • If the system recovers from a failed tool call, it is moving toward an agent.
  • If the system breaks a goal into tasks and delegates them, it is a real agent.

Most successful agents are narrow. They do one job well. They handle customer support triage or document extraction. They are not general reasoning engines.

Successful teams focus on these three things:

  • Tool design: How clean is the interface?
  • Failure handling: What happens when a tool returns nothing?
  • Observability: Can you trace why the agent made a decision?

Unsuccessful teams just swap one model for a newer one and expect better results. They ignore the system design.

Frameworks like LangChain or CrewAI change every month. The framework matters less than the pattern.

Use these patterns:

  • Plan then execute: Separate the reasoning step from the execution step.
  • Separate retrieval from reasoning: Fetching context is a different job than using it.
  • Explicit handoffs: Use structured logs when one agent passes work to another.

The framework is just scaffolding. The architecture is the building.

RAG is standard, but chunking is often broken. If you split documents poorly, the model loses context. This leads to hallucinations.

If your RAG results are useless, check your chunking and metadata. The model is rarely the problem.

Models will get better. Context windows will grow. Token costs will drop.

None of that solves the real engineering challenge. You must build systems that behave correctly when you are not watching.

Focus on governance, observability, and reliable tool use. The best engineers will not be model researchers. They will be systems designers who build reliable AI.

上下文窗口正在变得巨大,这就是为什么它改变了一切

上下文窗口正在变得巨大。在感觉转瞬即逝的时间里,我们已经从几千个 token 跨越到了数百万个 token。这不仅仅是一个技术里程碑;它正在改变我们与 AI 交互的范式。

上下文窗口的演变

回想一下,在 GPT-3 时代,上下文窗口仅为 2,048 个 token。如果你想让模型处理一本书,那几乎是不可能的。随着 GPT-4、Claude 和 Gemini 的出现,这一限制正在迅速消失。Gemini 1.5 Pro 甚至可以处理高达 100 万甚至 200 万个 token。

这意味着你可以将整个代码库、数小时的视频或数百份文档直接“喂”给模型。

RAG 与长上下文:两种不同的路径

在长上下文窗口普及之前,检索增强生成 (RAG) 是处理大规模数据的标准方法。

RAG (检索增强生成)

RAG 的工作原理类似于查阅索引。当你提问时,系统会先从庞大的数据库中检索出最相关的“片段”,然后将这些片段连同你的问题一起发送给模型。

  • 优点:成本低、速度快、可以处理无限量的数据。
  • 缺点:检索可能会丢失上下文,无法理解文档之间的全局联系。

长上下文 (Long Context)

长上下文则完全不同。它不再是“检索片段”,而是直接将所有信息放入模型的“工作记忆”中。

  • 优点:能够进行全局推理,理解复杂的跨文档联系,减少了检索错误。
  • 缺点:计算成本高、延迟较高、可能存在“大海捞针”问题。

“大海捞针”问题 (Needle In A Haystack)

尽管窗口变大了,但并不意味着模型能完美处理所有信息。这就是“大海捞针”测试发挥作用的地方。该测试旨在检查模型是否能在海量信息中准确找到一个微小的、特定的事实。

许多模型在处理长文本时会遇到“迷失在中间 (Lost in the Middle)”的问题——即它们能记住开头和结尾的信息,但容易忽略中间的内容。

为什么这改变了一切?

  1. 从“检索”转向“推理”:我们正在从“寻找相关片段”转向“对整个数据集进行深度推理”。
  2. AI Agent 的进化:拥有巨大上下文窗口的 AI Agent 可以拥有更持久、更复杂的“工作记忆”,从而能够处理更复杂的任务。
  3. 开发流程的简化:开发者不再需要构建复杂的 RAG 管道来处理中等规模的数据集,直接将数据放入 Prompt 即可。

结论

上下文窗口的扩张正在重新定义 AI 的边界。虽然 RAG 在处理海量、实时数据方面仍将发挥作用,但长上下文为深度理解和复杂推理开辟了全新的可能性。