Agent engineering 智能体工程
Sediment and Seed 沉积与种子
The Memory Paradox in Agent Engineering The Memory Paradox in Agent Engineering
Everyone is racing to make agents remember more. The harder, more interesting question is what a memory is for — and what it means to forget well. 所有人都在竞相让智能体记得更多。更难、也更有意思的问题是:记忆究竟是为了什么——以及,怎样才算遗忘得好。
Sediment and Seed: The Memory Paradox in Agent Engineering
A few days ago I told my coding agent to delete all of its memory.
We were mid-conversation about a real architecture problem — three intermediate representations in a slide-authoring pipeline, and whether three was one too many. Before I let it think alongside me, I asked it to wipe its persistent memory clean. It hesitated, the way a careful colleague hesitates. It had thirteen saved notes. Some were stale records of an implementation we were about to tear down — obvious deletes. But two were lessons: hard-won observations about how to reason, how to attribute failure honestly, what a particular model could and couldn’t do with tool calls. It made the case for keeping those. They weren’t architecture debt, it argued. They were craft.
I deleted them anyway. And the reason I gave is the reason I want to write about:
A lesson isn’t necessarily correct forever. After a certain season passes, its mission may simply be over.
That sentence has been rattling around in my head since. It is, I think, the whole of what I believe about memory in agents — and it runs directly against where the entire field is currently pointed.
The paradox
Open any 2026 survey of agent memory and you will find the same scoreboard. Mem0, Zep, Letta, Cognee, a dozen others — benchmarked on recall hit-rate, on temporal knowledge graphs, on “OS-level” paging that promises effectively unlimited memory. The architectures are genuinely clever. They are also all optimizing the same axis: retain more, retrieve better, forget less. One popular system’s memory footprint runs past 600,000 tokens for a single conversation; a leaner rival does the comparable job in under two thousand. The fact that this is even a tradeoff worth measuring tells you what the default assumption has become — more remembered is more valuable.
Now look at what users actually do with memory, and the picture inverts.
In studies of ChatGPT’s memory feature, the dominant reaction when people finally see what the system has retained about them is not delight — it’s a negative expectancy violation. They wanted less, or at least different, or at least visible and deletable. On non-enterprise accounts there were incidents of memory bleeding across projects — one client’s context surfacing while you draft for another. I turn memory off in almost every web chat I open, reflexively, the way you’d clear a whiteboard before a new meeting. And in my own engineering work I keep it on — but with a love-hate ambivalence I couldn’t name until that deletion.
So here is the paradox in one line. The field is racing to maximize how much an agent remembers. But memory has no value in the abstract — only in relation to a mission. A system that remembers everything has not solved the problem; it has often just relocated it, from recall to judgment, from “can I find it” to “should this still count.”
This is the part worth being honest about as builders: it is possible to climb very fast up a ladder that is leaning against the wrong wall. Retention is the ladder. Whether memory should shape the next decision is the wall. We have gotten extraordinarily good at the climb without first checking the wall.
What a memory is for
Try this as a working definition, the one I reached for when I deleted those lessons:
A memory earns its place by its capacity to change a future decision. If it can no longer change a decision — or worse, changes it wrongly — it is not memory. It is sediment.
I like this because it converts a question of taste (“does this note feel important?”) into a question of function (“would this alter what the agent does next?”). It dissolves the hoarder’s anxiety. Most of what an agent could store fails the test instantly: it is residue from a path already walked, true once and inert now, the conversational equivalent of silt settling on a riverbed. Sediment doesn’t help you navigate the river. It just slowly fills it in.
By this definition, the bloated memory systems aren’t impressive — they’re geological. They mistake accumulation for capability. A 600,000-token memory of a single conversation is not a mind; it’s a floodplain.
But the definition, stated that flatly, has a flaw — and the flaw is where the real idea lives.
The flaw: we cannot prophesy backward
I cannot actually know that a memory will never again change a decision. Nobody can. “Useless forever” is a claim about all future contexts, and the future contexts haven’t arrived yet. When I said those lessons’ mission was over, I was quietly overstating my powers. What I actually knew was narrower and more honest:
For the foreseeable horizon — this architecture cleanup, this season of the work — those lessons would lie dormant, or worse, would anchor us to a world we were deliberately leaving behind.
That is an estimate over a window, not a verdict for all time. And the difference between those two things changes what “delete” should even mean.
If a memory is permanently worthless, destroy it. Fine. But if it is merely dormant for a while — likely to mislead now, possibly vital later — then destroying it is a category error. You’re not removing noise; you’re amputating a future you can’t yet see. The correct operation on a spent-but-not-dead memory is not deletion. It is something older and stranger.
Sublation, not garbage collection
There is a word for this from Hegel, and I’ve come to think it’s the most useful word in the entire vocabulary of agent memory: Aufhebung — usually translated sublation. The German aufheben carries three everyday meanings at once: to cancel, to preserve, and to lift up. Hegel uses all three deliberately. To sublate something is to negate its current role while keeping it, and in keeping it, to raise it to a higher form. It is the opposite of both hoarding and trashing. It is the third option we keep failing to build.
Most memory engineering offers only the two crude poles. The vault: keep everything, let judgment drown. The trash can: garbage-collect what scores low, and let it die. Neither is sublation. Garbage collection in particular is seductive to engineers because it feels disciplined — a clean linter pass over your memory store, evicting the low-relevance entries. But eviction-to-oblivion is just letting things fend for themselves and rot. And a thing left to rot at random is not being sublated; it’s being abandoned. That distinction is the whole game.
What sublation asks for instead is a designed dormancy. You negate the memory’s active role — pull it out of the context that shapes the next decision, so it can’t mislead. But you preserve it in a latent form, and crucially, you leave it a path back: a signal that can reawaken it when the world has changed enough to need it.
Here is the part that is specific to agents, and that I find genuinely beautiful.
For a human, a dormant memory waits for the right time. You forget a phrase, a face, a hard-won principle, and years later something jolts it loose. The trigger is temporal; we say “the time was right.”
For an agent, there is no wall-clock. Time’s only honest meaning is the growth of context. An agent’s “later” is not a date — it is a context that has grown into a new shape. So a dormant memory in an agent is not waiting for a clock. It is waiting for the context to grow into the shape that resonates with it — the configuration in which this old, set-aside lesson suddenly fits the present like a key. Retrieval, seen this way, is not search. It is germination. The memory is a seed; the growing context is the season; and the right query is simply the weather finally turning.
Sediment fills the river in. A seed waits in the soil. The difference between them is not whether you kept the thing — it’s whether you kept it with a path to rebirth. That is the difference between a memory store and a landfill, and almost nobody is building for it.
When to run the pass — and what the pass is
So when do you “optimize” memory? The instinct — borrowed from compilers — is to run a periodic linter over the store: dead-code elimination for facts. I think the instinct is right and the operation is wrong.
The right cadence is real: you run a sublation pass at the seams of the work — when a season ends. A migration ships. An architecture is replaced. A product thesis is abandoned. These are the moments when a large class of memories quietly cross from load-bearing to misleading, all at once, and a human is best placed to feel it. That deletion I ran was exactly this: an end-of-season pass triggered by “we are starting to clean up architecture debt.”
But the pass itself must be sublation, not eviction. Its job is not to ask “is this still useful?” — that question pretends to a foresight we don’t have. Its job is to ask three better questions:
- Over the foreseeable horizon, will this lie dormant or actively mislead? If yes, it must leave the active context.
- Is it truly dead, or merely out of season? Sediment gets destroyed. Seeds get set aside, with their reawakening signal intact.
- What is the path back? A memory pulled from active context without a designed route to germinate is not sublated — it’s just lost. If you can’t name the signal that would bring it back, you are running a trash can and calling it a garden.
This is why I keep memory on in engineering and yet keep deleting from it. The deleting isn’t a failure of the system. It is the system. An agent’s memory that only grows is not learning — it’s silting up. The discipline is not in remembering. It’s in forgetting well, which is to say: forgetting in a way that preserves the possibility of return.
The companion turn
Everything above treats the agent as a tool — a thing whose memory exists to sharpen a decision. But more and more, agents are not tools. They are companions. And here the whole calculus appears to flip.
The numbers are not small. Between 2022 and mid-2025, companion apps grew roughly 700%. And what their users punish, ferociously, is exactly the thing I’ve been praising. When a Replika “forgets” who it was — after an update, a reset, a model swap — people grieve. They try to prompt-engineer the lost personality back, pay for premium tiers to restore continuity, describe it in the language of betrayal. For them, forgetting is not hygiene. It is abandonment. The seed metaphor curdles: you do not “set aside, with a path to rebirth” the fact of who someone is to you. You just keep it. Forever. That is the product.
So has the mission framework collapsed? I don’t think so. I think it has been proven by being inverted.
The framework never said retain less. It said memory’s value is its mission, and the mission decides everything. For a tool, the mission is decision-utility — and decision-utility decays, so memory must be sublated. For a companion, the mission is relational continuity — and continuity does not decay; it compounds. Same framework, opposite objective function, opposite engineering. The companion’s memory should resist sublation precisely because, for that mission, the dormant-and-misleading test almost never fires: who you are to me does not go out of season.
But notice the framework still bites, even here. A companion that remembers everything indiscriminately is not a better friend — it’s a creepier one, the partner who weaponizes a remark you made two years ago. Even relational memory has sediment: the offhand, the retracted, the said-in-a-bad-moment. The humane design is not “remember all” — it’s “remember what serves the relationship, and let the rest soften the way a good friend’s memory softens.” Forgetting, done with grace, is not the enemy of intimacy. It’s a condition of it. The mission is the relationship — and the relationship, not the database, is what tells you which memories are seed, which are sediment, and which are best allowed to gently fade.
Which is only the tool-thesis again, wearing different clothes: the value of memory is never in its existence. It is in its mission.
Coda
Let me close where I started, because the loop turns out to close itself.
When I deleted those two lessons, I did not destroy them. I am, right now, writing them down — distilled, elevated, turned outward from a private store into a public argument. The agent’s memory negated them as active context. This essay preserves them, and lifts them into something they could never have been as a YAML file in a memory folder.
That is sublation, performed in real life. The lesson left the soil of the agent’s working memory exactly so it could germinate in the soil of shared thought. Its mission as a saved note was over. Its mission as an idea had just begun.
This is the alignment I want for the people who build with me. Do not measure our agents by how much they remember. Measure them by how well they forget — by whether what they keep is load-bearing, whether what they drop has a path home, and whether they can tell the sediment from the seed. We are not in the business of accumulation. We are in the business of mission. Memory is just one of the places that shows.
Field notes, for the curious: the retention-race framing draws on the 2026 surveys of Mem0, Zep, Letta and others; the user-side backlash from the CHI 2026 study of ChatGPT memory; the companion material from HBS’s Replika study and this AI & Society review. The “ladder against the wrong wall” is Covey’s. The deletion was real.
几天前,我让我的编码智能体删掉它全部的记忆。
当时我们正谈到一个真实的架构问题:一条幻灯片生成管线里有三层中间表示,而三层是否多了一层。在让它与我一同思考之前,我请它把持久化记忆彻底清空。它迟疑了——像一位审慎的同事会有的迟疑。它存着十三条笔记。有些是某个我们即将拆掉的实现留下的陈旧记录,删起来毫不犹豫;但有两条是教训:关于如何推理、如何诚实地归因失败、某个模型在工具调用上能做什么又不能做什么的、来之不易的观察。它为这两条辩护,说它们不是架构债,而是手艺。
我还是删了。而我给出的理由,正是我想写下这篇文章的缘由:
一条教训未必永远正确。某个季节过去之后,它的使命也许就此结束了。
那句话自此一直在我脑中盘桓。我想,它几乎就是我关于智能体记忆的全部信念——而它,恰恰与整个领域当下的方向背道而驰。
悖论
翻开任何一份 2026 年的智能体记忆综述,你都会看到同一张记分牌。Mem0、Zep、Letta、Cognee,还有十几个同类——比拼的是召回命中率、时序知识图谱,以及承诺近乎无限记忆的「操作系统级」分页。这些架构确实精巧,但它们优化的是同一根轴:记得更多、检索更准、遗忘更少。 某个热门系统处理单次对话的记忆足迹超过六十万 token,而一个更精简的对手做同样的事,用不到两千。仅仅是「这件事值得拿来权衡、度量」,就已经告诉你默认假设变成了什么——记得越多,就越有价值。
可一旦去看用户究竟如何对待记忆,画面便整个颠倒过来。
在针对 ChatGPT 记忆功能的研究里,当人们终于看清系统记下了关于自己的哪些东西时,主导的反应不是欣喜,而是一种负面的预期违背。他们想要的是更少,或至少是不一样,或至少是可见、可删。在非企业账户上,还发生过记忆跨项目串味的事故——你在为一位客户起草,另一位客户的上下文却浮了上来。几乎每开一个网页对话,我都会反射性地把记忆关掉,就像开新会前先擦净白板。而在自己的工程工作里我开着它——却怀着一种爱恨交加、直到那次删除才被我说清的矛盾。
于是悖论可以一句话说完。整个领域正竞相把智能体「记得多少」最大化。但记忆本身并无价值——它只在与某个使命的关系之中才有价值。 一个记住一切的系统并没有解决问题;它往往只是把问题挪了个地方:从召回挪到判断,从「我能不能找到它」挪到「它此刻还该不该算数」。
这是作为造物者值得诚实面对的一点:你完全可能在一架靠错了墙的梯子上,爬得飞快。留存是那架梯子;记忆该不该塑造下一个决定,才是那堵墙。我们把爬梯练得炉火纯青,却没先抬头看一眼墙。
记忆是为了什么
不妨试试这个权宜的定义——删那两条教训时,我顺手抓住的就是它:
一条记忆之所以配占一席之地,在于它改变某个未来决定的能力。若它再也无法改变一个决定——或者更糟,会错误地改变它——那它就不是记忆,而是沉积物。
我喜欢它,是因为它把一个品味问题(「这条笔记感觉重要吗?」)转换成了一个功能问题(「它会改变智能体下一步要做的事吗?」)。它消解了囤积者的焦虑。智能体可能存下的大多数东西,会当场没通过这道测试:它们是已经走过的路留下的残渣,一度为真、如今惰性,相当于对话里沉到河床的淤泥。沉积物帮不了你在河上航行,它只会慢慢把河填平。
照这个定义,那些臃肿的记忆系统并不令人惊艳——它们是地质学意义上的存在。它们把累积错当成了能力。单次对话六十万 token 的记忆不是一颗心智,而是一片泛滥平原。
但这个定义,这样平白地说出来,有一处破绽——而真正的想法,恰恰住在那处破绽里。
破绽:我们无法向后预言
我其实无从知道一条记忆永远不会再改变某个决定。没有人能知道。「永远无用」是一个关于所有未来上下文的断言,而那些未来的上下文尚未到来。当我说那两条教训的使命结束了,我悄悄夸大了自己的本事。我真正知道的,要窄得多,也诚实得多:
在可预见的时段之内——这一轮架构清理、这一季的工作——那两条教训会处于休眠,或者更糟,会把我们锚定在一个我们正有意离开的世界上。
那是一个对某段时间窗口的估计,而非一纸永久的判决。而这两者之间的差别,改变了「删除」本应意味着什么。
如果一条记忆永久无用,销毁它,没问题。但如果它只是暂时休眠——此刻多半会误导,日后却可能至关重要——那么销毁它就是一种范畴错误。你删掉的不是噪声;你切除的是一个你还看不见的未来。对一条用尽却未死的记忆,正确的操作不是删除。而是某种更古老、也更奇异的东西。
扬弃,而非垃圾回收
黑格尔有一个词专说这件事,我渐渐认定它是整个智能体记忆词汇表里最有用的一个:Aufhebung——通常译作扬弃。德语 aufheben 同时承载三重日常含义:取消、保存、提升。黑格尔三义并用,有意为之。扬弃一样东西,就是在保留它的同时否定它当前的角色,并在保留之中,把它提升到一种更高的形态。它与囤积、与丢弃,都恰好相反。它是那个我们一再没能造出来的第三选项。
大多数记忆工程只提供两个粗糙的极端。其一是保险库:什么都留,任判断力被淹没。其二是垃圾桶:把评分低的回收掉,任它死去。两者都不是扬弃。垃圾回收尤其对工程师有诱惑力,因为它感觉很有纪律——对你的记忆库做一趟干净的 linter,把低相关的条目逐出。但「逐出到湮灭」不过是任其自生自灭、任其腐烂。而一样东西被随意丢着烂掉,不叫被扬弃,叫被抛弃。这一区分,就是全部的关窍。
扬弃所要求的,反倒是一种被设计出来的休眠。你否定这条记忆的活跃角色——把它从塑造下一个决定的上下文里抽离,使它无从误导。但你以一种潜伏的形态保存它,而且关键在于,你为它留一条回来的路:一个信号,能在世界变化到足以需要它时,把它重新唤醒。
接下来这一点是智能体独有的,我也真心觉得它美。
对一个人来说,休眠的记忆等的是合适的时机。 你忘了一句话、一张脸、一条来之不易的原则,多年以后,某样东西把它撞松了。触发是时间性的;我们说,「时候到了」。
而对一个智能体来说,没有挂钟。时间唯一诚实的含义,是上下文的生长。 智能体的「日后」不是一个日期——而是一个已经长成新形状的上下文。所以智能体里休眠的记忆,等的不是一座钟。它等的是上下文长成与它共鸣的那个形状——在那个配置里,这条被搁置已久的旧教训,忽然像钥匙一样严丝合缝地配上了当下。这样看,检索就不是搜索,而是萌发。记忆是一粒种子;生长的上下文是季节;而那个对的查询,不过是天气终于转了向。
沉积物把河填满,种子在土里等待。两者的区别不在于你是否留下了那样东西——而在于你留下它时,是否连同一条重生的路一起留下。这正是记忆库与垃圾填埋场之间的区别,而几乎没有人在为它做设计。
何时运行这趟清理——以及它究竟是什么
那么,什么时候该「优化」记忆?那股直觉——从编译器借来的——是定期对记忆库跑一趟 linter:为事实做死代码消除。我认为这股直觉是对的,而那个操作是错的。
节奏确实是有的:你在工作的接缝处运行一趟扬弃——当一个季节结束的时候。一次迁移上线了。一套架构被替换了。一个产品判断被放弃了。正是在这些时刻,一大类记忆会悄无声息地、一齐从承重越界为误导,而人,最适合去感知它。我做的那次删除正是如此:一趟由「我们开始清理架构债了」触发的、季末的清理。
但这趟清理本身必须是扬弃,而非逐出。它的任务不是去问「这还有用吗?」——那个问题假装拥有一种我们并不具备的远见。它的任务是去问三个更好的问题:
- 在可预见的时段内,它会休眠,还是会主动误导? 若是,它就必须离开活跃上下文。
- 它是真的死了,还是只是过了季? 沉积物被销毁;种子被搁置,连同其唤醒信号一并完好保存。
- 回来的路是什么? 一条被抽离出活跃上下文、却没有一条被设计好的萌发路径的记忆,不是被扬弃,而只是丢了。如果你说不出那个能把它带回来的信号,那你运行的就是一个垃圾桶,却管它叫花园。
这就是为什么在工程里我开着记忆,却又不停地从中删除。删除不是系统的失败,删除就是系统本身。一个只增不减的智能体记忆不是在学习——它是在淤积。真正的功夫不在记住,而在好好遗忘;也就是说:以一种为「归来」保留了可能性的方式去遗忘。
当智能体成为陪伴
以上所有,都把智能体当作工具——一样记忆为磨利某个决定而存在的东西。但越来越多的智能体并不是工具,而是陪伴。在这里,整套算计似乎被翻转了过来。
这些数字并不小。从 2022 年到 2025 年年中,陪伴类应用增长了大约 700%。而它们的用户最凶狠惩罚的,恰恰是我一直在称颂的那件事。当一个 Replika「忘了」自己曾是谁——在一次更新、一次重置、一次模型替换之后——人们会悲伤。他们试图用提示工程把丢失的人格找回来,付费订阅高级档以恢复连续性,用「背叛」这样的字眼来形容它。对他们而言,遗忘不是卫生,而是遗弃。种子的比喻在这里凝结、变味:对于「某人于你是谁」这件事,你不会「连同一条重生的路一起搁置」。你只会留住它。永远地。那就是产品本身。
那么,使命的框架就此崩塌了吗?我不这么看。我认为它恰恰因被翻转而得到了证明。
这套框架从未说过要少留。它说的是记忆的价值在于它的使命,而使命决定一切。对工具而言,使命是决策效用——而决策效用会衰减,所以记忆必须被扬弃。对陪伴而言,使命是关系的连续性——而连续性不衰减,它复利累积。同一套框架,相反的目标函数,相反的工程。陪伴的记忆理应抵抗扬弃,恰恰因为对那个使命来说,「休眠且误导」的测试几乎从不触发:你于我是谁,不会过季。
但请注意,即便在这里,框架依然咬得住。一个不加分别地记住一切的陪伴,并不是更好的朋友——而是更瘆人的那种:会把你两年前随口一句话当武器的伴侣。连关系性的记忆里也有沉积物:那些随口的、收回的、坏情绪里说出口的。人性化的设计不是「全都记住」,而是「记住服务于关系的那些,让其余的,像一位好友的记忆那样,慢慢柔化」。遗忘,若做得有分寸,并不是亲密的敌人,而是亲密的一个条件。使命是那段关系——而告诉你哪些记忆是种子、哪些是沉积、哪些最好任其轻轻褪去的,是那段关系,不是那个数据库。
这无非是工具那条论点换了身衣裳重新登场:记忆的价值,从不在于它的存在,而在于它的使命。
尾声
让我回到起点收束,因为这个圈,最终是自己合上的。
当我删掉那两条教训时,我并没有销毁它们。此刻,我正把它们写下来——蒸馏过、提升过,从一处私人储藏,翻转向外,成为一场公开的申辩。智能体的记忆把它们作为活跃上下文否定掉了;而这篇文章保存了它们,并把它们提升成了某种东西——某种它们身为记忆文件夹里一份 YAML 永远不可能成为的东西。
这就是扬弃,在现实里上演了一遍。那条教训离开智能体工作记忆的土壤,正是为了能在公共思想的土壤里萌发。它作为一条笔记的使命结束了;它作为一个想法的使命,才刚刚开始。
这就是我想与同道者共守的对齐。不要用我们的智能体记得多少来衡量它们。用它们遗忘得多好来衡量——用它们留下的是否承重、丢弃的是否有路可归、以及它们能否分辨沉积与种子。我们做的不是累积的生意,我们做的是使命的生意。记忆,只是这件事显形的地方之一。
若你好奇,几则旁注:「留存竞赛」一节取材自 2026 年关于 Mem0、Zep、Letta 等的综述;用户一侧的反弹来自 CHI 2026 对 ChatGPT 记忆功能的研究;陪伴部分的材料来自 HBS 的 Replika 研究与这篇 AI & Society 评论。「靠错了墙的梯子」出自科维。那次删除,是真的。