千符森林:在3B模型上運行多智能體經濟
重點摘要
回到文章 千符森林:在3B模型上運行多智能體經濟 團隊文章 發布於2026年6月5日 按讚 - Lester Leong AdmiralTaco 追蹤 build-small-hackathon 一份Build Small黑客松的現場報告,探討由30億參數組成的交易委員會能做與不能做的事。先嘗試看看:這個Space,以及開放的智能體軌跡。我為Build Small黑客松打造了千符森林。這是一個微型的經濟體系:五個 woodland creatures,各自基於Qwen2.5-3B模型運作,交易五種商品與鵝卵石,彼此閒聊、囤積、甚至恐慌。你點擊森林,就能看到泡沫、崩盤以及日益擴大的貧富差距自動上演。模型透過vLLM部署在Modal上;一個Gradio應用程式則作為觀看森林的窗口。這是一份針對工程面的現場報告,寫給那些使用小型模型進行開發的人們。
Back to Articles Thousand Token Wood: shipping a multi-agent economy on a 3B model Team Article Published June 5, 2026 Upvote - Lester Leong AdmiralTaco Follow build-small-hackathon A Build Small Hackathon field report on what a 3-billion-parameter council of traders can and cannot do. Try it first: the Space, and the open agent traces. I built Thousand Token Wood for the Build Small Hackathon. It is a tiny economy: five woodland creatures, each its own agent on Qwen2.5-3B, trade five goods for pebbles, gossip, hoard, and panic. You poke the wood and watch bubbles, crashes, and a widening wealth gap appear on their own. The model is served with vLLM on Modal; a Gradio app is the window onto the wood. This is a field report on the engineering, written for people who build with small models. The short version: a 3B model is a reliable format generator and an unreliable reasoner, emergent systems need designed scarcity, and the best demos sit where a technical constraint meets something you already understand deeply. Why small is the design, not the limit A living economy needs many agents thinking many times per run. That is exactly where a frontier model is the wrong tool: too slow and too costly to run a council of traders every tick. A small model is what makes a real-time multi-agent simulation feasible. Every creature decides in a single batched GPU call per turn. The first economy was dead on arrival The naive version did nothing. Production outran consumption, so every creature was self-sufficient and never had a reason to trade. The market cleared once and went silent. The fix was to engineer scarcity: Diet variety: a creature can eat only one unit of any single food per meal, so surviving means buying foods it does not grow. Spoilage: perishable food rots if hoarded, forcing surplus to be sold while it still has value. A winter fuel crisis: every creature must burn firewood each turn, the need rises over time, and only one creature makes firewood. That last mechanic drives the drama. One supplier cannot meet rising demand, so the woodcutter gets rich and everyone else competes for warmth. Valid JSON, weak judgment With scarcity in place, the honest small-model lesson surfaced. The 3B emitted valid JSON on 100% of calls, but its economic judgment was poor: a creature that produced acorns would post an order to buy acorns, the one thing it had in surplus. The fix was not a bigger model, it was a sharper prompt. I told each agent what it produced and must never buy, computed the exact list of goods it was short on, and gave it one worked example. Decision quality jumped and the creatures began trading to their roles. The whole loop is wrapped in a tolerant JSON parse-and-repair layer, so a malformed response degrades to a no-op instead of crashing the simulation. A second lesson came from wellbeing. I first modeled it as an accumulator, and any chronic shortfall ground every creature to zero over a run, a death spiral that was no fun to watch and that punished the agents' imperfect optimization. I reframed it as a mean-reverting mood that recovers when a creature is fed and warm and never hits zero. Stakes belong in pebbles, prices, and status, not starvation. Then it started telling stories The feature I am most pleased with ties the project to market history. The player can draw a Wood Legend: a famous episode reskinned as woodland folklore. Tulip Mania becomes the Great Acorn Mania. The South Sea Bubble becomes the Hollow Log Trading Company. The 1929 bank runs become the Run on Oona's Hoard. These are not flavor text. Each legend fires real shocks, and the agents react. In one run I drew the Run on Oona's Hoard, the rumor that the owl's vault was empty. Oona began liquidating her honey to raise pebbles, and the flood of supply crashed the honey price from 10 to 3 over the next turns. A reskinned bank run made an agent dump assets and moved a market price. None of it was scripted. For that to be visible, prices had to move. They were frozen because the agents quoted back the reference price I showed them. The fix was to let the market reference drift with residual supply and demand after each round: heavy unfilled buying pushes a price up, a glut pushes it down. Prices now trend during scarcity and stay calm in balanced trade. What actually happened A representative fifteen-turn run, with a drought and a winter rumor injected partway: Metric Result Valid JSON actions 100% (75 of 75 calls) Trades per turn sustained 3 to 9, never silent Honey price crashed 10 to 3 during the bank-run legend Firewood price rose 4 to 7 as winter scarcity bit Wealth gap (Gini) widened 0.14 to 0.38 Outcome the woodcutter ended richest, the hoarder broke The reasoning behind every one of those moves is in the open traces dataset: each row is a creature's full prompt, raw response, parsed actions, and private thought. Takeaways for building with small models Most of the engineering is closing the gap between a small model's reliable formatting and its unreliable reasoning, with structure and prompting rather than scale. Emergent systems need designed scarcity; abundance is boring. And the most compelling small-model demos do not need invented drama. Three centuries of market history had it ready, and a council of 3B agents was enough to play it out. Small models, big adventures. Try the Space. Originally published on Medium. Datasets mentioned in this article 1 Spaces mentioned in this article 1 Community EditPreview Upload images, audio, and videos by dragging in the text input, pasting, or clicking here. Tap or paste here to upload images Comment · Sign up or log in to comment Upvote - Datasets mentioned in this article 1 Spaces mentioned in this article 1
Related
相關文章

Edge AI Daily 早報(6月19日)
AI Engineer World's Fair 2026規模再創新高,標誌AI工程從幕後走向舞臺中央。行業面臨結構性調整:楊立昆警示OpenAI年虧210億美元揭示商業模式脆弱性,Transformer之父轉投OpenAI反映人才爭奪白熱化。Anthropic多線佈局——語音支持七種語言、加入碳清除聯盟、落子首爾辦事處,展現生態擴張野心。監管壓力加劇,意大利依據DMA調查蘋果iCloud,巴西開放iOS側載佣金降至5%,蘋果圍牆花園持續崩塌。

今天起,Claude Design要把設計師和程序員變成同一種人了
猝不及防!Anthropic深夜甩出Claude Design大更新,設計系統一鍵導入,代碼雙向同步,9大平臺一鍵導出。Anthropic設計師親自下場錄屏:AI跑了八輪自查,才敢把設計稿給你看。

OpenAI 成為 Rust 基金會白金會員,合計贊助 60 萬美元
OpenAI 正式成為 Rust 基金會白金會員,將提供總計 60 萬美元資金,用於支持 Rust 開源項目維護者及 Rust 創新實驗室等計劃。這標誌著 AI 巨頭對安全、高效系統編程語言的重視。 #OpenAI #Rust #開源

Claude Design 上線首周用戶破百萬,和 Claude Code 共享 AI 配額
Anthropic 今天(6 月 18 日)發佈公告,在宣佈 Claude Design 上線首周用戶規模突破 100 萬後,進一步強化和 Claude Code 的雙向聯動,實現從設計到編程的無縫工作流。
谷歌時隔6年再發智能音箱,Gemini上桌,售價不到700元
智東西 編譯 | 劉煜 編輯 | 陳駿達 智東西6月18日消息,谷歌昨日宣佈,其首款搭載居家版Gemini語音助手的智能音箱(Google Home Speaker)已開啟預售,將於當地時間6月25日正式上市,售價為99.99美元(約合人民幣677.03元)。在此之前,谷歌已有6年沒有推出過獨立智能音箱產品。 谷歌這款智能音箱外觀近似球形,風格類似亞馬遜新一代Echo音箱與蘋果舊款音箱HomePod Mini。 ▲谷歌智能音箱(圖源:谷歌官網) 使用音箱時,用戶只需通過口令“Hey Google”或“OK Google”喚醒Gemini,就可以繼續下達相應指令。這與谷歌舊款音箱、智能顯示屏等喚醒語音助手的方式相同。此外,用戶只要按照日常說話習慣下達命令,Gemini便能理解用戶意圖,相比之前大大提升溝通效率。 一、加強短時對話記憶,會員可與Gemini不限次數對話 谷歌此次推出的全新音箱升級諸多功能。其中,音箱搭載的Gemini語音助手擁有10款全新擬人化語音音色,用戶可以根據喜好自行選擇聲線。音箱還可支持用戶一次性下達多條語音指令,即使指令未能說對、說完整,用戶中途改口Gemini也能識別。 Gemini還具備多鏈路推理能力,落地到實際生活場景中比較實用。例如,用戶問:“我支持的足球隊下場比賽天氣如何?”Gemini收到指令後,會自動查詢賽事時間、舉辦地點,同時匹配相應時段天氣,再給出答覆。 同時,Gemini加強了短時對話記憶,能承接上下文實現連續對話功能。即使用戶連續追問、甚至串聯多項任務、不重複交代前置條件,該語音助手也能實現來回連貫交流。 ▲谷歌Gemini對話場景(圖源:谷歌官網) 不僅如此,Gemini搭配的連續對話功能,能讓應答後的音箱麥克風保持短暫收音,用戶無需重複喊“OK Google”就能繼續提問。該功能現已全面支持所有Gemini原生適配的語言,包括

微軟,考慮接入DeepSeek
這篇消息聚焦「微軟,考慮接入DeepSeek」。原始導語提到:Copilot Cowork轉為按量計費。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。