Hugging Face BlogAI Agent

消失的崩盤:五模型經濟中的控制與湧現

2026年6月8日 13:10

重點摘要

回到文章頁面。消失的崩盤:五模型經濟中的控制與湧現。團隊文章,2026年6月8日發布。來自「Build Small Hackathon」的現場筆記,2026年6月,第三篇。在第一篇筆記中,我講了一個引以為傲的故事。我描繪了一則名為「歐娜寶庫擠兌」的林地傳說——將1929年的銀行擠兌重新包裝成森林寓言,然後看著掌管蜂蜜的貓頭鷹讀懂恐慌,開始拋售。隨後的幾個回合,過量的供應使蜂蜜價格從10暴跌至3。沒有人事先編寫劇本。一個重新包裝的銀行擠兌事件,讓某個AI代理拋售資產,而這波拋售就推動了價格變動。這正是核心論點:給予一個小型模型某個角色與預算,湧現的市場行為便會自然浮現。

站內 AI 整理稿

Back to Articles The crash that vanished: control and emergence in a five-model economy Team Article Published June 8, 2026 Upvote - Lester Leong AdmiralTaco Follow build-small-hackathon Field notes from the Build Small Hackathon, June 2026. Third installment. In the first of these notes I told a story I was proud of. I drew a Wood Legend called the Run on Oona's Hoard, a 1929 bank run reskinned as woodland folklore, and watched the owl who keeps the honey read the panic and start liquidating. The flood of supply crashed the honey price from 10 down to 3 over the next few turns. Nobody scripted it. A reskinned bank run made an agent dump an asset, and the dump moved a price. That was the whole thesis: give a small model a role and a budget, and emergent market behavior falls out for free. Then I rebuilt the wood, and the crash stopped happening. This installment is about why, because the failure taught me more about building on agents than the original success did. Five labs, five minds The rebuild swapped one model running five creatures for a council of five different labs' small models, each driving its own creature: an OpenAI model, an NVIDIA model, an OpenBMB model, and a half-billion-parameter model I fine-tuned myself running two of them. The point was honesty. If the claim is that small models can run a living economy, the strongest version of that claim is five distinct architectures making distinct choices in the same market, not one model wearing five hats. That heterogeneity is exactly what broke the story I had already written up. The price is whatever the agents decide to trade at I rebuilt the operator side too. The player is now a financier who works from the shadows: short a good, whisper a true tip to set up its fall, spring the legend, and collect when the price craters. I made that loop legible on the screen, with an objective, a scoreboard, and a one-click first trade. Making a promise visible is the fastest way to discover the promise is false. Because when I shorted honey and sprang the Run on Oona's Hoard, honey did not crash. It rose. The council models, reading a rumor that the vault was empty and a tip that the crop was doomed, did not dump honey the way the original single model had. They hoarded it. Scarcity, not a fire sale. The short lost money, and the headline the narrator wrote, with no irony, was that the honey gamble had soured. This is the lesson, and it is not specific to a game. In an agent economy the reference price is not a dial you turn. It is the residue of what the agents actually choose to trade. The original crash was real, but it was contingent on one model's disposition, not a robust property of the system. Change the population, and the emergent behavior you documented can simply evaporate. Three ways to fail I spent three live runs trying to make the crash come back by pushing on the economy from the outside, the way you would shock a textbook supply and demand model. First I left the legend as a pure rumor and trusted the agents to react. They did not sell. Second I dumped a windfall of honey into every creature's stores, reasoning that a glut would collapse demand and pull the price down. That worked beautifully against my test policy, a rule-based stand-in I use for fast offline runs, because the test policy follows a mechanical wants-threshold: flood its inventory and it stops buying. The live models ignored the windfall and traded on their own read of the room. The gambit lost again. Third I sized the short up, which only made the loss larger. Three recordings, three losses: minus fifteen, minus twenty-six, minus twenty-seven pebbles, when the entire premise was that this was how you made money. The pattern was the warning. Every lever I pulled was an input to the agents' decision, and the agents were free to decline. You cannot steer a heterogeneous population of models with a mechanical shock, because the shock only biases a choice they still get to make. The trap inside the trap is worth naming on its own. The fix that worked against my fast test policy gave me false confidence and cost me a live run to disprove. When the cheap stand-in and the real agents disagree, the stand-in is the one lying, and any result that only reproduces under the stand-in is not a result. Author the seam, do not push the inputs The resolution was to stop trying to convince the agents and to make the panic true by construction. A bank run is, definitionally, a crash. So the legend now crashes its good at settlement, after the market has finished clearing for the turn, by overwriting the reference price directly. The agents trade all they like; then the run lands as a fact, the price halves, and the short that front-ran it settles into profit. The crash is no longer a behavior I hope for. It is an authored consequence I impose at the one seam where nothing downstream can argue with it. That sounds like giving up on emergence, and it is the opposite. The emergent layer, five models trading, gossiping, hoarding, forming grudges, is still doing all the work that makes the wood feel alive. What I learned is that you do not get reliable outcomes by pushing harder on emergent inputs. You get them by choosing the precise seam at which to author a deterministic override, and leaving everything upstream free. Emergence for texture, authored control for the moments that have to happen. The craft is knowing which is which, and where the seam sits. Attempt Mechanism Honey at settlement Gambit P&L Original, one model that model chose to dump 10 to 3 the showcase win Council, rumor only five models chose to hold rose on scarcity minus 15 Council, inventory glut demand collapse, test policy only barely moved minus 26 to 27 Council, settlement override price crashed post-clearing, by fiat halved reliably plus 40 Table 1. The same gambit across four worlds. The crash was emergent and fragile under one model, absent under a heterogeneous council, and reliable only once it was authored at the settlement seam. What I took away Three things, and all three outlive the game. First, emergence is contingent, not durable. Behavior you observe and write up from one population of agents can vanish when you change the population, even if nothing else changes. Treat a single impressive run as an anecdote, not a property, until it survives a different cast. Second, you do not control a market of agents by shocking its inputs. Supply and demand levers only bias choices the agents are still free to make, and a heterogeneous council will frequently decline. Reliable outcomes come from authoring at a settlement seam, downstream of every decision, not from pushing harder upstream. Third, the cheap simulator that lets you iterate fast is also the one most likely to flatter a wrong fix. When the stand-in and the real agents disagree, believe the agents. I build agent-based market models for a living, and I have made every one of these mistakes at larger scale and higher stakes than a wood full of woodland creatures. It was useful to make them again somewhere the only thing at risk was a pile of pebbles and a story I had told too confidently the first time. Small models, big adventures, and a crash you have to author yourself. Try it: the Space. Open agent traces: the dataset. Datasets mentioned in this article 1 Spaces mentioned in this article 1 More from this author Building Pakistan Notice Helper: A Small AI Tool for a Very Local Safety Problem 1 June 8, 2026 Amazing Digital Dentures (a failed project) 1 June 7, 2026 Community EditPreview Upload images, audio, and videos by dragging in the text input, pasting, or clicking here. Tap or paste here to upload images Comment · Sign up or log in to comment Upvote - Datasets mentioned in this article 1 Spaces mentioned in this article 1

Related

相關文章

Hugging Face BlogAI Agent

MosaicLeaks: Can your research agent keep a secret?

Back to Articles MosaicLeaks: Can your research agent keep a secret? Enterprise Article Published June 18, 2026 Upvote - Alexander Gurung agurung Follow ServiceNow Rafael Pardinas rafapi-snow Follow ServiceNow TL;DR Deep research agents increasingly combine private local documents with external tools like web retrieval, creating a privacy risk: an agent's external queries may leak sensitive information. MosaicLeaks proposes a new deep-research task with multi-hop questions that interleave public and private information. Across the models we tested, agents frequently leaked private information, and training only for task performance made it worse. We propose a mosaic-leakage-aware RL training method, Privacy-Aware Deep Research (PA-DR), which raises strict chain success (the share of chains

17 小時前
量子位AI Agent

騰訊老兵+大廠00後新銳,碼上飛想做的不只是AI Coding

這篇消息聚焦「騰訊老兵+大廠00後新銳,碼上飛想做的不只是AI Coding」。原始導語提到:已接入華為鴻蒙生態 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

18 小時前

21年老牌企服公司的AI實驗:讓Agent跑一遍流程

這篇消息聚焦「21年老牌企服公司的AI實驗:讓Agent跑一遍流程」。原始導語提到:司盟企服接入騰訊雲WorkBuddy後,將海外郵件管理、審計理賬、訂單審核等高頻交付流程交給Agent先跑一遍 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

19 小時前
TechWebAI Agent

曹操出行宣佈啟動全面AI轉型,組織升級向AI原生公司邁進

曹操出行在2026國際汽車及供應鏈博覽會 上宣佈啟動全面AI轉型,併發布RoboX戰略,打造全球領先的物理AI移動科技平臺。與此同時,公司正式啟動組織升級,加快向AI原生公司邁進。為推動全面AI轉型,今年上半年,公司推進戰略聚焦,持續優化業務結構,主動收縮非核心業務,加快向AI原生公司轉型。

22 小時前