Sakana AI Launches Sakana Fugu: An Orchestration Model That Routes Tasks Across a Swappable Pool of Frontier LLMs

2026年6月22日 18:42

重點摘要

站內 AI 整理稿

Today, Sakana AI launched Sakana Fugu. It is a multi-agent orchestration system that behaves like one model. You send a request to a single endpoint. Fugu decides how to handle it internally. It solves a task directly when that is enough. It also assembles and coordinates a team of expert models when needed. The complexity of a multi-agent system never reaches your code. TL;DR Fugu delivers a multi-agent system behind one OpenAI-compatible API. Fugu Ultra leads most published coding and reasoning benchmarks. The orchestrator beats the individual models it coordinates. Opt-out and provider routing target compliance and single-vendor risk. Routing is proprietary, so per-query model selection stays hidden. What is Sakana Fugu Fugu is itself a language model. It is trained to call other LLMs in an agent pool. That pool includes instances of itself, called recursively. Fugu manages model selection, delegation, verification, and synthesis internally. Instead of hard-coded roles or workflows, Fugu learns how to coordinate. It decides when to delegate and how agents should communicate. It then combines their work into one answer. From the outside, you call a single model. Inside, a coordinated system of experts does the work. Sakana AI frames this as a hedge against single-vendor dependency. If one provider restricts access, Fugu routes around the disruption. The research team cites recent export controls on Anthropic’s Fable and Mythos models as motivation. Over time, newer models can be folded into the pool. Fugu and Fugu Ultra: Two Models, One API Fugu ships in two variants, both behind one OpenAI-compatible API: Fugu balances strong performance with low latency. It is a default for everyday coding, code review, and chatbots. It also fits tools like Codex. You can opt specific agents out of its pool. That helps teams meet data, privacy, and compliance requirements. Fugu Ultra is tuned for maximum answer quality on hard, multi-step problems. It coordinates a deeper pool of expert agents. Its pool is fixed, so opt-out is not available. The current model ID is fugu-ultra-20260615. The Research Behind the Orchestrator Fugu builds on two ICLR 2026 papers Trinity and the Conductor on learned orchestration. TRINITY uses a lightweight evolved coordinator across several turns. It assigns Thinker, Worker, or Verifier roles to delegate work adaptively. Conductor is trained with reinforcement learning. It discovers natural-language coordination strategies and focused prompts for diverse LLM pools. Together, they show systems can learn to assemble and route agents per task. That replaces hand-designed workflows. Interactive Explainer (function(){ window.addEventListener("message", function(e){ if (e && e.data && e.data.type === "fugu-sim-height") { var f = document.getElementById("fugu-sim-frame"); if (f && e.data.height) { f.style.height = e.data.height + "px"; } } }); })(); Benchmark Sakana AI compares Fugu against the foundation models it orchestrates. Baselines use provider-reported scores. SWE Bench Pro uses the mini-swe-agent as scaffolding. BenchmarkFuguFugu UltraOpus 4.8Gemini 3.1 ProGPT 5.5SWE Bench Pro*59.073.769.254.258.6TerminalBench 2.180.282.174.670.378.2LiveCodeBench92.993.287.888.585.3LiveCodeBench Pro87.890.884.882.988.4Humanity’s Last Exam47.250.049.844.441.4CharXiv Reasoning85.186.684.283.384.1GPQA-D95.595.592.094.393.6SciCode60.158.753.558.956.1τ³ Banking21.720.620.68.420.6Long Context Reasoning74.773.367.772.774.3MRCRv286.693.687.984.994.8 The orchestrator posts the top score on 10 of 11 rows. Fugu Ultra tops the four coding benchmarks, CharXiv Reasoning, and Humanity’s Last Exam. It ties regular Fugu on GPQA-D. Regular Fugu leads SciCode, τ³ Banking, and Long Context Reasoning. GPT 5.5 wins MRCRv2, the only baseline win here. Its Fugu models stand shoulder-to-shoulder with Anthropic’s Fable 5 and Mythos Preview. Those two are not in Fugu’s pool, since they are not publicly accessible. Use Cases Sakana AI ran a beta with close to 500 early users. The published examples favor long, multi-step tasks. AutoResearch: An agent improved a small GPT’s training recipe autonomously. It ran 123 experiments over roughly 14 hours on one H100 GPU. Fugu Ultra reached the best mean validation BPB of 0.9774, with a best single run of 0.9748. Rubik’s Cube solver: Each model wrote a pure-Python solver, no libraries allowed. Fugu Ultra solved all 300 held-out cubes, averaging 19.72 moves. One baseline matched it closely at 19.76 moves. Two others crashed and solved none. Classical Japanese kana reading order: On a 1610 letter, Fugu Ultra scored NED 0.80. The nearest baseline reached only 0.24. Blindfold chess: Fugu played four games from memory, with no board shown. It beat three frontier models and a 2100-Elo Stockfish engine. Online trading: On one 50-week window, Fugu Ultra returned +19.43% on average across five runs. The other frontier models stayed below +15%. Sakana AI notes past performance does not guarantee future results. A Minimal API Example Fugu uses an OpenAI-compatible API, so no SDK migration is required. Point an existing client at your console-provided endpoint. Copy CodeCopiedUse a different Browserfrom openai import OpenAI # Endpoint and key come from your Sakana console (console.sakana.ai). client = OpenAI( base_url="https://<your-fugu-endpoint>/v1", # from console.sakana.ai api_key="YOUR_SAKANA_API_KEY", ) resp = client.chat.completions.create( model="fugu-ultra-20260615", # or "fugu" messages=[ {"role": "user", "content": "Reproduce the method in this paper and report the gap."}, ], ) print(resp.choices[0].message.content) Token usage and cost are reported per request. So you can monitor spend in real time. Community Reactions #fugu-sent-root *{box-sizing:border-box;margin:0;padding:0} #fugu-sent-root{ --bg:#fff;--ink:#0a0a0a;--mut:#6b6b6b;--line:#dcdcdc;--soft:#f5f5f5;--soft2:#ebebeb; font-family:"IBM Plex Mono",ui-monospace,SFMono-Regular,Menlo,Consolas,monospace; background:var(--bg);color:var(--ink);border:1px solid var(--ink); max-width:920px;margin:0 auto;-webkit-font-smoothing:antialiased;line-height:1.5; } #fugu-sent-root .hd{border-bottom:1px solid var(--ink);padding:18px 20px;display:flex;justify-content:space-between;align-items:flex-start;gap:12px;flex-wrap:wrap} #fugu-sent-root .hd h2{font-size:17px;letter-spacing:.03em;font-weight:700} #fugu-sent-root .hd p{font-size:11.5px;color:var(--mut);margin-top:6px;max-width:560px} #fugu-sent-root .tag{font-size:10px;letter-spacing:.12em;text-transform:uppercase;border:1px solid var(--ink);padding:4px 8px;white-space:nowrap} #fugu-sent-root .panel{padding:18px 20px;border-bottom:1px solid var(--line)} #fugu-sent-root .lbl{font-size:10px;letter-spacing:.16em;text-transform:uppercase;color:var(--mut);margin-bottom:10px;display:block} /* overview bar */ #fugu-sent-root .obar{display:flex;height:32px;border:1px solid var(--ink);overflow:hidden} #fugu-sent-root .seg{display:flex;align-items:center;justify-content:center;white-space:nowrap;border-right:1px solid var(--ink)} #fugu-sent-root .seg:last-child{border-right:0} #fugu-sent-root .seg.sup{background:#0a0a0a} #fugu-sent-root .seg.ske{background:repeating-linear-gradient(45deg,#0a0a0a,#0a0a0a 1px,#fff 1px,#fff 6px)} #fugu-sent-root .seg.cri{background:#fff} #fugu-sent-root .seg .t{font-size:10.5px;font-weight:700;background:#fff;color:#0a0a0a;border:1px solid #0a0a0a;padding:1px 7px;line-height:1.4} #fugu-sent-root .legend{display:flex;gap:18px;flex-wrap:wrap;margin-top:12px;font-size:11px;color:var(--mut)} #fugu-sent-root .legend span{display:inline-flex;align-items:center;gap:7px} #fugu-sent-root .sw{width:14px;height:14px;border:1px solid var(--ink);display:inline-block} #fugu-sent-root .sw.sup{background:#0a0a0a} #fugu-sent-root .sw.ske{background:repeating-linear-gradient(45deg,#0a0a0a,#0a0a0a 1px,#fff 1px,#fff 6px)} #fugu-sent-root .sw.cri{background:#fff} #fugu-sent-root .summary{font-size:12.5px;margin-top:14px;border-left:3px sol

原始來源：MarkTechPost AI ↗

查看原始來源

IT之家生成式AI

為劇而生：生數 Vidu Q3 多模態大模型上線華為雲，主打文 / 圖生視頻一體化成片能力

據介紹，Vidu Q3 是全球首個「為劇而生」的視頻大模型，支持 16 秒聲畫同出、1080P 畫質，擁有穩定多鏡頭敘事與精準切鏡能力，並具備多國語言文字渲染及多語言輸出功能，可用於漫劇、短劇、影視劇等專業內容創作。

13 分鐘前閱讀分析

IT之家生成式AI

Anthropic 工程負責人：Claude Code 讓程序員更孤獨

Anthropic 工程負責人指出，Claude Code 可能讓程序員感到更孤獨。為此，團隊重拾面對面交流，舉辦編程午餐、黑客松和共同開發時段，鼓勵工程師一起工作，並互相學習不同的 AI 工作流程。

14 分鐘前閱讀分析

36氪生成式AI

Claude Code破解3500年前死語言，74年來最大考古語言學突破

這篇消息聚焦「Claude Code破解3500年前死語言，74年來最大考古語言學突破」。原始導語提到：可能是74年來最大考古語言學突破從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

1 小時前閱讀分析

智東西生成式AI

又一大模型發佈！號稱比肩Fable 5和Mythos

智東西作者 | 畢偉豪編輯 | 心緣智東西6月22日報道，今天，日本AI獨角獸Sakana AI發佈了Sakana Fugu系列編排器模型，包括Fugu Ultra和Fugu兩款模型。其中Fugu Ultra模型在工程、科學和推理基準測試中，性能接近或超越了Fable 5以及Mythos Preview等頂尖模型。和傳統大語言模型不同的是，Sakana Fugu不會自己回答問題，它會調用世界上各種模型來完成任務。簡單來說，Sakana Fugu像一個“總指揮”，會根據任務選擇最佳的模型來處理。 Fugu在日文中是河豚的意思，從官方發佈的動畫可以看出，Sakana Fugu是要用多個“小魚”匯聚成一個“大河豚”這種美味食材。 Sakana AI是一家日本AI獨角獸，成立於2023年，由Transformer論文第五作者Llion Jones聯合創辦，曾用“進化”的方式，通過小模型組合實現堪比大模型的能力。如今，他們在Sakana Fugu在技術報告提出了訓練模型的新思路：讓一個模型學會調度多個模型，把不同特長不同的大模型組織起來，形成一種“集體智能”。 Sakana AI在博客中提出，編排模型將會超越傳統大模型成為新的前沿方向。他們認為，過去幾年AI進步靠暴力堆算力和數據，但現實複雜任務需要的專業知識遠超單一模型的能力邊界。充分發揮模型的最佳性能需要集體智慧，需要知道何時該用哪個模型、什麼時候委派、怎麼組合擅長不同領域的模型。同時，這種編排不僅是技術上的進步，更是地緣政治的產物。Sakana AI從近期Anthropic模型被施加出口管制中吸取教訓，認為綁定單一供應商，訪問權限可能會一夜消失，而Fugu的底層模型池完全可替換，一家斷供就換另一家，Sakana AI稱之為“AI主權的現實藍圖”。 Sakana AI在博客中提出，Fugu本身是一個專門用於理解何時委派任

9 小時前閱讀分析

智東西生成式AI

庫克攤牌了：漲價！

智東西編譯 | 陳佳編輯 | 雲鵬智東西6月22日消息，據《華爾街日報》昨日報道，AI行業對存儲芯片的海量需求正引發一場波及全球的消費電子漲價潮。蘋果、微軟、任天堂、索尼等廠商近期密集上調產品售價，蘋果公司CEO蒂姆·庫克（Tim Cook）坦言公司已無力獨自消化上游成本上漲壓力，計劃在未來數月上調產品售價。這場漲價的根源直指AI產業擴張。DRAM和NAND閃存既是手機、筆記本、遊戲主機等消費設備的基礎硬件，也是AI企業訓練和運行大模型的關鍵資源。AI行業的爆發式增長大量吞噬了同類芯片產能，導致全球存儲芯片供應陷入嚴重緊缺。芯片廠商從中大幅獲益，全球三家存儲芯片巨頭——SK海力士、三星電子和美光科技，成為最大贏家。過去半年SK海力士與美光市值均飆漲約四倍，各自突破1萬億美元（約合人民幣6.77萬億元）。而成本壓力層層向消費者端傳導，過去一年美光一款主流數據中心內存芯片合約價漲幅接近4倍。據Investing 2026年6月20日報道，高盛下調了2026年和2027年全球智能手機出貨量預測，理由是存儲芯片價格持續上漲抑制市場需求。在新設備售價持續走高的背景下，二手翻新設備市場正迎來新一輪增長窗口，多家翻新交易平臺在品質管控和售後保障方面持續加碼。與此同時，分析師預測新建晶圓廠需要兩到三年才能投產，芯片短缺和漲價態勢還將持續數年，消費電子買家短期內難以等到降價拐點。一、各品類消費電子密集漲價，成本壓力轉嫁給消費者消費者已在各類產品上切身感受到漲價衝擊。微軟上週公佈新款Surface Pro筆記本起售價為1599美元（約合人民幣1.08萬元），較上一代產品上漲600美元（約合人民幣4062元）。任天堂5月將Switch 2遊戲主機售價上調50美元（約合人民幣339元）至499美元（約合人民幣3378元），官方就此致歉稱“為本次調價給消費者帶來的影響致以誠摯

9 小時前閱讀分析

36氪生成式AI

Claude下一代神級模型秘密出爐，Sonnet-5被曝下週上線

這篇消息聚焦「Claude下一代神級模型秘密出爐，Sonnet-5被曝下週上線」。原始導語提到：封禁，反而讓Anthropic更快了？從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

10 小時前閱讀分析

相關文章