認識『North Mini Code』：Cohere 推出 30B 開放權重混合專家模型，僅 3B 參數活躍，專為自主編碼設計

2026年6月11日 08:33

重點摘要

本週，Cohere AI 團隊發布了其首款面向開發者的編碼模型『North Mini Code』。該模型採用開放權重，專注服務軟體工程師。它是一個混合專家（MoE）模型，總參數達 30B，但每個 token 僅啟動 3B 參數。此次發佈圍繞『主權 AI』理念展開：讓團隊依照自身需求執行強大模型。小巧高效的編碼模型讓團隊無需大型 GPU 集群即可自行部署。North Mini Code 正是為填補這項缺口而生。North Mini Code 參數規模為 30B-A3B，其中 A3B 代表每次前向傳播有 30 億活躍參數。Cohere 針對三項任務進行優化：程式碼生成、自主軟體工程以及終端操作。該模型為純文字輸入輸出，不支援圖像或影片處理。

站內 AI 整理稿

This week, Cohere AI team shipped its first developer-facing coding model named ‘North Mini Code‘. ‘North Mini Code’ is open-weight and focused at software engineers. It is a mixture-of-experts (MoE) model with 30B total parameters. Only 3B of those parameters activate per token. The release is positioned around “sovereign” AI. The idea is simple: run capable models on your own terms. Small, efficient coding models let teams self-host without large GPU clusters. North Mini Code targets that gap directly. North Mini Code North Mini Code is a 30B-A3B parameter model. The A3B stands for three billion active parameters per forward pass. Cohere optimized it for three jobs: code generation, agentic software engineering, and terminal tasks. The model is text-in, text-out. There is no image or video input. The context window is 256K tokens. Maximum output length is 64K tokens. Cohere lists a minimum hardware bar of one H100 at FP8. Weights ship under Apache 2.0 on Hugging Face. You can also reach it through the Cohere API, Model Vault, and OpenRouter. FieldNorth-Mini-Code-1.0LicenseApache 2.0 Model size30B total; 3B activeContext length256K total; 64K max generationOptimized forCode generation, agentic software engineering, terminal tasksAvailabilityHugging Face, Cohere API, Cohere Model Vault, OpenRouterHardware (minimum)1× H100 @ FP8 The Architecture North Mini Code is a decoder-only Transformer with sparse MoE layers. Its attention interleaves two types in a 3:1 ratio. Sliding-window attention uses RoPE for positions. Global attention uses no positional embeddings at all. The feed-forward block holds 128 experts. Eight experts activate per token. Each expert is an FFN with SwiGLU activation. The router applies a sigmoid before top-k selection. A single dense layer sits before the sparse layers. That mix keeps active compute small while widening total capacity. Cohere released the weights in BF16. Post-training ran in two phases. First came two-stage cascaded supervised fine-tuning (SFT). Then came reinforcement learning with verifiable rewards (RLVR). The post-training focused on agentic coding. The model also supports interleaved thinking and native tool use. Benchmarks Cohere reports a 33.4 on the Artificial Analysis Coding Index. It describes this as a competitive position among similarly sized models. The company evaluated on SWE-Bench Verified, SWE-Bench Pro, and Terminal-Bench v2. It also used Terminal-Bench Hard, SciCode, and LiveCodeBench v6. The methodology is specific. SWE-Bench used the SWE-agent harness v1.1.0. Terminal-Bench v2 used a simple ReAct harness with one terminal tool. Terminal-Bench Hard used the Terminus-2 harness. Each benchmark ran with three seeds, then averaged. Sampling used temperature 1.0 and top_p 0.95. The Speed In Cohere’s internal tests, North Mini Code reached up to 2.8x higher output throughput. That held at identical concurrency and hardware. It also showed a 30% edge in inter-token latency. Time-to-first-token was closer between the two. Devstral Small 2 kept a slight TTFT lead. MetricNorth Mini Code vs Devstral Small 2Output throughputUp to 2.8x higher (same concurrency and hardware)Inter-token latency30% better for North Mini CodeTime-to-first-tokenSlightly behind Devstral Small 2 Use Cases With Examples Cohere built North Mini Code for agentic workflows. Three patterns stand out in its own framing: Sub-agent orchestration: A main agent delegates subtasks to helpers. Example: one agent writes unit tests while another fixes failing code. Systems architecture mapping: The model reads a repository and sketches its structure. Example: tracing how services call each other before a large refactor. Code reviews: The model scans a diff for problems. Example: flagging an unguarded null dereference before a merge. Terminal tasks fit the model as well. Example: listing files, running a build, then parsing the output for errors. Getting Started The fastest path is Hugging Face Transformers. Install Transformers from source for this model. Recommended sampling is temperature 1.0 and top_p 0.95. Copy CodeCopiedUse a different Browser# Install Transformers from source (required for this model): # pip install "git+https://github.com/huggingface/transformers.git" from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "CohereLabs/North-Mini-Code-1.0" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") prompt = "Write a python program to check if a string is a palindrome or not." messages = [{"role": "user", "content": prompt}] # return_dict=True yields a dict (input_ids + attention_mask) so **inputs unpacks cleanly inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt", ).to(model.device) gen_tokens = model.generate( **inputs, max_new_tokens=1024, do_sample=True, temperature=1.0, top_p=0.95, ) # Decode only the newly generated tokens, not the prompt output = tokenizer.decode(gen_tokens[0][inputs["input_ids"].shape[-1]:]) print(output) For serving, vLLM works. You need vLLM main plus Cohere’s melody library. Accurate response parsing depends on it. Copy CodeCopiedUse a different Browseruv pip install "git+https://github.com/vllm-project/vllm.git" uv pip install "cohere_melody>=0.9.0" vllm serve CohereLabs/North-Mini-Code-1.0 \ -tp 2 \ --max-model-len 320000 \ --tool-call-parser cohere_command4 \ --reasoning-parser cohere_command4 \ --enable-auto-tool-choice Quantized builds exist for Ollama, LM Studio, and llama.cpp. You can also try the model before downloading. Cohere offers free access through OpenCode and a hosted Hugging Face Space. Key Takeaways Cohere’s first coding model, North Mini Code, is a 30B mixture-of-experts that activates just 3B parameters per token. It runs on a single H100 at FP8, with 256K context and 64K max output. Weights ship under Apache 2.0, though the Hugging Face card adds a non-commercial note. Cohere official release reports 33.4 on the Artificial Analysis Coding Index, and up to 2.8x throughput over Devstral Small 2. Built for agentic coding—sub-agent orchestration, architecture mapping, code reviews with native tool use Marktechpost’s Interactive Explainer /* ---- wpautop suppression (scoped) ---- */ #mtp-nmc hr, #mtp-nmc p:empty, #mtp-nmc del, #mtp-nmc s { display:none !important; } /* ---- tokens + reset ---- */ #mtp-nmc{ --coral:#FF7759 !important; --coral-deep:#E8553B !important; --purple:#B57BD6 !important; --cream:#FBF7F0 !important; --paper:#FFFFFF !important; --ink:#1C1B1A !important; --soft:#6B6560 !important; --line:#ECE5D8 !important; --good:#2E7D5B !important; all:initial !important; display:block !important; box-sizing:border-box !important; width:100% !important; max-width:920px !important; margin:0 auto !important; background:var(--cream) !important; color:var(--ink) !important; font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif !important; -webkit-font-smoothing:antialiased !important; border:1px solid var(--line) !important; border-radius:18px !important; overflow:hidden !important; line-height:1.5 !important; } #mtp-nmc *,#mtp-nmc *::before,#mtp-nmc *::after{ box-sizing:border-box !important; } /* ---- header ---- */ #mtp-nmc .nmc-head{ padding:30px 30px 24px !important; background: radial-gradient(120% 140% at 100% 0%, rgba(255,119,89,.16) 0%, rgba(255,119,89,0) 55%), var(--cream) !important; border-bottom:1px solid var(--line) !important; } #mtp-nmc .nmc-eyebrow{ display:inline-flex !important; align-items:center !important; gap:8px !important; font-size:11px !important; font-weight:700 !important; letter-spacing:.16em !important; text-transform:uppercase !important; color:var(--coral-deep) !important; margin:0 0 12px !important; } #mtp-nmc .nmc-dot{ width:7px !important; height:7px !important; border-radius:50% !important; background:var(--coral) !important; display:inline-bl

原始來源：MarkTechPost AI ↗

查看原始來源

鈦媒體生成式AI

Edge AI Daily 早報（6月19日）

AI Engineer World's Fair 2026規模再創新高，標誌AI工程從幕後走向舞臺中央。行業面臨結構性調整：楊立昆警示OpenAI年虧210億美元揭示商業模式脆弱性，Transformer之父轉投OpenAI反映人才爭奪白熱化。Anthropic多線佈局——語音支持七種語言、加入碳清除聯盟、落子首爾辦事處，展現生態擴張野心。監管壓力加劇，意大利依據DMA調查蘋果iCloud，巴西開放iOS側載佣金降至5%，蘋果圍牆花園持續崩塌。

3 小時前閱讀分析

36氪生成式AI

今天起，Claude Design要把設計師和程序員變成同一種人了

猝不及防！Anthropic深夜甩出Claude Design大更新，設計系統一鍵導入，代碼雙向同步，9大平臺一鍵導出。Anthropic設計師親自下場錄屏：AI跑了八輪自查，才敢把設計稿給你看。

16 小時前閱讀分析

IT之家生成式AI

OpenAI 成為 Rust 基金會白金會員，合計贊助 60 萬美元

OpenAI 正式成為 Rust 基金會白金會員，將提供總計 60 萬美元資金，用於支持 Rust 開源項目維護者及 Rust 創新實驗室等計劃。這標誌著 AI 巨頭對安全、高效系統編程語言的重視。 #OpenAI #Rust #開源

19 小時前閱讀分析

IT之家生成式AI

Claude Design 上線首周用戶破百萬，和 Claude Code 共享 AI 配額

Anthropic 今天（6 月 18 日）發佈公告，在宣佈 Claude Design 上線首周用戶規模突破 100 萬後，進一步強化和 Claude Code 的雙向聯動，實現從設計到編程的無縫工作流。

20 小時前閱讀分析

智東西生成式AI

谷歌時隔6年再發智能音箱，Gemini上桌，售價不到700元

智東西編譯 | 劉煜編輯 | 陳駿達智東西6月18日消息，谷歌昨日宣佈，其首款搭載居家版Gemini語音助手的智能音箱（Google Home Speaker）已開啟預售，將於當地時間6月25日正式上市，售價為99.99美元（約合人民幣677.03元）。在此之前，谷歌已有6年沒有推出過獨立智能音箱產品。谷歌這款智能音箱外觀近似球形，風格類似亞馬遜新一代Echo音箱與蘋果舊款音箱HomePod Mini。 ▲谷歌智能音箱（圖源：谷歌官網）使用音箱時，用戶只需通過口令“Hey Google”或“OK Google”喚醒Gemini，就可以繼續下達相應指令。這與谷歌舊款音箱、智能顯示屏等喚醒語音助手的方式相同。此外，用戶只要按照日常說話習慣下達命令，Gemini便能理解用戶意圖，相比之前大大提升溝通效率。一、加強短時對話記憶，會員可與Gemini不限次數對話谷歌此次推出的全新音箱升級諸多功能。其中，音箱搭載的Gemini語音助手擁有10款全新擬人化語音音色，用戶可以根據喜好自行選擇聲線。音箱還可支持用戶一次性下達多條語音指令，即使指令未能說對、說完整，用戶中途改口Gemini也能識別。 Gemini還具備多鏈路推理能力，落地到實際生活場景中比較實用。例如，用戶問：“我支持的足球隊下場比賽天氣如何？”Gemini收到指令後，會自動查詢賽事時間、舉辦地點，同時匹配相應時段天氣，再給出答覆。同時，Gemini加強了短時對話記憶，能承接上下文實現連續對話功能。即使用戶連續追問、甚至串聯多項任務、不重複交代前置條件，該語音助手也能實現來回連貫交流。 ▲谷歌Gemini對話場景（圖源：谷歌官網）不僅如此，Gemini搭配的連續對話功能，能讓應答後的音箱麥克風保持短暫收音，用戶無需重複喊“OK Google”就能繼續提問。該功能現已全面支持所有Gemini原生適配的語言，包括

23 小時前閱讀分析

36氪生成式AI

微軟，考慮接入DeepSeek

這篇消息聚焦「微軟，考慮接入DeepSeek」。原始導語提到：Copilot Cowork轉為按量計費。從 AI 情報角度來看，這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

1 天前閱讀分析

相關文章