MarkTechPost AI生成式AI

認識『North Mini Code』:Cohere 推出 30B 開放權重混合專家模型,僅 3B 參數活躍,專為自主編碼設計

2026年6月11日 08:33

重點摘要

本週,Cohere AI 團隊發布了其首款面向開發者的編碼模型『North Mini Code』。該模型採用開放權重,專注服務軟體工程師。它是一個混合專家(MoE)模型,總參數達 30B,但每個 token 僅啟動 3B 參數。此次發佈圍繞『主權 AI』理念展開:讓團隊依照自身需求執行強大模型。小巧高效的編碼模型讓團隊無需大型 GPU 集群即可自行部署。North Mini Code 正是為填補這項缺口而生。North Mini Code 參數規模為 30B-A3B,其中 A3B 代表每次前向傳播有 30 億活躍參數。Cohere 針對三項任務進行優化:程式碼生成、自主軟體工程以及終端操作。該模型為純文字輸入輸出,不支援圖像或影片處理。

站內 AI 整理稿

This week, Cohere AI team shipped its first developer-facing coding model named ‘North Mini Code‘. ‘North Mini Code’ is open-weight and focused at software engineers. It is a mixture-of-experts (MoE) model with 30B total parameters. Only 3B of those parameters activate per token. The release is positioned around “sovereign” AI. The idea is simple: run capable models on your own terms. Small, efficient coding models let teams self-host without large GPU clusters. North Mini Code targets that gap directly. North Mini Code North Mini Code is a 30B-A3B parameter model. The A3B stands for three billion active parameters per forward pass. Cohere optimized it for three jobs: code generation, agentic software engineering, and terminal tasks. The model is text-in, text-out. There is no image or video input. The context window is 256K tokens. Maximum output length is 64K tokens. Cohere lists a minimum hardware bar of one H100 at FP8. Weights ship under Apache 2.0 on Hugging Face. You can also reach it through the Cohere API, Model Vault, and OpenRouter. FieldNorth-Mini-Code-1.0LicenseApache 2.0 Model size30B total; 3B activeContext length256K total; 64K max generationOptimized forCode generation, agentic software engineering, terminal tasksAvailabilityHugging Face, Cohere API, Cohere Model Vault, OpenRouterHardware (minimum)1× H100 @ FP8 The Architecture North Mini Code is a decoder-only Transformer with sparse MoE layers. Its attention interleaves two types in a 3:1 ratio. Sliding-window attention uses RoPE for positions. Global attention uses no positional embeddings at all. The feed-forward block holds 128 experts. Eight experts activate per token. Each expert is an FFN with SwiGLU activation. The router applies a sigmoid before top-k selection. A single dense layer sits before the sparse layers. That mix keeps active compute small while widening total capacity. Cohere released the weights in BF16. Post-training ran in two phases. First came two-stage cascaded supervised fine-tuning (SFT). Then came reinforcement learning with verifiable rewards (RLVR). The post-training focused on agentic coding. The model also supports interleaved thinking and native tool use. Benchmarks Cohere reports a 33.4 on the Artificial Analysis Coding Index. It describes this as a competitive position among similarly sized models. The company evaluated on SWE-Bench Verified, SWE-Bench Pro, and Terminal-Bench v2. It also used Terminal-Bench Hard, SciCode, and LiveCodeBench v6. The methodology is specific. SWE-Bench used the SWE-agent harness v1.1.0. Terminal-Bench v2 used a simple ReAct harness with one terminal tool. Terminal-Bench Hard used the Terminus-2 harness. Each benchmark ran with three seeds, then averaged. Sampling used temperature 1.0 and top_p 0.95. The Speed In Cohere’s internal tests, North Mini Code reached up to 2.8x higher output throughput. That held at identical concurrency and hardware. It also showed a 30% edge in inter-token latency. Time-to-first-token was closer between the two. Devstral Small 2 kept a slight TTFT lead. MetricNorth Mini Code vs Devstral Small 2Output throughputUp to 2.8x higher (same concurrency and hardware)Inter-token latency30% better for North Mini CodeTime-to-first-tokenSlightly behind Devstral Small 2 Use Cases With Examples Cohere built North Mini Code for agentic workflows. Three patterns stand out in its own framing: Sub-agent orchestration: A main agent delegates subtasks to helpers. Example: one agent writes unit tests while another fixes failing code. Systems architecture mapping: The model reads a repository and sketches its structure. Example: tracing how services call each other before a large refactor. Code reviews: The model scans a diff for problems. Example: flagging an unguarded null dereference before a merge. Terminal tasks fit the model as well. Example: listing files, running a build, then parsing the output for errors. Getting Started The fastest path is Hugging Face Transformers. Install Transformers from source for this model. Recommended sampling is temperature 1.0 and top_p 0.95. Copy CodeCopiedUse a different Browser# Install Transformers from source (required for this model): # pip install "git+https://github.com/huggingface/transformers.git" from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "CohereLabs/North-Mini-Code-1.0" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto") prompt = "Write a python program to check if a string is a palindrome or not." messages = [{"role": "user", "content": prompt}] # return_dict=True yields a dict (input_ids + attention_mask) so **inputs unpacks cleanly inputs = tokenizer.apply_chat_template( messages, tokenize=True, add_generation_prompt=True, return_dict=True, return_tensors="pt", ).to(model.device) gen_tokens = model.generate( **inputs, max_new_tokens=1024, do_sample=True, temperature=1.0, top_p=0.95, ) # Decode only the newly generated tokens, not the prompt output = tokenizer.decode(gen_tokens[0][inputs["input_ids"].shape[-1]:]) print(output) For serving, vLLM works. You need vLLM main plus Cohere’s melody library. Accurate response parsing depends on it. Copy CodeCopiedUse a different Browseruv pip install "git+https://github.com/vllm-project/vllm.git" uv pip install "cohere_melody>=0.9.0" vllm serve CohereLabs/North-Mini-Code-1.0 \ -tp 2 \ --max-model-len 320000 \ --tool-call-parser cohere_command4 \ --reasoning-parser cohere_command4 \ --enable-auto-tool-choice Quantized builds exist for Ollama, LM Studio, and llama.cpp. You can also try the model before downloading. Cohere offers free access through OpenCode and a hosted Hugging Face Space. Key Takeaways Cohere’s first coding model, North Mini Code, is a 30B mixture-of-experts that activates just 3B parameters per token. It runs on a single H100 at FP8, with 256K context and 64K max output. Weights ship under Apache 2.0, though the Hugging Face card adds a non-commercial note. Cohere official release reports 33.4 on the Artificial Analysis Coding Index, and up to 2.8x throughput over Devstral Small 2. Built for agentic coding—sub-agent orchestration, architecture mapping, code reviews with native tool use Marktechpost’s Interactive Explainer /* ---- wpautop suppression (scoped) ---- */ #mtp-nmc hr, #mtp-nmc p:empty, #mtp-nmc del, #mtp-nmc s { display:none !important; } /* ---- tokens + reset ---- */ #mtp-nmc{ --coral:#FF7759 !important; --coral-deep:#E8553B !important; --purple:#B57BD6 !important; --cream:#FBF7F0 !important; --paper:#FFFFFF !important; --ink:#1C1B1A !important; --soft:#6B6560 !important; --line:#ECE5D8 !important; --good:#2E7D5B !important; all:initial !important; display:block !important; box-sizing:border-box !important; width:100% !important; max-width:920px !important; margin:0 auto !important; background:var(--cream) !important; color:var(--ink) !important; font-family:-apple-system,BlinkMacSystemFont,"Segoe UI",Roboto,Helvetica,Arial,sans-serif !important; -webkit-font-smoothing:antialiased !important; border:1px solid var(--line) !important; border-radius:18px !important; overflow:hidden !important; line-height:1.5 !important; } #mtp-nmc *,#mtp-nmc *::before,#mtp-nmc *::after{ box-sizing:border-box !important; } /* ---- header ---- */ #mtp-nmc .nmc-head{ padding:30px 30px 24px !important; background: radial-gradient(120% 140% at 100% 0%, rgba(255,119,89,.16) 0%, rgba(255,119,89,0) 55%), var(--cream) !important; border-bottom:1px solid var(--line) !important; } #mtp-nmc .nmc-eyebrow{ display:inline-flex !important; align-items:center !important; gap:8px !important; font-size:11px !important; font-weight:700 !important; letter-spacing:.16em !important; text-transform:uppercase !important; color:var(--coral-deep) !important; margin:0 0 12px !important; } #mtp-nmc .nmc-dot{ width:7px !important; height:7px !important; border-radius:50% !important; background:var(--coral) !important; display:inline-bl

Related

相關文章

鈦媒體生成式AI

Edge AI Daily 早報(6月19日)

AI Engineer World's Fair 2026規模再創新高,標誌AI工程從幕後走向舞臺中央。行業面臨結構性調整:楊立昆警示OpenAI年虧210億美元揭示商業模式脆弱性,Transformer之父轉投OpenAI反映人才爭奪白熱化。Anthropic多線佈局——語音支持七種語言、加入碳清除聯盟、落子首爾辦事處,展現生態擴張野心。監管壓力加劇,意大利依據DMA調查蘋果iCloud,巴西開放iOS側載佣金降至5%,蘋果圍牆花園持續崩塌。

3 小時前
智東西生成式AI

谷歌時隔6年再發智能音箱,Gemini上桌,售價不到700元

智東西 編譯 | 劉煜 編輯 | 陳駿達 智東西6月18日消息,谷歌昨日宣佈,其首款搭載居家版Gemini語音助手的智能音箱(Google Home Speaker)已開啟預售,將於當地時間6月25日正式上市,售價為99.99美元(約合人民幣677.03元)。在此之前,谷歌已有6年沒有推出過獨立智能音箱產品。 谷歌這款智能音箱外觀近似球形,風格類似亞馬遜新一代Echo音箱與蘋果舊款音箱HomePod Mini。 ▲谷歌智能音箱(圖源:谷歌官網) 使用音箱時,用戶只需通過口令“Hey Google”或“OK Google”喚醒Gemini,就可以繼續下達相應指令。這與谷歌舊款音箱、智能顯示屏等喚醒語音助手的方式相同。此外,用戶只要按照日常說話習慣下達命令,Gemini便能理解用戶意圖,相比之前大大提升溝通效率。 一、加強短時對話記憶,會員可與Gemini不限次數對話 谷歌此次推出的全新音箱升級諸多功能。其中,音箱搭載的Gemini語音助手擁有10款全新擬人化語音音色,用戶可以根據喜好自行選擇聲線。音箱還可支持用戶一次性下達多條語音指令,即使指令未能說對、說完整,用戶中途改口Gemini也能識別。 Gemini還具備多鏈路推理能力,落地到實際生活場景中比較實用。例如,用戶問:“我支持的足球隊下場比賽天氣如何?”Gemini收到指令後,會自動查詢賽事時間、舉辦地點,同時匹配相應時段天氣,再給出答覆。 同時,Gemini加強了短時對話記憶,能承接上下文實現連續對話功能。即使用戶連續追問、甚至串聯多項任務、不重複交代前置條件,該語音助手也能實現來回連貫交流。 ▲谷歌Gemini對話場景(圖源:谷歌官網) 不僅如此,Gemini搭配的連續對話功能,能讓應答後的音箱麥克風保持短暫收音,用戶無需重複喊“OK Google”就能繼續提問。該功能現已全面支持所有Gemini原生適配的語言,包括

23 小時前

微軟,考慮接入DeepSeek

這篇消息聚焦「微軟,考慮接入DeepSeek」。原始導語提到:Copilot Cowork轉為按量計費。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

1 天前