Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains
重點摘要
Back to Articles Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains Team Article Published June 1, 2026 Upvote 5 Nikita Pavlichenko pavlichenko Follow JetBrains Mellum2 is a 12B-parameter Mixture-of-Experts model trained from scratch on natural language and code. The model activates only 2.5B parameters per token, making it efficient for high-throughput, low-latency inference. Mellum2 is can be used for routing, RAG, summarization, sub-agents, high-throughput coding features, and private deployments. It is released under the Apache 2.0 license. Compared with similar-sized models, Mellum2 delivers competitive benchmark performance while achieving more than 2x faster inference. Download the model on Hugging Face: https://huggingface.co/collections/JetBrains/mellum-2 For archite
Back to Articles Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains Team Article Published June 1, 2026 Upvote 5 Nikita Pavlichenko pavlichenko Follow JetBrains Mellum2 is a 12B-parameter Mixture-of-Experts model trained from scratch on natural language and code. The model activates only 2.5B parameters per token, making it efficient for high-throughput, low-latency inference. Mellum2 is can be used for routing, RAG, summarization, sub-agents, high-throughput coding features, and private deployments. It is released under the Apache 2.0 license. Compared with similar-sized models, Mellum2 delivers competitive benchmark performance while achieving more than 2x faster inference. Download the model on Hugging Face: https://huggingface.co/collections/JetBrains/mellum-2 For architecture details, training setup, benchmarks, and evaluation methodology, read the full technical report: https://arxiv.org/pdf/2605.31268 Today we’re releasing Mellum2, an open Mixture-of-Experts model optimized for low-latency text-and-code workloads. Mellum originally started as a code completion model. With Mellum2, we extend that foundation to a broader set of natural language and software engineering tasks while keeping the model focused on efficient inference and deployability. Modern AI systems increasingly rely on multiple model calls: routing, retrieval, summarization, planning, validation, and tool use. Many of these operations are latency-sensitive and do not require the largest available model. Mellum2 targets these workloads. Benchmark highlights In our technical report, we evaluate Mellum2 across code generation, reasoning, science, and math benchmarks. Mellum2 is competitive with similarly sized open models while delivering more than 2x faster inference, making it suitable for high-throughput production workloads. Model architecture Mellum2 is a Mixture-of-Experts model: Model Total parameters Active parameters per token Modality License Mellum2 12B 2.5B Text and code Apache 2.0 The MoE architecture keeps total model capacity high while activating only a subset of parameters for each token. This makes inference more efficient and helps reduce serving cost for real-time workloads. Mellum2 is intentionally focused on text and code rather than multimodal tasks. This specialization keeps the model compact and efficient for software engineering workloads. Key use cases Routing and orchestration Mellum2 works well as a lightweight routing and orchestration model in multi-model systems, including prompt classification, tool selection, and intermediate control-flow steps. RAG pipelines The model is well suited for latency-sensitive retrieval pipelines, including context compression, summarization, and retrieval post-processing. Sub-agents Mellum2 can be used for agent subtasks such as planning, validation, transformation, and context preparation, reducing the need to invoke larger models for intermediate operations. Private deployment Because Mellum2 is open and efficient to serve, it can be deployed in self-hosted environments involving proprietary code or internal data. Why well-scoped models matter As AI systems mature, the most effective architectures are becoming less monolithic. A single frontier model can be powerful, but production systems often need several specialized components working together: retrievers, routers, code-aware models, validators, tool callers, and larger reasoning models. We think of Mellum2 as a “focal” model: a fast, well-scoped model optimized for high-frequency tasks inside larger AI systems. The goal is not to replace every model in the stack. The goal is to make the stack faster, cheaper, and easier to control. Getting started with Mellum2 If you are building AI systems for software engineering – inside an IDE, in a RAG pipeline, as part of an agent workflow, or on private infrastructure – Mellum2 is ready to try. Models mentioned in this article 1 Collections mentioned in this article 1 Community EditPreview Upload images, audio, and videos by dragging in the text input, pasting, or clicking here. Tap or paste here to upload images Comment · Sign up or log in to comment Upvote 5 Models mentioned in this article 1 Collections mentioned in this article 1
Related
相關文章
網易有道全面向AI轉型 全場景Agent矩陣亮相圖博會
{"id":"39ef5947-b77a-4904-bf03-ff6264f08dc4","object":"response","model":"deepseek-v4-flash","output":[],"stop_reason":"max_output_tokens","usage":{"input_tokens":154,"output_tokens":200,"total_tokens":354}}
MosaicLeaks: Can your research agent keep a secret?
Back to Articles MosaicLeaks: Can your research agent keep a secret? Enterprise Article Published June 18, 2026 Upvote - Alexander Gurung agurung Follow ServiceNow Rafael Pardinas rafapi-snow Follow ServiceNow TL;DR Deep research agents increasingly combine private local documents with external tools like web retrieval, creating a privacy risk: an agent's external queries may leak sensitive information. MosaicLeaks proposes a new deep-research task with multi-hop questions that interleave public and private information. Across the models we tested, agents frequently leaked private information, and training only for task performance made it worse. We propose a mosaic-leakage-aware RL training method, Privacy-Aware Deep Research (PA-DR), which raises strict chain success (the share of chains

騰訊老兵+大廠00後新銳,碼上飛想做的不只是AI Coding
這篇消息聚焦「騰訊老兵+大廠00後新銳,碼上飛想做的不只是AI Coding」。原始導語提到:已接入華為鴻蒙生態 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

Agent引爆網盤大戰,騰訊、百度、阿里齊聚,這次爭的不再是下載速度
這篇消息聚焦「Agent引爆網盤大戰,騰訊、百度、阿里齊聚,這次爭的不再是下載速度」。原始導語提到:網盤成了Agent新基建。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

21年老牌企服公司的AI實驗:讓Agent跑一遍流程
這篇消息聚焦「21年老牌企服公司的AI實驗:讓Agent跑一遍流程」。原始導語提到:司盟企服接入騰訊雲WorkBuddy後,將海外郵件管理、審計理賬、訂單審核等高頻交付流程交給Agent先跑一遍 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。
曹操出行宣佈啟動全面AI轉型,組織升級向AI原生公司邁進
曹操出行在2026國際汽車及供應鏈博覽會 上宣佈啟動全面AI轉型,併發布RoboX戰略,打造全球領先的物理AI移動科技平臺。與此同時,公司正式啟動組織升級,加快向AI原生公司邁進。為推動全面AI轉型,今年上半年,公司推進戰略聚焦,持續優化業務結構,主動收縮非核心業務,加快向AI原生公司轉型。