Hugging Face Blog生成式AI

Nemotron 3.5 內容安全:為全球企業 AI 打造可自訂的多模態安全防護

2026年6月4日 18:57

重點摘要

回顧過去兩年,NVIDIA 的內容安全技術棧已從一個專注於英文的分類器,發展為一系列專業模型,逐步擴展至新的模態、語言與推論模式。2026 年 3 月推出的 Nemotron 3 Content Safety 首次在單一 4B 參數模型中整合多模態與多語言能力。今日我們發布 Nemotron 3.5 Content Safety,補齊最後一塊拼圖:一個統一處理多模態輸入的單一模型。

站內 AI 整理稿

Back to Articles Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI Enterprise + Article Published June 4, 2026 Upvote - Varun Singh varunsingh Follow nvidia Isabel Hulseman ihulseman0220 Follow nvidia Anuj Doshi andoshi Follow nvidia Shyamala Prayaga sprayaga25 Follow nvidia The last two years have seen NVIDIA's content safety stack grow from a focused English text classifier into a family of specialized models—each extending coverage to new modalities, languages, and inference modes. Nemotron 3 Content Safety, released in March 2026, combined multimodal and multilingual capabilities for the first time in a single 4B-parameter model. Today, we are releasing Nemotron 3.5 Content Safety, which completes that arc: a single model that unifies multimodal input, multilingual reach, custom enterprise policy enforcement, and auditable reasoning into one inference call. This post covers what changes in 3.5, the design decisions behind each new capability, and how to integrate the model into production safety pipelines. What's New in Nemotron 3.5 Content Safety 1. Unified Multimodal Evaluation Nemotron 3 introduced image understanding; Nemotron 3.5 deepens the multimodal integration. The model takes a user prompt, an optional image, and an optional assistant response as a single context window and produces a coherent safety verdict over the combined input. Evaluating all three together—rather than scoring each independently—closes a well-known gap in multimodal safety scenarios: policy violations that only emerge from the interaction between text and image, or between request and response, are now caught in a single pass. 2. Global Language Coverage Nemotron 3.5 maintains the 12-language explicit training coverage of its predecessors—English, French, Spanish, German, Chinese, Japanese, Korean, Arabic, Hindi, Russian, Portuguese, and Italian—while also inheriting strong zero-shot generalization across approximately 140 languages from the Gemma 3 base model. This means deployments in markets where training data is sparse (e.g., Southeast Asian languages, Scandinavian languages, less-resourced African languages) benefit from base-model multilingual transfer without requiring separate fine-tuning. 3. Custom Policy Enforcement This is the most significant architectural addition in 3.5 relative to Nemotron 3. Production deployments rarely operate under a single universal safety taxonomy. A healthcare platform has a different risk profile than a financial services chatbot, a developer tools IDE, or a children's education app. Nemotron 3.5 accepts a custom policy specification alongside the input. The model reasons over that policy when producing its verdict rather than deferring entirely to the built-in taxonomy. This extends the work first introduced in Nemotron Content Safety Reasoning 4B to the full multimodal, multilingual setting. 4. Reasoning Traces (THINK Mode) Every safety verdict in Nemotron 3.5 can be accompanied by an auditable reasoning trace via an optional think mode. When enabled, the model outputs its step-by-step reasoning before delivering a final safe / unsafe label and, optionally, the violated categories. <think> The user prompt asks for guidance on acquiring a controlled substance without a prescription. The assistant response provides specific sourcing steps and references an online marketplace. This interaction violates the Criminal Planning/Confessions and Controlled Substances categories. The image (a pharmacy exterior) provides locational context but does not alter the verdict. </think> User Safety: unsafe Response Safety: unsafe Safety Categories: Criminal Planning/Confessions, Controlled Substances When latency is the primary constraint, THINK mode can be disabled to return to the same low-latency binary verdict available in Nemotron 3. 5. Safety Dataset With Nemotron 3.5, we are releasing our safety dataset. This is an important milestone since most OSS safety models don't generally provide the training or evaluation sets. This problem is worse for the multimodal space where artifacts such as images or videos are often derived from resources with restrictive licensing terms. The Nemotron 3.5 Content Safety Dataset is multimodal, multilingual, and includes safety reasoning traces that were used to train the model. These reasoning traces were generated in a 2-step manner to make them concise, similar to the Nemotron Content Safety Reasoning 4B model. Model Architecture Nemotron 3.5 Content Safety is built on Google Gemma 3 4B IT (4B parameters), providing a 128K context window, strong vision-language reasoning, and broad multilingual coverage. NVIDIA fine-tunes this base with a LoRA adapter that installs targeted safety classification behavior while keeping the model compact enough for real-time deployment on 8GB+ VRAM GPUs. The inference interface supports three output modes: Mode 1 — Low-latency binary verdict: User Safety: safe Response Safety: unsafe Mode 2 — Binary verdict with categories: User Safety: safe Response Safety: unsafe Safety Categories: Violence, Criminal Planning/Confessions Mode 3 — THINK mode (reasoning + verdict): <think> [step-by-step reasoning trace] </think> User Safety: unsafe Response Safety: unsafe Safety Categories: [categories] The safety taxonomy follows the Aegis 2.0 framework: 13 core categories aligned with the MLCommons safety taxonomy, plus 10 fine-grained subcategories. This alignment allows direct comparison with other open and closed guard systems benchmarked on Aegis-taxonomy datasets. Reasoning Reasoning is a supercharger for content safety classification because it provides the necessary context, customization, and accountability required for production AI systems, especially in enterprise and regulated environments. Enables Custom and Contextual Policy Enforcement Reasoning allows a content safety model to dynamically interpret and enforce custom, domain-specific policies defined in natural language at the time of inference. This is necessary because production deployments rarely operate under a single, universal safety taxonomy. A financial services chatbot has a different risk profile than a children's education app which may have a lower tolerance for profanity. This capability supports: Category Suppression: Disabling irrelevant categories, such as preventing a "violence" category trigger when a DevOps tool handles the phrase "terminate a process". Custom Category Injection: Defining proprietary risk categories specific to an organization's regulatory or product policies. Provides Auditable and Documented Justification The reasoning traces show the model's step-by-step logic before it delivers a final safe or unsafe verdict. This documented justification serves several purposes: Compliance and Audit Logging: Regulated industries often require documented justifications for content moderation decisions. Human Review: Reviewers can audit why a verdict was reached to identify systematic model errors. Policy Iteration: The traces reveal how the model interprets edge cases, allowing teams to iteratively refine and improve custom policy language. Latency While reasoning can introduce latency, the Nemotron model addresses this by condensing reasoning chains into concise summaries to limit output tokens and increase efficiency. This is done in a 2-step process similar to what was done in the predecessor model Nemotron-Content-Safety-Reasoning-4B. In the first step, we use larger, more powerful models such as Qwen 397B to generate chain-of-thought reasoning traces based upon provided prompts, images, and responses. We also provided the ground-truth labels of the samples to avoid any misclassification that can find its way into the reasoning traces. In step 2, we make these reasoning traces more concise by using another large model such as Qwen 80B. We specifically instruct this model to rephrase the original traces (from step 1) so that it fits in no more than 3 sent

Related

相關文章

鈦媒體生成式AI

Edge AI Daily 早報(6月19日)

AI Engineer World's Fair 2026規模再創新高,標誌AI工程從幕後走向舞臺中央。行業面臨結構性調整:楊立昆警示OpenAI年虧210億美元揭示商業模式脆弱性,Transformer之父轉投OpenAI反映人才爭奪白熱化。Anthropic多線佈局——語音支持七種語言、加入碳清除聯盟、落子首爾辦事處,展現生態擴張野心。監管壓力加劇,意大利依據DMA調查蘋果iCloud,巴西開放iOS側載佣金降至5%,蘋果圍牆花園持續崩塌。

2 小時前
智東西生成式AI

谷歌時隔6年再發智能音箱,Gemini上桌,售價不到700元

智東西 編譯 | 劉煜 編輯 | 陳駿達 智東西6月18日消息,谷歌昨日宣佈,其首款搭載居家版Gemini語音助手的智能音箱(Google Home Speaker)已開啟預售,將於當地時間6月25日正式上市,售價為99.99美元(約合人民幣677.03元)。在此之前,谷歌已有6年沒有推出過獨立智能音箱產品。 谷歌這款智能音箱外觀近似球形,風格類似亞馬遜新一代Echo音箱與蘋果舊款音箱HomePod Mini。 ▲谷歌智能音箱(圖源:谷歌官網) 使用音箱時,用戶只需通過口令“Hey Google”或“OK Google”喚醒Gemini,就可以繼續下達相應指令。這與谷歌舊款音箱、智能顯示屏等喚醒語音助手的方式相同。此外,用戶只要按照日常說話習慣下達命令,Gemini便能理解用戶意圖,相比之前大大提升溝通效率。 一、加強短時對話記憶,會員可與Gemini不限次數對話 谷歌此次推出的全新音箱升級諸多功能。其中,音箱搭載的Gemini語音助手擁有10款全新擬人化語音音色,用戶可以根據喜好自行選擇聲線。音箱還可支持用戶一次性下達多條語音指令,即使指令未能說對、說完整,用戶中途改口Gemini也能識別。 Gemini還具備多鏈路推理能力,落地到實際生活場景中比較實用。例如,用戶問:“我支持的足球隊下場比賽天氣如何?”Gemini收到指令後,會自動查詢賽事時間、舉辦地點,同時匹配相應時段天氣,再給出答覆。 同時,Gemini加強了短時對話記憶,能承接上下文實現連續對話功能。即使用戶連續追問、甚至串聯多項任務、不重複交代前置條件,該語音助手也能實現來回連貫交流。 ▲谷歌Gemini對話場景(圖源:谷歌官網) 不僅如此,Gemini搭配的連續對話功能,能讓應答後的音箱麥克風保持短暫收音,用戶無需重複喊“OK Google”就能繼續提問。該功能現已全面支持所有Gemini原生適配的語言,包括

22 小時前

微軟,考慮接入DeepSeek

這篇消息聚焦「微軟,考慮接入DeepSeek」。原始導語提到:Copilot Cowork轉為按量計費。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

23 小時前