MarkTechPost AI研究與前沿

微軟AI推出MAI-Transcribe-1.5:在Artificial Analysis測試中字元錯誤率僅2.4%,同級最佳FLEURS準確度,長音檔轉錄速度最快提升5倍

2026年6月8日 08:56

重點摘要

上週微軟AI宣佈推出MAI-Transcribe-1.5,這是該公司內部開發的第二代語音轉文字模型。此模型專為43種語言、多種口音及吵雜環境下的準確度而設計,微軟團隊將其定位於生產環境的轉錄工作負載。MAI-Transcribe-1.5是一個自動語音辨識(ASR)模型,輸入音訊後回傳文字,完全由微軟內部打造,並非基於第三方模型。該單一系統可處理43種語言,並針對多樣化口音、方言與真實聲學條件進行最佳化。微軟正將其整合至Copilot、Teams、GitHub及Dynamics 365 Contact Centre,同時也上架於自家模型平臺Foundry。

站內 AI 整理稿

Last week Microsoft AI has announced MAI-Transcribe-1.5. It is the second iteration of the company’s in-house speech-to-text family. The model targets accuracy across 43 languages, accents, and noisy environments. The Microsoft team positions it for production transcription workloads. What is MAI-Transcribe-1.5 MAI-Transcribe-1.5 is an automatic speech recognition (ASR) model. It takes audio as input and returns text. Microsoft built it in-house, not on a third-party base. The model handles 43 languages with a single system. It is optimized for diverse accents, dialects, and real-world acoustic conditions. Microsoft is integrating it into Copilot, Teams, GitHub, and Dynamics 365 Contact Centre. It is also available in Foundry, Microsoft’s model platform. The Accuracy Case Accuracy here is measured by Word-Error-Rate (WER). Lower WER means fewer mistakes per transcribed word. Microsoft reports best-in-class WER across 43 languages on FLEURS. FLEURS is a standard multilingual transcription benchmark. On the Artificial Analysis leaderboard, the model posts a WER of 2.4%. That places it third on a competitive open benchmark. So the picture is split. Microsoft team claims first place on FLEURS and third on Artificial Analysis. The language expansion is the other accuracy story. Coverage grew from 25 languages to 43. The 18 new languages were added without compromising accuracy. Ten of them are South Asian, including Bengali, Tamil, and Telugu. Eight are European, such as Ukrainian, Greek, and Catalan. Speed MAI-Transcribe-1.5 leads on accuracy-times-speed on the Artificial Analysis leaderboard. It runs up to 5x faster than models of comparable accuracy. The effect is largest on long audio files. The model can transcribe an hour of audio in under 15 seconds. Microsoft cites up to 5x speedups over Gemini 3.1, Scribe v2, and GPT-4o-Transcribe on long audio. Against the prior MAI-Transcribe-1, the Azure card lists up to 5.7x faster long-form inference. For batch pipelines processing large archives, that latency gap compounds quickly. Keyword (Entity) Biasing: The Feature Worth Understanding Generic transcribers often fail on domain-specific words. These include people, product names, medical terms, and internal acronyms. Those words frequently matter most to enterprise users. MAI-Transcribe-1.5 adds keyword biasing, also called entity biasing. You supply a list of domain-specific keywords. The Azure card supports up to 200 keywords. The model biases its predictions toward that list. Critically, it does not blindly force matches. It uses shared context to decide when biasing should apply. Microsoft reports a 30% WER reduction on FLEURS when biasing is used. A short example shows the effect. Without biasing, names render as “Sean,” “Oif,” and “Societal.” With a supplied name list, the model recovers “Shaun,” “Aoife,” and “Xochitl.” This is relevant for meetings, healthcare, and call centers with niche vocabulary. Use Cases The Azure model card lists concrete production scenarios. Each maps to a common engineering workload: Video captions for media and content platforms. Accessibility tools that depend on accurate captions. Meeting transcription for Teams-style collaboration tools. Call analysis for contact centers and support analytics. Content creation workflows that need fast draft transcripts. Voice agents that convert speech to text before reasoning. Automatic language identification helps when the input language is unknown. The model detects the spoken language without a manual setting. MAI-Transcribe-1.5 vs MAI-Transcribe-1 The table below compares the two generations using stated facts only. AttributeMAI-Transcribe-1MAI-Transcribe-1.5Languages covered2543Keyword/entity biasingNot listedUp to 200 keywordsLong-form inference speedBaselineUp to 5.7x fasterArtificial Analysis WERNot specified2.4% (ranked #3)FLEURS position (per Microsoft)State-of-the-artBest-in-class across 43 languagesAutomatic language identificationNot specifiedYesLifecyclePrior releaseGenerally available (GA)Input / OutputAudio / TextAudio / Text Strengths and Limitations Strengths: 43-language coverage from a single model, up from 25. Keyword/entity biasing yields up to 30% WER reduction on FLEURS. Sub-15-second transcription for an hour of audio. Generally available now through Azure AI Foundry. Robust on noisy, real-world audio, per Microsoft. Limitations: No diarization yet, so speaker labels are unavailable. No native streaming API, so real-time use is limited. Several accuracy, speed, and cost claims are first-party. Ranked third on Artificial Analysis, behind two competitors. Sources Introducing MAI-Transcribe-1.5 — Microsoft AI MAI-Transcribe-1.5 model card — Azure AI Foundry MAI-Transcribe-1.5 Foundry API documentation MAI-Transcribe-1.5 Cookbook MAI Playground The post Microsoft AI Introduces MAI-Transcribe-1.5: 2.4% WER on Artificial Analysis, Best-in-Class FLEURS Accuracy, and Up to 5x Faster Long-Audio Transcription appeared first on MarkTechPost.

Related

相關文章

GPT發AI原創新成果了

這篇消息聚焦「GPT發AI原創新成果了」。原始導語提到:AI實現藥物全自動研發,還遠嗎? 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

9 分鐘前

AI越強,越要“殺死”過去的自己

這篇消息聚焦「AI越強,越要“殺死”過去的自己」。原始導語提到:人類需要實現思維模式的轉變。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

2 小時前
MarkTechPost AI研究與前沿

Salesforce CodeGen Tutorial: Generate, Validate, and Rerank Python Functions With Unit Tests and Safety Checks

In this tutorial, we implement an end-to-end workflow for Salesforce CodeGen. We load a CodeGen model from Hugging Face, prepare it for code generation, and use it to generate Python functions from natural-language prompts. We then move beyond basic inference by adding function extraction, syntax checking, static safety checks, unit-test-based validation, best-of-N candidate reranking, multi-step program synthesis, prompt-style experimentation, benchmark visualization, and artifact export. Through this workflow, we learn how CodeGen can be used not only as a code completion model but also as part of a structured code-generation pipeline that evaluates, filters, and organizes generated solutions. Loading the Salesforce CodeGen Model from Hugging Face Copy CodeCopiedUse a different Browserim

9 小時前

Transformer之父離開谷歌,奧特曼等了他十年

這篇消息聚焦「Transformer之父離開谷歌,奧特曼等了他十年」。原始導語提到:27億美元也沒能留住,Noam Shazeer追尋下一代架構。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

16 小時前

Dario訪談首曝:Mythos被稱為“超級武器”

這篇消息聚焦「Dario訪談首曝:Mythos被稱為“超級武器”」。原始導語提到:在這場69分鐘完整訪談裡,Dario Amodei 說人類真正面對的不是某個突然降臨的奇點,而是一條已經開始垂直起飛的指數曲線。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

20 小時前

用結構替代數據,因果世界模型如何重塑具身智能大腦

這篇消息聚焦「用結構替代數據,因果世界模型如何重塑具身智能大腦」。原始導語提到:因果世界模型需要一個標誌性的時刻來證明自己。 從 AI 情報角度來看,這類內容值得關注其背後的技術進展、產品落地、產業競爭與後續市場影響。

21 小時前