PP-OCRv6 登上 Hugging Face:支援 50 種語言的 OCR,參數量從 150 萬到 3450 萬
重點摘要
PP-OCRv6 是 PaddleOCR 最新一代通用 OCR 模型系列,專為真實世界的文字檢測與識別而設計,適用於文件、螢幕截圖、多語言影像、數位顯示等場景。您可先在線評估 PP-OCRv6,再透過 PaddlePaddle、Transformers 或 ONNX Runtime 後端整合輕量且生產就緒的 OCR 功能。
Back to Articles PP-OCRv6 on Hugging Face: 50-Language OCR from 1.5M to 34.5M Parameters Team Article Published June 22, 2026 Upvote 26 +20 AlexZhang AlexTransformer Follow PaddlePaddle cuicheng ChengCui Follow PaddlePaddle Jun Zhang jzhang533 Follow PaddlePaddle Manhui Lin gggdddfff Follow PaddlePaddle Yue Zhang xiaohei66 Follow PaddlePaddle leo-q8 leo-q8 Follow PaddlePaddle yubo zhangyubo0722 Follow PaddlePaddle Yi Liu michaelowenliu Follow PaddlePaddle Evaluate PP-OCRv6 online, then integrate lightweight, production-ready OCR with PaddlePaddle, Transformers, or ONNX Runtime backend. PP-OCRv6 is the latest generation of PaddleOCR’s universal OCR model family. It is designed for real-world text detection and recognition across documents, screenshots, multilingual images, digital displays, industrial labels, and scene text. The model family scales from 1.5M to 34.5M parameters, with three tiers: tiny, small, and medium. The medium and small tiers support 50 languages, including Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages. Try PP-OCRv6 online quickly: PP-OCRv6 Online Demo. On PaddleOCR’s official in-house multi-scenario OCR benchmarks, PP-OCRv6_medium reaches 86.2% detection Hmean and 83.2% recognition accuracy. Compared with PP-OCRv5_server, it improves text detection by +4.6 percentage points and text recognition by +5.1 percentage points. PP-OCRv6 focuses on a practical OCR need: producing accurate, structured text outputs with small models and flexible deployment options. For a deeper discussion of why specialized OCR models remain useful in the VLM era, see our previous blog: PP-OCRv5 on Hugging Face: A Specialized Approach to OCR. What’s new in PP-OCRv6 PP-OCRv6 introduces architecture, training, and data improvements across detection and recognition. The main design goal is to improve OCR accuracy while keeping model sizes suitable for different deployment settings. Three model tiers PP-OCRv6 provides three model tiers, covering different model sizes and OCR accuracy levels. Model Model size Detection Hmean Recognition accuracy Typical application scenarios PP-OCRv6_tiny 1.5M params 80.6% 73.5% Edge devices, lightweight local OCR, latency-sensitive demos, constrained environments PP-OCRv6_small 7.7M params 84.1% 81.3% Mobile, desktop, balanced OCR services, multilingual OCR with lower compute cost PP-OCRv6_medium 34.5M params 86.2% 83.2% Accuracy-oriented OCR, server-side pipelines, industrial OCR, document ingestion, multilingual OCR PPLCNetV4 backbone PP-OCRv6 uses PPLCNetV4 as a unified backbone for text detection and text recognition. For developers, the main benefit is consistency across the model family. The tiny, small, and medium tiers are not unrelated models; they are part of the same OCR family and share a common architectural direction. RepLKFPN for text detection Text detection is the first stage of the OCR pipeline. Detection quality affects the crops sent to the recognizer, and poor crops often lead to poorer recognition. PP-OCRv6 upgrades the detection module with RepLKFPN, a lightweight large-kernel feature pyramid network designed for multi-scale text detection while keeping inference efficient. This is relevant for real-world OCR inputs, where text may be small, dense, rotated, low-resolution, or embedded in complex backgrounds. EncoderWithLightSVTR for recognition For text recognition, PP-OCRv6 uses EncoderWithLightSVTR. It combines local context modeling with global attention to improve recognition quality on challenging text crops. The recognition improvements are especially relevant for multilingual text, screen text, industrial characters, special symbols, dense text, and noisy image regions. Unified multilingual OCR The medium and small tiers support 50 languages in one model family, covering Simplified Chinese, Traditional Chinese, English, Japanese, and 46 Latin-script languages. This helps reduce the need for separate OCR models across common multilingual OCR scenarios. Quick start with PaddleOCR Install PaddleOCR: pip install paddleocr Run OCR with Paddle Infernece(Default backend): from paddleocr import PaddleOCR # Model: PP-OCRv6_medium(Default) # Backend: Paddle Inference(Default) ocr = PaddleOCR( use_doc_orientation_classify=False, use_doc_unwarping=False, use_textline_orientation=False, ) result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png") for res in result: res.print() res.save_to_img("output") res.save_to_json("output") The OCR result can be saved as visualization images and structured JSON output. The structured output can then be used by downstream systems such as document parsing, search, extraction, RAG, analytics, or agent workflows. Available inference backends PP-OCRv6 can be used with multiple inference backends through PaddleOCR. PaddleOCR 3.7 provides a unified inference-engine interface, where engine selects the underlying runtime and related configuration can be passed through the pipeline or module API. Backend Description Transformers Hugging Face / PyTorch-oriented inference path for supported PaddleOCR models ONNX Runtime Portable inference path for ONNX-based deployment environments Paddle Inference Native Paddle inference format For Hugging Face users, PaddleOCR supports running selected OCR and document parsing models with a Transformers backend. This can be enabled with: engine="transformers" For more details on how the Transformers backend works in PaddleOCR, see: PaddleOCR: Running OCR and Document Parsing Tasks with a Transformers Backend Run PP-OCRv6 example with Transformer Backend: from paddleocr import PaddleOCR # Model: PP-OCRv6_medium(Default) # Backend: transformers ocr = PaddleOCR( use_doc_orientation_classify=False, use_doc_unwarping=False, use_textline_orientation=False, engine="transformers", ) result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png") ONNX variants are also available in the PP-OCRv6 Collection for environments that use ONNX Runtime through engine="onnxruntime": from paddleocr import PaddleOCR # Model: PP-OCRv6_medium(Default) # Backend: ONNX Runtime ocr = PaddleOCR( use_doc_orientation_classify=False, use_doc_unwarping=False, use_textline_orientation=False, engine="onnxruntime", ) result = ocr.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png") Together, these backend options make PP-OCRv6 available across different runtime environments while keeping the same OCR model family on the Hugging Face Hub. Conclusion PP-OCRv6 extends PaddleOCR with a lightweight, multilingual OCR model family for real-world text detection and recognition. The release includes three model tiers from 1.5M to 34.5M parameters, up to 50-language OCR support, improved detection and recognition accuracy over PP-OCRv5_server, and multiple model formats on the Hugging Face Hub, including safetensors, Paddle inference models, and ONNX models. Together with the hosted Hugging Face Space and the available PaddleOCR inference backends, PP-OCRv6 provides several entry points for evaluation and integration: Online Demo: PP-OCRv6 Online Demo Model Collection: PP-OCRv6 Collection Transformers Backend Blog: PaddleOCR with Transformers Backend PaddleOCR Documentation: PP-OCRv6 Documentation PaddleOCR Official Website: https://www.paddleocr.com You can evaluate PP-OCRv6 with the online demo, explore the available model assets in the Collection, and use the inference backend that matches your own OCR workflow. Spaces mentioned in this article 1 Collections mentioned in this article 1 More from this author PaddleOCR 3.5: Running OCR and Document Parsing Tasks with a Transformers Backend 38 May 18, 2026 Community merve 8 days ago my favorite OCR models + native transformers support are match made in heaven 1 reply · 🔥 1 1 + MustafaA1 3 days ago Is there anything out there that is even smaller and nee
Related
相關文章
廣告治理迎來“視覺進化”:巨量引擎發佈 Mamoda 2.5 版本,實現視頻全形態覆蓋
巨量引擎發佈自研廣告治理大模型Mamoda 2.5,實現內容安全風控技術升級。該模型從1.0僅能識別基礎違規文本起步,經持續迭代,能力邊界不斷擴展,為數字化廣告生態的違規內容高效精準識別與治理提供更強支撐。
AI 視頻賽道格局重塑:谷歌 Gemini Omni Flash 登頂盲測榜首
谷歌DeepMind的文生視頻模型Gemini Omni Flash在權威盲測排行榜Video Arena中以1404Elo分躍居第一,彰顯谷歌多模態技術實力,也印證視頻生成領域正高速迭代。
AI基礎設施的下一個千億市場,為何藏在網絡裡?
過去六年,國產GPU公司一路站上AI風口,估值不斷刷新,DPU卻被忽略了。這並不符合產業現實。2020年英偉達完成收購Mellanox後,就已經明確了“GPU+CPU+DPU”的三芯戰略。過去幾年,英偉達也持續強化網絡能力,黃仁勳在2026年CES展示“六芯組合”時,其中四款都與網絡相關。一個越來越清晰的趨勢開始浮出水面:AI基礎設施的瓶頸,正在從算力本身轉向網絡與調度。
Google Health API 有了 CLI:ghealth 是專為 Fitbit 資料設計的開源工具
Google Health API 是 Fitbit Web API 的官方後繼者,它鎖定 Google Health API v4,並讓開發者遷移至 Google OAuth 2.0。現在,一款名為 ghealth 的開源 CLI 命令列工具將該 API 包裝起來,適用於終端機與 AI 代理。該工具是單一的 Go 二進位檔,採用 Apache 2.0 授權。它將 40 種經過驗證的資料類型以結構化 JSON 形式呈現,讓你能將睡眠、心率與步數資料直接導入代理的上下文。什麼是 ghealth?ghealth 是 Google Health API v4 的包裝工具。你可以透過 go build -o ghealth . 從原始碼建置,產出一個自包含的二進位檔。該工具明確以代理為優先,每個指令都會回傳形狀穩定的簡化 JSON。此外,它還提供確定性錯誤碼、--dry-run 旗標與 --raw 旗標。
