Skip to content

paddlex/PP-OCRv5_server

Model Information

paddlex/PP-OCRv5_server is a server-side OCR pipeline from the PP-OCR v5 series, developed by Baidu's PaddlePaddle team. It performs end-to-end OCR: text detection followed by text recognition. PP-OCRv5 is the latest generation, supporting four major languages (Simplified Chinese, Traditional Chinese, English, Japanese) with a single model, including handwritten text, vertical text, pinyin, and rare characters.

  • Model Developer: Baidu / PaddlePaddle
  • Framework: PaddleX 3.x
  • Task: Full OCR pipeline (text detection + recognition)
  • Input: Document page image
  • Output: Recognized text with bounding boxes and confidence scores

Model Architecture

The OCR pipeline combines two models:

  • Text Detection: PP-OCRv5_server_det (DBNet++, 84.3 MB)
  • Text Recognition: PP-OCRv5_server_rec (SVTR-based, 81 MB)
  • Combined Inference Time: ~100–800ms per image depending on text density (NVIDIA H100 GPU)

Supported Languages

Language Code
English en
Simplified Chinese ch
Traditional Chinese ch
Japanese japan
Korean korean
Latin latin
Arabic arabic

PP-OCRv5 natively supports Chinese, Traditional Chinese, English, and Japanese in a single model. Additional languages are available through the broader PaddleOCR ecosystem.


Benchmark

Model Recognition Accuracy (%) GPU Inference (ms) Model Size (MB)
PP-OCRv5_server_rec 86.38 8.46 / 2.36 81
PP-OCRv4_server_rec 85.19 8.75 / 2.49 173
PP-OCRv5_mobile_rec 81.29 5.43 / 1.46 16

PP-OCRv5 improves accuracy while reducing model size compared to PP-OCRv4.


Usage

from air.document_analysis import DocumentAnalysisClient

client = DocumentAnalysisClient(api_key="...")
result = client.ocr(
    model="paddlex/PP-OCRv5_server",
    image_path="document_page.png",
    language="en",
    threshold=0.3,
)
for r in result.results:
    print(f"'{r.text}' (confidence: {r.score:.3f})")

API Reference

See the Document Analysis API for full endpoint documentation.

External References