paddlex/PP-OCRv5_server¶

Model Information¶

paddlex/PP-OCRv5_server is a server-side OCR pipeline from the PP-OCR v5 series, developed by Baidu's PaddlePaddle team. It performs end-to-end OCR: text detection followed by text recognition. PP-OCRv5 is the latest generation, supporting four major languages (Simplified Chinese, Traditional Chinese, English, Japanese) with a single model, including handwritten text, vertical text, pinyin, and rare characters.

Model Developer: Baidu / PaddlePaddle
Framework: PaddleX 3.x
Task: Full OCR pipeline (text detection + recognition)
Input: Document page image
Output: Recognized text with bounding boxes and confidence scores

Model Architecture¶

The OCR pipeline combines two models:

Text Detection: PP-OCRv5_server_det (DBNet++, 84.3 MB)
Text Recognition: PP-OCRv5_server_rec (SVTR-based, 81 MB)
Combined Inference Time: ~100–800ms per image depending on text density (NVIDIA H100 GPU)

Supported Languages¶

Language	Code
English	`en`
Simplified Chinese	`ch`
Traditional Chinese	`ch`
Japanese	`japan`
Korean	`korean`
Latin	`latin`
Arabic	`arabic`

PP-OCRv5 natively supports Chinese, Traditional Chinese, English, and Japanese in a single model. Additional languages are available through the broader PaddleOCR ecosystem.

Benchmark¶

Model	Recognition Accuracy (%)	GPU Inference (ms)	Model Size (MB)
PP-OCRv5_server_rec	86.38	8.46 / 2.36	81
PP-OCRv4_server_rec	85.19	8.75 / 2.49	173
PP-OCRv5_mobile_rec	81.29	5.43 / 1.46	16

PP-OCRv5 improves accuracy while reducing model size compared to PP-OCRv4.

Usage¶

from air.document_analysis import DocumentAnalysisClient

client = DocumentAnalysisClient(api_key="...")
result = client.ocr(
    model="paddlex/PP-OCRv5_server",
    image_path="document_page.png",
    language="en",
    threshold=0.3,
)
for r in result.results:
    print(f"'{r.text}' (confidence: {r.score:.3f})")

API Reference¶

See the Document Analysis API for full endpoint documentation.