paddlex/PP-OCRv5_server¶
Model Information¶
paddlex/PP-OCRv5_server is a server-side OCR pipeline from the PP-OCR v5 series, developed by Baidu's PaddlePaddle team. It performs end-to-end OCR: text detection followed by text recognition. PP-OCRv5 is the latest generation, supporting four major languages (Simplified Chinese, Traditional Chinese, English, Japanese) with a single model, including handwritten text, vertical text, pinyin, and rare characters.
- Model Developer: Baidu / PaddlePaddle
- Framework: PaddleX 3.x
- Task: Full OCR pipeline (text detection + recognition)
- Input: Document page image
- Output: Recognized text with bounding boxes and confidence scores
Model Architecture¶
The OCR pipeline combines two models:
- Text Detection: PP-OCRv5_server_det (DBNet++, 84.3 MB)
- Text Recognition: PP-OCRv5_server_rec (SVTR-based, 81 MB)
- Combined Inference Time: ~100–800ms per image depending on text density (NVIDIA H100 GPU)
Supported Languages¶
| Language | Code |
|---|---|
| English | en |
| Simplified Chinese | ch |
| Traditional Chinese | ch |
| Japanese | japan |
| Korean | korean |
| Latin | latin |
| Arabic | arabic |
PP-OCRv5 natively supports Chinese, Traditional Chinese, English, and Japanese in a single model. Additional languages are available through the broader PaddleOCR ecosystem.
Benchmark¶
| Model | Recognition Accuracy (%) | GPU Inference (ms) | Model Size (MB) |
|---|---|---|---|
| PP-OCRv5_server_rec | 86.38 | 8.46 / 2.36 | 81 |
| PP-OCRv4_server_rec | 85.19 | 8.75 / 2.49 | 173 |
| PP-OCRv5_mobile_rec | 81.29 | 5.43 / 1.46 | 16 |
PP-OCRv5 improves accuracy while reducing model size compared to PP-OCRv4.
Usage¶
from air.document_analysis import DocumentAnalysisClient
client = DocumentAnalysisClient(api_key="...")
result = client.ocr(
model="paddlex/PP-OCRv5_server",
image_path="document_page.png",
language="en",
threshold=0.3,
)
for r in result.results:
print(f"'{r.text}' (confidence: {r.score:.3f})")
API Reference¶
See the Document Analysis API for full endpoint documentation.