paddlex/PP-OCRv4_server_det¶

Model Information¶

paddlex/PP-OCRv4_server_det is a server-side text detection model from the PP-OCR v4 series, developed by Baidu's PaddlePaddle team. It detects text regions in document images, outputting quadrilateral bounding polygons suitable for downstream OCR recognition. The model is optimized for high accuracy on server-grade hardware.

Model Developer: Baidu / PaddlePaddle
Framework: PaddleX 3.x
Task: Text detection (locating text regions in images)
Input: Document page image
Output: Quadrilateral bounding polygons for each text region

Model Architecture¶

Type: DBNet++ (Differentiable Binarization) with ResNet backbone
Model Size: ~109 MB
Detection Accuracy: 69.2% (PaddleOCR 3.0 multilingual benchmark)
Inference Time: ~128ms (GPU, normal mode) / ~99ms (GPU, high-performance mode)
Languages: Language-agnostic (detects text regions regardless of script)

Benchmark¶

Model	Accuracy (%)	GPU Inference (ms)	Model Size (MB)
PP-OCRv4_server_det	69.2	127.82 / 98.87	109
PP-OCRv5_server_det	83.8	89.55 / 70.19	84.3
PP-OCRv4_mobile_det	63.8	9.87 / 4.17	4.7

Benchmark evaluated on PaddleOCR 3.0 multilingual dataset (Chinese, Traditional Chinese, English, Japanese) covering street scenes, web images, documents, handwriting, and distorted text.

Usage¶

from air.document_analysis import DocumentAnalysisClient

client = DocumentAnalysisClient(api_key="...")
result = client.text_detection(
    model="paddlex/PP-OCRv4_server_det",
    image_path="document_page.png",
    threshold=0.3,
)
for region in result.regions:
    print(f"Region: bbox={region.bbox}, score={region.score:.3f}")

API Reference¶

See the Document Analysis API for full endpoint documentation.