paddlex/RT-DETR-H_layout_17cls¶
Model Information¶
paddlex/RT-DETR-H_layout_17cls is a high-accuracy layout detection model based on Baidu's RT-DETR (Real-Time DEtection TRansformer) architecture. It detects and classifies document layout elements such as text blocks, tables, figures, headers, footers, and more. The model is part of the PaddleX ecosystem.
- Model Developer: Baidu / PaddlePaddle
- Framework: PaddleX 3.x
- Task: Document layout detection
- Input: Document page image
- Output: Bounding boxes with element labels and confidence scores
Model Architecture¶
- Type: RT-DETR (Real-Time DEtection TRansformer) — end-to-end object detection without NMS
- Backbone: HGNetv2 (High-Performance)
- Parameters: ~435 MB
- Inference Time: ~32ms per image (NVIDIA H100 GPU)
Supported Layout Classes (17)¶
| Class | Description |
|---|---|
text |
Body text blocks |
title |
Document titles / headings |
figure |
Images, charts, diagrams |
figure_caption |
Captions for figures |
table |
Data tables |
table_caption |
Captions for tables |
header |
Page headers |
footer |
Page footers |
reference |
Bibliography / references |
equation |
Mathematical equations |
list-item |
Bullet/numbered list items |
index |
Table of contents / index |
code |
Code blocks |
algorithm |
Algorithm descriptions |
abstract |
Paper abstracts |
author |
Author information |
stamp |
Stamps and seals |
Usage¶
from air.document_analysis import DocumentAnalysisClient
client = DocumentAnalysisClient(api_key="...")
result = client.layout_detection(
model="paddlex/RT-DETR-H_layout_17cls",
image_path="document_page.png",
threshold=0.5,
)
for element in result.elements:
print(f"{element.label}: score={element.score:.3f}, bbox={element.bbox}")
API Reference¶
See the Document Analysis API for full endpoint documentation.