Skip to content

paddlex/RT-DETR-H_layout_17cls

Model Information

paddlex/RT-DETR-H_layout_17cls is a high-accuracy layout detection model based on Baidu's RT-DETR (Real-Time DEtection TRansformer) architecture. It detects and classifies document layout elements such as text blocks, tables, figures, headers, footers, and more. The model is part of the PaddleX ecosystem.

  • Model Developer: Baidu / PaddlePaddle
  • Framework: PaddleX 3.x
  • Task: Document layout detection
  • Input: Document page image
  • Output: Bounding boxes with element labels and confidence scores

Model Architecture

  • Type: RT-DETR (Real-Time DEtection TRansformer) — end-to-end object detection without NMS
  • Backbone: HGNetv2 (High-Performance)
  • Parameters: ~435 MB
  • Inference Time: ~32ms per image (NVIDIA H100 GPU)

Supported Layout Classes (17)

Class Description
text Body text blocks
title Document titles / headings
figure Images, charts, diagrams
figure_caption Captions for figures
table Data tables
table_caption Captions for tables
header Page headers
footer Page footers
reference Bibliography / references
equation Mathematical equations
list-item Bullet/numbered list items
index Table of contents / index
code Code blocks
algorithm Algorithm descriptions
abstract Paper abstracts
author Author information
stamp Stamps and seals

Usage

from air.document_analysis import DocumentAnalysisClient

client = DocumentAnalysisClient(api_key="...")
result = client.layout_detection(
    model="paddlex/RT-DETR-H_layout_17cls",
    image_path="document_page.png",
    threshold=0.5,
)
for element in result.elements:
    print(f"{element.label}: score={element.score:.3f}, bbox={element.bbox}")

API Reference

See the Document Analysis API for full endpoint documentation.

External References