MinerU · Parsing

MinerU
Multimodal parsing.

Parse PDFs, tables, and images into structured outputs—built as the ingestion foundation for ThinkDoc and ThinkExtract.

/api/v1/tasks/submit
curl -X POST "http://localhost:8000/api/v1/tasks/submit" \
  -F "file=@document.pdf" \
  -F "backend=pipeline" \
  -F "lang=ch"

Structured by design

Layout-aware parsing for complex documents—so downstream knowledge and extraction steps start from clean structure, not noisy text.

99%
0.12s Latency Per Page

Multimodal inputs

Bring scans, photos, mixed layouts, and tables—MinerU is built for real-world document chaos.

EN
ZH
JP
+
table_chart

Tables & layouts

Recover tables, headings, and reading order for reliable downstream schema mapping.

security

Downstream-ready

Outputs align to the ThinkDoc / ThinkExtract pipeline—so teams don’t rebuild parsing at every layer.

Parsing that
preserves meaning

Multimodal document understanding: keep structure, recover semantics, and prepare evidence for knowledge systems.

AI Logic
psychology

Multimodal understanding

Handle PDFs, embedded images, and tricky scans while preserving the cues humans use to interpret a page.

AI Logic
account_tree

Layout & reading order

Reconstruct titles, sections, tables, and sidebars so retrieval and extraction operate on the right units.

AI Logic
auto_awesome

Structured outputs

Emit clean Markdown / JSON-friendly structures that slot into ThinkDoc knowledge graphs and ThinkExtract schemas.

Built for production pipelines.

Finance, manufacturing, science, policy, and internet-scale programs—where parsing quality determines everything downstream.

500M+
Pages / Month
99.9%
Uptime SLA

Start from clean structure.

GitHub