MinerU
Multimodal parsing.
Parse PDFs, tables, and images into structured outputs—built as the ingestion foundation for ThinkDoc and ThinkExtract.
curl -X POST "http://localhost:8000/api/v1/tasks/submit" \
-F "file=@document.pdf" \
-F "backend=pipeline" \
-F "lang=ch"
Structured by design
Layout-aware parsing for complex documents—so downstream knowledge and extraction steps start from clean structure, not noisy text.
Multimodal inputs
Bring scans, photos, mixed layouts, and tables—MinerU is built for real-world document chaos.
Tables & layouts
Recover tables, headings, and reading order for reliable downstream schema mapping.
Downstream-ready
Outputs align to the ThinkDoc / ThinkExtract pipeline—so teams don’t rebuild parsing at every layer.
Parsing that
preserves meaning
Multimodal document understanding: keep structure, recover semantics, and prepare evidence for knowledge systems.
Multimodal understanding
Handle PDFs, embedded images, and tricky scans while preserving the cues humans use to interpret a page.
Layout & reading order
Reconstruct titles, sections, tables, and sidebars so retrieval and extraction operate on the right units.
Structured outputs
Emit clean Markdown / JSON-friendly structures that slot into ThinkDoc knowledge graphs and ThinkExtract schemas.
Built for production pipelines.
Finance, manufacturing, science, policy, and internet-scale programs—where parsing quality determines everything downstream.