OCR integration for scanned PDFs with pluggable engine support
Extract text from email attachments (PDF + image OCR). PDF text via `pdf-extract` (pure Rust); OCR via the `tesseract` CLI subprocess (not linked as a C library). Two-stage fallback for scanned PDFs: try embedded text first, fall back to OCR on the raw bytes if the text is too short. Returns `ExtractionResult` with text + language + confidence + page count + JSON metadata.
PdfEngine trait + PDFium-backed implementation.
High-quality PDF converter for scanned books with AI enhancement, deskew correction, and Japanese OCR
PDF processing pipeline for Belarusian financial reports with OCR, table extraction, and data normalization
Multimodal artifact ingestion and OCR pipeline for Engram
A pure Rust library for data frame operations, particularly useful for processing data extracted from PDF files or OCR recognize
A fast CLI tool to batch convert PDFs into Markdown using GLM-OCR.
Overlay searchable CJK text on PDFs, extract text, merge/split pages — pure Rust, zero C dependencies
High-performance document intelligence library for Rust. Extract text, metadata, and structured data from PDFs, Office documents, images, and 90+ formats and 300+ programming languages via tree-sitter code intelligence with async/sync APIs.
PDF to pixel buffer — pure Rust, zero Poppler. For the CLI tool: cargo install rasterrocket-cli
Special features for justpdf - OCR, barcode, ZUGFeRD, BiDi, deskew
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.