A Rust data validation library providing Deequ-like capabilities without Spark dependencies
spdf command-line interface.
Shell out to LibreOffice / ImageMagick for non-PDF inputs.
Orchestrator for the spdf pipeline.
OCR engine trait + HTTP and Tesseract implementations.
JSON and text output formatters.
PdfEngine trait + PDFium-backed implementation.
Text cleaning, bbox, markup, and search helpers.
Spatial grid projection — the algorithmic core of spdf.
Core types for the spdf workspace: TextItem, ParsedPage, ParseResult, ParseConfig.
A Rust data validation library providing Deequ-like capabilities without Spark dependencies