Extract text from Wikipedia dumps (.bz2) and convert it to JSONLines format.
Extract text from Wikipedia dumps (.bz2) and convert it to JSONLines format.
Extract text from Wikipedia dumps (.bz2) and convert it to JSONLines format.
A markdown corpus indexer for LLMs to build and query their own per-repo wikis.
PDF to pixel buffer — pure Rust, zero Poppler. For the CLI tool: cargo install rasterrocket-cli
CLI — renders PDF pages to pixel files. Drop-in pdftoppm replacement. Pure Rust, zero Poppler.
Pixel types and colour math for the rasterrocket PDF renderer
PPM/PGM/PBM/PNG output for the rasterrocket PDF renderer
FreeType glyph cache and outline rendering for the rasterrocket PDF renderer
Native PDF content-stream interpreter — poppler-free render path for rasterrocket
Lazy, zero-copy PDF file parser for the rasterrocket render pipeline
Software rasterizer — path fill, compositing, and AVX-512/AVX2/NEON SIMD for the rasterrocket PDF renderer