Rust port of JusText — paragraph-level boilerplate removal for HTML
Pure Rust jusText algorithm for paragraph classification
Web content extraction library inspired by trafilatura. Extracts main text, metadata, and comments from HTML.
Core extraction cascade orchestrator for kawat
HTML main-content extraction (article body, title, metadata) — Rust ports of Mozilla Readability, Trafilatura, and htmldate.