Javascript-only library to perform OCR on scanned PDFs to turn them into searchable PDFs
High-quality OCR and text extraction for images and PDFs.
The Adobe PDF Services Node.js SDK provides APIs for creating, combining, exporting and manipulating PDFs.
A simple wrapper around command-line utils to assist in PDF / Image OCR (Optical Character Recognition) processing using Tesseract.
Fast PDF classification and text extraction. Detect text-based vs scanned PDFs, extract text by region with quality checks. Native Rust performance via napi-rs.
Read text and parse tables from PDF files. Supports tabular data with automatic column detection, and rule-based parsing.
A Node.js wrapper for the opendataloader-pdf Java CLI.
A robust, strictly-typed Node.js and Browser library for parsing office files (.docx, .pptx, .xlsx, .odt, .odp, .ods, .pdf, .rtf, .csv, .md, .html) and generating high-fidelity outputs in Markdown, HTML, CSV, RTF, and RAG-focused chunks.
Fast PDF classification and text extraction. Detect text-based vs scanned PDFs, extract text by region with quality checks. Native Rust performance via napi-rs.
A Node.js wrapper for the Tesseract OCR API
Fast PDF classification, text extraction, and image extraction. Native Rust performance via napi-rs.
Node PDF is a set of tools that takes in PDF files and converts them to usable formats for data processing. The library supports both extracting text from searchable pdf files as well as performing OCR on pdfs which are just scanned images of text
n8n community node to convert HTML and CSS to PDF using PdfMunk API - perfect for invoices, reports, certificates, and document generation
super-simple async PDF reader that extracts text with x,y page positions based on pdf.js
ocr documents using gpt-4o-mini
Pure TypeScript, cross-platform module for extracting text, images, and tabular data from PDFs. Run directly in your browser or in Node!
Display PDFs in your React app as easily as if they were images.
Create and modify PDF files with JavaScript
PDF extraction and rendering across all JavaScript runtimes
PDF to Markdown and DOCX conversion powered by Mistral OCR.
Fast, lightweight PDF and document parsing with spatial text extraction
Guten OCR is a high accurate text detection (OCR) Javascript/Typescript library that runs on Node.js, Browser, React Native and C++. Based on PaddleOCR and ONNX runtime
Small, fast and advanced PNG / APNG encoder and decoder
Guten OCR is a high accurate text detection (OCR) Javascript/Typescript library that runs on Node.js, Browser, React Native and C++. Based on PaddleOCR and ONNX runtime
No description provided.
No description provided.