A CLI tool for OCR processing of PDF files using Mistral API with optional LLM verification
Read text and parse tables from PDF files. Supports tabular data with automatic column detection, and rule-based parsing.
High-quality OCR and text extraction for images and PDFs.
A high-performance, parallelized PDF OCR tool using Tesseract.js WASM
The Adobe PDF Services Node.js SDK provides APIs for creating, combining, exporting and manipulating PDFs.
merge multiple PDF documents, or parts of them, to a new PDF document
A Node.js wrapper for the opendataloader-pdf Java CLI.
A robust, strictly-typed Node.js and Browser library for parsing office files (.docx, .pptx, .xlsx, .odt, .odp, .ods, .pdf, .rtf, .csv, .md, .html) and generating high-fidelity outputs in Markdown, HTML, CSV, RTF, and RAG-focused chunks.
Nuktaa helps AI teams turn public or private source material into usable knowledge for LLM applications.
Fast PDF classification and text extraction. Detect text-based vs scanned PDFs, extract text by region with quality checks. Native Rust performance via napi-rs.
PDF to Markdown and DOCX conversion powered by Mistral OCR.
A Node.js wrapper for the Tesseract OCR API
Node PDF is a set of tools that takes in PDF files and converts them to usable formats for data processing. The library supports both extracting text from searchable pdf files as well as performing OCR on pdfs which are just scanned images of text
n8n community node to convert HTML and CSS to PDF using PdfMunk API - perfect for invoices, reports, certificates, and document generation
Lightweight, probably the fastest PaddleOCR SDK in TypeScript. Runs anywhere JavaScript runs: Node.js, Bun, Deno, web browsers, and browser extensions. Docker & CLI supported. The official SDK is browser-only. Accurate text detection and recognition for d
Pure TypeScript, cross-platform module for extracting text, images, and tabular data from PDFs. Run directly in your browser or in Node!
Node.js utility to convert PDF file/buffer pages to PNG files/buffers. No build-time compilation required — pre-built native binaries included for all major platforms.
A simple wrapper around command-line utils to assist in PDF / Image OCR (Optical Character Recognition) processing using Tesseract.
Display PDFs in your React app as easily as if they were images.
CLI tool for converting Markdown files to PDF.
Fast PDF classification and text extraction. Detect text-based vs scanned PDFs, extract text by region with quality checks. Native Rust performance via napi-rs.
Create and modify PDF files with JavaScript
📃📸 Converts PDFs to images in nodejs
n8n community node for PDF processing - convert PDF to images, extract text and run OCR
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.