Node PDF is a set of tools that takes in PDF files and converts them to usable formats for data processing. The library supports both extracting text from searchable pdf files as well as performing OCR on pdfs which are just scanned images of text
Extract image from pdf without binary dependency
A fast, native Node.js module to extract and process text from PDF files using Rust and N-API. Built with [Tokio](https://tokio.rs/), [`pdf-extract`](https://docs.rs/pdf-extract), and [`text-splitter`](https://crates.io/crates/text-splitter), this package
TypeScript client for the PDF Extract API
PDF extraction and rendering across all JavaScript runtimes
TypeScript client for the PDF Extract API
Push new pdf extract jobs out to workers
Extract text from pdfs that contain searchable pdf text
Parse .eml and .msg files or convert to pdf. Extract headers and attachments from .eml and msg files. Natively in typescript, support mjs & cjs!
Create and modify PDF files with JavaScript
super-simple async PDF reader that extracts text with x,y page positions based on pdf.js
Pure javascript cross-platform module to extract text from PDFs.
extracts CSS into separate files
This repository provides advanced support for data extraction from PDF documents
The Adobe PDF Services Node.js SDK provides APIs for creating, combining, exporting and manipulating PDFs.
A CSS Modules transform to extract local aliases for inline imports
SDK para a API de Processamento de Documentos, com suporte a extração de PDF, templates, conversão para imagem e mais.
unzip a zip file into a directory using 100% javascript
Pure TypeScript, cross-platform module for extracting text, images, and tabular data from PDFs. Run directly in your browser or in Node!
PDF text extraction in TypeScript
Display PDFs in your React app as easily as if they were images.
MCP server for document processing - PDF extract/merge/split, DOCX to Markdown, image resize/compress
Extract pages from a PDF into canvas elements on the client side
Create and modify PDF files with JavaScript
A library to extract content from pdfs
A Rust toolkit for detecting and extracting metadata, text, and content from various file formats
Extract text from email attachments (PDF + image OCR). PDF text via `pdf-extract` (pure Rust); OCR via the `tesseract` CLI subprocess (not linked as a C library). Two-stage fallback for scanned PDFs: try embedded text first, fall back to OCR on the raw bytes if the text is too short. Returns `ExtractionResult` with text + language + confidence + page count + JSON metadata.
Self-contained web search MCP server. 9 backends with automatic fallback. Works from any IP.
Local-first MCP server bridging Claude to your Zotero library — search, read, cite, enrich, write — over stdio or streamable-HTTP with OAuth 2.1.
High-performance PDF text extraction library for vectorization pipelines
Build LLM applications in Rust with type safety: chains, agents, RAG, LangGraph, embeddings, vector stores, and 20+ document loaders. A LangChain port supporting OpenAI, Claude, Gemini, Mistral, Bedrock, Ollama, and more. Includes streaming, structured output, and multi-agent (Deep Agent) workflows.
High-performance document conversion engine for AI/LLM embeddings - 27 formats supported
A high-performance, reasoning-based RAG indexer in Rust following the PageIndex pattern.
Fast pure-Rust PDF extraction library and CLI — ~10-50x faster than pdfplumber for text, word, table, layout, image, and metadata extraction from PDFs. By Clark Labs Inc.
TUI for webpage summarisation
A flexible rule-based file and folder comparison tool and crate including nice html reporting. Compares CSVs, JSON, text files, pdf-texts and images.
PDF content extraction tool and library.
description yo
A command line utility for extracting annotation and field metadata from a PDF in JSON format.
Extract all images with format conversions based upon Pdf::Reader library
Grim is a simple gem for extracting a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.
Extract citations from PDFs.
Extract tables from PDF as a structured info. Uses ghostscript to print pdf to image, then recognizes table separators optically. No OpenCV or other heavy dependencies
This gem lets you extract plain text from PDF documents. It is a Jruby wrapper for the Apache PDFBox library.
This is a ChupaText decomposer plugin for to extract text and meta-data from PDF. You can use `pdf` decomposer.
FillablePDF is an extremely simple and lightweight utility that bridges iText and Ruby in order to fill out fillable PDF forms or extract field values from previously filled out PDF forms.
simple wrapper around CLI for extracting text from PDF and Word documents
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.