A fast, native Node.js module to extract and process text from PDF files using Rust and N-API. Built with [Tokio](https://tokio.rs/), [`pdf-extract`](https://docs.rs/pdf-extract), and [`text-splitter`](https://crates.io/crates/text-splitter), this package
PDF extraction and rendering across all JavaScript runtimes
Extract text from pdfs that contain searchable pdf text
Create and modify PDF files with JavaScript
extracts CSS into separate files
The Adobe PDF Services Node.js SDK provides APIs for creating, combining, exporting and manipulating PDFs.
super-simple async PDF reader that extracts text with x,y page positions based on pdf.js
PDF text extraction in TypeScript
Create and modify PDF files with JavaScript
Display PDFs in your React app as easily as if they were images.
This repository provides advanced support for data extraction from PDF documents
Pure TypeScript, cross-platform module for extracting text, images, and tabular data from PDFs. Run directly in your browser or in Node!
Define uninitialized elements
Yet another library to extract text from MS Office and PDF files
An advanced text layout framework
A Webpack plugin to optimize \ minimize CSS assets.
Pure javascript cross-platform module to extract text from PDFs.
High-quality OCR and text extraction for images and PDFs.
n8n community node to convert HTML and CSS to PDF using PdfMunk API - perfect for invoices, reports, certificates, and document generation
Node PDF is a set of tools that takes in PDF files and converts them to usable formats for data processing. The library supports both extracting text from searchable pdf files as well as performing OCR on pdfs which are just scanned images of text
A PDF generation library for Node.js
Create PDF files on the browser and server
Native PDF text extraction for React Native and Expo. Extract text content from PDF files using platform-native APIs (PDFKit on iOS, PDFBox on Android). Works with Expo development builds.
PDF to HTML or Text conversion using Apache Tika. Also generate PDF thumbnail using Apache PDFBox.
Grim is a simple gem for extracting a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.
This gem lets you extract plain text from PDF documents. It is a Jruby wrapper for the Apache PDFBox library.
This is a ChupaText decomposer plugin for to extract text and meta-data from PDF. You can use `pdf` decomposer.
simple wrapper around CLI for extracting text from PDF and Word documents
Kreuzberg is a high-performance document intelligence library with a Rust core and native Ruby bindings via Magnus. Extract text, metadata, and structured data from 75+ file formats including PDF, DOCX, PPTX, XLSX, HTML, RTF, images (with OCR), email, archives, and more. Features async/sync APIs, text chunking, language detection, and keyword extraction.
HexaPDF is a pure Ruby library with an accompanying application for working with PDF files. In short, it allows creating new PDF files, manipulating existing PDF files, merging multiple PDF files into one, extracting meta information, text, images and files from PDF files, securing PDF files by encrypting them and optimizing PDF files for smaller file size or other criteria. HexaPDF was designed with ease of use and performance in mind. It uses lazy loading and lazy computing when possible and tries to produce small PDF files by default.
Provides methods to extract texts from various file formats like Microsoft Office (<= 2002, as well as >= 2007,) PDF and HTML.
Library (Docsplit wrapper) for text extraction from pdf, doc/x, txt files with OpenOffice
Extracts text from PDF files using Tesseract, the text is added to the PDF as a background layer.
Provides a very simple extraction resource for extracing text from slices of a PDF.
Extracts tables from PDF text using spacing and position heuristics.
This is a ChupaText decomposer plugin for to extract text and meta-data from office files such as Microsoft Word file, Microsoft Excel file and OpenDocument Format file. It uses [LibreOffice](https://www.libreoffice.org/). You can use `libreoffice` decomposer. It depends on `pdf` decomposer. Because it converts a office file to PDF file and extracts text and meta-data by `pdf` decomposer.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.