Extract text from pdfs that contain searchable pdf text
Native PDF text extraction for React Native and Expo. Extract text content from PDF files using platform-native APIs (PDFKit on iOS, PDFBox on Android). Works with Expo development builds.
PDF text extract
PDF text extract
PDF text extract
PDF text extract
PDF text extract
Native PDF text extraction for React Native and Expo. Extract text content from PDF files using platform-native APIs (PDFKit on iOS, PDFBox on Android). Works with Expo development builds.
PDF text extract
PDF text extract
PDF text extract
PDF text extract
PDF text extract
PDF extraction and rendering across all JavaScript runtimes
Create and modify PDF files with JavaScript
extracts CSS into separate files
Node module that extracts metadata, text-content, and styling from readable pdf-files
The Adobe PDF Services Node.js SDK provides APIs for creating, combining, exporting and manipulating PDFs.
super-simple async PDF reader that extracts text with x,y page positions based on pdf.js
PDF text extraction in TypeScript
Create and modify PDF files with JavaScript
Display PDFs in your React app as easily as if they were images.
This repository provides advanced support for data extraction from PDF documents
Pure TypeScript, cross-platform module for extracting text, images, and tabular data from PDFs. Run directly in your browser or in Node!
Grim is a simple gem for extracting a page from a pdf and converting it to an image as well as extract the text from the page as a string. It basically gives you an easy to use api to ghostscript, imagemagick, and pdftotext specific to this use case.
This gem lets you extract plain text from PDF documents. It is a Jruby wrapper for the Apache PDFBox library.
This is a ChupaText decomposer plugin for to extract text and meta-data from PDF. You can use `pdf` decomposer.
simple wrapper around CLI for extracting text from PDF and Word documents
Kreuzberg is a high-performance document intelligence library with a Rust core and native Ruby bindings via Magnus. Extract text, metadata, and structured data from 75+ file formats including PDF, DOCX, PPTX, XLSX, HTML, RTF, images (with OCR), email, archives, and more. Features async/sync APIs, text chunking, language detection, and keyword extraction.
HexaPDF is a pure Ruby library with an accompanying application for working with PDF files. In short, it allows creating new PDF files, manipulating existing PDF files, merging multiple PDF files into one, extracting meta information, text, images and files from PDF files, securing PDF files by encrypting them and optimizing PDF files for smaller file size or other criteria. HexaPDF was designed with ease of use and performance in mind. It uses lazy loading and lazy computing when possible and tries to produce small PDF files by default.
Provides methods to extract texts from various file formats like Microsoft Office (<= 2002, as well as >= 2007,) PDF and HTML.
Library (Docsplit wrapper) for text extraction from pdf, doc/x, txt files with OpenOffice
Extracts text from PDF files using Tesseract, the text is added to the PDF as a background layer.
Provides a very simple extraction resource for extracing text from slices of a PDF.
Extracts tables from PDF text using spacing and position heuristics.
This is a ChupaText decomposer plugin for to extract text and meta-data from office files such as Microsoft Word file, Microsoft Excel file and OpenDocument Format file. It uses [LibreOffice](https://www.libreoffice.org/). You can use `libreoffice` decomposer. It depends on `pdf` decomposer. Because it converts a office file to PDF file and extracts text and meta-data by `pdf` decomposer.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.