A robust, strictly-typed Node.js and Browser library for parsing office files (.docx, .pptx, .xlsx, .odt, .odp, .ods, .pdf, .rtf, .csv, .md, .html) and generating high-fidelity outputs in Markdown, HTML, CSV, RTF, and RAG-focused chunks.
**Lightning-fast text extraction for Office documents — built with pure native JavaScript.**
## Introduction
Yet another library to extract text from MS Office and PDF files
A Node.js library to parse text out of any office file. Currently supports docx, pptx, xlsx, odt, odp, ods, pdf files.
Full-text PDF, DOCX, PPTX, XLSX search for static sites — Apache Solr for client-side apps, without Solr.
A robust, strictly-typed Node.js and Browser library for parsing office files (.docx, .pptx, .xlsx, .xls, .csv, .odt, .odp, .ods, .pdf, .rtf) into structured AST with rich metadata, formatting, and attachment support.
A Node.js library to parse text out of any office file. Currently supports docx, pptx, xlsx, odt, odp, ods, pdf files.
Enhanced n8n document converter with flexible sheet processing. Converts DOCX, XML, YML, XLS, XLSX, CSV, PDF, TXT, PPT, PPTX, HTML, JSON, ODT, ODP, ODS to JSON/text. Features individual sheet workflow items, toggleable metadata, Excel row/column preservat
HireSquire CLI - AI-powered candidate screening from the command line
Astro integration for @icjia/pdf-search-index — adds linked PDFs as first-class search rows.
Nuxt 4 module for @icjia/pdf-search-index — extract PDFs from mixed CMS + @nuxt/content sources.
Fork of office-text-extractor with unreleased changes that include browser support
Converts most common file types into clean text or Markdown
Yet another library to extract text from MS Office and PDF files
Lightweight, promise-based parsers for common document types. The package detects the file type by extension, extracts UTF-8-safe text, and returns structured metadata so downstream pipelines can reason about the content (row counts, sheet names, page cou
MCP server for Infomaniak kDrive — search, list, and read files (text, Excel, Word, PDF, PowerPoint)
OpenClaw plugin for multimodal RAG - semantic indexing and time-aware search for images and audio using local AI models
Optional Office format renderers (XLSX, PPTX/ODP slide decks, legacy Office, cloud viewer).