Core engine that parses HTML into an intermediate DocumentElement tree and exposes a plugin registry so external adapters can convert that tree into DOCX, PDF, XLSX, Markdown and more.
PDF adapter for html-to-document-core — converts a DocumentElement tree into .pdf using the html2pdf.js library.
DOCX adapter for html-to-document-core — converts a DocumentElement tree into a .docx Buffer using the docx library.
CSS parser plugin for html-to-document-core that harvests <style> tags and appends parsed statements to the per-parse stylesheet.
PDF deconverter for html-to-document-core — converts PDF files to DocumentElement[] using pdf-parse.
Core utilities for dealing with the dom and prosemirror within remirror
Convert Word documents from docx to simple HTML and Markdown
Language service for HTML
HTML templates literals in JavaScript
Tooltip and Popover Positioning Engine
A robust Punycode converter that fully complies to RFC 3492 and RFC 5891, and works on nearly all JavaScript platforms.
ProseMirror's view component
Runtime type checking for React props and similar objects.
ProseMirror document transformations
ProseMirror editor state
Accessibility engine for automated Web UI testing
FullCalendar core package for rendering a calendar
HTML language support for the CodeMirror code editor
ProseMirror plugin for cursors at normally impossible-to-reach positions
ProseMirror's document model
Advanced html to plain text converter
A JavaScript implementation of many web standards
hast utility to parse from HTML
Inlines img, script and link tags into the same file.
The core parser for the Bayeux document markup language, optimised for long-form documents. Generators are also provided for HTML, LaTeX and PanDoc
Kreuzberg is a high-performance document intelligence library with a Rust core and native Ruby bindings via Magnus. Extract text, metadata, and structured data from 75+ file formats including PDF, DOCX, PPTX, XLSX, HTML, RTF, images (with OCR), email, archives, and more. Features async/sync APIs, text chunking, language detection, and keyword extraction.
pikuri-vectordb gives a pikuri-core agent a +vectordb_search+ tool over a local document corpus — agentic search, the agent decides when to retrieve. Ships a swappable backend (a pure-Ruby +Backend::InMemory+ for teaching, plus thin +Backend::Qdrant+ / +Backend::Chroma+ HTTP clients for persistence — Qdrant recommended), a chunker, an embedder wrapper over +RubyLLM.embed+, and an optional +Reranker::LlamaServer+ that speaks +/v1/rerank+ against a cross-encoder model. Text extraction goes through +Pikuri::FileType.read_as_text+ in pikuri-core, which handles plain text / Markdown / PDF; HTML extraction is a deferred follow-up. Hosts wire the feature via +c.add_extension Pikuri::VectorDb::Extension.new(...)+ inside the +Agent.new+ block — same opt-in shape as +pikuri-tasks+ / +pikuri-skills+. The bundled +Pikuri::VectorDb::LIBRARIAN+ persona is the privilege-separated sub-agent counterpart for hosts that want recall to flow through a child rather than the parent's context. Three model endpoints in the full setup — chat (via ruby_llm), an embedder (via +RubyLLM.embed+), and an optional reranker (HTTP +/v1/rerank+). A single +llama-server+ in router mode serves all three by default, loading each cached GGUF on demand; see the gem's README for details.
Contentful API wrapper library exposing an ActiveRecord-like interface
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.