Streaming Rabin chunker
Rabin chunker for IPFS implementation in Rust
Chunk/split your stream without eating the splitter char.
A transform stream which chunks incoming data into chunkSize byte chunks
A parts-of-speech chunker.
Chunk object-mode streams
Heading-aware, token-budgeted semantic chunker for Markdown — for RAG and embedding pipelines.
Mistral OCR + deterministic AST chunker for RAG pipelines
A simple chunker for breaking up structured data into chunks, suitable for RAG
Split large texts into chunks with a maximum number of token. Split by fixed size or by sentence.
IELTS2GO Video Chunker - Professional video splitting tool for educational content
AskDB RAG layer: deterministic chunker over Schema v2, BYO embedder + vector store (in-memory, file-backed, pgvector), and an optional retriever wired into @askdb/core ask().
Array chunker for JavaScript.
zooid core: SessionRunner, Chunker, hooks, config parsing, and the Runtime/Adapter/Transport interfaces.
PHI-aware medical text chunker for RAG applications
Heading-aware Markdown chunking for documentation embeddings
Sivru search engine — gitignore-aware walker, code-aware chunker, BM25, cosine top-k, ranking signals, on-disk cache.
Tiny function to split an array into chunks, returned as an array of arrays
Scout Text Chunker provides text chunking strategies for RAG pipelines.
Chunk buffers using an arbitrary chunker
A NodeJS transform stream for chunking raw data into constant-size chunks. Useful for consuming raw media streams where chunk size = 1 frame.
Easy to understand arbitrary data chunker
Thai legal document processing — chunking, paragraph extraction, varak segmentation
n8n node that splits Markdown into retrieval-ready chunks with heading-aware metadata for RAG and vector stores
Minimalistic parallel executor
Fast text chunking for Rust
Chunking utilities for VectraDB in Rust
AST-aware code chunking and late chunking for RAG
A high-performance, deterministic, flexible and portable zero-copy streaming Content-Defined Chunking (CDC) and hashing infrastructure library. Bytes in → Chunks & hashes out
Tree-sitter AST-aware code chunking for Synwire semantic search
A collection of Content Defined Chunking algorithms
Goxoy File Chunker splits files into equal chunks
Retrieval-Augmented Generation for Rust Agent Development Kit (ADK-Rust) agents
Serialize and deserialize messages in datagrams
Iterate over the data in a `Read` type in a regular-expression-delimited way.
VIL Advanced Semantic Chunker — SIMD-optimized, zero-alloc text chunking with sentence-boundary, sliding-window, code-aware, and table strategies
Embed arbitrary data and multiple, distinct documents within ruby files.
Salesforce client and extractor designed for handling large amounts of data
Middleware for chunking the body of a response
Multiple chunking strategies to split documents into optimal pieces for embedding and vector search. Supports character, recursive, sentence, markdown, HTML, code, token, and semantic splitting.
A powerful tool for RAG (Retrieval-Augmented Generation) that splits text into chunks based on semantic meaning rather than just character counts. Supports sliding windows, adaptive buffering, and dynamic percentile-based thresholding.
Detects topic boundaries using embedding similarity to produce semantically coherent chunks from books, articles, and documents. Supports Cohere, OpenAI, and OpenRouter embedders.
pikuri-vectordb gives a pikuri-core agent a +vectordb_search+ tool over a local document corpus — agentic search, the agent decides when to retrieve. Ships a swappable backend (a pure-Ruby +Backend::InMemory+ for teaching and a thin +Backend::Chroma+ HTTP client for persistence), a chunker, an embedder wrapper over +RubyLLM.embed+, and an optional +Reranker::LlamaServer+ that speaks +/v1/rerank+ against a cross-encoder model. Text extraction goes through +Pikuri::FileType.read_as_text+ in pikuri-core, which handles plain text / Markdown / PDF; HTML extraction is a deferred follow-up. Hosts wire the feature via +c.add_extension Pikuri::VectorDb::Extension.new(...)+ inside the +Agent.new+ block — same opt-in shape as +pikuri-tasks+ / +pikuri-skills+. The bundled +Pikuri::VectorDb::LIBRARIAN+ persona is the privilege-separated sub-agent counterpart for hosts that want recall to flow through a child rather than the parent's context. Three model endpoints in the full setup — chat (via ruby_llm), an embedder (via +RubyLLM.embed+), and an optional reranker (HTTP +/v1/rerank+). A single +llama-server+ in router mode serves all three by default, loading each cached GGUF on demand; see the gem's README for details.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.