Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
A Rust library for splitting text into chunks with overlap, designed for handling large amounts of text efficiently. Implementation is identical to langchain's CharacterTextSplitter
Text splitters: Character, Recursive, Markdown, HTML, Language, Token
Text splitters for LLM context windows: character, token, markdown, and recursive strategies
Text splitters/parsers for Recoco, an all-Rust fork of CocoIndex with greater flexibility.
Tree-sitter AST-aware code chunking for Synwire semantic search
The best possible text chunker and text splitter and other text tools
Rust Library for Parsing and Segmentation of Source code
Document loaders, text splitters, and CachedEmbedder for atomr-agents.
RAG primitives for Cognis: embeddings, vector stores (in-memory, FAISS, Chroma, Qdrant, Pinecone, Weaviate), retrievers, text splitters, document loaders, and incremental indexing pipelines.
AST-aware code chunking and late chunking for RAG
AST-aware code chunking and late chunking for RAG