Unified inference runtime for high-performance LLM execution
High-performance key-value cache for LLM inference
Advanced quantization engine for efficient LLM inference
High-performance LLM inference engine with advanced quantization and salience-based optimization
Salience analysis engine for intelligent token prioritization in LLM inference
Shared utilities and types for Zeta Reticula components
High-performance LLM inference server with llama.cpp FFI bindings
A fast, keyboard-driven TUI for running local models via llama.cpp
A Rust-based key-value store designed for append-only data storage. Features SHA-256 checksums for data integrity and cross-compiles to wasm32 for smart contract applications.
High-performance embedding vector service written in Rust
Token-optimized HTTP client for LLMs — fetches any URL as clean markdown
storey integration for CosmWasm