LLM inference in Rust
High-performance key-value cache for LLM inference
Trait interface for compressed KV-cache implementations in mistral.rs
A3S Power — Privacy-preserving LLM inference for TEE environments
LLM serving runtime with Ruvector integration - Paged attention, KV cache, and SONA learning
BitPolar: near-optimal vector quantization with zero training overhead — 3-bit precision, provably unbiased inner products (ICLR 2026)
Unified inference runtime for high-performance LLM execution
High-performance LLM inference engine with advanced quantization and salience-based optimization
TurboQuant KV-Cache Quantization — 3-bit compression with zero accuracy loss (Zandieh et al., ICLR 2026)
Advanced quantization engine for efficient LLM inference
Salience analysis engine for intelligent token prioritization in LLM inference
Shared utilities and types for Zeta Reticula components
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.
No description provided.