Multi-layer LLM response cache with exact, semantic, and embedding lookup
A3S Power — Privacy-preserving LLM inference for TEE environments
Shared codec/profile/shape/eval traits and typed IDs for governed compression experiments
Download, inspect, and compare HuggingFace models from Rust. Multi-connection parallel downloads plus safetensors header inspection via HTTP Range. No weight data downloaded.
Cache-efficient tensor permutation / transpose (HPTT-inspired).
Execution and autodiff traits for TensorLogic inference engines
Core BitNet implementation with fundamental data structures and algorithms
Lightweight ONNX model parser for extracting tensor shapes, operations, and data
Custom model implementations for candle-pipelines (patches for candle-transformers)
Derive macros for the Soma computational graph runtime
Computational graph runtime for research pipelines, agent orchestration, and data virtualization
Autonomous research agent for the Soma runtime