Pure Rust LLM inference engine — the sovereign alternative to llama.cpp (meta crate)
Pure Rust LLM inference engine CLI — the sovereign alternative to llama.cpp
Benchmark suite for OxiLLaMa inference engine
OpenAI-compatible HTTP API server for OxiLLaMa
WebAssembly bindings for OxiLLaMa GGUF parsing and quantization
Optional wgpu GPU compute backend for OxiLLaMa
GGUF v3 parser and tensor loader for OxiLLaMa
Python bindings for OxiLLaMa LLM inference engine
Inference engine — KV cache, sampling, tokenizer bridge
Model architecture implementations — LLaMA, Qwen3, Mistral, Gemma, Phi
Quantization kernels for all GGUF quantization types