Gemma inference engine in rust
Senior SysAdmin, Network Admin, Data Analyst, and Software Engineer living in your terminal. A high-precision local AI agent harness for LM Studio, Ollama, and other local OpenAI-compatible runtimes that runs 100% on your own silicon. Reads repos, edits files, runs builds, inspects full network state and workstation telemetry, and runs real Python/JS for data analysis.
Efficient, Flexible and Portable Structured Generation for Rust - Rust bindings for XGrammar
LLM memory management for edge AI with small models
A simple, async Rust client for the Groq API (OpenAPI-compatible)
Gemma Self-Evolving Agent — local LLM agent with persistent MemoryBrain
LLM serving runtime with Ruvector integration - Paged attention, KV cache, and SONA learning
Embedding engine for Erio
Blazingly fast tokenizer - 50x faster tokenization, 10x smaller model files, 100% accurate drop-in replacement for HuggingFace
Multi-AI Providers Library for Rust. (Ollama, OpenAI, Anthropic, Groq, Gemini, ...)
Model loading for RLX — config parsing, safetensors weights, graph builders
Embedding and reranking infrastructure with rate limiting and retry logic