CLI for Ferrum — a Rust-native LLM inference engine
Model orchestration engine for Ferrum LLM inference
Core trait contracts for the Ferrum LLM inference engine
Unified compute kernels (CUDA/Metal/CPU) and model runner for Ferrum inference
KV cache management with PagedAttention for Ferrum inference
Model architectures (LLaMA, Qwen, BERT) for Ferrum inference
Weight-format abstraction (Dense / GPTQ / AWQ / GGUF) for Ferrum models
Sampling strategies for Ferrum LLM inference engine
Request scheduling for Ferrum LLM inference engine
OpenAI-compatible HTTP API server for Ferrum inference
Testing utilities for Ferrum LLM inference engine
Tokenization wrapper for Ferrum inference engine