Advanced CUDA kernel builder for Rust with incremental builds, auto-detection, and external dependency support.
Build-time CUDA kernel compiler for the baracuda ecosystem: nvcc-driven incremental builds, parallel compilation, GPU auto-detection, and CUTLASS / custom git dependency support.
Idiomatic Rust wrappers for the NVIDIA CUDA stack (Driver API, Runtime API, NVRTC, cuBLAS, cuDNN, NCCL, NVML, ...). Umbrella crate.
Header acquisition for NVIDIA CUTLASS as a baracuda workspace dependency. Sparse-checkout fetch with file-locked caching; emits cargo:include for downstream build.rs consumers.
Compiled CUTLASS template instantiations for the baracuda ecosystem. Hosts curated .cu kernel sources, builds them via baracuda-forge, exposes extern "C" entry points for the safe baracuda-cutlass crate.
TurboQuant KV-Cache Quantization — 3-bit compression with zero accuracy loss (Zandieh et al., ICLR 2026)