Idiomatic Rust wrappers for the NVIDIA CUDA stack (Driver API, Runtime API, NVRTC, cuBLAS, cuDNN, NCCL, NVML, ...). Umbrella crate.
Simple BLAS [sd]gemm benchmark
OxiCUDA BLAS - GPU-accelerated BLAS operations (cuBLAS equivalent)
High-performance tropical matrix multiplication with SIMD and CUDA backends
CUDA backend for tropical matrix multiplication
c32 (complex f32) matrix multiplication for qlora-gemm - maintained fork using qlora-paste
c64 (complex f64) matrix multiplication for qlora-gemm - maintained fork using qlora-paste
Common utilities for qlora-gemm matrix multiplication - maintained fork using qlora-paste
f16 matrix multiplication for qlora-gemm - maintained fork using qlora-paste
f32 matrix multiplication for qlora-gemm - maintained fork using qlora-paste
f64 matrix multiplication for qlora-gemm - maintained fork using qlora-paste
Compile-time GEMM microkernel code generation for trueno (sovereign, no external BLAS)
Quickly generate a Gemfile entry for a gem.
Simple rake tasks for gems
Ignis is the foundation of a CUDA-backed deep-learning ecosystem for Ruby that actually targets native Windows. It provides a GPU n-dimensional array (Ignis::NDArray), CUDA memory/device management, a runtime kernel compiler (NVRTC) with a batteries-included kernel library, fp16/bf16 conversion, and cuBLAS GEMM. Kernels are compiled at runtime and libraries are bound via FFI — there are NO C extensions, so installation needs no compiler or devkit (the usual Windows native-gem killer). Requires an NVIDIA GPU + CUDA toolkit/runtime.