baracuda-transformer-engine
v0.0.1-alpha.63crates.io· RustSafe Rust wrapper for baracuda's port of NVIDIA TransformerEngine's FP8 cast/transpose + delayed-scaling recipe primitives. Provides `Fp8Recipe` (delayed-scaling state with amax history), `Fp8CastPlan` for {f32, f16, bf16} → FP8 with running amax, `Fp8DequantPlan` for FP8 → {f32, f16, bf16}. Cast/recipe subset only — `normalization` / `fused_rope` / `fused_attn` / `fused_softmax` / `activation` / `gemm` skipped (overlap existing baracuda phases). NO cuDNN dep, NO pybind11. On Ada (sm_89) the FP8 wins are bandwidth-saving only (KV cache, weights); FP8 tensor-core math throughput equals BF16. Forward-compatible with Hopper / Blackwell where the compute wins also materialize.
The verdict
Maintained. Niche but maintained, actively maintained.
Live from the crates.io registry · derived rules, not AI
How it scores
MaintenanceHealthy
PopularityNiche
SecurityClean
LicensePermissive
DepsZero deps
Maintenance
Last published this month.
Popularity
7 downloads / week
Security
No known advisories for this version (OSV).
License
Apache-2.0 OR MIT
Dependencies
No runtime dependencies
Recent releases
- 0.0.1-alpha.63this month
- 0.0.1-alpha.62this month
- 0.0.1-alpha.61this month
- 0.0.1-alpha.60this month
- 0.0.1-alpha.59this month
- 0.0.1-alpha.58this month