baracuda-flashinfer

v0.0.1-alpha.65crates.io· Rust

Safe, typed Rust wrappers for NVIDIA FlashInfer's inference-serving kernels: batched paged-KV attention decode, decode-time KV-cache append, cascade / prefix-cache attention-state merge, and sort-free top-K / top-P / min-P sampling. The canonical vLLM-style serving surface for the baracuda CUDA stack. Apache-2.0 (FlashInfer upstream).

The verdict

Maintained. Niche but maintained, actively maintained.

Live from the crates.io registry · derived rules, not AI

How it scores

MaintenanceHealthy

PopularityNiche

SecurityClean

LicensePermissive

DepsZero deps

Maintenance

Last published this month.

Popularity

2 downloads / week

Security

No known advisories for this version (OSV).

License

Apache-2.0 OR MIT

Dependencies

No runtime dependencies

Recent releases

0.0.1-alpha.65this month
0.0.1-alpha.64this month