Fusion passes and fused-op decomposition for RLX MIR
Tensor IR for the RLX ML compiler — standalone, serializable, optimizable
MLX backend for RLX — Apple's array framework via hand-rolled C++ shim, eager + lazy execution
AMD ROCm/HIP backend — same .cu kernel sources as rlx-cuda, dispatched via HIP. Mac-iterable scaffold; real HIP runtime bindings are follow-up work.
Google TPU backend — drives libtpu's PJRT plugin from Rust.