First of all, thank you for the amazing work on DeepGEMM — it's been extremely helpful.
While integrating DeepGEMM into a backward pass implementation, I encountered a reproducible crash when running the k-grouped FP8 GEMM with N = 768.
❗ Error
Running DeepGEMM with the following shape causes a CUDA illegal instruction error:
RuntimeError: CUDA driver error (csrc/apis/../jit_kernels/impls/sm90_fp8_gemm_1d1d.hpp:65): 715
(CUDA_ERROR_ILLEGAL_INSTRUCTION, an illegal instruction was encountered)
🔁 Reproduction
The issue reproduces consistently by adding the following shape into
enumerate_k_grouped_contiguous:
This triggers the following call path:
k_grouped_fp8_gemm_nt_contiguous
- → FP8 kernel selection
- → SM90 kernel dispatch
- → crash with CUDA illegal instruction
Notably, the same configuration works correctly when N = 1536, so the issue appears to be specific to N = 768.
🧩 Expected Behavior
The kernel should run successfully for (groups=128, M=2048, N=768, K=4096) without causing an illegal instruction.
🧪 Environment (if helpful)
- GPU: H100 (SM90)
- CUDA Toolkit: CUDA 12.9 Driver Version: 535.161.08
- PyTorch version: 2.8.0
🙏 Additional Notes
If you need further logs or want me to test a patch, I’m happy to help.
Thanks again for the excellent work on DeepGEMM!
First of all, thank you for the amazing work on DeepGEMM — it's been extremely helpful.
While integrating DeepGEMM into a backward pass implementation, I encountered a reproducible crash when running the k-grouped FP8 GEMM with N = 768.
❗ Error
Running DeepGEMM with the following shape causes a CUDA illegal instruction error:
🔁 Reproduction
The issue reproduces consistently by adding the following shape into
enumerate_k_grouped_contiguous:This triggers the following call path:
k_grouped_fp8_gemm_nt_contiguousNotably, the same configuration works correctly when N = 1536, so the issue appears to be specific to
N = 768.🧩 Expected Behavior
The kernel should run successfully for
(groups=128, M=2048, N=768, K=4096)without causing an illegal instruction.🧪 Environment (if helpful)
🙏 Additional Notes
If you need further logs or want me to test a patch, I’m happy to help.
Thanks again for the excellent work on DeepGEMM!