豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

feat: support bf16 output and plain TMA writes in k_grouped_gemm on SM90;#298

Open
fedorovgv wants to merge 1 commit intodeepseek-ai:mainfrom
fedorovgv:feat/k_grouped_sm90_bfp16
Open

feat: support bf16 output and plain TMA writes in k_grouped_gemm on SM90;#298
fedorovgv wants to merge 1 commit intodeepseek-ai:mainfrom
fedorovgv:feat/k_grouped_sm90_bfp16

Conversation

@fedorovgv
Copy link
Copy Markdown

Add two features to SM90 FP8 1D1D k-grouped GEMM:

  • Support plain TMA store as an alternative to atomic accumulation, controlled by the presence of the c tensor
  • Support BF16 output dtype, casting WGMMA FP32 accumulators to BF16 before the TMA store

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant