豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

Add n shape bias#240

Open
xueweilnvidia wants to merge 6 commits intodeepseek-ai:nv_devfrom
xueweilnvidia:add_n_shape_bias
Open

Add n shape bias#240
xueweilnvidia wants to merge 6 commits intodeepseek-ai:nv_devfrom
xueweilnvidia:add_n_shape_bias

Conversation

@xueweilnvidia
Copy link
Copy Markdown

@xueweilnvidia xueweilnvidia commented Dec 17, 2025

Add bias with shape [n] support to fp8 gemm on sm100. Only support bf16 output datatype

test with:
test_attention.py test_bf16.py test_fp8.py

the performance are in attached files
performance.txt

LyricZhao pushed a commit that referenced this pull request Apr 16, 2026
* Remove unnecessary cute::min

* Use add.rn.f32.bf16 for mixed-precision addition

* Code tidy-up

* Use __fdividef for kFastMath

* Revert buggy optimization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant