[代理镜像] Add n shape bias by xueweilnvidia · Pull Request #240 · deepseek-ai/DeepGEMM

xueweilnvidia · 2025-12-17T10:51:55Z

Add bias with shape [n] support to fp8 gemm on sm100. Only support bf16 output datatype

test with:
test_attention.py test_bf16.py test_fp8.py

the performance are in attached files
performance.txt

* Remove unnecessary cute::min * Use add.rn.f32.bf16 for mixed-precision addition * Code tidy-up * Use __fdividef for kFastMath * Revert buggy optimization

xueweilnvidia added 6 commits December 2, 2025 18:48

minor

cfe29c4

accumulate and bias true works

d96986f

minor

6e38032

update smem

7dd9fbc

minor

e314311

bug fix

3ae9b1a

LyricZhao pushed a commit that referenced this pull request Apr 16, 2026

Misc optimizations (#240)

9b6dccd

* Remove unnecessary cute::min * Use add.rn.f32.bf16 for mixed-precision addition * Code tidy-up * Use __fdividef for kFastMath * Revert buggy optimization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add n shape bias#240

Add n shape bias#240
xueweilnvidia wants to merge 6 commits intodeepseek-ai:nv_devfrom
xueweilnvidia:add_n_shape_bias

xueweilnvidia commented Dec 17, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xueweilnvidia commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xueweilnvidia commented Dec 17, 2025 •

edited

Loading