豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

[QUESTION]Why does fp8_mqa_logits use four MATH WGs for computation? #290

@pengwubj

Description

@pengwubj

Why does fp8_mqa_logits use four MATH WGs for computation? In typical Hopper implementations, only two MATH WGs are generally used. This is also the case in the DeepGEMM 1d1d and 1d2d implementations.

Is this design choice related to the KV block dimension being 256? If we change the KV block size to 128, would it impact performance?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions