豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

Wh/dcp dev 2#16

Merged
staugust merged 2 commits intoantgroup:yjh/dcp-devfrom
Rythsman:wh/dcp-dev-2
Nov 20, 2025
Merged

Wh/dcp dev 2#16
staugust merged 2 commits intoantgroup:yjh/dcp-devfrom
Rythsman:wh/dcp-dev-2

Conversation

@Rythsman
Copy link
Copy Markdown

@Rythsman Rythsman commented Nov 18, 2025

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Summary by Sourcery

Introduce DCP-awareness by adjusting memory leak detection logic and adding DCP status logs in model parallel initialization, and update attention kernels to use base-2 exp/log operations

Enhancements:

  • Adjust memory leak detection to account for DCP world size
  • Add logging of DCP enablement status and sizes during model parallel setup
  • Switch attention utility kernels from natural exp/log to base-2 exp2/log2 operations

@sourcery-ai
Copy link
Copy Markdown

sourcery-ai bot commented Nov 18, 2025

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

This PR enhances the scheduler’s memory leak check to account for distributed decode context parallelism (DCP), adds conditional DCP enable/disable logging in the parallel state initializer, and updates the attention kernel to use base-2 exponential and logarithm functions.

Sequence diagram for DCP-aware memory leak check in scheduler

sequenceDiagram
    participant Scheduler
    participant DCP
    Scheduler->>DCP: get_dcp_world_size()
    alt DCP world size > 1
        Scheduler->>Scheduler: real_available_size < real_total_num_tokens?
    else DCP world size == 1
        Scheduler->>Scheduler: real_available_size != real_total_num_tokens?
    end
    Scheduler->>Scheduler: If memory leak, log warning
Loading

Class diagram for updated attention kernel math functions

classDiagram
    class _correct_attn_cp_out_kernel {
        -lse
        -lse_max
        -lse_exp (now uses tl.exp2)
        -lse_acc
        -lse (now uses tl.log2)
        -factor (now uses tl.exp2)
        +output
    }
Loading

Class diagram for DCP logging in parallel state initializer

classDiagram
    class ParallelState {
        +initialize_model_parallel()
        +get_dcp_size_from_env()
        +get_tensor_model_parallel_rank()
        +logger.info() // logs DCP enabled/disabled based on dcp_size
    }
Loading

File-Level Changes

Change Details Files
Refactor memory-leak logic to support DCP world size
  • Introduce real_available_size and real_total_num_tokens variables
  • Retrieve dcp_world_size via get_dcp_world_size()
  • Use conditional comparison for memory_leak when dcp_world_size > 1
  • Preserve legacy check when dcp_world_size ≤ 1
python/sglang/srt/managers/scheduler.py
Add DCP enable/disable logging in model parallel initialization
  • Log “DCP enabled” when decode_context_model_parallel_size > 1 on rank 0
  • Log “DCP disabled” otherwise on rank 0
python/sglang/srt/distributed/parallel_state.py
Switch attention kernel to base-2 exp/log operations
  • Replace tl.exp with tl.exp2
  • Replace tl.log with tl.log2
python/sglang/srt/layers/attention/utils.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @Rythsman, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on refining the distributed computing aspects, particularly concerning Decode Context Parallel (DCP). It introduces clearer logging for DCP's operational status, adjusts core numerical computations within an attention mechanism, and updates memory leak detection to be more robust and accurate when DCP is in use. The overall aim is to improve the stability, observability, and correctness of the system under parallel execution configurations.

Highlights

  • Enhanced Logging for Decode Context Parallel (DCP): Improved logging during the initialization of model parallelism to clearly indicate whether Decode Context Parallel (DCP) is enabled or disabled, along with its size and the tensor model parallel size. This provides better visibility into the distributed setup.
  • Numerical Function Base Change in Attention Kernel: Modified the _correct_attn_cp_out_kernel function to use base-2 exponential (tl.exp2) and logarithm (tl.log2) functions instead of their natural counterparts (tl.exp, tl.log). This change likely impacts log-sum-exp calculations, potentially for numerical stability or performance optimization.
  • Refined Memory Leak Detection for DCP: Updated the check_memory function to incorporate the get_dcp_world_size and adjust the memory leak detection logic. When Decode Context Parallel (DCP) is active (dcp_world_size > 1), the memory leak condition is now real_available_size < real_total_num_tokens, accounting for potential differences in memory management with DCP enabled.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and they look great!


Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces changes related to Decode Context Parallelism (DCP), including logging for its status and adjustments to memory leak detection logic. It also updates attention kernels to use base-2 logarithm functions, likely for performance. My review focuses on improving code clarity and maintainability in the new DCP-related logic.

Comment on lines +1564 to +1569
if decode_context_model_parallel_size > 1:
if get_tensor_model_parallel_rank() == 0:
logger.info(f"DCP enabled, dcp_size={decode_context_model_parallel_size}, tp_size={tensor_model_parallel_size}")
else:
if get_tensor_model_parallel_rank() == 0:
logger.info(f"DCP disabled, dcp_size={decode_context_model_parallel_size}, tp_size={tensor_model_parallel_size}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logging logic for DCP status can be simplified to avoid code duplication and improve readability. The check for get_tensor_model_parallel_rank() == 0 can be performed once, and a conditional expression can be used to determine the status string.

Suggested change
if decode_context_model_parallel_size > 1:
if get_tensor_model_parallel_rank() == 0:
logger.info(f"DCP enabled, dcp_size={decode_context_model_parallel_size}, tp_size={tensor_model_parallel_size}")
else:
if get_tensor_model_parallel_rank() == 0:
logger.info(f"DCP disabled, dcp_size={decode_context_model_parallel_size}, tp_size={tensor_model_parallel_size}")
if get_tensor_model_parallel_rank() == 0:
status = "enabled" if decode_context_model_parallel_size > 1 else "disabled"
logger.info(f"DCP {status}, dcp_size={decode_context_model_parallel_size}, tp_size={tensor_model_parallel_size}")

Comment on lines +1663 to +1671
real_available_size = available_size + evictable_size
real_total_num_tokens = self.max_total_num_tokens - protected_size
dcp_world_size = get_dcp_world_size()
# TODO(wh): currently, enable_dcp with get more avalibale_size, check later
token_msg = f"{self.max_total_num_tokens=}, {available_size=}, {evictable_size=}, {protected_size=}\n"
if dcp_world_size > 1:
memory_leak = real_available_size < real_total_num_tokens
else:
memory_leak = real_available_size != real_total_num_tokens
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for memory leak detection can be refactored for better clarity. It's good practice to state the general rule first and then handle the exception. This makes the intention of the code more explicit.

Also, there's a typo in the TODO comment: avalibale_size should be available_size.

Suggested change
real_available_size = available_size + evictable_size
real_total_num_tokens = self.max_total_num_tokens - protected_size
dcp_world_size = get_dcp_world_size()
# TODO(wh): currently, enable_dcp with get more avalibale_size, check later
token_msg = f"{self.max_total_num_tokens=}, {available_size=}, {evictable_size=}, {protected_size=}\n"
if dcp_world_size > 1:
memory_leak = real_available_size < real_total_num_tokens
else:
memory_leak = real_available_size != real_total_num_tokens
real_available_size = available_size + evictable_size
real_total_num_tokens = self.max_total_num_tokens - protected_size
dcp_world_size = get_dcp_world_size()
# TODO(wh): currently, enable_dcp gets more available_size, check later
token_msg = f"{self.max_total_num_tokens=}, {available_size=}, {evictable_size=}, {protected_size=}\n"
memory_leak = real_available_size != real_total_num_tokens
if memory_leak and dcp_world_size > 1 and real_available_size > real_total_num_tokens:
# This is a known issue with DCP, not a leak.
memory_leak = False

@staugust staugust merged commit 411e1f6 into antgroup:yjh/dcp-dev Nov 20, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants