[代理镜像] Wh/dcp dev 2 by Rythsman · Pull Request #16 · antgroup/sglang

Rythsman · 2025-11-18T03:04:33Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

Summary by Sourcery

Introduce DCP-awareness by adjusting memory leak detection logic and adding DCP status logs in model parallel initialization, and update attention kernels to use base-2 exp/log operations

Enhancements:

Adjust memory leak detection to account for DCP world size
Add logging of DCP enablement status and sizes during model parallel setup
Switch attention utility kernels from natural exp/log to base-2 exp2/log2 operations

sourcery-ai · 2025-11-18T03:04:38Z

Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

This PR enhances the scheduler’s memory leak check to account for distributed decode context parallelism (DCP), adds conditional DCP enable/disable logging in the parallel state initializer, and updates the attention kernel to use base-2 exponential and logarithm functions.

Sequence diagram for DCP-aware memory leak check in scheduler

sequenceDiagram
    participant Scheduler
    participant DCP
    Scheduler->>DCP: get_dcp_world_size()
    alt DCP world size > 1
        Scheduler->>Scheduler: real_available_size < real_total_num_tokens?
    else DCP world size == 1
        Scheduler->>Scheduler: real_available_size != real_total_num_tokens?
    end
    Scheduler->>Scheduler: If memory leak, log warning

Class diagram for updated attention kernel math functions

classDiagram
    class _correct_attn_cp_out_kernel {
        -lse
        -lse_max
        -lse_exp (now uses tl.exp2)
        -lse_acc
        -lse (now uses tl.log2)
        -factor (now uses tl.exp2)
        +output
    }

Class diagram for DCP logging in parallel state initializer

classDiagram
    class ParallelState {
        +initialize_model_parallel()
        +get_dcp_size_from_env()
        +get_tensor_model_parallel_rank()
        +logger.info() // logs DCP enabled/disabled based on dcp_size
    }

File-Level Changes

Change	Details	Files
Refactor memory-leak logic to support DCP world size	Introduce real_available_size and real_total_num_tokens variables Retrieve dcp_world_size via get_dcp_world_size() Use conditional comparison for memory_leak when dcp_world_size > 1 Preserve legacy check when dcp_world_size ≤ 1	`python/sglang/srt/managers/scheduler.py`
Add DCP enable/disable logging in model parallel initialization	Log “DCP enabled” when decode_context_model_parallel_size > 1 on rank 0 Log “DCP disabled” otherwise on rank 0	`python/sglang/srt/distributed/parallel_state.py`
Switch attention kernel to base-2 exp/log operations	Replace tl.exp with tl.exp2 Replace tl.log with tl.log2	`python/sglang/srt/layers/attention/utils.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

gemini-code-assist · 2025-11-18T03:04:50Z

Summary of Changes

Hello @Rythsman, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on refining the distributed computing aspects, particularly concerning Decode Context Parallel (DCP). It introduces clearer logging for DCP's operational status, adjusts core numerical computations within an attention mechanism, and updates memory leak detection to be more robust and accurate when DCP is in use. The overall aim is to improve the stability, observability, and correctness of the system under parallel execution configurations.

Highlights

Enhanced Logging for Decode Context Parallel (DCP): Improved logging during the initialization of model parallelism to clearly indicate whether Decode Context Parallel (DCP) is enabled or disabled, along with its size and the tensor model parallel size. This provides better visibility into the distributed setup.
Numerical Function Base Change in Attention Kernel: Modified the _correct_attn_cp_out_kernel function to use base-2 exponential (tl.exp2) and logarithm (tl.log2) functions instead of their natural counterparts (tl.exp, tl.log). This change likely impacts log-sum-exp calculations, potentially for numerical stability or performance optimization.
Refined Memory Leak Detection for DCP: Updated the check_memory function to incorporate the get_dcp_world_size and adjust the memory leak detection logic. When Decode Context Parallel (DCP) is active (dcp_world_size > 1), the memory leak condition is now real_available_size < real_total_num_tokens, accounting for potential differences in memory management with DCP enabled.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

sourcery-ai

Hey there - I've reviewed your changes and they look great!

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

gemini-code-assist

Code Review

This pull request introduces changes related to Decode Context Parallelism (DCP), including logging for its status and adjustments to memory leak detection logic. It also updates attention kernels to use base-2 logarithm functions, likely for performance. My review focuses on improving code clarity and maintainability in the new DCP-related logic.

gemini-code-assist · 2025-11-18T03:07:32Z

+    if decode_context_model_parallel_size > 1:
+        if get_tensor_model_parallel_rank() == 0:
+            logger.info(f"DCP enabled, dcp_size={decode_context_model_parallel_size}, tp_size={tensor_model_parallel_size}")
+    else:
+        if get_tensor_model_parallel_rank() == 0:
+            logger.info(f"DCP disabled, dcp_size={decode_context_model_parallel_size}, tp_size={tensor_model_parallel_size}")


The logging logic for DCP status can be simplified to avoid code duplication and improve readability. The check for get_tensor_model_parallel_rank() == 0 can be performed once, and a conditional expression can be used to determine the status string.

Suggested change

if decode_context_model_parallel_size > 1:

if get_tensor_model_parallel_rank() == 0:

logger.info(f"DCP enabled, dcp_size={decode_context_model_parallel_size}, tp_size={tensor_model_parallel_size}")

else:

if get_tensor_model_parallel_rank() == 0:

logger.info(f"DCP disabled, dcp_size={decode_context_model_parallel_size}, tp_size={tensor_model_parallel_size}")

if get_tensor_model_parallel_rank() == 0:

status = "enabled" if decode_context_model_parallel_size > 1 else "disabled"

logger.info(f"DCP {status}, dcp_size={decode_context_model_parallel_size}, tp_size={tensor_model_parallel_size}")

gemini-code-assist · 2025-11-18T03:07:32Z

+            real_available_size = available_size + evictable_size
+            real_total_num_tokens = self.max_total_num_tokens - protected_size
+            dcp_world_size = get_dcp_world_size()
+            # TODO(wh): currently, enable_dcp with get more avalibale_size, check later
            token_msg = f"{self.max_total_num_tokens=}, {available_size=}, {evictable_size=}, {protected_size=}\n"
+            if dcp_world_size > 1:
+                memory_leak = real_available_size < real_total_num_tokens
+            else:
+                memory_leak = real_available_size != real_total_num_tokens


The logic for memory leak detection can be refactored for better clarity. It's good practice to state the general rule first and then handle the exception. This makes the intention of the code more explicit.

Also, there's a typo in the TODO comment: avalibale_size should be available_size.

Suggested change

real_available_size = available_size + evictable_size

real_total_num_tokens = self.max_total_num_tokens - protected_size

dcp_world_size = get_dcp_world_size()

# TODO(wh): currently, enable_dcp with get more avalibale_size, check later

token_msg = f"{self.max_total_num_tokens=}, {available_size=}, {evictable_size=}, {protected_size=}\n"

if dcp_world_size > 1:

memory_leak = real_available_size < real_total_num_tokens

else:

memory_leak = real_available_size != real_total_num_tokens

real_available_size = available_size + evictable_size

real_total_num_tokens = self.max_total_num_tokens - protected_size

dcp_world_size = get_dcp_world_size()

# TODO(wh): currently, enable_dcp gets more available_size, check later

token_msg = f"{self.max_total_num_tokens=}, {available_size=}, {evictable_size=}, {protected_size=}\n"

memory_leak = real_available_size != real_total_num_tokens

if memory_leak and dcp_world_size > 1 and real_available_size > real_total_num_tokens:

# This is a known issue with DCP, not a leak.

memory_leak = False

wanghao44 added 2 commits November 13, 2025 21:39

[tmp]: add log

d26567d

[fix]: tmp modify for mem leak check && fix cp attn acc error in sglang

97bf118

sourcery-ai bot reviewed Nov 18, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

staugust merged commit 411e1f6 into antgroup:yjh/dcp-dev Nov 20, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wh/dcp dev 2#16

Wh/dcp dev 2#16
staugust merged 2 commits intoantgroup:yjh/dcp-devfrom
Rythsman:wh/dcp-dev-2

Rythsman commented Nov 18, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Nov 18, 2025 •

edited

Loading

Reviewer's Guide

Sequence diagram for DCP-aware memory leak check in scheduler

Class diagram for updated attention kernel math functions

Class diagram for DCP logging in parallel state initializer

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist bot commented Nov 18, 2025

Uh oh!

sourcery-ai bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Uh oh!

gemini-code-assist bot Nov 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Rythsman commented Nov 18, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for DCP-aware memory leak check in scheduler

Class diagram for updated attention kernel math functions

Class diagram for DCP logging in parallel state initializer

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

gemini-code-assist bot commented Nov 18, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rythsman commented Nov 18, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 18, 2025 •

edited

Loading