[代理镜像] Add Claude 4.6 prompt optimization A/B test configurations by bhavyaus · Pull Request #4316 · microsoft/vscode-copilot-chat

bhavyaus · 2026-03-10T04:46:37Z

Summary

Implements the Claude 4.6 Prompt Optimization Test Plan with three A/B test configurations:

Control: Existing Claude46DefaultPrompt (no change)
Combined: Single optimized prompt for both Opus and Sonnet with moderate exploration guidance
Split: Separate Opus-specific (bounded exploration) and Sonnet-specific (full persistence) prompts

Background

Two problems observed in benchmarks:

Opus over-exploration: +78.6% read_file calls, +130.5% manage_todo_list calls vs expected
Sonnet token usage: Higher than expected

Changes

Add github.copilot.chat.anthropic.promptOptimization experiment-based setting (control/combined/split)
Add Claude46OptimizedBasePrompt base class with condensed shared sections
Add tier-specific subclasses: Claude46CombinedPrompt, Claude46OpusPrompt, Claude46SonnetPrompt
Add ToolSearchToolPromptOptimized (flattened, no custom search variant)
Add FileLinkificationInstructionsOptimized (condensed formatting rules)
Add AnthropicReminderInstructionsOptimized (inlined editing reminder, removed tool_search block)
Update AnthropicPromptResolver with isOpus() detection and optimization routing

vs-code-engineering · 2026-03-10T04:47:03Z

📬 CODENOTIFY

The following users are being notified based on files changed in this PR:

@bryanchen-d

Matched files:

src/extension/prompts/node/agent/anthropicPrompts.tsx

Copilot

Pull request overview

The PR implements a Claude 4.6 Prompt Optimization A/B test with three experimental configurations: control (existing behavior), combined (single condensed prompt for all Claude 4.6 models), and split (separate Opus-specific and Sonnet-specific prompts). The optimization addresses two benchmarked regressions: Opus over-exploration (+78.6% read calls, +130.5% todo calls) and higher-than-expected Sonnet token usage.

Changes:

Adds AnthropicPromptOptimization experiment-based configuration setting (control/combined/split)
Introduces Claude46OptimizedBasePrompt base class with three subclasses (Claude46CombinedPrompt, Claude46OpusPrompt, Claude46SonnetPrompt) providing tier-specific exploration guidance
Adds condensed variants: ToolSearchToolPromptOptimized, FileLinkificationInstructionsOptimized, and AnthropicReminderInstructionsOptimized

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`src/platform/configuration/common/configurationService.ts`	Adds `AnthropicPromptOptimization` experiment-based config key
`src/extension/prompts/node/agent/fileLinkificationInstructions.tsx`	Adds condensed `FileLinkificationInstructionsOptimized` variant
`src/extension/prompts/node/agent/anthropicPrompts.tsx`	Core changes: adds `ToolSearchToolPromptOptimized`, `Claude46OptimizedBasePrompt` hierarchy, `AnthropicReminderInstructionsOptimized`, and routing logic in `AnthropicPromptResolver`
`package.nls.json`	Adds NLS string for the new setting
`package.json`	Registers the new `github.copilot.chat.anthropic.promptOptimization` VS Code setting

Comments suppressed due to low confidence (1)

src/extension/prompts/node/agent/anthropicPrompts.tsx:525

ToolSearchToolPromptOptimized always instructs the model to call TOOL_SEARCH_TOOL_NAME (the server-side regex tool tool_search_tool_regex), but when AnthropicToolSearchMode is set to 'client', the server-side tool is not added to the tool list (see messagesApi.ts line 110), and only CUSTOM_TOOL_SEARCH_NAME (tool_search) is available instead. In that configuration, the optimized prompt would tell the model to call tool_search_tool_regex, which won't be found, causing tool search to fail silently. This issue only manifests when both AnthropicPromptOptimization ≠ 'control' and AnthropicToolSearchMode = 'client' are active simultaneously. If these two settings are always mutually exclusive in experiments, this is safe, but it should be documented or guarded.

		return <Tag name='toolSearchInstructions'>
			You MUST use {TOOL_SEARCH_TOOL_NAME} to load deferred tools BEFORE calling them. Calling a deferred tool without loading it first will fail.<br />
			<br />
			Construct regex patterns using Python re.search() syntax:<br />
			- `^mcp_github_` matches tools starting with "mcp_github_"<br />
			- `issue|pull_request` matches tools containing "issue" OR "pull_request"<br />
			- `create.*branch` matches tools with "create" followed by "branch"<br />
			<br />
			The pattern matches case-insensitively against tool names, descriptions, argument names, and argument descriptions.<br />
			<br />
			Do NOT call {TOOL_SEARCH_TOOL_NAME} again for a tool already returned by a previous search. If a search returns no matching tools, the tool is not available. Do not retry with different patterns.<br />
			<br />
			Available deferred tools (must be loaded before use):<br />
			{deferredTools.join('\n')}
		</Tag>;
	}

Implement three-way prompt optimization experiment for Claude 4.6 models: - Control: existing Claude46DefaultPrompt (no change) - Combined: single optimized prompt for both Opus and Sonnet with moderate exploration guidance - Split: separate Opus-specific (bounded exploration) and Sonnet-specific (full persistence) prompts

…onditional rendering for tool instructions

Copilot AI review requested due to automatic review settings March 10, 2026 04:46

bhavyaus enabled auto-merge March 10, 2026 04:46

Copilot started reviewing on behalf of bhavyaus March 10, 2026 04:48 View session

vs-code-engineering bot assigned bhavyaus Mar 10, 2026

vs-code-engineering bot added this to the 1.112.0 milestone Mar 10, 2026

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread src/extension/prompts/node/agent/anthropicPrompts.tsx

Comment thread src/extension/prompts/node/agent/anthropicPrompts.tsx

Comment thread src/extension/prompts/node/agent/anthropicPrompts.tsx Outdated

DonJayamanne previously approved these changes Mar 10, 2026

View reviewed changes

bhavyaus added 2 commits March 10, 2026 08:48

Optimize Claude 4.6 prompt configurations with type adjustments and c…

ac97070

…onditional rendering for tool instructions

bhavyaus dismissed DonJayamanne’s stale review via ac97070 March 10, 2026 15:49

bhavyaus force-pushed the dev/bhavyau/claude46-prompt-optimization branch from ab3018b to ac97070 Compare March 10, 2026 15:49

bhavyaus requested a review from DonJayamanne March 10, 2026 15:49

anthonykim1 approved these changes Mar 10, 2026

View reviewed changes

bhavyaus added this pull request to the merge queue Mar 10, 2026

Merged via the queue into main with commit 8b7c6ec Mar 10, 2026
19 checks passed

bhavyaus deleted the dev/bhavyau/claude46-prompt-optimization branch March 10, 2026 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Claude 4.6 prompt optimization A/B test configurations#4316

Add Claude 4.6 prompt optimization A/B test configurations#4316
bhavyaus merged 2 commits intomainfrom
dev/bhavyau/claude46-prompt-optimization

bhavyaus commented Mar 10, 2026

Uh oh!

vs-code-engineering bot commented Mar 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

bhavyaus commented Mar 10, 2026

Summary

Background

Changes

Uh oh!

vs-code-engineering bot commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📬 CODENOTIFY

@bryanchen-d

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vs-code-engineering bot commented Mar 10, 2026 •

edited

Loading