Add Claude 4.6 prompt optimization A/B test configurations#4316
Merged
Add Claude 4.6 prompt optimization A/B test configurations#4316
Conversation
Contributor
📬 CODENOTIFYThe following users are being notified based on files changed in this PR: @bryanchen-dMatched files:
|
Contributor
There was a problem hiding this comment.
Pull request overview
The PR implements a Claude 4.6 Prompt Optimization A/B test with three experimental configurations: control (existing behavior), combined (single condensed prompt for all Claude 4.6 models), and split (separate Opus-specific and Sonnet-specific prompts). The optimization addresses two benchmarked regressions: Opus over-exploration (+78.6% read calls, +130.5% todo calls) and higher-than-expected Sonnet token usage.
Changes:
- Adds
AnthropicPromptOptimizationexperiment-based configuration setting (control/combined/split) - Introduces
Claude46OptimizedBasePromptbase class with three subclasses (Claude46CombinedPrompt,Claude46OpusPrompt,Claude46SonnetPrompt) providing tier-specific exploration guidance - Adds condensed variants:
ToolSearchToolPromptOptimized,FileLinkificationInstructionsOptimized, andAnthropicReminderInstructionsOptimized
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/platform/configuration/common/configurationService.ts |
Adds AnthropicPromptOptimization experiment-based config key |
src/extension/prompts/node/agent/fileLinkificationInstructions.tsx |
Adds condensed FileLinkificationInstructionsOptimized variant |
src/extension/prompts/node/agent/anthropicPrompts.tsx |
Core changes: adds ToolSearchToolPromptOptimized, Claude46OptimizedBasePrompt hierarchy, AnthropicReminderInstructionsOptimized, and routing logic in AnthropicPromptResolver |
package.nls.json |
Adds NLS string for the new setting |
package.json |
Registers the new github.copilot.chat.anthropic.promptOptimization VS Code setting |
Comments suppressed due to low confidence (1)
src/extension/prompts/node/agent/anthropicPrompts.tsx:525
ToolSearchToolPromptOptimizedalways instructs the model to callTOOL_SEARCH_TOOL_NAME(the server-side regex tooltool_search_tool_regex), but whenAnthropicToolSearchModeis set to'client', the server-side tool is not added to the tool list (seemessagesApi.tsline 110), and onlyCUSTOM_TOOL_SEARCH_NAME(tool_search) is available instead. In that configuration, the optimized prompt would tell the model to calltool_search_tool_regex, which won't be found, causing tool search to fail silently. This issue only manifests when bothAnthropicPromptOptimization ≠ 'control'andAnthropicToolSearchMode = 'client'are active simultaneously. If these two settings are always mutually exclusive in experiments, this is safe, but it should be documented or guarded.
return <Tag name='toolSearchInstructions'>
You MUST use {TOOL_SEARCH_TOOL_NAME} to load deferred tools BEFORE calling them. Calling a deferred tool without loading it first will fail.<br />
<br />
Construct regex patterns using Python re.search() syntax:<br />
- `^mcp_github_` matches tools starting with "mcp_github_"<br />
- `issue|pull_request` matches tools containing "issue" OR "pull_request"<br />
- `create.*branch` matches tools with "create" followed by "branch"<br />
<br />
The pattern matches case-insensitively against tool names, descriptions, argument names, and argument descriptions.<br />
<br />
Do NOT call {TOOL_SEARCH_TOOL_NAME} again for a tool already returned by a previous search. If a search returns no matching tools, the tool is not available. Do not retry with different patterns.<br />
<br />
Available deferred tools (must be loaded before use):<br />
{deferredTools.join('\n')}
</Tag>;
}
DonJayamanne
previously approved these changes
Mar 10, 2026
Implement three-way prompt optimization experiment for Claude 4.6 models: - Control: existing Claude46DefaultPrompt (no change) - Combined: single optimized prompt for both Opus and Sonnet with moderate exploration guidance - Split: separate Opus-specific (bounded exploration) and Sonnet-specific (full persistence) prompts
…onditional rendering for tool instructions
ab3018b to
ac97070
Compare
anthonykim1
approved these changes
Mar 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the Claude 4.6 Prompt Optimization Test Plan with three A/B test configurations:
Claude46DefaultPrompt(no change)Background
Two problems observed in benchmarks:
Changes
github.copilot.chat.anthropic.promptOptimizationexperiment-based setting (control/combined/split)Claude46OptimizedBasePromptbase class with condensed shared sectionsClaude46CombinedPrompt,Claude46OpusPrompt,Claude46SonnetPromptToolSearchToolPromptOptimized(flattened, no custom search variant)FileLinkificationInstructionsOptimized(condensed formatting rules)AnthropicReminderInstructionsOptimized(inlined editing reminder, removed tool_search block)AnthropicPromptResolverwithisOpus()detection and optimization routing