豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

Add Claude 4.6 prompt optimization A/B test configurations#4316

Merged
bhavyaus merged 2 commits intomainfrom
dev/bhavyau/claude46-prompt-optimization
Mar 10, 2026
Merged

Add Claude 4.6 prompt optimization A/B test configurations#4316
bhavyaus merged 2 commits intomainfrom
dev/bhavyau/claude46-prompt-optimization

Conversation

@bhavyaus
Copy link
Copy Markdown
Contributor

Summary

Implements the Claude 4.6 Prompt Optimization Test Plan with three A/B test configurations:

  • Control: Existing Claude46DefaultPrompt (no change)
  • Combined: Single optimized prompt for both Opus and Sonnet with moderate exploration guidance
  • Split: Separate Opus-specific (bounded exploration) and Sonnet-specific (full persistence) prompts

Background

Two problems observed in benchmarks:

  • Opus over-exploration: +78.6% read_file calls, +130.5% manage_todo_list calls vs expected
  • Sonnet token usage: Higher than expected

Changes

  • Add github.copilot.chat.anthropic.promptOptimization experiment-based setting (control/combined/split)
  • Add Claude46OptimizedBasePrompt base class with condensed shared sections
  • Add tier-specific subclasses: Claude46CombinedPrompt, Claude46OpusPrompt, Claude46SonnetPrompt
  • Add ToolSearchToolPromptOptimized (flattened, no custom search variant)
  • Add FileLinkificationInstructionsOptimized (condensed formatting rules)
  • Add AnthropicReminderInstructionsOptimized (inlined editing reminder, removed tool_search block)
  • Update AnthropicPromptResolver with isOpus() detection and optimization routing

Copilot AI review requested due to automatic review settings March 10, 2026 04:46
@bhavyaus bhavyaus enabled auto-merge March 10, 2026 04:46
@vs-code-engineering
Copy link
Copy Markdown
Contributor

vs-code-engineering bot commented Mar 10, 2026

📬 CODENOTIFY

The following users are being notified based on files changed in this PR:

@bryanchen-d

Matched files:

  • src/extension/prompts/node/agent/anthropicPrompts.tsx

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

The PR implements a Claude 4.6 Prompt Optimization A/B test with three experimental configurations: control (existing behavior), combined (single condensed prompt for all Claude 4.6 models), and split (separate Opus-specific and Sonnet-specific prompts). The optimization addresses two benchmarked regressions: Opus over-exploration (+78.6% read calls, +130.5% todo calls) and higher-than-expected Sonnet token usage.

Changes:

  • Adds AnthropicPromptOptimization experiment-based configuration setting (control/combined/split)
  • Introduces Claude46OptimizedBasePrompt base class with three subclasses (Claude46CombinedPrompt, Claude46OpusPrompt, Claude46SonnetPrompt) providing tier-specific exploration guidance
  • Adds condensed variants: ToolSearchToolPromptOptimized, FileLinkificationInstructionsOptimized, and AnthropicReminderInstructionsOptimized

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/platform/configuration/common/configurationService.ts Adds AnthropicPromptOptimization experiment-based config key
src/extension/prompts/node/agent/fileLinkificationInstructions.tsx Adds condensed FileLinkificationInstructionsOptimized variant
src/extension/prompts/node/agent/anthropicPrompts.tsx Core changes: adds ToolSearchToolPromptOptimized, Claude46OptimizedBasePrompt hierarchy, AnthropicReminderInstructionsOptimized, and routing logic in AnthropicPromptResolver
package.nls.json Adds NLS string for the new setting
package.json Registers the new github.copilot.chat.anthropic.promptOptimization VS Code setting
Comments suppressed due to low confidence (1)

src/extension/prompts/node/agent/anthropicPrompts.tsx:525

  • ToolSearchToolPromptOptimized always instructs the model to call TOOL_SEARCH_TOOL_NAME (the server-side regex tool tool_search_tool_regex), but when AnthropicToolSearchMode is set to 'client', the server-side tool is not added to the tool list (see messagesApi.ts line 110), and only CUSTOM_TOOL_SEARCH_NAME (tool_search) is available instead. In that configuration, the optimized prompt would tell the model to call tool_search_tool_regex, which won't be found, causing tool search to fail silently. This issue only manifests when both AnthropicPromptOptimization ≠ 'control' and AnthropicToolSearchMode = 'client' are active simultaneously. If these two settings are always mutually exclusive in experiments, this is safe, but it should be documented or guarded.
		return <Tag name='toolSearchInstructions'>
			You MUST use {TOOL_SEARCH_TOOL_NAME} to load deferred tools BEFORE calling them. Calling a deferred tool without loading it first will fail.<br />
			<br />
			Construct regex patterns using Python re.search() syntax:<br />
			- `^mcp_github_` matches tools starting with "mcp_github_"<br />
			- `issue|pull_request` matches tools containing "issue" OR "pull_request"<br />
			- `create.*branch` matches tools with "create" followed by "branch"<br />
			<br />
			The pattern matches case-insensitively against tool names, descriptions, argument names, and argument descriptions.<br />
			<br />
			Do NOT call {TOOL_SEARCH_TOOL_NAME} again for a tool already returned by a previous search. If a search returns no matching tools, the tool is not available. Do not retry with different patterns.<br />
			<br />
			Available deferred tools (must be loaded before use):<br />
			{deferredTools.join('\n')}
		</Tag>;
	}

Comment thread src/extension/prompts/node/agent/anthropicPrompts.tsx
Comment thread src/extension/prompts/node/agent/anthropicPrompts.tsx
Comment thread src/extension/prompts/node/agent/anthropicPrompts.tsx Outdated
DonJayamanne
DonJayamanne previously approved these changes Mar 10, 2026
Implement three-way prompt optimization experiment for Claude 4.6 models:
- Control: existing Claude46DefaultPrompt (no change)
- Combined: single optimized prompt for both Opus and Sonnet with moderate exploration guidance
- Split: separate Opus-specific (bounded exploration) and Sonnet-specific (full persistence) prompts
@bhavyaus bhavyaus force-pushed the dev/bhavyau/claude46-prompt-optimization branch from ab3018b to ac97070 Compare March 10, 2026 15:49
@bhavyaus bhavyaus requested a review from DonJayamanne March 10, 2026 15:49
@bhavyaus bhavyaus added this pull request to the merge queue Mar 10, 2026
Merged via the queue into main with commit 8b7c6ec Mar 10, 2026
19 checks passed
@bhavyaus bhavyaus deleted the dev/bhavyau/claude46-prompt-optimization branch March 10, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants