豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

Commit b3e2aa3

Browse files
authored
feat: add OTel events and metrics for agentic edit quality signals (#4794)
* docs: add OTel backfill plan for agentic change metrics * docs: add Claude Code OTel parity analysis with feasibility + line estimates * docs: expand plan to cover all agentic surfaces (inline chat, CLI, cloud, NES) * docs: remove NES and Claude Code comparison, keep plan lean * docs: add 3-pillar signal type mapping (metrics/events/traces) * docs: re-audit signals — counters when easy, events only with useful attrs * feat: add OTel event emitters for agentic edit quality metrics * feat: add OTel counters and histograms for agentic edit quality metrics * feat: wire OTel events/metrics into userActions.ts for all agentic user actions * feat: wire OTel survival events into apply_patch, replace_string, and code_mapper tools * feat: wire OTel counters for agent summarization and edit response metrics * fix: resolve TypeScript errors — thread IOTelService through intent class hierarchy * docs: update sprint plan with completion notes * style: fix import ordering from editor auto-sort * feat: wire OTel counters for cloud session invoke, PR ready, and CLI PR creation * docs: consolidate OTel edit quality metrics into agent_monitoring.md * docs: rename Edit Quality to Agent Activity & Outcome * docs: align Edit Quality references to Agent Activity naming * refactor: adopt Harald's type-safe metrics API (EditSource/EditOutcome, 2 survival histograms)
1 parent 74475c3 commit b3e2aa3

File tree

15 files changed

+387
-33
lines changed

15 files changed

+387
-33
lines changed

docs/monitoring/agent_monitoring.md

Lines changed: 117 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,43 @@ invoke_agent copilot [~15s]
224224

225225
**`copilot_chat.time_to_first_token` attributes:** `gen_ai.request.model`
226226

227+
#### Agent Activity & Outcome Metrics
228+
229+
These metrics track the activity and outcomes of agentic code changes across all surfaces (agent mode, inline chat, background CLI, cloud sessions).
230+
231+
| Metric | Type | Unit | Description |
232+
|---|---|---|---|
233+
| `copilot_chat.edit.accept.count` | Counter | edits | File-level and inline edit accept/reject decisions |
234+
| `copilot_chat.edit.hunk.count` | Counter | hunks | Hunk-level accept/reject decisions |
235+
| `copilot_chat.lines_of_code.count` | Counter | lines | Lines of code added/removed by accepted agent edits |
236+
| `copilot_chat.edit.survival_rate` | Histogram | ratio (0-1) | How much AI-edited code survives over time (4-gram similarity) |
237+
| `copilot_chat.user.action.count` | Counter | actions | User engagement: copy, insert, apply, followup |
238+
| `copilot_chat.user.feedback.count` | Counter | votes | Thumbs up/down on chat responses |
239+
| `copilot_chat.agent.edit_response.count` | Counter | responses | Agent edit responses by success/error |
240+
| `copilot_chat.agent.summarization.count` | Counter | events | Context summarization outcomes (applied/failed) |
241+
| `copilot_chat.pull_request.count` | Counter | PRs | Pull requests created via CLI agent |
242+
| `copilot_chat.commit.count` | Counter | commits | Git commits created via agent |
243+
| `copilot_chat.cloud.session.count` | Counter | sessions | Cloud/remote agent sessions by partner |
244+
| `copilot_chat.cloud.pr_ready.count` | Counter | events | Remote agent job PR ready notifications |
245+
246+
**`copilot_chat.edit.accept.count` attributes:** `outcome` (`accepted`/`rejected`), `edit_surface` (`agent`/`inline_chat`)
247+
248+
**`copilot_chat.edit.hunk.count` attributes:** `outcome` (`accepted`/`rejected`)
249+
250+
**`copilot_chat.lines_of_code.count` attributes:** `type` (`added`/`removed`), `language_id`
251+
252+
**`copilot_chat.edit.survival_rate` attributes:** `edit_source` (`apply_patch`/`replace_string`/`code_mapper`/`inline_chat`), `time_delay_ms`
253+
254+
**`copilot_chat.user.action.count` attributes:** `action` (`copy`/`insert`/`apply`/`followup`)
255+
256+
**`copilot_chat.user.feedback.count` attributes:** `rating` (`positive`/`negative`)
257+
258+
**`copilot_chat.agent.edit_response.count` attributes:** `outcome` (`success`/`error`)
259+
260+
**`copilot_chat.agent.summarization.count` attributes:** `outcome` (`applied`/`failed`)
261+
262+
**`copilot_chat.cloud.session.count` attributes:** `partner_agent` (`copilot`/`claude`/`codex`)
263+
227264
### Events
228265

229266
#### `gen_ai.client.inference.operation.details`
@@ -278,6 +315,84 @@ Emitted for each LLM round-trip within an agent invocation.
278315
| `gen_ai.usage.output_tokens` | Output tokens this turn |
279316
| `tool_call_count` | Number of tool calls this turn |
280317

318+
#### Agent Activity & Outcome Events
319+
320+
These events provide drill-down detail for the agent activity metrics above. They are emitted as OTel log records.
321+
322+
##### `copilot_chat.edit.feedback`
323+
324+
Emitted when a user accepts or rejects a file-level edit from the agent.
325+
326+
| Attribute | Description |
327+
|---|---|
328+
| `outcome` | `accepted` or `rejected` |
329+
| `language_id` | Language of the edited file |
330+
| `participant` | Chat participant that proposed the edit |
331+
| `request_id` | Chat request identifier |
332+
| `edit_surface` | `agent` or `inline_chat` |
333+
| `has_remaining_edits` | Whether unreviewed edits remain |
334+
| `is_notebook` | Whether the file is a notebook |
335+
336+
##### `copilot_chat.edit.hunk.action`
337+
338+
Emitted when a user accepts or rejects an individual hunk.
339+
340+
| Attribute | Description |
341+
|---|---|
342+
| `outcome` | `accepted` or `rejected` |
343+
| `language_id` | Language of the edited file |
344+
| `request_id` | Chat request identifier |
345+
| `line_count` | Total lines in the hunk |
346+
| `lines_added` | Lines added |
347+
| `lines_removed` | Lines removed |
348+
349+
##### `copilot_chat.inline.done`
350+
351+
Emitted when an inline chat edit is accepted or rejected.
352+
353+
| Attribute | Description |
354+
|---|---|
355+
| `accepted` | `true` or `false` |
356+
| `language_id` | Language of the edited file |
357+
| `edit_count` | Number of edits suggested |
358+
| `edit_line_count` | Total lines across all edits |
359+
| `reply_type` | How the response was shown |
360+
| `is_notebook` | Whether the document is a notebook |
361+
362+
##### `copilot_chat.edit.survival`
363+
364+
Emitted at intervals (5s, 30s, 2min, 5min, 10min, 15min) after an edit is accepted, measuring how much of the AI-generated code survives.
365+
366+
| Attribute | Description |
367+
|---|---|
368+
| `edit_source` | `apply_patch`, `replace_string`, `code_mapper`, or `inline_chat` |
369+
| `survival_rate_four_gram` | 0-1 ratio of AI edit still present (4-gram similarity) |
370+
| `survival_rate_no_revert` | 0-1 ratio of edit ranges not reverted |
371+
| `time_delay_ms` | Milliseconds since edit acceptance |
372+
| `did_branch_change` | Whether git branch changed (ignore if `true`) |
373+
| `request_id` | Chat request identifier |
374+
375+
##### `copilot_chat.user.feedback`
376+
377+
Emitted when a user votes on a chat response (thumbs up/down).
378+
379+
| Attribute | Description |
380+
|---|---|
381+
| `rating` | `positive` or `negative` |
382+
| `participant` | Chat participant name |
383+
| `conversation_id` | Conversation session ID |
384+
| `request_id` | Chat request identifier |
385+
386+
##### `copilot_chat.cloud.session.invoke`
387+
388+
Emitted when a cloud/remote agent session is started.
389+
390+
| Attribute | Description |
391+
|---|---|
392+
| `partner_agent` | `copilot`, `claude`, or `codex` |
393+
| `model` | Model identifier |
394+
| `request_id` | Chat request identifier |
395+
281396
### Resource Attributes
282397

283398
All signals carry:
@@ -449,9 +564,9 @@ In your trace viewer, filter by `service.name` to see traces from specific agent
449564

450565
**Traces** — Visualize the full agent execution in Jaeger or Grafana Tempo. Each `invoke_agent` span contains child `chat` and `execute_tool` spans, making it easy to identify bottlenecks and debug failures. Subagent invocations appear as nested `invoke_agent` spans under `execute_tool runSubagent`.
451566

452-
**Metrics** — Track token usage trends by model and provider, monitor tool success rates via `copilot_chat.tool.call.count`, and watch perceived latency with `copilot_chat.time_to_first_token`. All metrics carry the same resource attributes (`service.name`, `service.version`, `session.id`) for consistent filtering.
567+
**Metrics** — Track token usage trends by model and provider, monitor tool success rates via `copilot_chat.tool.call.count`, and watch perceived latency with `copilot_chat.time_to_first_token`. Agent activity metrics (`copilot_chat.edit.accept.count`, `copilot_chat.edit.survival_rate`, `copilot_chat.lines_of_code.count`) power accept rate and commit survival dashboards. All metrics carry the same resource attributes (`service.name`, `service.version`, `session.id`) for consistent filtering.
453568

454-
**Events**`copilot_chat.session.start` tracks session creation. `copilot_chat.tool.call` events provide per-invocation timing and error details. `gen_ai.client.inference.operation.details` gives the full LLM call record including token usage and, when content capture is enabled, the complete prompt/response messages. Use `gen_ai.conversation.id` to correlate all signals belonging to the same session.
569+
**Events**`copilot_chat.session.start` tracks session creation. `copilot_chat.tool.call` events provide per-invocation timing and error details. `copilot_chat.edit.feedback` and `copilot_chat.edit.survival` events enable drill-down into which edits were accepted/rejected and how code survival varies by edit source. `copilot_chat.user.feedback` links thumbs-up/down votes to specific conversations for quality investigation. `gen_ai.client.inference.operation.details` gives the full LLM call record including token usage and, when content capture is enabled, the complete prompt/response messages. Use `gen_ai.conversation.id` to correlate all signals belonging to the same session.
455570

456571
---
457572

src/extension/chatSessions/copilotcli/node/copilotcliSession.ts

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ import type * as vscode from 'vscode';
99
import type { ChatParticipantToolToken } from 'vscode';
1010
import { ConfigKey, IConfigurationService } from '../../../../platform/configuration/common/configurationService';
1111
import { ILogService } from '../../../../platform/log/common/logService';
12+
import { GenAiMetrics } from '../../../../platform/otel/common/genAiMetrics';
1213
import { CopilotChatAttr, GenAiAttr, GenAiOperationName, IOTelService, ISpanHandle, SpanKind, SpanStatusCode, truncateForOTel } from '../../../../platform/otel/common/index';
1314
import type { ParsedPromptFile } from '../../../../platform/promptFiles/common/promptsService';
1415
import { CapturingToken } from '../../../../platform/requestLogger/common/capturingToken';
@@ -580,6 +581,7 @@ export class CopilotCLISession extends DisposableStore implements ICopilotCLISes
580581
if (pullRequestUrl) {
581582
this._createdPullRequestUrl = pullRequestUrl;
582583
this.logService.trace(`[CopilotCLISession] Captured pull request URL: ${pullRequestUrl}`);
584+
GenAiMetrics.incrementPullRequestCount(this._otelService);
583585
}
584586
}
585587
// Log tool call to request logger

src/extension/chatSessions/vscode-node/copilotCloudSessionsProvider.ts

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ import { GithubRepoId, IGitService } from '../../../platform/git/common/gitServi
1515
import { derivePullRequestState, PullRequestSearchItem, SessionInfo } from '../../../platform/github/common/githubAPI';
1616
import { AuthOptions, CCAEnabledResult, IGithubRepositoryService, IOctoKitService, JobInfo, RemoteAgentJobResponse } from '../../../platform/github/common/githubService';
1717
import { ILogService } from '../../../platform/log/common/logService';
18+
import { GenAiMetrics } from '../../../platform/otel/common/genAiMetrics';
19+
import { IOTelService } from '../../../platform/otel/common/otelService';
1820
import { IExperimentationService } from '../../../platform/telemetry/common/nullExperimentationService';
1921
import { ITelemetryService } from '../../../platform/telemetry/common/telemetry';
2022
import { DeferredPromise, retry, RunOnceScheduler } from '../../../util/vs/base/common/async';
@@ -307,6 +309,7 @@ export class CopilotCloudSessionsProvider extends Disposable implements vscode.C
307309
@IChatDelegationSummaryService private readonly _chatDelegationSummaryService: IChatDelegationSummaryService,
308310
@IExperimentationService private readonly _experimentationService: IExperimentationService,
309311
@IDomainService private readonly _domainService: IDomainService,
312+
@IOTelService private readonly _otelService: IOTelService,
310313
) {
311314
super();
312315
this.registerCommands();
@@ -1962,6 +1965,7 @@ export class CopilotCloudSessionsProvider extends Disposable implements vscode.C
19621965
partnerAgent: partnerAgent?.name ?? 'unknown',
19631966
model: modelId ?? 'unknown'
19641967
});
1968+
GenAiMetrics.incrementCloudSessionCount(this._otelService, partnerAgent?.name ?? 'unknown');
19651969

19661970
// Follow up
19671971
if (context.chatSessionContext && !context.chatSessionContext.isUntitled && request.sessionResource.scheme === CopilotCloudSessionsProvider.TYPE) {
@@ -2480,6 +2484,7 @@ export class CopilotCloudSessionsProvider extends Disposable implements vscode.C
24802484
}
24812485
*/
24822486
this.telemetry.sendMSFTTelemetryEvent('copilotcloud.chat.remoteAgentJobPullRequestReady');
2487+
GenAiMetrics.incrementCloudPrReadyCount(this._otelService);
24832488
this.logService.trace(`Job ${jobId} now has pull request #${jobInfo.pull_request.number}`);
24842489
this.refresh();
24852490
return jobInfo;

src/extension/conversation/vscode-node/userActions.ts

Lines changed: 34 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,9 @@ import { EditSurvivalResult } from '../../../platform/editSurvivalTracking/commo
1010
import { ILanguageDiagnosticsService } from '../../../platform/languages/common/languageDiagnosticsService';
1111
import { IMultiFileEditInternalTelemetryService } from '../../../platform/multiFileEdit/common/multiFileEditQualityTelemetry';
1212
import { INotebookService } from '../../../platform/notebook/common/notebookService';
13-
import { GenAiMetrics } from '../../../platform/otel/common/genAiMetrics';
1413
import type { EditOutcome } from '../../../platform/otel/common/genAiAttributes';
14+
import { emitEditFeedbackEvent, emitEditHunkActionEvent, emitEditSurvivalEvent, emitInlineDoneEvent, emitUserFeedbackEvent } from '../../../platform/otel/common/genAiEvents';
15+
import { GenAiMetrics } from '../../../platform/otel/common/genAiMetrics';
1516
import { IOTelService } from '../../../platform/otel/common/otelService';
1617
import { ISurveyService } from '../../../platform/survey/common/surveyService';
1718
import { ITelemetryService, TelemetryEventMeasurements, TelemetryEventProperties } from '../../../platform/telemetry/common/telemetry';
@@ -91,6 +92,7 @@ export class UserFeedbackService implements IUserFeedbackService {
9192
characterCount: e.action.copiedCharacters,
9293
lineCount: e.action.copiedText.split('\n').length,
9394
});
95+
GenAiMetrics.incrementUserActionCount(this.otelService, 'copy');
9496
break;
9597
case 'insert':
9698
/* __GDPR__
@@ -116,6 +118,7 @@ export class UserFeedbackService implements IUserFeedbackService {
116118
characterCount: e.action.totalCharacters,
117119
newFile: e.action.newFile ? 1 : 0
118120
});
121+
GenAiMetrics.incrementUserActionCount(this.otelService, 'insert');
119122
break;
120123
case 'followUp':
121124
/* __GDPR__
@@ -134,6 +137,7 @@ export class UserFeedbackService implements IUserFeedbackService {
134137
participant: agentId,
135138
command: result.metadata?.command,
136139
});
140+
GenAiMetrics.incrementUserActionCount(this.otelService, 'followup');
137141
break;
138142
case 'bug':
139143
if (conversation) {
@@ -204,7 +208,11 @@ export class UserFeedbackService implements IUserFeedbackService {
204208
isNotebookCell: e.action.uri.scheme === Schemas.vscodeNotebookCell ? 1 : 0
205209
});
206210

207-
GenAiMetrics.recordChatEditOutcome(this.otelService, 'chat_editing', outcomes.get(e.action.outcome) ?? 'unknown', document?.languageId, e.action.hasRemainingEdits);
211+
{
212+
const otelOutcome = outcomes.get(e.action.outcome) ?? 'unknown';
213+
emitEditFeedbackEvent(this.otelService, otelOutcome, document?.languageId ?? '', agentId, result.metadata?.responseId ?? '', 'agent', e.action.hasRemainingEdits, this.notebookService.hasSupportedNotebooks(e.action.uri));
214+
GenAiMetrics.recordEditAcceptance(this.otelService, 'chat_editing', otelOutcome, document?.languageId);
215+
}
208216

209217
if (result.metadata?.responseId
210218
&& (e.action.outcome === vscode.ChatEditingSessionActionOutcome.Accepted
@@ -240,7 +248,13 @@ export class UserFeedbackService implements IUserFeedbackService {
240248
measurements,
241249
'edit.hunk.action'
242250
);
243-
GenAiMetrics.recordEditAcceptance(this.otelService, 'chat_editing_hunk', outcome, document?.languageId);
251+
252+
emitEditHunkActionEvent(this.otelService, outcome, document?.languageId ?? '', result.metadata?.responseId ?? '', e.action.lineCount, e.action.linesAdded, e.action.linesRemoved);
253+
GenAiMetrics.recordEditAcceptance(this.otelService, 'chat_editing_hunk', outcome, document?.languageId ?? '');
254+
if (outcome === 'accepted') {
255+
GenAiMetrics.incrementLinesOfCode(this.otelService, 'added', document?.languageId ?? '', e.action.linesAdded);
256+
GenAiMetrics.incrementLinesOfCode(this.otelService, 'removed', document?.languageId ?? '', e.action.linesRemoved);
257+
}
244258
}
245259
break;
246260
}
@@ -319,6 +333,7 @@ export class UserFeedbackService implements IUserFeedbackService {
319333
},
320334
'conversation.appliedCodeblock'
321335
);
336+
GenAiMetrics.incrementUserActionCount(this.otelService, 'apply');
322337
}
323338

324339
handleFeedback(e: vscode.ChatResultFeedback, agentId: string): void {
@@ -359,6 +374,10 @@ export class UserFeedbackService implements IUserFeedbackService {
359374
{},
360375
'conversation.messageRating'
361376
);
377+
378+
const otelRating = e.kind === vscode.ChatResultFeedbackKind.Helpful ? 'positive' : 'negative';
379+
emitUserFeedbackEvent(this.otelService, otelRating, agentId, result.metadata?.sessionId ?? '', result.metadata?.responseId ?? '');
380+
GenAiMetrics.incrementUserFeedbackCount(this.otelService, otelRating);
362381
}
363382

364383
// --- inline
@@ -474,10 +493,9 @@ export class UserFeedbackService implements IUserFeedbackService {
474493
this.telemetryService.sendMSFTTelemetryEvent('inline.done', sharedProps, {
475494
...sharedMeasures, accepted
476495
});
477-
this.telemetryService.sendGHTelemetryEvent('inline.done', sharedProps, {
478-
...sharedMeasures, accepted
479-
});
480-
GenAiMetrics.recordEditAcceptance(this.otelService, 'inline_chat', accepted ? 'accepted' : 'rejected', languageId);
496+
497+
emitInlineDoneEvent(this.otelService, accepted === 1, languageId, editCount, editLineCount, interactionOutcome.kind, isNotebookDocument === 1);
498+
GenAiMetrics.recordEditAcceptance(this.otelService, 'inline_chat', accepted === 1 ? 'accepted' : 'rejected', languageId);
481499

482500
this.telemetryService.sendInternalMSFTTelemetryEvent('interactiveSessionDone', {
483501
language: languageId,
@@ -513,13 +531,6 @@ export class UserFeedbackService implements IUserFeedbackService {
513531
}
514532

515533
function reportInlineEditSurvivalEvent(res: EditSurvivalResult, sharedProps: TelemetryEventProperties | undefined, sharedMeasures: TelemetryEventMeasurements | undefined, otelService: IOTelService) {
516-
const survivalMeasures = {
517-
...sharedMeasures,
518-
survivalRateFourGram: res.fourGram,
519-
survivalRateNoRevert: res.noRevert,
520-
timeDelayMs: res.timeDelayMs,
521-
didBranchChange: res.didBranchChange ? 1 : 0,
522-
};
523534
/* __GDPR__
524535
"inline.trackEditSurvival" : {
525536
"owner": "hediet",
@@ -544,8 +555,15 @@ function reportInlineEditSurvivalEvent(res: EditSurvivalResult, sharedProps: Tel
544555
"isNotebook": { "classification": "SystemMetaData", "purpose": "FeatureInsight", "isMeasurement": true, "comment": "Whether the document is a notebook" }
545556
}
546557
*/
547-
res.telemetryService.sendMSFTTelemetryEvent('inline.trackEditSurvival', sharedProps, survivalMeasures);
548-
res.telemetryService.sendGHTelemetryEvent('inline.trackEditSurvival', sharedProps, survivalMeasures);
558+
res.telemetryService.sendMSFTTelemetryEvent('inline.trackEditSurvival', sharedProps, {
559+
...sharedMeasures,
560+
survivalRateFourGram: res.fourGram,
561+
survivalRateNoRevert: res.noRevert,
562+
timeDelayMs: res.timeDelayMs,
563+
didBranchChange: res.didBranchChange ? 1 : 0,
564+
});
565+
566+
emitEditSurvivalEvent(otelService, 'inline_chat', res.fourGram, res.noRevert, res.timeDelayMs, res.didBranchChange, String(sharedProps?.requestId ?? ''));
549567
GenAiMetrics.recordEditSurvivalFourGram(otelService, 'inline_chat', res.fourGram, res.timeDelayMs);
550568
GenAiMetrics.recordEditSurvivalNoRevert(otelService, 'inline_chat', res.noRevert, res.timeDelayMs);
551569
}

0 commit comments

Comments
 (0)