feat: add OTel events and metrics for agentic edit quality signals#4794
feat: add OTel events and metrics for agentic edit quality signals#4794
Conversation
dfb81a1 to
61db06f
Compare
… code_mapper tools
…e, 2 survival histograms)
61db06f to
de00d09
Compare
There was a problem hiding this comment.
Pull request overview
This PR backfills OpenTelemetry (OTel) metrics and log-record events for agentic edit activity/outcomes (acceptance, survival, user engagement, summarization, cloud/CLI actions), aligning OTel observability with existing product telemetry signals across inline chat, panel chat editing, tools, and cloud sessions.
Changes:
- Add new OTel metric helpers and event emitters for edit feedback/hunk actions/inline done/edit survival/user feedback and various agent/cloud counters.
- Wire OTel emission into edit tools (apply patch, replace string, code mapper), intents (agent/edit/notebook/ask agent), user action handling, cloud sessions, and CLI PR creation.
- Document the new signals in
docs/monitoring/agent_monitoring.md.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/platform/otel/common/genAiMetrics.ts | Adds additional metric instruments/helpers for agent/user/cloud outcome signals. |
| src/platform/otel/common/genAiEvents.ts | Adds new OTel log-record events for edit feedback, hunk actions, inline done, survival, user feedback, cloud invoke. |
| src/platform/otel/common/genAiAttributes.ts | Extends EditSource union to cover tool/edit surfaces (apply_patch, replace_string, code_mapper). |
| src/extension/tools/node/applyPatchTool.tsx | Emits OTel edit survival event + metrics for applyPatch survival tracking. |
| src/extension/tools/node/abstractReplaceStringTool.tsx | Emits OTel edit survival event + metrics for replaceString survival tracking. |
| src/extension/prompts/node/codeMapper/codeMapperService.ts | Emits OTel edit survival event + metrics for code mapper survival tracking. |
| src/extension/intents/node/notebookEditorIntent.ts | Threads IOTelService through invocation base constructor. |
| src/extension/intents/node/editCodeIntent2.ts | Threads IOTelService through invocation base constructor. |
| src/extension/intents/node/editCodeIntent.ts | Adds OTel agent edit response metric emission. |
| src/extension/intents/node/askAgentIntent.ts | Threads IOTelService through invocation base constructor. |
| src/extension/intents/node/agentIntent.ts | Adds OTel summarization outcome metric emission + threads IOTelService. |
| src/extension/conversation/vscode-node/userActions.ts | Emits OTel edit/user events & metrics for panel and inline actions; adjusts telemetry wiring. |
| src/extension/chatSessions/vscode-node/copilotCloudSessionsProvider.ts | Records OTel cloud session and PR-ready counters. |
| src/extension/chatSessions/copilotcli/node/copilotcliSession.ts | Records OTel PR creation counter when a PR URL is captured. |
| docs/monitoring/agent_monitoring.md | Documents new agent activity/outcome metrics and events. |
Comments suppressed due to low confidence (1)
docs/monitoring/agent_monitoring.md:253
- The documented metric attribute keys here (e.g.
outcome,edit_surface,edit_source,time_delay_ms) don’t match the attributes used by the OTel metrics in code. For exampleGenAiMetrics.recordEditAcceptanceusescopilot_chat.edit.source/copilot_chat.edit.outcome(and optionallycopilot_chat.language_id) and survival metrics usecopilot_chat.time_delay_ms. Please update the attribute lists to match the actual attribute keys/values being emitted.
**`copilot_chat.edit.accept.count` attributes:** `outcome` (`accepted`/`rejected`), `edit_surface` (`agent`/`inline_chat`)
**`copilot_chat.edit.hunk.count` attributes:** `outcome` (`accepted`/`rejected`)
**`copilot_chat.lines_of_code.count` attributes:** `type` (`added`/`removed`), `language_id`
**`copilot_chat.edit.survival_rate` attributes:** `edit_source` (`apply_patch`/`replace_string`/`code_mapper`/`inline_chat`), `time_delay_ms`
| this.telemetryService.sendMSFTTelemetryEvent('inline.done', sharedProps, { | ||
| ...sharedMeasures, accepted | ||
| }); | ||
| this.telemetryService.sendGHTelemetryEvent('inline.done', sharedProps, { | ||
| ...sharedMeasures, accepted | ||
| }); | ||
| GenAiMetrics.recordEditAcceptance(this.otelService, 'inline_chat', accepted ? 'accepted' : 'rejected', languageId); | ||
|
|
||
| emitInlineDoneEvent(this.otelService, accepted === 1, languageId, editCount, editLineCount, interactionOutcome.kind, isNotebookDocument === 1); | ||
| GenAiMetrics.recordEditAcceptance(this.otelService, 'inline_chat', accepted === 1 ? 'accepted' : 'rejected', languageId); | ||
|
|
There was a problem hiding this comment.
inline.done is still sent to MSFT + internal telemetry, but the GitHub telemetry forwarding (sendGHTelemetryEvent('inline.done', ...)) was removed. This looks like a regression from the previous behavior where GH telemetry parity existed for inline chat completion signals; please re-add GH telemetry (or document/justify the intentional removal).
| res.telemetryService.sendMSFTTelemetryEvent('inline.trackEditSurvival', sharedProps, { | ||
| ...sharedMeasures, | ||
| survivalRateFourGram: res.fourGram, | ||
| survivalRateNoRevert: res.noRevert, | ||
| timeDelayMs: res.timeDelayMs, | ||
| didBranchChange: res.didBranchChange ? 1 : 0, | ||
| }); | ||
|
|
||
| emitEditSurvivalEvent(otelService, 'inline_chat', res.fourGram, res.noRevert, res.timeDelayMs, res.didBranchChange, String(sharedProps?.requestId ?? '')); |
There was a problem hiding this comment.
inline.trackEditSurvival is now only sent to MSFT telemetry (plus OTel), but the GitHub telemetry forwarding call was removed. If GH telemetry is still expected for edit survival (as it is for other edit-survival signals like applyPatch/replaceString), please restore the GH event here to avoid losing that signal.
| @@ -1962,6 +1965,7 @@ export class CopilotCloudSessionsProvider extends Disposable implements vscode.C | |||
| partnerAgent: partnerAgent?.name ?? 'unknown', | |||
| model: modelId ?? 'unknown' | |||
| }); | |||
There was a problem hiding this comment.
This adds the copilot_chat.cloud.session.count metric, but the PR also introduces emitCloudSessionInvokeEvent and the docs describe a copilot_chat.cloud.session.invoke event; currently that event helper is never called anywhere. Consider emitting the OTel log record here alongside the MSFT telemetry (using request.id / modelId / partnerAgent).
| }); | |
| }); | |
| GenAiMetrics.emitCloudSessionInvokeEvent(this._otelService, request.id, modelId ?? 'unknown', partnerAgent?.name ?? 'unknown'); |
| static incrementLinesOfCode(otel: IOTelService, type: 'added' | 'removed', languageId: string, count: number): void { | ||
| otel.incrementCounter('copilot_chat.lines_of_code.count', count, { | ||
| 'type': type, | ||
| [CopilotChatAttr.LANGUAGE_ID]: languageId, |
There was a problem hiding this comment.
incrementLinesOfCode always attaches copilot_chat.language_id, but call sites in this PR pass document?.languageId ?? '', which will emit an empty-string label value when languageId is unavailable. Consider making languageId optional here (and omitting the attribute when undefined/empty), consistent with recordEditAcceptance/recordChatEditOutcome, or guard at call sites to avoid emitting empty label values.
| static incrementLinesOfCode(otel: IOTelService, type: 'added' | 'removed', languageId: string, count: number): void { | |
| otel.incrementCounter('copilot_chat.lines_of_code.count', count, { | |
| 'type': type, | |
| [CopilotChatAttr.LANGUAGE_ID]: languageId, | |
| static incrementLinesOfCode(otel: IOTelService, type: 'added' | 'removed', languageId: string | undefined, count: number): void { | |
| otel.incrementCounter('copilot_chat.lines_of_code.count', count, { | |
| 'type': type, | |
| ...(languageId ? { [CopilotChatAttr.LANGUAGE_ID]: languageId } : {}), |
| static incrementCommitCount(otel: IOTelService): void { | ||
| otel.incrementCounter('copilot_chat.commit.count'); | ||
| } | ||
|
|
There was a problem hiding this comment.
incrementCommitCount() is introduced here, but there are currently no call sites for it in the repo (so copilot_chat.commit.count will never be emitted). Either add the missing instrumentation at the point where the agent creates a commit, or remove this metric (and the related docs entry) until it’s wired up.
| static incrementCommitCount(otel: IOTelService): void { | |
| otel.incrementCounter('copilot_chat.commit.count'); | |
| } |
| | `copilot_chat.edit.accept.count` | Counter | edits | File-level and inline edit accept/reject decisions | | ||
| | `copilot_chat.edit.hunk.count` | Counter | hunks | Hunk-level accept/reject decisions | | ||
| | `copilot_chat.lines_of_code.count` | Counter | lines | Lines of code added/removed by accepted agent edits | | ||
| | `copilot_chat.edit.survival_rate` | Histogram | ratio (0-1) | How much AI-edited code survives over time (4-gram similarity) | | ||
| | `copilot_chat.user.action.count` | Counter | actions | User engagement: copy, insert, apply, followup | |
There was a problem hiding this comment.
The new “Agent Activity & Outcome Metrics” section documents metric names (copilot_chat.edit.accept.count, copilot_chat.edit.hunk.count, copilot_chat.edit.survival_rate) that don’t exist in the implementation. The code emits copilot_chat.edit.acceptance.count (with copilot_chat.edit.source distinguishing file vs hunk) and survival histograms copilot_chat.edit.survival.four_gram / copilot_chat.edit.survival.no_revert. Please align the doc table with the actual instrument names (or rename the instruments to match the doc).
This issue also appears on line 246 of the same file.
| { | ||
| const otelOutcome = outcomes.get(e.action.outcome) ?? 'unknown'; | ||
| emitEditFeedbackEvent(this.otelService, otelOutcome, document?.languageId ?? '', agentId, result.metadata?.responseId ?? '', 'agent', e.action.hasRemainingEdits, this.notebookService.hasSupportedNotebooks(e.action.uri)); | ||
| GenAiMetrics.recordEditAcceptance(this.otelService, 'chat_editing', otelOutcome, document?.languageId); |
There was a problem hiding this comment.
This block stopped emitting the existing copilot_chat.chat_edit.outcome.count metric (GenAiMetrics.recordChatEditOutcome), so file-level chat editing session outcomes (accepted/rejected/saved) will no longer be counted anywhere. Please restore recordChatEditOutcome here (in addition to the new event), or remove the metric/instrument entirely if it’s intentionally deprecated.
| GenAiMetrics.recordEditAcceptance(this.otelService, 'chat_editing', otelOutcome, document?.languageId); | |
| GenAiMetrics.recordEditAcceptance(this.otelService, 'chat_editing', otelOutcome, document?.languageId); | |
| GenAiMetrics.recordChatEditOutcome(this.otelService, 'chat_editing', otelOutcome as EditOutcome, document?.languageId); |
Backfill OTel events/metrics for agentic activity and outcome signals already tracked in MSFT telemetry.
edit.accept.count/edit.feedbackpanel.edit.feedbackedit.hunk.count/edit.hunk.actionedit.hunk.actionlines_of_code.countedit.hunk.action(on accept)edit.survival_rate/edit.survival*.trackEditSurvivaledit.accept.count/inline.doneinline.doneuser.action.countpanel.action.*user.feedback.count/user.feedbackpanel.action.voteagent.edit_response.countpanel.edit.codeblocksagent.summarization.counttriggerSummarizeFailed/backgroundSummarizationAppliedcloud.session.countcopilotcloud.chat.invokecloud.pr_ready.countremoteAgentJobPullRequestReadypull_request.countcreate_pull_requesttool successFiles changed:
genAiEvents.ts,genAiMetrics.ts,userActions.ts,applyPatchTool.tsx,abstractReplaceStringTool.tsx,codeMapperService.ts,editCodeIntent.ts,agentIntent.ts,copilotCloudSessionsProvider.ts,copilotcliSession.ts+ 4 intent subclass constructor threading.Docs: All new signals documented in
docs/monitoring/agent_monitoring.md(Agent Activity & Outcome Metrics/Events sections).