hydrate_from_state stores id() of ephemeral objects in sent_items, causing false-positive dedup after GC

### Please read this first

- **Have you read the docs?** Yes — [Agents SDK docs](https://openai.github.io/openai-agents-python/)
- **Have you searched for related issues?** Yes — no existing issues cover this.

### Describe the bug

`OpenAIServerConversationTracker.hydrate_from_state` stores `id()` of temporary dict objects in `self.sent_items`. When `original_input` is a string, `ItemHelpers.input_to_new_input_list` wraps it in a temporary `dict`. The `id()` of that dict is added to `sent_items`, but no strong reference is kept, so the dict is immediately eligible for garbage collection.

When CPython reuses the same memory address for a later allocation (e.g. a `function_call_output` dict created during HITL rejection), that new object's `id()` collides with the stale entry in `sent_items`. `prepare_input` then considers it "already sent" and drops it from the API payload, producing:

```
400 - "No tool output found for function call <call_id>."
```

The bug is in `oai_conversation.py`, inside `hydrate_from_state`:

```python
for item in ItemHelpers.input_to_new_input_list(normalized_input):
    if item is None:
        continue
    self.sent_items.add(id(item))  # ← id() of ephemeral dict; no reference kept
```

`sent_initial_input` / `remaining_initial_input` already control whether the original user input is replayed, so these transient IDs don't appear necessary for correctness. The cleanest fix would be to not add them to `sent_items` in the first place.

### Debug information
- Agents SDK version: `v0.13.1`
- Python version: Python 3.12.8

### Repro steps

Self-contained script — no API calls or network required, runs in < 1 second:

```python
"""
Deterministic repro: OpenAIServerConversationTracker.hydrate_from_state stores
id() of ephemeral objects in sent_items, causing false-positive dedup after GC.

No API calls or network required. Runs in < 1 second.

Bug: When original_input is a string, hydrate_from_state converts it via
ItemHelpers.input_to_new_input_list into a temporary dict, stores id(dict) in
sent_items, then lets the dict be garbage collected. Any later dict allocated at
the same address (e.g. the rejection function_call_output) is wrongly considered
"already sent" and dropped from the API payload.

SDK version: openai-agents 0.13.1
File: agents/run_internal/oai_conversation.py, hydrate_from_state
"""

import gc
import sys

from openai.types.responses import ResponseFunctionToolCall, ResponseReasoningItem
from openai.types.responses.response_reasoning_item import Summary

from agents.items import (
    ModelResponse,
    ReasoningItem,
    ToolApprovalItem,
    ToolCallItem,
    ToolCallOutputItem,
)
from agents.usage import Usage
from agents.run_internal.oai_conversation import OpenAIServerConversationTracker


class FakeAgent:
    name = "fake"


agent = FakeAgent()

reasoning_obj = ResponseReasoningItem(
    id="rs_001", type="reasoning",
    summary=[Summary(text="thinking", type="summary_text")],
)
function_call_obj = ResponseFunctionToolCall(
    id="fc_001", type="function_call", call_id="call_ABC",
    name="my_tool", arguments='{"x": 1}', status="completed",
)

reasoning_raw = {
    "type": "reasoning", "id": "rs_001",
    "summary": [{"text": "thinking", "type": "summary_text"}],
}
function_call_raw = {
    "type": "function_call", "id": "fc_001", "call_id": "call_ABC",
    "name": "my_tool", "arguments": '{"x": 1}', "status": "completed",
}
function_call_raw_copy = dict(function_call_raw)

generated_items = [
    ReasoningItem(agent=agent, raw_item=reasoning_obj),
    ToolCallItem(agent=agent, raw_item=function_call_raw),
    ToolApprovalItem(agent=agent, raw_item=function_call_raw_copy, tool_name="my_tool"),
]

model_response = ModelResponse(
    output=[reasoning_obj, function_call_obj],
    usage=Usage(),
    response_id="resp_001",
)

# --- Step 1: Hydrate tracker (simulates a resumed run) ---
tracker = OpenAIServerConversationTracker(previous_response_id="resp_001")
tracker.hydrate_from_state(
    original_input="Do something",
    generated_items=generated_items,
    model_responses=[model_response],
)

print(f"sent_items after hydrate: {tracker.sent_items}")
print(f"  count: {len(tracker.sent_items)}")

# --- Step 2: Find the stale id (from the GC'd temp dict) ---
known_raw_ids = {id(reasoning_obj), id(function_call_raw), id(function_call_raw_copy)}
stale_ids = tracker.sent_items - known_raw_ids
print(f"\nStale ids (from GC'd temp objects): {stale_ids}")
if not stale_ids:
    print("No stale ids found (unexpected).")
    sys.exit(1)
stale_id = stale_ids.pop()
print(f"  stale id = {stale_id}")

# --- Step 3: Force-allocate a rejection dict at that exact address ---
gc.collect()
rejection_raw = None
pool = []
for i in range(500_000):
    candidate = {
        "type": "function_call_output",
        "call_id": "call_ABC",
        "output": "Rejected.",
    }
    if id(candidate) == stale_id:
        rejection_raw = candidate
        break
    pool.append(candidate)

if rejection_raw is None:
    print(f"Could not reproduce id() reuse after {i+1} attempts.")
    sys.exit(0)

print(f"\nReproduced id() reuse after {i+1} attempts")
print(f"  rejection_raw id = {id(rejection_raw)}")
print(f"  id in sent_items: {id(rejection_raw) in tracker.sent_items}")

del pool
gc.collect()

# --- Step 4: Build items as they'd appear after HITL rejection ---
rejection_item = ToolCallOutputItem(
    agent=agent, raw_item=rejection_raw, output="Rejected.",
)
items_after_resolve = [
    generated_items[0],  # reasoning
    generated_items[1],  # tool_call
    rejection_item,      # rejection output (replaces tool_approval)
]

# --- Step 5: Observe the bug ---
result = tracker.prepare_input("Do something", items_after_resolve)

print(f"\nprepare_input returned {len(result)} items")
if len(result) == 0:
    print("\n*** BUG REPRODUCED ***")
    print("The rejection function_call_output was dropped because its id()")
    print("matched a stale entry in sent_items from a GC'd temporary dict.")
    print("The API would receive 0 input items and return:")
    print('  400 - "No tool output found for function call call_ABC."')
else:
    print("\nBug not reproduced — items included correctly.")
    for idx, item in enumerate(result):
        t = item.get("type") if isinstance(item, dict) else "?"
        print(f"  [{idx}] {t}")
```

### Expected behavior

`prepare_input` should include the rejection `function_call_output` in the API payload. The item is new and was never sent to the API, so it should not be filtered by dedup.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hydrate_from_state stores id() of ephemeral objects in sent_items, causing false-positive dedup after GC #2798

Please read this first

Describe the bug

Debug information

Repro steps

Expected behavior

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

hydrate_from_state stores id() of ephemeral objects in sent_items, causing false-positive dedup after GC #2798

Description

Please read this first

Describe the bug

Debug information

Repro steps

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions