Please read this first
- Have you read the docs? Yes — Agents SDK docs
- Have you searched for related issues? Yes — no existing issues cover this.
Describe the bug
OpenAIServerConversationTracker.hydrate_from_state stores id() of temporary dict objects in self.sent_items. When original_input is a string, ItemHelpers.input_to_new_input_list wraps it in a temporary dict. The id() of that dict is added to sent_items, but no strong reference is kept, so the dict is immediately eligible for garbage collection.
When CPython reuses the same memory address for a later allocation (e.g. a function_call_output dict created during HITL rejection), that new object's id() collides with the stale entry in sent_items. prepare_input then considers it "already sent" and drops it from the API payload, producing:
400 - "No tool output found for function call <call_id>."
The bug is in oai_conversation.py, inside hydrate_from_state:
for item in ItemHelpers.input_to_new_input_list(normalized_input):
if item is None:
continue
self.sent_items.add(id(item)) # ← id() of ephemeral dict; no reference kept
sent_initial_input / remaining_initial_input already control whether the original user input is replayed, so these transient IDs don't appear necessary for correctness. The cleanest fix would be to not add them to sent_items in the first place.
Debug information
- Agents SDK version:
v0.13.1
- Python version: Python 3.12.8
Repro steps
Self-contained script — no API calls or network required, runs in < 1 second:
"""
Deterministic repro: OpenAIServerConversationTracker.hydrate_from_state stores
id() of ephemeral objects in sent_items, causing false-positive dedup after GC.
No API calls or network required. Runs in < 1 second.
Bug: When original_input is a string, hydrate_from_state converts it via
ItemHelpers.input_to_new_input_list into a temporary dict, stores id(dict) in
sent_items, then lets the dict be garbage collected. Any later dict allocated at
the same address (e.g. the rejection function_call_output) is wrongly considered
"already sent" and dropped from the API payload.
SDK version: openai-agents 0.13.1
File: agents/run_internal/oai_conversation.py, hydrate_from_state
"""
import gc
import sys
from openai.types.responses import ResponseFunctionToolCall, ResponseReasoningItem
from openai.types.responses.response_reasoning_item import Summary
from agents.items import (
ModelResponse,
ReasoningItem,
ToolApprovalItem,
ToolCallItem,
ToolCallOutputItem,
)
from agents.usage import Usage
from agents.run_internal.oai_conversation import OpenAIServerConversationTracker
class FakeAgent:
name = "fake"
agent = FakeAgent()
reasoning_obj = ResponseReasoningItem(
id="rs_001", type="reasoning",
summary=[Summary(text="thinking", type="summary_text")],
)
function_call_obj = ResponseFunctionToolCall(
id="fc_001", type="function_call", call_id="call_ABC",
name="my_tool", arguments='{"x": 1}', status="completed",
)
reasoning_raw = {
"type": "reasoning", "id": "rs_001",
"summary": [{"text": "thinking", "type": "summary_text"}],
}
function_call_raw = {
"type": "function_call", "id": "fc_001", "call_id": "call_ABC",
"name": "my_tool", "arguments": '{"x": 1}', "status": "completed",
}
function_call_raw_copy = dict(function_call_raw)
generated_items = [
ReasoningItem(agent=agent, raw_item=reasoning_obj),
ToolCallItem(agent=agent, raw_item=function_call_raw),
ToolApprovalItem(agent=agent, raw_item=function_call_raw_copy, tool_name="my_tool"),
]
model_response = ModelResponse(
output=[reasoning_obj, function_call_obj],
usage=Usage(),
response_id="resp_001",
)
# --- Step 1: Hydrate tracker (simulates a resumed run) ---
tracker = OpenAIServerConversationTracker(previous_response_id="resp_001")
tracker.hydrate_from_state(
original_input="Do something",
generated_items=generated_items,
model_responses=[model_response],
)
print(f"sent_items after hydrate: {tracker.sent_items}")
print(f" count: {len(tracker.sent_items)}")
# --- Step 2: Find the stale id (from the GC'd temp dict) ---
known_raw_ids = {id(reasoning_obj), id(function_call_raw), id(function_call_raw_copy)}
stale_ids = tracker.sent_items - known_raw_ids
print(f"\nStale ids (from GC'd temp objects): {stale_ids}")
if not stale_ids:
print("No stale ids found (unexpected).")
sys.exit(1)
stale_id = stale_ids.pop()
print(f" stale id = {stale_id}")
# --- Step 3: Force-allocate a rejection dict at that exact address ---
gc.collect()
rejection_raw = None
pool = []
for i in range(500_000):
candidate = {
"type": "function_call_output",
"call_id": "call_ABC",
"output": "Rejected.",
}
if id(candidate) == stale_id:
rejection_raw = candidate
break
pool.append(candidate)
if rejection_raw is None:
print(f"Could not reproduce id() reuse after {i+1} attempts.")
sys.exit(0)
print(f"\nReproduced id() reuse after {i+1} attempts")
print(f" rejection_raw id = {id(rejection_raw)}")
print(f" id in sent_items: {id(rejection_raw) in tracker.sent_items}")
del pool
gc.collect()
# --- Step 4: Build items as they'd appear after HITL rejection ---
rejection_item = ToolCallOutputItem(
agent=agent, raw_item=rejection_raw, output="Rejected.",
)
items_after_resolve = [
generated_items[0], # reasoning
generated_items[1], # tool_call
rejection_item, # rejection output (replaces tool_approval)
]
# --- Step 5: Observe the bug ---
result = tracker.prepare_input("Do something", items_after_resolve)
print(f"\nprepare_input returned {len(result)} items")
if len(result) == 0:
print("\n*** BUG REPRODUCED ***")
print("The rejection function_call_output was dropped because its id()")
print("matched a stale entry in sent_items from a GC'd temporary dict.")
print("The API would receive 0 input items and return:")
print(' 400 - "No tool output found for function call call_ABC."')
else:
print("\nBug not reproduced — items included correctly.")
for idx, item in enumerate(result):
t = item.get("type") if isinstance(item, dict) else "?"
print(f" [{idx}] {t}")
Expected behavior
prepare_input should include the rejection function_call_output in the API payload. The item is new and was never sent to the API, so it should not be filtered by dedup.
Please read this first
Describe the bug
OpenAIServerConversationTracker.hydrate_from_statestoresid()of temporary dict objects inself.sent_items. Whenoriginal_inputis a string,ItemHelpers.input_to_new_input_listwraps it in a temporarydict. Theid()of that dict is added tosent_items, but no strong reference is kept, so the dict is immediately eligible for garbage collection.When CPython reuses the same memory address for a later allocation (e.g. a
function_call_outputdict created during HITL rejection), that new object'sid()collides with the stale entry insent_items.prepare_inputthen considers it "already sent" and drops it from the API payload, producing:The bug is in
oai_conversation.py, insidehydrate_from_state:sent_initial_input/remaining_initial_inputalready control whether the original user input is replayed, so these transient IDs don't appear necessary for correctness. The cleanest fix would be to not add them tosent_itemsin the first place.Debug information
v0.13.1Repro steps
Self-contained script — no API calls or network required, runs in < 1 second:
Expected behavior
prepare_inputshould include the rejectionfunction_call_outputin the API payload. The item is new and was never sent to the API, so it should not be filtered by dedup.