豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

Add: Eliminating mathematical hallucinations with deterministic tool use#2599

Open
michaelwinczuk wants to merge 1 commit intoopenai:mainfrom
michaelwinczuk:add-math-hallucination-elimination
Open

Add: Eliminating mathematical hallucinations with deterministic tool use#2599
michaelwinczuk wants to merge 1 commit intoopenai:mainfrom
michaelwinczuk:add-math-hallucination-elimination

Conversation

@michaelwinczuk
Copy link
Copy Markdown

Summary

A practical notebook showing how to eliminate LLM mathematical hallucinations by routing computation to SymPy via tool use.

What it covers

  • Why LLMs hallucinate math (token prediction vs computation)
  • A simple compute_math tool using SymPy
  • Side-by-side comparison: GPT-4o-mini without tools vs with SymPy tool
  • Benchmark data: 3B (55%), 7B (77%), 32B (93%), SymPy tool (100%)
  • The "separate perception from execution" pattern

Why it belongs in the cookbook

  • Directly demonstrates OpenAI's function calling / tool use feature
  • Solves a common production problem (math accuracy)
  • Complements the existing Developing_hallucination_guardrails.ipynb
  • Pattern generalizes beyond math (facts, code, dates)

Short, practical, copy-pasteable. Links to Math Swarm for the full 1,079-test, 12-category implementation.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 760ea983e3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +150 to +152
" tools=[math_tool],\n",
" tool_choice=\"auto\",\n",
" temperature=0,\n",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Force math tool invocation in deterministic path

The “WITH SYMPY TOOL (deterministic computation)” call uses tool_choice="auto", which still allows the model to skip the tool and answer from token prediction (your own else branch already handles [NO TOOL USED]). That means this path cannot guarantee deterministic/no-hallucination behavior and can regress to incorrect answers for some prompts; use a required/specific function tool choice for this section.

Useful? React with 👍 / 👎.

"cell_type": "markdown",
"metadata": {},
"source": [
"# Eliminating Mathematical Hallucinations with Deterministic Tool Use\n",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Add registry entry for the new notebook

This commit introduces a new cookbook notebook but does not update registry.yaml, so the publication pipeline will not index/render this page on cookbook.openai.com. Per this repo’s metadata workflow, new content must be added to the registry in the same change to avoid shipping an effectively hidden example.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant