豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

Add Dewey managed RAG pipeline example#2586

Open
lambdabaa wants to merge 2 commits intoopenai:mainfrom
lambdabaa:dewey-rag-pipeline
Open

Add Dewey managed RAG pipeline example#2586
lambdabaa wants to merge 2 commits intoopenai:mainfrom
lambdabaa:dewey-rag-pipeline

Conversation

@lambdabaa
Copy link
Copy Markdown

@lambdabaa lambdabaa commented Apr 4, 2026

Summary

Adds a notebook demonstrating how to build production document Q&A using Dewey as a managed RAG backend alongside the OpenAI Python SDK.

Dewey handles the full ingestion pipeline (PDF conversion, section extraction, chunking, embedding) behind a single API, letting developers focus on the application layer rather than infrastructure assembly.

The notebook covers:

  • Uploading PDFs (three foundational AI papers from ArXiv) to a Dewey collection
  • Waiting for async ingestion and inspecting the extracted section hierarchy
  • Hybrid BM25 + vector search (RRF) with chunk-level citation metadata
  • Section-aware retrieval: scan section titles/summaries cheaply before loading full chunk content
  • Streaming agentic research endpoint with tool-call trace and source attribution
  • BYOK (bring your own OpenAI key) for direct cost transparency
  • A RAG chat loop using Dewey retrieval + OpenAI gpt-4o-mini generation

Notebook location

examples/dewey_rag_pipeline.ipynb

Dependencies

  • meetdewey — Dewey Python SDK
  • openai — OpenAI Python SDK
  • requests — for downloading ArXiv PDFs (stdlib-only alternative available)

Both are installed via %pip install at the top of the notebook.

@lambdabaa lambdabaa force-pushed the dewey-rag-pipeline branch from f95685a to 1e64afa Compare April 4, 2026 06:28
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f95685a454

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

"cell_type": "markdown",
"metadata": {},
"source": [
"# Production Document Q&A with Dewey's Managed RAG Backend\n",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Register new notebook in registry.yaml

This commit adds examples/dewey_rag_pipeline.ipynb but does not add a matching registry.yaml entry, so the new content will not be discoverable/published on cookbook.openai.com even though the notebook exists in the repo. Please add a registry record for this path in the same change set to keep metadata in sync with content additions.

Useful? React with 👍 / 👎.

Comment thread examples/dewey_rag_pipeline.ipynb Outdated
"metadata": {},
"outputs": [],
"source": [
"%pip install meetdewey openai --quiet"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Install requests in setup cell

The setup cell installs only meetdewey and openai, but the next import cell uses requests for PDF downloads; in a clean virtual environment this will raise ModuleNotFoundError before ingestion starts. Add requests to the %pip install line (or remove the dependency) so the notebook runs end-to-end from a fresh environment.

Useful? React with 👍 / 👎.

Comment thread examples/dewey_rag_pipeline.ipynb Outdated
Comment on lines +326 to +327
"print(\"BYOK configured: Dewey will route generation through your OpenAI account.\")\n",
"print(\"Credit metering is bypassed. deep/exhaustive depths are unlocked.\")"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid claiming BYOK is configured when no API call ran

This cell only shows the provider-key creation as comments, but then unconditionally prints that BYOK is configured and deep/exhaustive are unlocked. Users who skip actual key registration will get failures later while believing setup succeeded, so this should either execute/verify the registration or print instructional text that clearly states configuration is still pending.

Useful? React with 👍 / 👎.

@lambdabaa lambdabaa force-pushed the dewey-rag-pipeline branch from 1e64afa to 2db7628 Compare April 4, 2026 06:33
Demonstrates building production document Q&A with Dewey's managed
RAG backend alongside the OpenAI Python SDK.

Covers:
- Uploading PDFs to a Dewey collection
- Hybrid BM25 + vector search with citation metadata
- Section-aware retrieval (scan titles before loading chunks)
- Streaming agentic research endpoint with source attribution
- BYOK (bring your own OpenAI key) for cost transparency
- RAG chat loop using Dewey retrieval + OpenAI generation
@lambdabaa lambdabaa force-pushed the dewey-rag-pipeline branch from 2db7628 to 785edfb Compare April 4, 2026 06:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant