豆豆友情提示:这是一个非官方 GitHub 代理镜像,主要用于网络测试或访问加速。请勿在此进行登录、注册或处理任何敏感信息。进行这些操作请务必访问官方网站 github.com。 Raw 内容也通过此代理提供。
Skip to content

Commit 9a31ac7

Browse files
authored
docs: add a mention of evals into contributing.md (#773)
1 parent 67c1f80 commit 9a31ac7

File tree

1 file changed

+32
-0
lines changed

1 file changed

+32
-0
lines changed

CONTRIBUTING.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,3 +87,35 @@ You can use the `DEBUG` environment variable as usual to control categories that
8787
### Updating documentation
8888

8989
When adding a new tool or updating a tool name or description, make sure to run `npm run docs` to generate the tool reference documentation.
90+
91+
### Contributing to Evals
92+
93+
We use Gemini to evaluate the MCP server tools in `scripts/eval_scenarios`.
94+
Each scenario is a TypeScript file that exports a `scenario` object implementing `TestScenario`.
95+
96+
- **prompt**: The prompt to send to the model.
97+
- **maxTurns**: Maximum number of conversation turns.
98+
- **expectations**: A function that verifies the tool calls made by the model.
99+
- **htmlRoute** (Optional): Serve custom HTML content for the test at a specific path.
100+
101+
We look to test that the tools are used correctly without too rigid assertions. Avoid asserting exact argument values if they can vary (e.g., natural language reasoning), but ensure the core parameters (like URLs or selectors) were correct.
102+
103+
Example:
104+
105+
```ts
106+
import {TestScenario} from '../eval_gemini.js';
107+
108+
export const scenario: TestScenario = {
109+
prompt: 'Navigate to example.com',
110+
maxTurns: 2,
111+
expectations: calls => {
112+
// Check that at least one call was 'browse_page'
113+
const navigation = calls.find(c => c.name === 'browse_page');
114+
if (!navigation) throw new Error('Model did not browse the page');
115+
// Verify essential args
116+
if (navigation.args.url !== 'http://example.com') {
117+
throw new Error(`Wrong URL: ${navigation.args.url}`);
118+
}
119+
},
120+
};
121+
```

0 commit comments

Comments
 (0)