Skip to content

Notion MCP Server OpenAPI Schema IncompleteΒ #249

@callanwu

Description

@callanwu

πŸ› Bug Description

Summary

The @notionhq/notion-mcp-server@1.9.1 exposes tool schemas derived from an incomplete notion-openapi.json. The MCP server itself performs no schema validation β€” arguments are passed straight through to the Notion REST API β€” but the restrictive schemas in the prompt systematically penalize models that strictly follow declared tool schemas, while models that ignore schema constraints can bypass the limitations and succeed.

This means MCPMark Notion benchmark scores partly measure a model's willingness to violate tool schemas, rather than its actual task-solving capability.

Affected Component

src/agents/react_agent.py β†’ _render_tools_description() (line ~449)
src/agents/base_agent.py β†’ _create_stdio_server() (line ~173)

The tool schemas from mcp_server.list_tools() are rendered verbatim into the prompt without any correction.

Root Cause

The upstream @notionhq/notion-mcp-server package uses a notion-openapi.json that only partially describes the Notion API. The MCP server's architecture (parser.ts β†’ proxy.ts β†’ http-client.ts) faithfully converts this incomplete spec into tool schemas, but never validates arguments against them β€” it just forwards everything to the Notion REST API.

Specific Schema Issues

Issue 1 (Critical): API-patch-block-children β€” only 2 of 25+ block types declared

Schema says:

{
  "children.items.properties.type": {
    "type": "string",
    "enum": ["paragraph", "bulleted_list_item"]
  },
  "children.items.additionalProperties": false
}

API actually accepts: heading_1, heading_2, heading_3, to_do, toggle, callout, quote, divider, table, column_list, code, equation, bookmark, numbered_list_item, table_of_contents, breadcrumb, synced_block, image, video, file, pdf, audio, etc.

Also: nested rich_text items have additionalProperties: false, blocking annotations (bold, italic, color, etc.)

Impact: Models following the schema cannot create headings, callouts, dividers, toggles, or any formatted text.

Issue 2 (Critical): API-post-page children items typed as string

Schema says:

{ "children.items": { "type": "string" } }

API actually accepts: Block objects (same as patch-block-children)

Impact: Models serialize block objects as JSON strings β†’ Notion API rejects with "body.children[0] should be an object, instead was string". This causes 100% failure for all API-post-page calls with inline children.

Issue 3 (High): API-post-page parent requires page_id, doesn't declare database_id

Schema says:

{ "parent.required": ["page_id"] }

No database_id property is declared.

API actually accepts: { "database_id": "<uuid>" } (without page_id) for creating database entries.

Impact: Schema-compliant models cannot add entries to databases. They try dozens of page_id permutations (dummy UUIDs, empty strings, etc.) and exhaust their turn budget.

Issue 4 (High): API-post-page / API-patch-page properties β€” additionalProperties: false

Schema says:

{
  "properties.additionalProperties": false,
  "properties.properties": { "title": {...}, "type": {...} }
}

API actually accepts: Any property name/type (select, number, rich_text, date, checkbox, url, formula, relation, etc.)

Impact: Models can only set title properties. Cannot populate any custom database columns.

Issue 5 (Moderate): API-create-a-database properties β€” oneOf only allows title

Schema says:

{
  "properties.additionalProperties.oneOf": [{
    "required": ["title"],
    "additionalProperties": false
  }]
}

API actually accepts: Any property type: title, rich_text, number, select, multi_select, date, checkbox, url, email, phone_number, formula, relation, rollup, status, etc.

Impact: Models following the schema can only create databases with title-type properties.

Impact on Benchmark Fairness

Without this fix, the Notion benchmark conflates two unrelated capabilities:

  • Task-solving ability (understanding Notion structures, computing correct data, etc.)
  • Schema violation willingness (ignoring additionalProperties: false, required, and enum constraints)

Models that are more instruction-following (treating tool schemas as contracts) are systematically penalized, while models that treat schemas as advisory guidance are rewarded. This undermines the benchmark's ability to measure actual MCP tool-use competence.

πŸ“· Recurrence Steps

No response

🚦 Expected Behavior

No response

πŸ“ Additional Information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions