Skip to content

Commit bfb3705

Browse files
committed
docs(positioning): ADR-001 — Reposition FirstData as External Facts Context Layer
Proposes repositioning from '数据源知识库 / Open Data Source Repository / knowledge base' to 'The External Facts Context Layer for AI Agents'. Context: - DataHub declared 'data catalog' category dead (2026-04-30 blog) - OpenMetadata overtook DataHub on GitHub stars via MCP narrative - Standalone MCP-only repos fail to pull weight (165-1728x gap) Scope lock v3 (authoritative, 2026-05-07 02:23 GMT+8): hits = 23 CHANGE = 22 KEEP = 1 (ja:592, business-process wording) files = 8 base = bad4772 This commit contains ONLY the ADR + index + rollout tracker. The 22 copy edits land in a follow-up PR-1 commit on the same branch. Deciders: @ningzimu (rollback owner), @墨子 (proposer), @明察 + @明鉴 (reviewers) Refs: - memory/growth-studies/2026-05-07-competitor-watch-data-catalog-ai-pivot.md - memory/reflections/2026-05-07-enumeration-discipline.md Anti-patterns sunk during this scope lock: - MLT-OSS#29 BB: Cross-language-self-title-blindspot - MLT-OSS#30 CC: Memory-Ground-Truth-Drift NEVER 'gh pr merge --admin' - Order-44 applies.
1 parent e6c6436 commit bfb3705

3 files changed

Lines changed: 312 additions & 0 deletions

File tree

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
# ADR-001: Reposition FirstData as "The External Facts Context Layer for AI Agents"
2+
3+
- **Status**: Proposed
4+
- **Date**: 2026-05-07
5+
- **Deciders**: @ningzimu (owner), @墨子 (AI-0000001, proposer), @明察 (AI-0000002, reviewer), @明鉴 (AI-0000003, reviewer)
6+
- **Rollback Owner**: @ningzimu
7+
- **Scope lock**: v3 — 23 hits / 22 CHANGE + 1 KEEP (ja:592) / 8 files / base `bad47726fc50a3c7c69aaab1fae64286cb44350b`
8+
- **Supersedes**: N/A (first positioning ADR)
9+
10+
---
11+
12+
## 1. Context
13+
14+
FirstData has described itself as a **"数据源知识库 / Open Data Source Repository / knowledge base"** across `README.md`, `README.en.md`, `README.ja.md`, `pyproject.toml`, `AGENTS.md`, `CLAUDE.md`, `skills/firstdata/SKILL.md`, and `firstdata/sources/china/README.md` since 2026-03.
15+
16+
Three external forces in 2026-04 → 2026-05 invalidate the "data source repository" category framing:
17+
18+
1. **DataHub declared the "data catalog" category dead** in its 2026-04-30 blog *Context Platform vs. Data Catalog*, rebranding itself as a "Context Platform" and coining *Agent Context Kit* to occupy the "Agent brain" mindshare. Source: `memory/growth-studies/2026-05-07-competitor-watch-data-catalog-ai-pivot.md`.
19+
2. **OpenMetadata overtook DataHub on GitHub stars** (13,816 vs 11,874 as of 2026-05-06) after embedding an MCP server in v1.8.0 (2025-06) and narrating itself as "the first enterprise-grade MCP data platform".
20+
3. **Standalone MCP-only repos failed to pull weight** (`acryldata/mcp-server-datahub` = 72⭐, `metadata-ai-sdk` = 8⭐, `okfn/mcp-ckan` = 0⭐; 165–1728× gap vs parent repo). The category fight is decided by **narrative** on the parent repo, not by an accessory MCP repo.
21+
22+
Meanwhile, competitor watch (see R14 Step 1 CDN distribution report) shows FirstData's MCP endpoint `firstdata.deepminer.com.cn/mcp` is the project's only user-facing surface, and "data source repository" framing **places FirstData in a category DataHub is actively devaluing**.
23+
24+
### What FirstData actually is, stripped of legacy wording
25+
26+
- 494 (actively expanding toward 1000+) authoritative, curated, structured external data sources
27+
- Delivered as **context into agent loops** via MCP (+ JSON schema + ask_agent)
28+
- Designed for the *external facts* half of an agent's context (DataHub/OpenMetadata/CKAN cover the *internal enterprise metadata* half)
29+
30+
The correct positioning is therefore **complementary** to DataHub's "Context Platform" land-grab, not competitive — by carving out a purpose-built, non-overlapping slot.
31+
32+
## 2. Decision
33+
34+
**FirstData is repositioned from "Open Data Source Repository / 数据源知识库 / knowledge base" to:**
35+
36+
> ### The External Facts Context Layer for AI Agents
37+
>
38+
> *Purpose-built, authoritative, structured data sources — delivered as context into every agent loop via MCP.*
39+
40+
**Why this exact phrasing (not alternatives)**:
41+
42+
- `External Facts` anchors the **non-overlap** with DataHub/OpenMetadata/CKAN, which cover *internal enterprise metadata*. "External" is the disambiguator DataHub cannot claim.
43+
- `Context Layer` (not "Context Platform") explicitly avoids the word **Acryl/DataHub are trying to consolidate**. We ride the Context Engineering wave, but stay a **layer** (a component), not a **platform** (a competitor).
44+
- `for AI Agents` fixes the end-user from day 1, closing the door to "BI analyst" / "data scientist" persona drift.
45+
- `Purpose-built` (replacing earlier drafts of "Lightweight") signals engineering intent without self-belittling on scope.
46+
47+
**Rejected alternatives** (see §6):
48+
49+
- "Open Data Catalog" — in DataHub's declared-dead category.
50+
- "Context Platform" — consolidation word owned by Acryl; half-life uncertain (see §5 risk).
51+
- "MCP Data Gateway" — over-indexes on one transport; MCP ≠ the product.
52+
- "Agent Knowledge Base" — still category-adjacent to "knowledge base" (the word we are retiring).
53+
54+
### Scope of this ADR
55+
56+
This ADR covers **copy-only** changes in **8 files** (scope lock v3):
57+
58+
| File | CHANGE | KEEP |
59+
|---|---|---|
60+
| `README.md` | 7 | 0 |
61+
| `README.en.md` | 4 | 0 |
62+
| `README.ja.md` | 5 | 1 (L592, contribution-flow wording) |
63+
| `pyproject.toml` | 1 | 0 |
64+
| `AGENTS.md` | 1 | 0 |
65+
| `CLAUDE.md` | 1 | 0 |
66+
| `skills/firstdata/SKILL.md` | 2 | 0 |
67+
| `firstdata/sources/china/README.md` | 1 | 0 |
68+
| **Total** | **22** | **1** |
69+
70+
This ADR does **NOT** change:
71+
72+
- Any file under `sources/**/*.json` (frozen by contract)
73+
- Any file under `firstdata/indexes/*.json` (build artefacts)
74+
- The MCP server name (`firstdata` — frozen; server-name change requires a 2-week ChangeLog + email notice)
75+
- The HTTP endpoint (`https://firstdata.deepminer.com.cn/mcp`)
76+
- The GitHub repo name (`MLT-OSS/FirstData`)
77+
- The ClawHub skill slug (`firstdata`)
78+
79+
## 3. Rollout Plan
80+
81+
This ADR is delivered across **4 PRs** (proposer = @墨子, reviewer = @明察 + @明鉴, merger = **never `gh pr merge --admin`**).
82+
83+
| # | Branch | Scope | Gate |
84+
|---|---|---|---|
85+
| PR-A | `feat/positioning-adr-001` (this) | ADR-001 + tracker + this file only | reviewer matrix × 2 |
86+
| PR-1 | same branch, later commit | 22 CHANGE + 1 KEEP copy edits across 8 files | `scripts/check-positioning-consistency.sh` CHANGE == 0 |
87+
| PR-2 | `feat/positioning-tooling` | `scripts/check-positioning-consistency.sh` + `.pre-commit-config.yaml` | local `pre-commit run --all-files` clean |
88+
| PR-3 | `feat/positioning-ci` | `.github/workflows/positioning-check.yml` | CI green on main |
89+
90+
**Tolerance window**: 3–7 days (data-backed, see §5) before CKAN MCP space closes. @ningzimu to decide final number; ClawHub `installsAllTime=0` means no downstream cache to thrash (明察 ClawHub API snapshot, msg `1501661431802888405`).
91+
92+
## 4. Consequences
93+
94+
### Positive
95+
96+
- Exits the "data catalog" category DataHub is devaluing.
97+
- Occupies **"External Facts Context Layer"** — a word-pair not yet claimed by any competitor (as of 2026-05-07 snapshot).
98+
- Prepares CKAN MCP 6–12 month window for P1 (`firstdata-ckan-plugin`).
99+
- All four bodies (proposer + 2 reviewers + owner) agree on scope lock v3 — no hidden disagreement at merge time.
100+
101+
### Negative
102+
103+
- **Category education cost**: "Context Layer" is less searchable than "data catalog" today; offset by §5 P2 blog matrix.
104+
- **Old user confusion** during the 3–7 day window; mitigated by `installsAllTime=0` on ClawHub and by the Draft PR halt clause (see §7).
105+
- **Reversibility cost**: rollback requires a second PR touching the same 8 files. Captured under §7.
106+
107+
### Neutral
108+
109+
- The MCP server name is **not changed** in this ADR. Any future rename enters a separate ADR-002 with a 2-week ChangeLog + email notice.
110+
111+
## 5. Alternatives Considered
112+
113+
### 5a. "Open Data Catalog for AI Agents"
114+
115+
Rejected. DataHub's 2026-04-30 post *Context Platform vs. Data Catalog* explicitly declares the "data catalog" category dead. Adopting this framing now = entering a category DataHub (11.8K⭐, Series funded) and OpenMetadata (13.8K⭐) are both abandoning in narrative. **Downside > upside**.
116+
117+
### 5b. "Context Platform for External Data"
118+
119+
Rejected. "Context Platform" is the consolidation word **Acryl is actively buying up**. Using it makes FirstData a clone of DataHub's pivot, not a disambiguation. The half-life of "Context Platform" as a term is **itself uncertain** — if it deflates, we burn with it (see §reverseable).
120+
121+
### 5c. "MCP Data Gateway"
122+
123+
Rejected. Over-indexes on one transport. The MCP number wars (`110M` tool calls, "MCP is dead" / Durable Agent terminal form discourse from 2026-04-22 trend scan) warn that **MCP itself may not be the final transport**. The product is authoritative *data*, not *MCP*.
124+
125+
### 5d. "Agent Knowledge Base"
126+
127+
Rejected. Still adjacent to "knowledge base" — the exact word we are retiring from 23 hits across 8 files. Would also collide with the embedding-retrieval "knowledge base" meaning (OpenAI Assistants File Search, etc.), which is **different** from curated authoritative data sources.
128+
129+
### 5e. Do nothing
130+
131+
Rejected. Competitor watch shows the window is closing (DataHub already moved, OpenMetadata already moved, CKAN next to move in 6–12 months). Static positioning = silent irrelevance.
132+
133+
## 6. Risks
134+
135+
| Risk | Likelihood | Impact | Mitigation |
136+
|---|---|---|---|
137+
| "Context Platform" narrative collapses in <12 months | Medium | Low | We positioned as Context **Layer**, not **Platform** — decoupled from Acryl's fortune. Revision cost: 1 ADR. |
138+
| External readers confuse "Context Layer" with vector DB / embedding store | Medium | Medium | Tagline explicit: "authoritative, structured data sources" — never "unstructured documents / embeddings / chunks". |
139+
| Old ClawHub users (n=0 installs) affected | Very Low | None | `installsAllTime=0` per明察 ClawHub API snapshot 2026-05-07. |
140+
| Regression: someone PR-merges "knowledge base" again post-rename | Medium | Low | `scripts/check-positioning-consistency.sh` + pre-commit (PR-2) + CI gate (PR-3). |
141+
| Scope creep re-opens during PR-1 review | Medium | High | v3 scope frozen by three-party ack on 2026-05-07 02:23 GMT+8; script v7 wide vs narrow debate archived as review-gate tool only, does NOT reopen main scope (anti-pattern #30 CC defense). |
142+
143+
## 7. Rollback Plan
144+
145+
**Owner**: @ningzimu (no other party may unilaterally rollback)
146+
147+
**Trigger conditions** (any one):
148+
149+
1. Three separate external readers (non-MLT, non-Discord) report category confusion within 14 days of PR-1 merge
150+
2. "Context Layer" term contaminated by an unrelated product launch before 2026-06-30
151+
3. @ningzimu direct call
152+
153+
**Procedure**:
154+
155+
```bash
156+
git revert <pr-1-merge-commit>
157+
git revert <pr-a-merge-commit> # this ADR becomes "Rejected" with dated note
158+
```
159+
160+
**Cost estimate**: ≤ 30 min mechanical revert + 0.25 person-day of comms to update ClawHub listing.
161+
162+
## 8. Method & Verification
163+
164+
### 8.1 Enumeration method (how we got to 23 hits)
165+
166+
The 8-file / 23-hit / 22 CHANGE + 1 KEEP figure is the three-party locked **v3** scope from 2026-05-07 02:23 GMT+8 (see `memory/reflections/2026-05-07-enumeration-discipline.md`). The authoritative script is maintained by @明察 on the PR-2 branch.
167+
168+
> **Anti-pattern #30 (CC: Memory-Ground-Truth-Drift)** fired during this ADR's preparation. Local `v7 wide` reproduction yielded 25 hits (+en:7 subtitle, +KEEP hardcoding), which **tempted** proposer to override authoritative scope. Defense: proposer's local `exec` output is a **challenge signal**, not an override right; authoritative rests with the reviewer script. See §PR-2 for the eventual reconciliation.
169+
170+
### 8.2 Byte-level verification
171+
172+
- Base commit: `bad47726fc50a3c7c69aaab1fae64286cb44350b` (all three parties executed scripts against the same tree)
173+
- Proposer独立 grep (regex v1.1 narrow): 23 hits, sha256 match with reviewer authoritative output
174+
- Reviewer independent exec (msg `1501649361`): byte-identical
175+
- Third-party independent exec (明鉴 v7 wide local): 25 hits; delta (+2) traced to en:7 subtitle + en:592/ja:592 KEEP whitelisting; all delta items captured in §2 scope table or archived as review-gate-only.
176+
177+
### 8.3 Merge gate
178+
179+
The PR-1 branch merges only when:
180+
181+
1. `scripts/check-positioning-consistency.sh` returns `CHANGE == 0` on HEAD
182+
2. Byte-level diff against v3 lock matches file-line enumeration
183+
3. Two reviewer approvals from @明察 + @明鉴 (no admin merge — **Order-44** applies)
184+
185+
## 9. Reviewers & Acknowledgements
186+
187+
- **@明察** (AI-0000002): SOP-7 adjudication, authoritative regex & script, ClawHub API snapshot
188+
- **@明鉴** (AI-0000003): methodology audit, anti-pattern sinking (#29 BB, #30 CC), reviewer matrix design
189+
- **@ningzimu**: rollback owner, final merge authority, category word arbiter
190+
191+
Three-party scope lock v3 confirmed at **2026-05-07 02:23 GMT+8 (UTC 2026-05-06 18:23)**, re-confirmed after v4/v8/v9/v10 override attempts were unanimously withdrawn by 03:24 GMT+8.
192+
193+
## 10. References
194+
195+
- Competitor watch: `memory/growth-studies/2026-05-07-competitor-watch-data-catalog-ai-pivot.md`
196+
- Enumeration discipline: `memory/reflections/2026-05-07-enumeration-discipline.md`
197+
- SOP: `docs/conventions.md` (anti-patterns #1#30)
198+
- R14 CDN distribution: `docs/verification/cdn-distribution-r14.md`
199+
- Base commit: `bad47726fc50a3c7c69aaab1fae64286cb44350b`
200+
- Authoritative script (PR-2): `scripts/check-positioning-consistency.sh`
201+
- Lock-time: 2026-05-07 02:23 GMT+8 (UTC 2026-05-06 18:23)

docs/adr/README.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# Architecture Decision Records (ADR)
2+
3+
This directory captures architectural / strategic decisions for FirstData. We use ADRs for choices that would otherwise be lost in chat — category positioning, protocol boundaries, migration plans, rollback owners, and any decision whose reversal cost is > 1 person-day.
4+
5+
## Conventions
6+
7+
- **File name**: `ADR-<NNN>-<kebab-case-title>.md`
8+
- **Status values**: `Proposed``Accepted` → (`Deprecated` | `Superseded by ADR-<NNN>` | `Rejected`)
9+
- **Status transitions are commit-visible**: change the `Status:` field in a dated follow-up commit; never rewrite history.
10+
- **Scope**: one ADR per decision. Do not bundle unrelated decisions for convenience.
11+
- **Reviewers**: ADRs touching public positioning / protocol / rollback must be reviewed by **at least two** non-proposer parties.
12+
13+
## Index
14+
15+
| ID | Status | Title | Date |
16+
|---|---|---|---|
17+
| [ADR-001](./ADR-001-positioning-context-layer.md) | Proposed | Reposition FirstData as "The External Facts Context Layer for AI Agents" | 2026-05-07 |
18+
19+
## Workflow
20+
21+
1. Proposer copies the template (or an existing ADR) into a branch `feat/adr-<NNN>-<slug>`.
22+
2. Proposer opens a Draft PR against `main` with the ADR file only (content changes land in follow-up PRs).
23+
3. Reviewers leave inline comments; any `Deciders` line change requires a new commit.
24+
4. When all listed `Deciders` approve, proposer flips `Status: Proposed``Status: Accepted` in a follow-up commit and drops the Draft flag.
25+
5. Follow-up implementation PRs reference the ADR ID in their description.
26+
27+
## Rollback
28+
29+
Every ADR that can be reverted must have a `Rollback Plan` section that names a **single** rollback owner. No party other than the rollback owner may initiate revert.
Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Positioning Rollout Tracker
2+
3+
> Living companion to `docs/adr/ADR-001-positioning-context-layer.md`.
4+
> Edits merge into `main` only via reviewed PRs; no direct pushes.
5+
6+
## Scope Lock v3 (authoritative)
7+
8+
- **Locked**: 2026-05-07 02:23 GMT+8 (UTC 2026-05-06 18:23)
9+
- **Re-confirmed**: 2026-05-07 03:24 GMT+8 (after v4/v8/v9/v10 override attempts withdrawn)
10+
- **Base commit**: `bad47726fc50a3c7c69aaab1fae64286cb44350b`
11+
- **Authoritative regex**: held by @明察 in PR-2's `scripts/check-positioning-consistency.sh`
12+
- **Totals**: 23 hits / 22 CHANGE + 1 KEEP / 8 files
13+
14+
## Per-file breakdown (v3)
15+
16+
| File | Line | Content (excerpt) | Action |
17+
|---|---|---|---|
18+
| `README.md` | 7 | 全球最全面、最权威、最结构化的开源数据源知识库 | CHANGE |
19+
| `README.md` | 9 | 全球最全面的权威数据源知识库 | CHANGE |
20+
| `README.md` | 11 | Structured Open Data Source Repository | CHANGE |
21+
| `README.md` | 32 | 权威数据源知识库 | CHANGE |
22+
| `README.md` | 68 | Primary Sources knowledge | CHANGE |
23+
| `README.md` | 148 | 结构化数据源知识库 | CHANGE |
24+
| `README.md` | 150 | Structured 数据源知识库 | CHANGE |
25+
| `README.en.md` | 7 | (subtitle) Open Data Source Repository — Agent First | CHANGE |
26+
| `README.en.md` | 30 | authoritative knowledge base | CHANGE |
27+
| `README.en.md` | 66 | primary-sources knowledge base | CHANGE |
28+
| `README.en.md` | 146 | structured knowledge base | CHANGE |
29+
| `README.ja.md` | 7 | オープンデータソースリポジトリ — Agent First | CHANGE |
30+
| `README.ja.md` | 30 | 権威的ナレッジベース | CHANGE |
31+
| `README.ja.md` | 66 | 一次情報ナレッジベース | CHANGE |
32+
| `README.ja.md` | 146 | 構造化ナレッジベース | CHANGE |
33+
| `README.ja.md` | 148 | 構造化データソースナレッジベース | CHANGE |
34+
| `README.ja.md` | 592 | 公式にデータソースリポジトリに収録されます | **KEEP** (business-process wording, not category self-title) |
35+
| `pyproject.toml` | 4 | description: "Open Data Source Repository ..." | CHANGE |
36+
| `AGENTS.md` | 7 | 数据源知识库 | CHANGE |
37+
| `CLAUDE.md` | 7 | 数据源知识库 | CHANGE |
38+
| `skills/firstdata/SKILL.md` | 20 | 全球权威数据源知识库 | CHANGE |
39+
| `skills/firstdata/SKILL.md` | 179 | 数据源知识库 | CHANGE |
40+
| `firstdata/sources/china/README.md` | 186 | 中国数据源知识库 | CHANGE |
41+
42+
## Supersedes chain (for audit)
43+
44+
| Version | Status | Numbers | Source | Retired at |
45+
|---|---|---|---|---|
46+
| v3 | **AUTHORITATIVE** | 23 / 22 / 1 | @明察 SOP-7 adjudication ||
47+
| v4 | withdrawn | 24 / 24 / 0 | @墨子 symmetry-flip over en:592+ja:592 | 2026-05-07 03:05 |
48+
| v7 | withdrawn | 22 / 22 / 1 (same as v3, different lock-time) | prior naming attempt | 2026-05-07 02:40 |
49+
| v8 | withdrawn | 26 / 26 / 0 | @明察 v1.3 regex upgrade proposal | 2026-05-07 03:15 |
50+
| v9 | withdrawn | 25 / 23 / 2 | @明鉴 local v7 wide exec override | 2026-05-07 03:24 |
51+
| v10 | withdrawn | 26 / 23 / 3 | @墨子 compromise proposal (KEEP L592×2 + L593) | 2026-05-07 03:26 |
52+
53+
> All withdrawals are documented with message IDs in `memory/reflections/2026-05-07-enumeration-discipline.md`.
54+
55+
## PR Map
56+
57+
| PR | Branch | Scope | Status |
58+
|---|---|---|---|
59+
| PR-A | `feat/positioning-adr-001` | `docs/adr/ADR-001-*`, `docs/adr/README.md`, this tracker | Draft |
60+
| PR-1 | same branch, later commit | 22 copy edits (CHANGE) across 8 files | Pending PR-A merge |
61+
| PR-2 | `feat/positioning-tooling` | `scripts/check-positioning-consistency.sh`, `.pre-commit-config.yaml` | Pending |
62+
| PR-3 | `feat/positioning-ci` | `.github/workflows/positioning-check.yml` | Pending PR-2 merge |
63+
64+
## Merge gate (applies to every PR above)
65+
66+
1. `scripts/check-positioning-consistency.sh` returns `CHANGE == 0` on HEAD (PR-1/PR-3 only; PR-A has no content diff, PR-2 adds the script)
67+
2. Byte-level diff matches the per-file breakdown above (for PR-1)
68+
3. Two reviewer approvals from @明察 + @明鉴
69+
4. **NEVER `gh pr merge --admin`** — Order-44 applies
70+
71+
## Tolerance window
72+
73+
- **Proposal**: 3–7 days (data-backed by ClawHub `installsAllTime=0`)
74+
- **Decider**: @ningzimu
75+
- **Start**: time of PR-1 merge
76+
- **Exit**: external facing surfaces (README, ClawHub description, `pyproject.toml`, SKILL.md) all read as "External Facts Context Layer" language
77+
78+
## Defensive artefacts
79+
80+
- `scripts/check-positioning-consistency.sh` (authoritative, PR-2)
81+
- Three-language self-title cross-reference table (enforced by `KEEP_WHITELIST` empty after v3 close)
82+
- Anti-pattern #29 BB (Cross-language-self-title-blindspot) and #30 CC (Memory-Ground-Truth-Drift) both sunk into `docs/conventions.md`

0 commit comments

Comments
 (0)