Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
594 changes: 594 additions & 0 deletions ipfs_datasets_py/core_operations/knowledge_graph_manager.py

Large diffs are not rendered by default.

94 changes: 94 additions & 0 deletions ipfs_datasets_py/knowledge_graphs/CHANGELOG_KNOWLEDGE_GRAPHS.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,100 @@ All notable changes to the knowledge_graphs module will be documented in this fi
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [3.22.38] - 2026-02-23

### Added — 2 new MCP server tools for KG analytics and link prediction (Session 84) — 42 tests

**`mcp_server/tools/graph_tools/graph_analytics.py`** (new MCP tool):
- `graph_analytics(kg_data, include_completion_analysis, include_quality_metrics, include_topology, max_completion_suggestions)` — comprehensive KG analytics
- Quality metrics via `KnowledgeGraphExtractor.compute_extraction_quality_metrics()`
- KG completion via `KnowledgeGraphCompleter` (missing relationships, isolated entities)
- Topology: entity/relationship type distributions + degree statistics
- Returns `status / entity_count / relationship_count / quality_metrics / missing_relationships / isolated_entities / topology`

**`mcp_server/tools/graph_tools/graph_link_predict.py`** (new MCP tool):
- `graph_link_predict(entity_a_id, entity_b_id, kg_data, layer_type, top_candidates, top_k)` — GNN link prediction
- Delegates to `GraphNeuralNetworkAdapter.link_prediction_score()`
- Optional top-k ranked candidates via cosine similarity
- Returns `status / score / prediction ("likely"/"unlikely") / top_predictions`

**`core_operations/knowledge_graph_manager.py`** (updated):
- Added `analytics()` — full analytics pipeline
- Added `link_predict()` — link prediction with optional top-k ranking

**`mcp_server/tools/graph_tools/__init__.py`** (updated):
- 22 → 24 tools; `graph_analytics` and `graph_link_predict` added to `__all__`

**`mcp_server/tools/graph_tools/README.md`** (updated):
- 2 new rows for session 84 tools

## [3.22.37] - 2026-02-23

### Added — 3 new MCP server tools for GNN, ZKP, and Federation (Session 83) — 48 tests

**`mcp_server/tools/graph_tools/graph_gnn_embed.py`** (new MCP tool):
- `graph_gnn_embed(kg_data, entity_ids, top_k_similar, layer_type, embedding_dim, num_layers)` — compute GNN node embeddings
- Delegates to `KnowledgeGraphManager.gnn_embed()` → `GraphNeuralNetworkAdapter`
- Supports layer types: `"graph_conv"` / `"graph_sage"` / `"graph_attention"`
- Returns `status / entity_count / embedding_dim / layer_type / embeddings / similar`

**`mcp_server/tools/graph_tools/graph_zkp_prove.py`** (new MCP tool):
- `graph_zkp_prove(proof_type, entity_type, entity_name, ...)` — generate ZK proofs
- Proof types: `entity_exists` / `entity_property` / `path_exists` / `query_answer_count`
- Optional `build_tdfol_witness=True` to also generate TDFOL_v1 witness dict
- Returns `status / proof_type / proof / valid / tdfol_witness`

**`mcp_server/tools/graph_tools/graph_federate_query.py`** (new MCP tool):
- `graph_federate_query(graphs, query_entity_name, resolution_strategy, merge, ...)` — query across federated KGs
- Delegates to `KnowledgeGraphManager.federate_query()` → `FederatedKnowledgeGraph`
- Strategies: `"type_and_name"` / `"exact_name"` / `"property_match"`
- Returns `status / graph_count / entity_matches / query_hits / merged_entity_count`

**`core_operations/knowledge_graph_manager.py`** (updated):
- Added `gnn_embed()` — compute GNN embeddings + similar entities
- Added `zkp_prove()` — generate ZK proofs with optional TDFOL_v1 witness
- Added `federate_query()` — cross-graph entity resolution and query

**`mcp_server/tools/graph_tools/__init__.py`** (updated):
- 19 → 22 tools; `graph_gnn_embed`, `graph_zkp_prove`, `graph_federate_query` added to `__all__`

**`mcp_server/tools/graph_tools/README.md`** (updated):
- 3 new rows for session 83 tools

## [3.22.36] - 2026-02-23

### Added — TDFOL_v1 Witness Builder for Groth16 backend (Session 82) — 50 tests

**`query/groth16_kg_witness.py`** (new module):
- `KGAtomEncoder(max_length=64)` — normalize arbitrary KG strings to valid
single-word TDFOL_v1 atoms required by the Groth16 Rust backend
(`processors/groth16_backend`). Core `normalize(s)` + domain-specific encoders:
`encode_entity_type`, `encode_name`, `encode_relationship_type`,
`encode_entity_id`, `encode_property_key`. Compound atoms:
`atom_for_entity`, `atom_for_entity_exists`, `atom_for_path_exists`,
`atom_for_entity_property`.
- `KGWitnessBuilder(circuit_version=1, ruleset_id="TDFOL_v1")` — build complete
TDFOL_v1 witness input dicts compatible with `WitnessInput` struct:
- `entity_exists(entity_type, name, entity_id, confidence)` → witness proving a
named entity exists without revealing its ID
- `path_exists(path_ids, rel_types, start_type, end_type)` → witness proving a
path exists without revealing node IDs
- `entity_property(entity_id, property_key, value_hash)` → witness proving an
entity has a property (value hidden behind SHA-256 hash)
- `query_answer_count(min_count, actual_count, query_type)` → witness proving
result count ≥ threshold
- All builders auto-compute `theorem_hash_hex` and `axioms_commitment_hex`
- Circuit v2: auto-generates `intermediate_steps` when not provided

**`query/groth16_bridge.py`** (updated):
- `KGEntityFormula.to_tdfol_atoms(proof_type, entity_type, name_or_end_type,
entity_id, confidence) -> dict` (new classmethod) — returns valid TDFOL_v1
single-word atoms for `entity_exists` / `path_exists` / `entity_property`
proof types using `KGAtomEncoder` internally.

**`query/__init__.py`** (updated):
- `KGAtomEncoder` and `KGWitnessBuilder` exported + added to `__all__`.

## [3.22.35] - 2026-02-23

### Added — 5 new MCP server tools for query/extraction features (Session 81) — 42 tests
Expand Down
154 changes: 154 additions & 0 deletions ipfs_datasets_py/knowledge_graphs/DEFERRED_FEATURES.md
Original file line number Diff line number Diff line change
Expand Up @@ -688,6 +688,160 @@ assert kg.list_snapshots() == ["before_merge"]

---

## P15: Delivered in v3.22.38 (KG analytics and link prediction MCP tools)

### 32. Graph Analytics MCP Tool

**Status:** ✅ Implemented (v3.22.38 — 2026-02-23)
**Location:** `mcp_server/tools/graph_tools/graph_analytics.py` + `core_operations/knowledge_graph_manager.KnowledgeGraphManager.analytics()`
**Implementation:**
- `graph_analytics(kg_data, include_completion_analysis, include_quality_metrics, include_topology, max_completion_suggestions)` — comprehensive analytics
- Quality metrics via `KnowledgeGraphExtractor.compute_extraction_quality_metrics()`
- KG completion analysis via `KnowledgeGraphCompleter.find_missing_relationships()` + `find_isolated_entities()`
- Topology stats: entity/relationship type distributions, degree stats, source-only + sink-only counts

**Tests:** `tests/unit/knowledge_graphs/test_master_status_session84.py`

---

### 33. Graph Link Prediction MCP Tool

**Status:** ✅ Implemented (v3.22.38 — 2026-02-23)
**Location:** `mcp_server/tools/graph_tools/graph_link_predict.py` + `core_operations/knowledge_graph_manager.KnowledgeGraphManager.link_predict()`
**Implementation:**
- `graph_link_predict(entity_a_id, entity_b_id, kg_data, layer_type, top_candidates, top_k)` — GNN link prediction
- Delegates to `GraphNeuralNetworkAdapter.link_prediction_score()` (cosine similarity of node embeddings)
- Optional top-k ranking: cosine similarity for each candidate vs. entity_a embedding
- Returns `score ∈ [-1, 1]` + `prediction ("likely" ≥ 0.5, "unlikely" otherwise)` + `top_predictions`

**Tests:** `tests/unit/knowledge_graphs/test_master_status_session84.py`

---

## P14: Delivered in v3.22.37 (MCP tools for GNN, ZKP, and Federation)

### 29. GNN Embed MCP Tool

**Status:** ✅ Implemented (v3.22.37 — 2026-02-23)
**Location:** `mcp_server/tools/graph_tools/graph_gnn_embed.py` + `core_operations/knowledge_graph_manager.KnowledgeGraphManager.gnn_embed()`
**Implementation:**
- `graph_gnn_embed(kg_data, entity_ids, top_k_similar, layer_type, embedding_dim, num_layers)` — compute GNN node embeddings via `GraphNeuralNetworkAdapter`
- Layer types: `GRAPH_CONV` / `GRAPH_SAGE` *(default)* / `GRAPH_ATTENTION`
- Returns per-entity embedding vectors + optional top-*k* similar entities

**Tests:** `tests/unit/knowledge_graphs/test_master_status_session83.py`

---

### 30. ZKP Prove MCP Tool

**Status:** ✅ Implemented (v3.22.37 — 2026-02-23)
**Location:** `mcp_server/tools/graph_tools/graph_zkp_prove.py` + `core_operations/knowledge_graph_manager.KnowledgeGraphManager.zkp_prove()`
**Implementation:**
- `graph_zkp_prove(proof_type, ..., build_tdfol_witness, circuit_version)` — generate ZK proofs
- Proof types: `entity_exists` / `entity_property` / `path_exists` / `query_answer_count`
- Optional `build_tdfol_witness=True` produces TDFOL_v1 witness dict for Groth16 Rust backend

**Tests:** `tests/unit/knowledge_graphs/test_master_status_session83.py`

---

### 31. Federate Query MCP Tool

**Status:** ✅ Implemented (v3.22.37 — 2026-02-23)
**Location:** `mcp_server/tools/graph_tools/graph_federate_query.py` + `core_operations/knowledge_graph_manager.KnowledgeGraphManager.federate_query()`
**Implementation:**
- `graph_federate_query(graphs, query_entity_name, resolution_strategy, merge, ...)` — cross-graph entity resolution and query
- Strategies: `"type_and_name"` *(default)* / `"exact_name"` / `"property_match"`
- Returns entity matches, query hits (name-based lookup), and optional merged graph counts

**Tests:** `tests/unit/knowledge_graphs/test_master_status_session83.py`

---

## P13: Delivered in v3.22.36 (TDFOL_v1 Witness Builder for Groth16 backend)

**Status:** ✅ Implemented (v3.22.36 — 2026-02-23)
**Location:** `query/groth16_kg_witness.py` — `KGAtomEncoder`
**Implementation:**
- `KGAtomEncoder(max_length=64)` — normalizes arbitrary Knowledge Graph strings
(entity types, names, relationship types, entity IDs, property keys) to valid
single-word TDFOL_v1 atoms accepted by the Groth16 Rust backend in
`processors/groth16_backend`.
- `normalize(s) -> str` — core normalizer: lower-case, replace invalid chars with
`_`, strip leading non-letters, truncate, fallback to `"entity"` for empty input.
- Domain-specific encoders: `encode_entity_type`, `encode_name`,
`encode_relationship_type`, `encode_entity_id`, `encode_property_key`.
- Compound atoms: `atom_for_entity(type, name)` → `"type_name"`;
`atom_for_entity_exists(type, name)` → `"type_name_exists"`;
`atom_for_path_exists(start, end)` → `"path_start_to_end_exists"`;
`atom_for_entity_property(id, key)` → `"id_has_key"`.

**Example (now works):**
```python
from ipfs_datasets_py.knowledge_graphs.query.groth16_kg_witness import KGAtomEncoder

enc = KGAtomEncoder()
enc.encode_entity_type("Person") # "person"
enc.encode_name("Acme Corp") # "acme_corp"
enc.encode_name("Alice-Jane O'Brien") # "alice_jane_o_brien"
enc.atom_for_entity_exists("Person", "Alice") # "person_alice_exists"
enc.atom_for_path_exists("Person", "Org") # "path_person_to_org_exists"
```

**Tests:** `tests/unit/knowledge_graphs/test_master_status_session82.py`

---

### 28. TDFOL_v1 Witness Builder

**Status:** ✅ Implemented (v3.22.36 — 2026-02-23)
**Location:** `query/groth16_kg_witness.py` — `KGWitnessBuilder`
**Implementation:**
- `KGWitnessBuilder(circuit_version=1, ruleset_id="TDFOL_v1", encoder=None)` —
builds complete TDFOL_v1 witness input dicts compatible with the `WitnessInput`
struct in the Groth16 Rust backend (`processors/groth16_backend`).
- `entity_exists(entity_type, name, entity_id, confidence) -> dict` — proves
existence of a named entity without revealing its ID.
- `path_exists(path_ids, rel_types, start_type, end_type) -> dict` — proves a
graph path exists without revealing node IDs.
- `entity_property(entity_id, property_key, value_hash) -> dict` — proves an
entity has a specific property value (via SHA-256 hash).
- `query_answer_count(min_count, actual_count, query_type) -> dict` — proves the
result count meets a threshold.
- All builders auto-compute `theorem_hash_hex` and `axioms_commitment_hex`.
- Circuit v2 support: auto-generates `intermediate_steps` when not provided.

**Example (now works):**
```python
from ipfs_datasets_py.knowledge_graphs.query.groth16_kg_witness import KGWitnessBuilder
import json

builder = KGWitnessBuilder()
witness = builder.entity_exists("Person", "Alice", "eid_001", confidence=0.95)
# witness["theorem"] → "person_alice_exists"
# witness["private_axioms"] → ["eid_001_is_person", "eid_001_has_name_alice", ...]
# witness["theorem_hash_hex"] → 64-char hex SHA-256
# witness is JSON-serializable and ready for the Groth16 binary

# For the real Groth16 backend (when binary is compiled):
# import os; os.environ["IPFS_DATASETS_ENABLE_GROTH16"] = "1"
# from ipfs_datasets_py.logic.zkp.backends.groth16_ffi import Groth16Backend
# backend = Groth16Backend()
# proof_json = backend.prove(json.dumps(witness))
```

**Also added:** `KGEntityFormula.to_tdfol_atoms(proof_type, entity_type,
name_or_end_type, entity_id, confidence) -> dict` — returns valid TDFOL_v1
atoms for entity_exists / path_exists / entity_property proof types, bridging
the human-readable formula strings with the single-word atom requirement.

**Tests:** `tests/unit/knowledge_graphs/test_master_status_session82.py`

---

---

## P7: Delivered in v3.22.26 (formerly v4.0+ "GraphQL API support")

### 19. GraphQL API Support
Expand Down
3 changes: 3 additions & 0 deletions ipfs_datasets_py/knowledge_graphs/IMPROVEMENT_TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@

## Session log (most recent first)

- **Session 84 (2026-02-23):** 2 new MCP server tools — `graph_analytics.py` (comprehensive KG analytics: quality metrics via `KnowledgeGraphExtractor.compute_extraction_quality_metrics()`; KG completion analysis via `KnowledgeGraphCompleter`; topology stats; `KnowledgeGraphManager.analytics()`) + `graph_link_predict.py` (GNN link-prediction score between two entities via `GraphNeuralNetworkAdapter.link_prediction_score()`; optional top-k ranking; `KnowledgeGraphManager.link_predict()`); graph_tools/__init__.py 22→24 tools; README.md updated; DEFERRED_FEATURES P15 §32-33; 42 tests. v3.22.37→v3.22.38.
- **Session 83 (2026-02-23):** 3 new MCP server tools — `graph_gnn_embed.py` (compute GNN node embeddings via `GraphNeuralNetworkAdapter`; GRAPH_CONV/SAGE/ATTENTION; top-k similar entities; `KnowledgeGraphManager.gnn_embed()`) + `graph_zkp_prove.py` (generate ZK proofs for entity_exists/path_exists/entity_property/query_answer_count; optional TDFOL_v1 witness build via `KGWitnessBuilder`; `KnowledgeGraphManager.zkp_prove()`) + `graph_federate_query.py` (cross-graph entity resolution + entity lookup + merge via `FederatedKnowledgeGraph`; type_and_name/exact_name/property_match strategies; `KnowledgeGraphManager.federate_query()`); graph_tools/__init__.py 19→22 tools; README.md updated; DEFERRED_FEATURES P14 §29–31; 48 tests. v3.22.36→v3.22.37.
- **Session 82 (2026-02-23):** TDFOL_v1 witness builder — `query/groth16_kg_witness.py` (`KGAtomEncoder`: normalize KG strings to valid single-word TDFOL_v1 atoms via `normalize()`/`encode_entity_type()`/`encode_name()`/`encode_relationship_type()`/`encode_entity_id()`/`encode_property_key()`/`atom_for_entity()`/`atom_for_entity_exists()`/`atom_for_path_exists()`/`atom_for_entity_property()`; `KGWitnessBuilder`: build complete TDFOL_v1 witness input dicts for `entity_exists`/`path_exists`/`entity_property`/`query_answer_count` proofs; auto-computes `theorem_hash_hex`+`axioms_commitment_hex`; circuit v2 support); `KGEntityFormula.to_tdfol_atoms()` classmethod added to `groth16_bridge.py`; `query/__init__.py` + `__all__` updated; DEFERRED_FEATURES P13 §27+§28; 80 tests. v3.22.35→v3.22.36.
- **Session 81 (2026-02-23):** 5 new MCP server tools exposing query/extraction features — `graph_graphql_query.py` (execute GraphQL query via `KnowledgeGraphQLExecutor`) + `graph_visualize.py` (DOT/Mermaid/D3 JSON/ASCII via `KnowledgeGraphVisualizer`) + `graph_complete_suggestions.py` (missing-relationship suggestions via `KnowledgeGraphCompleter`) + `graph_explain.py` (explainable-AI entity/relationship/path/why_connected via `QueryExplainer`) + `graph_provenance_verify.py` (tamper-detection via `ProvenanceChain.verify_chain()`); 5 new `KnowledgeGraphManager` async methods; `graph_tools/__init__.py` + `README.md` updated (11→19 tools); 42 tests. v3.22.34→v3.22.35.
- **Session 80 (2026-02-23):** ROADMAP Research Areas delivered — `query/completion.py` (`KnowledgeGraphCompleter`; 6 structural completion patterns: triadic closure, common neighbour, symmetric relation, transitive relation, inverse relation, type compatibility; `CompletionSuggestion` + `CompletionReason`; `find_missing_relationships`/`find_isolated_entities`/`compute_completion_score`/`explain_suggestion`) + `query/explanation.py` (`QueryExplainer`; `explain_entity`/`explain_relationship`/`explain_path`/`explain_query_result`/`why_connected`/`entity_importance_score`; `EntityExplanation`/`RelationshipExplanation`/`PathExplanation`/`ExplanationDepth` dataclasses); 8 new symbols in `query/__init__.py`; DEFERRED_FEATURES P12 §25+§26; 52 tests. v3.22.33→v3.22.34.
- **Session 79 (2026-02-23):** Comprehensive documentation update — `query/README.md` v2.1.0→v3.22.33: added 5 new module rows (graphql.py/federation.py/gnn.py/zkp.py/groth16_bridge.py) + "Advanced Query Features" code examples for each + "Recent Additions" table; stale "Future Enhancements" (listing GraphQL as future) removed. `docs/knowledge_graphs/API_REFERENCE.md` v3.22.22→v3.22.33: new "Advanced Extraction APIs" section (KGDiff/GraphEvents/Snapshots/ProvenanceChain/Visualizer) + new "Advanced Query APIs" section (GraphQL/FederatedKG/GNN/ZKP/Groth16) with full examples; ToC updated. `docs/knowledge_graphs/USER_GUIDE.md` v2.0.0→v3.22.33: §11 "Future Roadmap" (features listed as planned for Q2-Q1 2027) → "Delivered Features (v3.22.x)" (15-row delivery table all ✅ + usage examples; stale Experimental Features block removed). 46 doc integrity tests. v3.22.32→v3.22.33.
Expand Down
8 changes: 4 additions & 4 deletions ipfs_datasets_py/knowledge_graphs/MASTER_STATUS.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Knowledge Graphs Module - Master Status Document

**Version:** 3.22.35
**Version:** 3.22.38
**Status:** ✅ Production Ready
**Last Updated:** 2026-02-23 (session 81)
**Last Major Release:** v3.22.35 (session 81: 5 new MCP server tools for query/extraction features — `graph_graphql_query` / `graph_visualize` / `graph_complete_suggestions` / `graph_explain` / `graph_provenance_verify`; 5 new `KnowledgeGraphManager` methods; `graph_tools/__init__.py` + `README.md` updated; 42 tests)
**Last Updated:** 2026-02-23 (session 84)
**Last Major Release:** v3.22.38 (session 84: 2 new MCP server tools — `graph_analytics` / `graph_link_predict`; 2 new `KnowledgeGraphManager` methods; `graph_tools/__init__.py` updated 22→24 tools; `README.md` updated; 42 tests)

---

Expand All @@ -19,7 +19,7 @@
| **Folder Refactoring** | ✅ Complete | All root-level modules moved to subpackages (2026-02-20) |
| **New MCP Tools** | ✅ Complete | graph_srl_extract, graph_ontology_materialize, graph_distributed_execute, graph_graphql_query, graph_visualize, graph_complete_suggestions, graph_explain, graph_provenance_verify |
| **Test Coverage** | **99.99% (1 missed line)** | Session 58: 3,759 pass, 2 skip, **0 fail** (full dep env); 1 missed line |
| **Documentation** | ✅ Up to Date | Reflects v3.22.35 structure |
| **Documentation** | ✅ Up to Date | Reflects v3.22.37 structure |
| **Known Issues** | None | 0 failures; all skips intentional (libipld/spaCy absent when not installed) |
| **Next Milestone** | v4.0 (2027+) | 1 missed line: `_entity_helpers.py:117` (intentional defensive guard) — 99.99% coverage |

Expand Down
Loading
Loading