Skip to content

Fix/neo4j nested attributes serialization#1

Merged
Ataxia123 merged 45 commits into
mainfrom
fix/neo4j-nested-attributes-serialization
Apr 7, 2026
Merged

Fix/neo4j nested attributes serialization#1
Ataxia123 merged 45 commits into
mainfrom
fix/neo4j-nested-attributes-serialization

Conversation

@Ataxia123
Copy link
Copy Markdown

Summary

Brief description of the changes in this PR.

Type of Change

  • Bug fix
  • New feature
  • Performance improvement
  • Documentation/Tests

Objective

For new features and performance improvements: Clearly describe the objective and rationale for this change.

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • All existing tests pass

Breaking Changes

  • This PR contains breaking changes

If this is a breaking change, describe:

  • What functionality is affected
  • Migration path for existing users

Checklist

  • Code follows project style guidelines (make lint passes)
  • Self-review completed
  • Documentation updated where necessary
  • No secrets or sensitive information committed

Related Issues

Closes #[issue number]

prasmussen15 and others added 30 commits February 23, 2026 10:31
* fix: replace edge name with uuid in resolution debug log

Edge names can contain PII. Use UUIDs instead in the
resolve_extracted_edge debug log message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: remove PII from remaining debug logs

- nodes.py: replace entity name with uuid and char count in embedding logs
- edges.py: replace edge fact text with uuid and char count in embedding log
- community_operations.py: replace full object dump with uuid and edge count
- search/search.py: remove user query from search latency log

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: convert embedding log timing from seconds to milliseconds

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The Docker images were pinned to graphiti-core 0.23.1, which is 4 months
behind the current release. This updates all Dockerfiles and compose files
to default to 0.28.1. Also fixes the sed version-replacement patterns which
only matched >= but the pyproject.toml uses ==, so the build-arg override
was silently failing.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add GLiNER2 hybrid LLM client

Implements GLiNER2Client, a hybrid LLM client that uses GLiNER2 (lightweight extraction model) for entity and relation extraction while delegating reasoning tasks (deduplication, summarization, community operations) to a secondary LLMClient.

Key features:
- Local CPU-friendly extraction using GLiNER2
- Message parsing to extract entity types, relations, and text from Graphiti prompts
- Response-model-based dispatch (ExtractedEntities/ExtractedEdges → GLiNER2, others → delegated LLM)
- Support for both local and API-based GLiNER2 modes
- Full async integration via asyncio.to_thread()

Includes example usage in examples/gliner2/ with Neo4j integration.

Dependencies: gliner2>=1.2.0 (optional)

* fix: address code review feedback on GLiNER2 client

- Use _generate_response_with_retry() for tenacity retry support
- Case-insensitive entity matching in relation filtering
- Add DEBUG logging for filtered relations
- Remove redundant env var defaults in example
- Add docstring note about synchronous model loading
- Clarify token estimation is approximate

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add python_version>=3.11 marker to gliner2 dependency

onnxruntime 1.24.2 (transitive dep of gliner2) dropped Python 3.10
support. CI runs `uv sync --all-extras` on Python 3.10, causing all
jobs to fail. Adding a version marker ensures the gliner2 extra is
only resolved on Python 3.11+.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: expand example with longer texts and multilingual episodes

- Add detailed English political biography and mortgage settlement text
- Add Spanish, French, and Portuguese episodes with overlapping entities
  (Kamala Harris, California, San Francisco, Gavin Newsom)
- Expand JSON metadata with additional fields
- Add multiple search queries to demonstrate retrieval
- Fix pyright errors: use typing.Any for model type (GLiNER2 vs GLiNER2API)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: delegate edge extraction and summarization to LLM client

GLiNER2 extracts structured triples (head, relation_type, tail) but
cannot generate natural-language facts, temporal bounds, or proper
relation types. This produced low-quality facts like "Kamala Harris
related to San Francisco".

Now GLiNER2 only handles entity extraction (ExtractedEntities). All
other pipeline operations — edge/relation extraction, node summary,
deduplication — are delegated to the LLM client which generates proper
facts paraphrased from source text.

Removed: _handle_relation_extraction, _extract_entity_names,
_extract_relation_types, _EDGE_EXTRACTION_MODEL.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: parse Python repr entity types from prompt templates

The prompt templates interpolate entity_types as Python list[dict]
directly (str()), producing Python repr with single quotes and None
rather than valid JSON. json.loads() fails on this format.

Now tries json.loads first, then falls back to ast.literal_eval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add custom entity types, extraction latency tracking, and detailed output

- Add Person, Organization, Location, Initiative entity types to example
- Pass entity_types to add_episode() for typed GLiNER2 extraction
- Track extraction latencies in GLiNER2Client.extraction_latencies
- Print extracted entities, edges, attributes, and summaries per episode
- Print latency summary (mean/min/max/total) at end of example
- Use gpt-5.2 with reasoning='none' in example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: switch example from OpenAI to Gemini for LLM and embeddings

- Replace OpenAIClient with GeminiClient (gemini-2.5-flash-lite)
- Replace default OpenAI embedder with GeminiEmbedder (gemini-embedding-001)
- Example now uses GOOGLE_API_KEY only (no OpenAI dependency)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: sort imports in gliner2 example

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add README, raise threshold to 0.7, use gliner2-large-v1 default

- Add examples/gliner2/README.md with GLiNER2 repo, paper, and HuggingFace links
- Mark GLiNER2Client as experimental
- Document swappable LLM/embedding providers
- Raise extraction threshold from 0.5 to 0.7 to reduce spurious entities
- Switch default model to gliner2-large-v1
- Update .env.example for Gemini (GOOGLE_API_KEY)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* refresh readme content

* remove readme roadmap section

* fix readme review issues
* restore readme title block

* center readme badges
* harden search filter inputs

* validate entity node labels on save

* tighten security regression coverage
* Bump graphiti-core version to 0.28.2

Update version across pyproject.toml, MCP server, server, Docker configs, and root lock file.
MCP server and server lock files will need regeneration after PyPI publish.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert MCP server version bump until release

MCP server depends on graphiti-core from PyPI, so the version bump
should happen after the 0.28.2 release is published.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert server graphiti-core requirement bump until release

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Updates mcp-server version to 1.0.2 and bumps graphiti-core dependency to >=0.28.2 to address security vulnerability (Cypher injection hardening added in 0.28.2).

Co-authored-by: Claude Haiku 4.5 <noreply@anthropic.com>
Add a prominent 'We're Hiring!' callout to the README promoting open Engineer and Developer Relations positions at Zep, linking to the careers page.
* zep upstream

* Remove Kuzu from test infrastructure and internal Go reference

Kuzu is being deprecated — remove it from the test driver list and
all Kuzu-specific test skips. Also remove a comment referencing an
internal Go file path that should not be in the public repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dependabot Bot and others added 15 commits March 31, 2026 17:50
Bumps the uv group with 1 update in the / directory: [langchain-core](https://github.com/langchain-ai/langchain).
Bumps the uv group with 2 updates in the /mcp_server directory: [langchain-core](https://github.com/langchain-ai/langchain) and [cryptography](https://github.com/pyca/cryptography).


Updates `langchain-core` from 1.2.12 to 1.2.22
- [Release notes](https://github.com/langchain-ai/langchain/releases)
- [Commits](langchain-ai/langchain@langchain-core==1.2.12...langchain-core==1.2.22)

Updates `langchain-core` from 1.2.12 to 1.2.22
- [Release notes](https://github.com/langchain-ai/langchain/releases)
- [Commits](langchain-ai/langchain@langchain-core==1.2.12...langchain-core==1.2.22)

Updates `cryptography` from 46.0.5 to 46.0.6
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@46.0.5...46.0.6)

---
updated-dependencies:
- dependency-name: langchain-core
  dependency-version: 1.2.22
  dependency-type: indirect
  dependency-group: uv
- dependency-name: langchain-core
  dependency-version: 1.2.22
  dependency-type: indirect
  dependency-group: uv
- dependency-name: cryptography
  dependency-version: 46.0.6
  dependency-type: indirect
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* feat: add automated PR triage system with evaluation rubric

Add a Claude Code-powered PR triage workflow that evaluates incoming PRs
against Graphiti's project principles and produces structured priority
assessments. This helps maintainers quickly identify high-value PRs among
the 128+ open PRs.

Components:
- .github/prompts/pr-triage.md: Evaluation rubric covering 5 dimensions
  (category, quality, alignment, slop detection, impact) with structured
  JSON output and human-readable PR comments
- .github/workflows/pr-triage.yml: GitHub Action with three trigger modes:
  auto on PR open (pull_request_target), manual single-PR dispatch, and
  batch mode for all open PRs
- .github/scripts/setup-triage-labels.sh: One-time label creation script

Security mitigations for fork PRs:
- Uses pull_request_target (never checks out fork code)
- Reads diffs only via gh pr diff (GitHub API, text only)
- Strict tool allowlist (no arbitrary Bash execution)
- Post-step label validation removes unexpected labels
- Explicit prompt injection warnings in evaluation prompt

https://claude.ai/code/session_01VJPHGChKzqPEThSkPw7sqd

* harden PR triage: diff size limits, re-triage, injection defense, timeouts

- Add diff size limit (>5000 lines skips triage, applies needs-rfc label)
- Add synchronize trigger so updated PRs get re-triaged automatically
- Remove stale triage labels before re-evaluation
- Add --append-system-prompt with injection defense at system level
- Add --max-turns (30 for single PR, 500 for batch) to prevent runaway loops
- Add timeout-minutes: 360 to batch job
- Gate validation step behind diff size check
- Add wc to batch job allowed tools for diff size checking

https://claude.ai/code/session_01VJPHGChKzqPEThSkPw7sqd

* skip triage for maintainer PRs using fork check

Add check-fork job (same pattern as claude-code-review.yml) to skip
triage on PRs from getzep/graphiti (non-fork = maintainer). Only
fork PRs from external contributors get auto-triaged.

- Auto-trigger (pull_request_target): gated on is_fork == true
- Manual dispatch: always runs (maintainers can triage any PR)
- Batch mode: filters out PRs where headRepository is getzep/graphiti
- Uses always() so triage job runs even when check-fork is skipped
  (workflow_dispatch events skip the check-fork job)

https://claude.ai/code/session_01VJPHGChKzqPEThSkPw7sqd

* prioritize bug fixes, require RFC for all new features/integrations

Update triage rubric:
- Bug fixes to existing functionality are now top priority (HIGH)
- New features and integrations (drivers, LLM providers, embedders)
  require a linked RFC issue regardless of PR size
- PRs adding new integrations without RFC get request-rfc action
- Alignment check updated: has_rfc_if_needed applies to all features,
  not just >500 LOC PRs

https://claude.ai/code/session_01VJPHGChKzqPEThSkPw7sqd

* docs: update CONTRIBUTING.md with RFC and priority rules

- Bug fixes to existing functionality are the top priority
- All new features and integrations (drivers, LLM providers, embedders)
  require an RFC issue before submitting a PR, not just >500 LOC changes
- PRs without a linked RFC will be tagged needs-rfc and not reviewed

https://claude.ai/code/session_01VJPHGChKzqPEThSkPw7sqd

* enable code review for fork PRs via pull_request_target

Switch claude-code-review.yml from pull_request to pull_request_target
so fork PRs get automatic code review with access to ANTHROPIC_API_KEY.

Security model (same as pr-triage.yml):
- Always check out the BASE repo, never the fork
- Read diffs only via gh pr diff (GitHub API, text only)
- Strict tool allowlist (no arbitrary Bash execution)
- --append-system-prompt marks all PR content as untrusted
- --max-turns 30 to prevent runaway loops
- Explicit prompt injection warnings

Changes to claude-code-review.yml:
- pull_request -> pull_request_target (enables fork PR reviews)
- Removed check-fork job (all PRs reviewed, not just internal)
- Added concurrency group to prevent duplicate reviews
- Switched to direct_prompt with security rules
- Added tool restrictions matching triage workflow

Changes to claude-code-review-manual.yml:
- Removed unsafe `gh pr checkout` (was executing fork code)
- Now checks out base repo and reads diff via API
- Added same security hardening (tool allowlist, injection defense)
- Replaced actions/github-script with simpler gh pr comment

https://claude.ai/code/session_01VJPHGChKzqPEThSkPw7sqd

* add priority and RFC rules to triage prompt context section

The Contribution Requirements section at the top of the triage prompt
was missing the updated rules (bug fix priority, RFC for all new
features/integrations). Added them so Claude sees these rules in the
initial context, not just in the evaluation logic later.

https://claude.ai/code/session_01VJPHGChKzqPEThSkPw7sqd

---------

Co-authored-by: Claude <noreply@anthropic.com>
---
updated-dependencies:
- dependency-name: aiohttp
  dependency-version: 3.13.4
  dependency-type: indirect
  dependency-group: uv
- dependency-name: aiohttp
  dependency-version: 3.13.4
  dependency-type: indirect
  dependency-group: uv
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…#1370)

fix: use prompt instead of direct_prompt in all workflows

direct_prompt is not a valid input for anthropics/claude-code-action@v1.
The correct input is prompt. This caused all three workflows to receive
empty instructions, making Claude do nothing useful.

Fixed in:
- .github/workflows/pr-triage.yml (2 occurrences)
- .github/workflows/claude-code-review.yml (1 occurrence)
- .github/workflows/claude-code-review-manual.yml (1 occurrence)

https://claude.ai/code/session_01VJPHGChKzqPEThSkPw7sqd

Co-authored-by: Claude <noreply@anthropic.com>
…ep#1372)

Slop detection changes:
- tests-missing alone is not slop — slop is the combination of
  overarchitected code + verbose/unfocused description + no tests
- Added overarchitected and verbose-unfocused-description as signals
- Replaced boilerplate-description with more specific signal
- Updated slop-detected label to require the combination, not just 3+
  arbitrary signals

Triage comment changes:
- Added "Note to Author" section that tells the PR author exactly what
  they need to do to comply with CONTRIBUTING.md (missing RFC, missing
  tests, slop rework, etc.)
- Updated needs-rfc label to apply for features/integrations without
  RFC, not just >500 LOC

https://claude.ai/code/session_01VJPHGChKzqPEThSkPw7sqd

Co-authored-by: Claude <noreply@anthropic.com>
Neo4j was crashing when entity/edge attributes contained nested structures
(Maps of Lists, Lists of Maps) because attributes were being spread as
individual properties instead of serialized to JSON strings.

Changes:
- Serialize attributes to JSON for Neo4j (like Kuzu already does)
- Update read path to handle both JSON strings and legacy dict format
- Add integration tests for nested attribute structures
- Maintain backward compatibility with existing code

Fixes issue where LLM extraction with complex structured attributes
would cause: Neo.ClientError.Statement.TypeError - Property values
can only be of primitive types or arrays thereof.

Modified Files:
- graphiti_core/utils/bulk_utils.py: Serialize attributes for Neo4j
- graphiti_core/nodes.py: Handle JSON string attributes in read path
- graphiti_core/edges.py: Handle JSON string attributes in read path
- graphiti_core/models/nodes/node_db_queries.py: Use n.attributes for Neo4j
- graphiti_core/models/edges/edge_db_queries.py: Use e.attributes for Neo4j

New Files:
- tests/test_neo4j_nested_attributes_int.py: Integration tests
- docs/neo4j-attributes-fix.md: Comprehensive documentation
…e behavior

Issues fixed:
1. Only serialize attributes for Neo4j, not FalkorDB/Neptune
2. Maintain backward compatibility with existing Neo4j data

Changes:
- Write path: Use elif to specifically target Neo4j only
- Query path: Use COALESCE and return both n.attributes and properties(n)
- Read path: Try JSON string first, fall back to spread properties
- FalkorDB/Neptune: Restore original spread behavior

This ensures:
- New Neo4j nodes: attributes as JSON string (supports nesting)
- Old Neo4j nodes: attributes spread as properties (backward compatible)
- FalkorDB/Neptune: unchanged behavior (no breaking changes)
Pin all workflow actions to full-length commit SHAs

Pin all 44 external action references across 13 workflow files to
full-length commit SHAs for supply chain security, preventing
compromised tags from injecting malicious code. Original version
tags are preserved as inline comments for readability.

https://claude.ai/code/session_01QfWs95xMGKUKGH5ppgNGgh

Co-authored-by: Claude <noreply@anthropic.com>
@Ataxia123 Ataxia123 merged commit f50cdbc into main Apr 7, 2026
2 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants