Skip to content

feat(nodes): add ArangoDB db_arango node#1294

Open
kgarg2468 wants to merge 1 commit into
rocketride-org:developfrom
kgarg2468:feat/RR-1293-arango-node
Open

feat(nodes): add ArangoDB db_arango node#1294
kgarg2468 wants to merge 1 commit into
rocketride-org:developfrom
kgarg2468:feat/RR-1293-arango-node

Conversation

@kgarg2468

@kgarg2468 kgarg2468 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add db_arango — a dual ["database","tool"] node that answers natural-language questions against ArangoDB by translating them to read-only AQL via a connected LLM (mirrors db_neo4j).
  • Multi-model: document queries, graph traversals, and ArangoSearch — one read-only path. Tools get_data/get_schema/get_aql; lanes questions → [table, text, answers] with DIALECT discovery and gated EXECUTE.
  • Read-only by design: EXPLAIN-plan modification-node inspection (primary gate) + keyword blocklist (defense-in-depth); per-query row/runtime/memory caps; allow_execute off by default.

Why this fits the codebase

ArangoDB, like Neo4j, has no SQLAlchemy dialect — it's python-arango + AQL over HTTP. So this is a near-verbatim clone of db_neo4j (which subclasses the generic IGlobalBase/IInstanceBase, not the SQL DatabaseGlobalBase), with AQL/HTTP swapped for Cypher/Bolt.

What changed (10 files, purely additive — no packages/ changes)

nodes/src/nodes/db_arango/: services.json, IGlobal.py (driver lifecycle, execute/explain, multi-model reflection, caps), IInstance.py (tools + lane handlers + NL→AQL validate/retry), utils.py (read-only gate + EXPLAIN-plan inspection), __init__.py, requirements.txt (python-arango), arango.svg, README.md; plus nodes/test/test_db_arango.py (91 unit tests) and examples/db_arango.pipe.

Type

feature (new pipeline + agent-tool node) + tests + example + docs

Testing

  • Tests added — nodes/test/test_db_arango.py, 91 network-free unit tests (sys.modules stub pattern).
  • Example pipeline — examples/db_arango.pipe.
  • Tested locally — ruff clean; 91 unit + full contract suite (276) pass; live ArangoDB 3.12 harness 14/14 (reflection, graph traversal, EXPLAIN-plan read-only gate); live IDE canvas (real LLM → real DB): document query, graph traversal, aggregation, read-only refusal, and non-query fallback all verified.
  • ./builder test — ran ./builder nodes:test (1101 pass/71 skip) and nodes:test-contracts (276) without the C++ toolchain; the full engine build runs in CI.
  • No services.json test block — db-node carve-out: the test framework runs a live pipeline (real DB connect) and has no python-arango mock, so db_neo4j/db_postgres/db_mysql/db_clickhouse all omit it; the 91 unit tests carry the network-free coverage.

Checklist

  • Conventional commit messages
  • No secrets or credentials (example pipe uses ${VAR} placeholders)
  • Breaking changes — none (purely additive)
  • Docs — node README.md included

Linked Issue

Closes #1293

Summary by CodeRabbit

New Features

  • Added an ArangoDB node that turns natural-language questions into read-only queries and returns results as text, tables, or structured answers.
  • Added tools to fetch generated AQL (get_aql), query results (get_data), and reflected schema (get_schema).
  • Added optional, opt-in raw AQL execution with safety checks, configurable runtime, and maximum-row limits.

Documentation

  • Added a dedicated guide covering setup, authentication, configuration parameters, and question/execution behavior.

Tests

  • Added network-free tests for safety gating, validation flow, schema reflection, formatting, and row-limit enforcement.

Chores

  • Added node service wiring, declared the required driver dependency, and introduced a chat-to-ArangoDB example pipeline.

@github-actions github-actions Bot added docs Documentation module:nodes Python pipeline nodes labels Jun 16, 2026
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Introduces a complete db_arango node package: shared AQL safety utilities (keyword blocklist + EXPLAIN-plan modification detection), IGlobal managing ArangoDB connection lifecycle and schema reflection, IInstance translating natural-language questions to read-only AQL via a multi-attempt LLM generation loop, service registration JSON, requirements, package init, README, an example pipeline, and a stub-isolated unit test suite.

Changes

ArangoDB Node (db_arango)

Layer / File(s) Summary
AQL safety utilities and plan-based read-only gate
nodes/src/nodes/db_arango/utils.py
Defines _is_aql_safe (regex write-keyword blocklist with comment stripping), _parse_is_valid (LLM boolean normalization), _plan_nodes/_plan_is_modification (EXPLAIN-plan modification detection across multiple result shapes), and _strip_ns.
IGlobal: connection lifecycle and schema reflection
nodes/src/nodes/db_arango/IGlobal.py
Implements IGlobal with beginGlobal/endGlobal for client/db lifecycle, authentication support (token and userpass), _reflect_schema for multi-model schema enumeration (collections, indexes, graphs, views) with graceful degradation, validateConfig for save-time connectivity probe, and helpers for type inference and cursor statistics.
IGlobal: query execution with safety gates and validation
nodes/src/nodes/db_arango/IGlobal.py
Implements _run_query (safety-gated via _is_aql_safe, with runtime/memory limits and result truncation), _run_query_raw (EXECUTE path bypassing keyword gate but enforcing row cap), and _validate_query (EXPLAIN-based read-only modification detection).
IInstance: tool methods and pipeline handler
nodes/src/nodes/db_arango/IInstance.py
Implements IInstance exposing get_data, get_schema, and get_aql tool methods with proper validation and error handling; implements writeQuestions pipeline handler with branching for DIALECT, EXECUTE, and default NL question flows.
IInstance: multi-attempt AQL generation loop
nodes/src/nodes/db_arango/IInstance.py
Drives _buildAqlQuery multi-attempt loop calling _buildAqlQueryOnce to generate AQL via LLM prompts (synthesized schema context, few-shot examples, read-only constraints, LIMIT guidance), validates via _is_aql_safe and EXPLAIN, and retries with prior rejection details.
IInstance: result formatting and limit handling
nodes/src/nodes/db_arango/IInstance.py
Adds _formatResultAsMarkdown to convert query results into markdown tables (dict-keyed or row-based); adds _clamp_limit to coerce user row limits into bounded range [1, DEFAULT_MAX_EXECUTE_ROWS] with sensible defaults.
Service registration and package initialization
nodes/src/nodes/db_arango/services.json, nodes/src/nodes/db_arango/__init__.py, nodes/src/nodes/db_arango/requirements.txt
Registers the node service (protocol db_arango://, class database+tool, field schema for auth/endpoint/tuning, lane shape); installs python-arango at import time; pins driver requirement.
Documentation and example pipeline
nodes/src/nodes/db_arango/README.md, examples/db_arango.pipe
Comprehensive README covering node roles, schema reflection, safety model, QuestionType flows, tools, auth modes, connectivity probe, parameter reference; example pipeline wiring chat input through ArangoDB and Anthropic to answer/table outputs.
Unit test suite with network isolation
nodes/test/test_db_arango.py
Full test suite using sys.modules stubs for network-free execution; covers utils safety, IGlobal connection/reflection/execution, IInstance tools/pipeline/formatting, EXPLAIN-plan gating, truncation warnings, multi-model reflection normalization, and end-to-end prompt context validation.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(100, 100, 200, 0.5)
        Note over Pipeline,ArangoDB: Pipeline startup
        Pipeline->>IGlobal: beginGlobal()
        IGlobal->>ArangoDB: _open_database() with auth
        IGlobal->>ArangoDB: _reflect_schema()
        ArangoDB-->>IGlobal: graph_schema cached
    end
    rect rgba(100, 200, 100, 0.5)
        Note over User,ArangoDB: NL question flow
        User->>IInstance: writeQuestions(NL question)
        loop up to max_attempts
            IInstance->>LLM: _buildAqlQueryOnce(schema_ctx)
            LLM-->>IInstance: generated AQL (JSON)
            IInstance->>IGlobal: _validate_query() via EXPLAIN
            IGlobal->>ArangoDB: db.aql.explain(aql)
            ArangoDB-->>IGlobal: execution plan
            IGlobal-->>IInstance: (is_valid, message)
        end
        IInstance->>IGlobal: _run_query(safe AQL)
        IGlobal->>ArangoDB: execute with runtime/memory caps
        ArangoDB-->>IGlobal: cursor rows (≤ max_execute_rows)
        IGlobal-->>IInstance: truncated result list
        IInstance-->>User: emit answers + markdown table
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • jmaionchi
  • Rod-Christensen
  • stepmikhaylov

Poem

🐇 Hippity-hop through the graph database,
AQL queries in a read-only cascade!
EXPLAIN checks the plan, no writes sneak past,
Schema reflected — collections amassed.
The rabbit queries documents with glee,
Returning your answers, safe as can be! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a new ArangoDB database node to the node catalog.
Linked Issues check ✅ Passed All v1 scope requirements from #1293 are met: NL-to-AQL translation, multi-model schema reflection, three tools (get_data/get_schema/get_aql), read-only enforcement via EXPLAIN-plan inspection plus keyword blocklist, result caps, and gated EXECUTE access.
Out of Scope Changes check ✅ Passed All changes are scoped to the new db_arango node package and supporting test; no modifications to existing files or unrelated functionality.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Biome (2.5.0)
nodes/src/nodes/db_arango/services.json

File contains syntax errors that prevent linting: Line 2: Expected a property but instead found '//'.; Line 6: End of file expected; Line 6: End of file expected; Line 6: End of file expected; Line 6: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 11: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 16: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 22: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 28: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 34: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 40: End of file expected; Line 45: End of file expected; Li

... [truncated 850 characters] ...

End of file expected; Line 76: End of file expected; Line 82: End of file expected; Line 82: End of file expected; Line 83: Expected a property but instead found '// Define the values that will be merged into any profile configuration'.; Line 82: End of file expected; Line 83: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 85: End of file expected; Line 87: End of file expected; Line 87: End of file expected; Line 87: End of file expected; Line 92: End of file expected; Line 99: End of file expected; Line 99: End of file expected; Line 99: End of file expected; Line 210: End of file expected; Line 216: End of file expected; Line 216: End of file expected; Line 216: End of file expected; Line 223: End of file expected

🔧 markdownlint-cli2 (0.22.1)
nodes/src/nodes/db_arango/README.md

markdownlint-cli2 v0.22.1 (markdownlint v0.40.0)
Error: Unable to use configuration file '/coderabbit-0.markdownlint-cli2.jsonc'; ENOENT: no such file or directory, open '/coderabbit-0.markdownlint-cli2.jsonc'
at throwForConfigurationFile (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:48:9)
at readOptionsOrConfig (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:169:5)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:927:21)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:14:22 {
[cause]: Error: ENOENT: no such file or directory, open '/coderabbit-0.markdownlint-cli2.jsonc'
at async open (node:internal/fs/promises:640:25)
at async Object.readFile (node:internal/fs/promises:1287:14)
at async readOptionsOrConfig (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:141:17)
at async main (file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2.mjs:927:21)
at async file:///usr/local/lib/node_modules/markdownlint-cli2/markdownlint-cli2-bin.mjs:14:22 {
errno: -2,
code: 'ENOENT',
syscall: 'open',
path: '/coderabbit-0.markdownlint-cli2.jsonc'
}
}


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

Copy link
Copy Markdown
🤖 Internal: Discord sync marker

Auto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/IInstance.py`:
- Around line 103-106: When `_buildAqlQuery` exhausts its validation retries, it
currently returns the last payload unchanged, which causes `get_aql` to treat a
previously rejected query as valid. Identify where `_buildAqlQuery` returns
after exhausting retries and explicitly mark the result with `valid=False`
before returning it. This ensures that downstream callers like `get_data`
properly recognize exhausted retries as invalid and handle them correctly
according to the tool contract.

In `@nodes/test/test_db_arango.py`:
- Around line 164-200: The cleanup in the finally block does not restore
pre-existing db_arango modules that may have been in sys.modules before this
test fixture ran. Before creating the new db_arango package and its submodules
(utils, IGlobal, IInstance), save any existing db_arango-related entries in
sys.modules to a separate dictionary. Then in the finally block, after removing
the scaffold modules, restore these saved db_arango entries just as the code
currently restores the stubbed modules in the saved dictionary. This ensures
that if db_arango modules existed before the fixture ran, they are properly
restored afterward, preventing import-state leakage between tests in a shared
pytest session.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 23705b92-1541-4e36-9272-46da2c150e3e

📥 Commits

Reviewing files that changed from the base of the PR and between 283ec8a and f1ab60c.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg
📒 Files selected for processing (9)
  • examples/db_arango.pipe
  • nodes/src/nodes/db_arango/IGlobal.py
  • nodes/src/nodes/db_arango/IInstance.py
  • nodes/src/nodes/db_arango/README.md
  • nodes/src/nodes/db_arango/__init__.py
  • nodes/src/nodes/db_arango/requirements.txt
  • nodes/src/nodes/db_arango/services.json
  • nodes/src/nodes/db_arango/utils.py
  • nodes/test/test_db_arango.py

Comment thread nodes/src/nodes/db_arango/IInstance.py
Comment thread nodes/test/test_db_arango.py
@kgarg2468 kgarg2468 force-pushed the feat/RR-1293-arango-node branch from f1ab60c to 0ca0de3 Compare June 16, 2026 19:23
@kgarg2468

Copy link
Copy Markdown
Contributor Author

Addressed both CodeRabbit suggestions in the latest push:

  1. Exhausted EXPLAIN retries_buildAqlQuery now marks the result invalid (and carries the EXPLAIN error) once retries are exhausted, so get_aql/get_data/writeQuestions report an error instead of treating a rejected query as valid: true. Refined vs the literal suggestion so the off-topic path (non-query questions → plain-text answer) is preserved — get_aql distinguishes an EXPLAIN failure (has error) from a genuine off-topic answer.
  2. Test loader_load_node now captures and restores any pre-existing db_arango* modules, so the shared builder nodes:test session can't see leaked import state.

Added 2 unit tests for these paths. Green locally: 93 unit + 276 contract, builder nodes:test 1185 pass / 28 skip.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/IGlobal.py`:
- Around line 189-192: The row limit check in the loop occurs after appending
the document to the rows list, which allows max_rows + 1 rows to be accumulated
before raising the error. Move the limit check before the rows.append(doc)
statement so that the check prevents exceeding the limit at exactly max_rows
rows, making it consistent with the _run_query method's behavior of checking
before appending.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5cf48e8-9883-4359-af04-4108bfd796ab

📥 Commits

Reviewing files that changed from the base of the PR and between f1ab60c and 0ca0de3.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg
📒 Files selected for processing (9)
  • examples/db_arango.pipe
  • nodes/src/nodes/db_arango/IGlobal.py
  • nodes/src/nodes/db_arango/IInstance.py
  • nodes/src/nodes/db_arango/README.md
  • nodes/src/nodes/db_arango/__init__.py
  • nodes/src/nodes/db_arango/requirements.txt
  • nodes/src/nodes/db_arango/services.json
  • nodes/src/nodes/db_arango/utils.py
  • nodes/test/test_db_arango.py

Comment thread nodes/src/nodes/db_arango/IGlobal.py
@kgarg2468 kgarg2468 force-pushed the feat/RR-1293-arango-node branch from 0ca0de3 to 3939b24 Compare June 16, 2026 19:45

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

♻️ Duplicate comments (1)
nodes/test/test_db_arango.py (1)

779-781: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not codify fail-open EXPLAIN handling.

Line 779 currently treats unknown plan shapes as safe. Since IGlobal._validate_query returns valid whenever _plan_is_modification(plan) is false, malformed or unrecognized EXPLAIN responses bypass the primary read-only gate. Keep _plan_nodes() returning [] if useful, but validation should reject empty/unrecognized plan shapes.

Suggested regression
-    def test_unknown_shape_is_safe(self):
+    def test_unknown_shape_has_no_plan_nodes(self):
         assert _plan_nodes('garbage') == []
         assert _plan_is_modification(None) is False
+
+    def test_unknown_explain_shape_is_rejected_by_validation(self):
+        ig = IGlobal.__new__(IGlobal)
+        ig.db = SimpleNamespace(aql=SimpleNamespace(explain=lambda aql: 'garbage'))
+        ok, msg = ig._validate_query('FOR u IN users RETURN u')
+        assert ok is False
+        assert 'plan' in msg.lower()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/test/test_db_arango.py` around lines 779 - 781, The test at
test_unknown_shape_is_safe codifies a fail-open pattern where unrecognized or
malformed EXPLAIN responses are treated as safe. Since IGlobal._validate_query
uses _plan_is_modification to gate read-only access, returning False for
unrecognized plans (like when _plan_is_modification receives None) bypasses the
primary validation gate. Rather than treating empty or unrecognized plan shapes
as safe defaults, update the validation logic in IGlobal._validate_query to
explicitly reject empty or unrecognized plan shapes instead of passing them
through as valid. You may keep _plan_nodes returning an empty list if useful,
but ensure that the validation layer detects and rejects unrecognized EXPLAIN
responses rather than defaulting to a permissive posture.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/IGlobal.py`:
- Around line 175-195: The _run_query_raw method is missing memory_limit
protection when executing AQL queries, which leaves the EXECUTE path vulnerable
to unbounded server-side memory consumption. Add a memory_limit parameter to the
_run_query_raw method signature (similar to the existing max_runtime parameter)
and pass it to the self.db.aql.execute call to mirror the memory protection
already implemented in the _run_query method and ensure consistent resource
limits across both query execution paths.

In `@nodes/src/nodes/db_arango/IInstance.py`:
- Around line 103-113: The limit parameter is passed to the LLM via get_aql()
but is not enforced on the actual query results returned by _run_query(). After
the query execution in the try block, the rows returned should be explicitly
limited to respect the requested limit. Slice the rows list to enforce the limit
before returning, so that the response dictionary containing rows never exceeds
the caller's requested row count regardless of what the generated AQL query
returns.
- Around line 467-469: The headers for the markdown table are being derived only
from the first row using first.keys(), which silently drops any fields that
appear in later rows since ArangoDB documents can be heterogeneous. Instead of
using only the keys from the first row, collect all unique keys from all rows in
the result by iterating through the entire result list and gathering a
comprehensive set of all field names, then convert this set to the headers list.
This ensures that all columns from all rows are included in the markdown table
output.
- Around line 323-327: When the AQL safety check via _is_aql_safe(aql) fails,
the result is returned unchanged, which can allow unsafe queries to pass through
to downstream callers if the LLM set isValid=true. Modify the return statement
in the condition that checks "if not is_valid or not aql or not
_is_aql_safe(aql)" to explicitly set result['isValid'] = False before returning
the result, ensuring unsafe AQL is properly marked as invalid for downstream
processing.

In `@nodes/src/nodes/db_arango/requirements.txt`:
- Line 1: Add a version pin constraint to the `python-arango` dependency in the
requirements.txt file to ensure deterministic builds. The `depends()` call in
`__init__.py` installs this requirement at import-time, and without a version
constraint, upstream releases can introduce behavior changes across deployments.
Review the version pinning patterns used in other db node requirements files
(clickhouse, mysql, postgres) and apply the same format to pin `python-arango`
to a tested stable version.

In `@nodes/src/nodes/db_arango/utils.py`:
- Around line 39-42: The _UNSAFE_AQL regex pattern and similar patterns at the
referenced locations do not account for keywords appearing inside string
literals or quoted identifiers, causing false rejections when safe keywords
appear in literal values and potential issues when comment markers appear in
strings. To fix this, either mask all string and quoted-identifier literals
before removing comments from the AQL query, ensuring the keyword detection runs
only on actual code, or alternatively run the EXPLAIN gate on every non-raw
execution path to validate query safety. This issue applies to both the
_UNSAFE_AQL pattern definition and the similar pattern at the other referenced
location.

In `@nodes/test/test_db_arango.py`:
- Around line 340-342: The test_modification_keyword_in_comment_is_ignored
method currently only covers write keywords appearing in comments marked with
//, but does not cover the case where // appears inside string literals (such as
in URLs like "http://example.com"). Add regression test cases to
test_modification_keyword_in_comment_is_ignored that verify _is_aql_safe
correctly handles scenarios where string literals containing // patterns appear
before actual write keywords (such as INSERT, UPDATE, or REMOVE). These tests
should ensure that queries like LET url = "http://example.com" INSERT {url} INTO
users are correctly identified as unsafe, not incorrectly stripped and marked as
safe.

---

Duplicate comments:
In `@nodes/test/test_db_arango.py`:
- Around line 779-781: The test at test_unknown_shape_is_safe codifies a
fail-open pattern where unrecognized or malformed EXPLAIN responses are treated
as safe. Since IGlobal._validate_query uses _plan_is_modification to gate
read-only access, returning False for unrecognized plans (like when
_plan_is_modification receives None) bypasses the primary validation gate.
Rather than treating empty or unrecognized plan shapes as safe defaults, update
the validation logic in IGlobal._validate_query to explicitly reject empty or
unrecognized plan shapes instead of passing them through as valid. You may keep
_plan_nodes returning an empty list if useful, but ensure that the validation
layer detects and rejects unrecognized EXPLAIN responses rather than defaulting
to a permissive posture.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4daf8fbb-1e46-4f36-bd43-cd71a58e02de

📥 Commits

Reviewing files that changed from the base of the PR and between 0ca0de3 and 3939b24.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg
📒 Files selected for processing (9)
  • examples/db_arango.pipe
  • nodes/src/nodes/db_arango/IGlobal.py
  • nodes/src/nodes/db_arango/IInstance.py
  • nodes/src/nodes/db_arango/README.md
  • nodes/src/nodes/db_arango/__init__.py
  • nodes/src/nodes/db_arango/requirements.txt
  • nodes/src/nodes/db_arango/services.json
  • nodes/src/nodes/db_arango/utils.py
  • nodes/test/test_db_arango.py

Comment thread nodes/src/nodes/db_arango/IGlobal.py
Comment thread nodes/src/nodes/db_arango/IInstance.py Outdated
Comment thread nodes/src/nodes/db_arango/IInstance.py
Comment thread nodes/src/nodes/db_arango/IInstance.py
@@ -0,0 +1 @@
python-arango

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "== db_arango requirement =="
sed -n '1,40p' nodes/src/nodes/db_arango/requirements.txt

echo
echo "== import-time dependency installs in db nodes =="
rg -n --type=py 'depends\(' nodes/src/nodes -g '*/__init__.py' -C1 | rg 'db_'

echo
echo "== version spec style across db node requirements =="
fd -a requirements.txt nodes/src/nodes | rg '/db_' | while read -r f; do
  echo "---- $f ----"
  sed -n '1,80p' "$f"
  echo
done

Repository: rocketride-org/rocketride-server

Length of output: 169


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Check the actual structure of db_arango
echo "=== db_arango directory structure ==="
find nodes/src/nodes/db_arango -type f | head -20

echo
echo "=== db_arango __init__.py content ==="
if [ -f nodes/src/nodes/db_arango/__init__.py ]; then
  cat -n nodes/src/nodes/db_arango/__init__.py
fi

echo
echo "=== Search for import of python-arango or arango in db nodes ==="
rg -n 'arango|python-arango' nodes/src/nodes/db_arango -t py

echo
echo "=== All requirements.txt files in db nodes ==="
find nodes/src/nodes -name 'requirements.txt' -path '*/db_*' | sort

Repository: rocketride-org/rocketride-server

Length of output: 4652


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "=== Requirements content for all db nodes ==="
for req_file in nodes/src/nodes/db_*/requirements.txt; do
  echo "---- $req_file ----"
  cat "$req_file"
  echo
done

Repository: rocketride-org/rocketride-server

Length of output: 530


Pin python-arango to a tested version for deterministic builds.

The depends() call in __init__.py installs this requirement at import-time. Without a version constraint, upstream releases can cause behavior changes across deployments. Other db nodes (clickhouse, mysql, postgres) pin their dependencies; align with this pattern.

🧰 Tools
🪛 OSV Scanner (2.3.8)

[HIGH] 1-1: requests 2.9.2: undefined

(PYSEC-2018-28)


[HIGH] 1-1: requests 2.9.2: undefined

(PYSEC-2023-74)


[HIGH] 1-1: requests 2.9.2: Requests vulnerable to .netrc credentials leak via malicious URLs

(GHSA-9hjg-9r4m-mvj7)


[HIGH] 1-1: requests 2.9.2: Requests Session object does not verify requests after making first request with verify=False

(GHSA-9wx4-h78v-vm56)


[HIGH] 1-1: requests 2.9.2: Requests has Insecure Temp File Reuse in its extract_zipped_paths() utility function

(GHSA-gc5v-m9x4-r6x2)


[HIGH] 1-1: requests 2.9.2: Unintended leak of Proxy-Authorization header in requests

(GHSA-j8r2-6x86-q33q)


[HIGH] 1-1: requests 2.9.2: Insufficiently Protected Credentials in Requests

(GHSA-x84v-xcm2-53pg)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/db_arango/requirements.txt` at line 1, Add a version pin
constraint to the `python-arango` dependency in the requirements.txt file to
ensure deterministic builds. The `depends()` call in `__init__.py` installs this
requirement at import-time, and without a version constraint, upstream releases
can introduce behavior changes across deployments. Review the version pinning
patterns used in other db node requirements files (clickhouse, mysql, postgres)
and apply the same format to pin `python-arango` to a tested stable version.

Comment thread nodes/src/nodes/db_arango/utils.py
Comment thread nodes/test/test_db_arango.py
@kgarg2468

Copy link
Copy Markdown
Contributor Author

Latest push addresses 2 of CodeRabbit's newest batch and declines the rest with reasons:

Applied

  • memory_limit now passed on the EXECUTE path (_run_query_raw), matching _run_query.
  • Markdown table headers are now the union of keys across all rows — ArangoDB is schemaless, so later rows can carry fields the first row lacks.

Declined (intentional)

  • Pin python-arango — the repo convention is unpinned (db_neo4jneo4j, mysql, clickhouse…); the engine resolves versions via constraints.txt. Pinning would diverge from every other node.
  • String-literal-aware safety regex (+ its test) — the EXPLAIN-plan modification-node check is the accurate primary read-only gate; the keyword regex is an intentional coarse backstop (same as db_neo4j). Parsing AQL string literals adds fragile complexity for a backstop.
  • Honor get_data limit post-execution / mark unsafe-AQL invalid — small consistency tweaks that match db_neo4j's current behavior; deferred.

Green locally: 96 unit + 276 contract; live ArangoDB 3.12 harness 14/14.

@kgarg2468 kgarg2468 force-pushed the feat/RR-1293-arango-node branch from 3939b24 to 798847f Compare June 16, 2026 20:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/README.md`:
- Line 31: The word "markdown" on line 31 should be capitalized to "Markdown"
since it is a proper noun. Update the text that currently says "markdown table"
to "Markdown table" for consistency with UI and documentation style conventions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 64bae20f-0251-4482-8fbd-0373c66f4fee

📥 Commits

Reviewing files that changed from the base of the PR and between 3939b24 and 798847f.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg
📒 Files selected for processing (9)
  • examples/db_arango.pipe
  • nodes/src/nodes/db_arango/IGlobal.py
  • nodes/src/nodes/db_arango/IInstance.py
  • nodes/src/nodes/db_arango/README.md
  • nodes/src/nodes/db_arango/__init__.py
  • nodes/src/nodes/db_arango/requirements.txt
  • nodes/src/nodes/db_arango/services.json
  • nodes/src/nodes/db_arango/utils.py
  • nodes/test/test_db_arango.py

Comment thread nodes/src/nodes/db_arango/README.md Outdated
Add `db_arango`, a dual database/tool node that answers natural-language
questions against ArangoDB by translating them to read-only AQL via a
connected LLM. Mirrors `db_neo4j` (the non-SQL NL->query pattern; ArangoDB
uses python-arango + AQL over HTTP, with no SQLAlchemy dialect).

- Tools get_data/get_schema/get_aql; lanes questions -> [table, text, answers]
  with DIALECT discovery and gated raw EXECUTE handling. get_data honors the
  requested row limit.
- Multi-model schema reflection: document/edge collections with sampled fields
  and indexes, named graphs (edge definitions), and ArangoSearch views.
- Read-only enforced by EXPLAIN-plan modification-node inspection (primary) plus
  a string-literal-aware keyword blocklist (defense-in-depth); per-query
  result/runtime/memory caps on both the read and EXECUTE paths; allow_execute
  off by default. Queries that fail EXPLAIN validation or generate unsafe AQL are
  marked invalid so callers report an error rather than treating them as valid.
- Markdown output unions keys across all rows (ArangoDB is schemaless).
- Ships nodes/test/test_db_arango.py (100 network-free unit tests) and
  examples/db_arango.pipe.

Validated: ruff clean; 100 unit tests + full contract suite (276) pass; live
ArangoDB 3.12 harness 14/14 (reflection, graph traversal, explain-plan gate);
live IDE canvas (real LLM -> real DB): document query, graph traversal,
aggregation, read-only refusal, and non-query fallback all verified.
`./builder nodes:test` + `nodes:test-contracts` pass without the C++ toolchain;
the full `./builder test` (engine build) runs in CI. No services.json test block
(db-node carve-out: the test framework needs a live DB; db_neo4j/postgres/mysql/
clickhouse all omit it). Committed with --no-verify (lefthook gitleaks/ruff not
on PATH); ruff + format + secret scan run manually.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kgarg2468

Copy link
Copy Markdown
Contributor Author

Another round addressing CodeRabbit's remaining flags:

  • Safety — the keyword gate now strips string literals before scanning, closing a backstop hole where a // inside a string (e.g. a URL) could swallow a trailing write keyword and hide it. (The EXPLAIN-plan modification check remains the primary gate.)
  • get_data now honors the requested row limit after execution.
  • Unsafe generated AQL is marked invalid so callers report an error instead of emitting the query as an answer.
  • README — 'Markdown' capitalization.

+4 unit tests (now 100 unit + 276 contract, ruff clean).

Still declining one: pin python-arango — this repo pins the resolved set globally via constraints.txt, and every other node's requirements.txt is unpinned; a per-node pin would be an inconsistency a maintainer would likely reverse. Happy to add it if you'd prefer.

@kgarg2468 kgarg2468 force-pushed the feat/RR-1293-arango-node branch from 798847f to c71d37d Compare June 22, 2026 18:50

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/db_arango/services.json`:
- Around line 164-177: The schema definitions in services.json specify bounds
for arangodb.max_attempts (1-20) and arangodb.max_execute_rows (1-1000000), but
these constraints are not enforced at runtime in IGlobal.py lines 78-105. Add
runtime validation to clamp max_attempts within the [1, 20] range and enforce
the upper bound of 1000000 for max_execute_rows (note that max_execute_rows
already has a lower bound check in place). This ensures that non-UI
configurations cannot bypass the defined limits and cause excessive retry
behavior or oversized result windows.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: e22877d8-819a-411a-a156-b313f22f5933

📥 Commits

Reviewing files that changed from the base of the PR and between 798847f and c71d37d.

⛔ Files ignored due to path filters (1)
  • nodes/src/nodes/db_arango/arango.svg is excluded by !**/*.svg
📒 Files selected for processing (9)
  • examples/db_arango.pipe
  • nodes/src/nodes/db_arango/IGlobal.py
  • nodes/src/nodes/db_arango/IInstance.py
  • nodes/src/nodes/db_arango/README.md
  • nodes/src/nodes/db_arango/__init__.py
  • nodes/src/nodes/db_arango/requirements.txt
  • nodes/src/nodes/db_arango/services.json
  • nodes/src/nodes/db_arango/utils.py
  • nodes/test/test_db_arango.py

Comment on lines +164 to +177
"arangodb.max_attempts": {
"type": "integer",
"title": "Max validation attempts",
"default": 5,
"minimum": 1,
"maximum": 20,
"description": "Maximum number of times to re-ask the LLM if EXPLAIN rejects the generated AQL query"
},
"arangodb.max_execute_rows": {
"type": "integer",
"title": "Max result rows",
"default": 25000,
"minimum": 1,
"maximum": 1000000,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Enforce these limits at runtime, not only in schema metadata.

Lines 164-177 define strict bounds, but nodes/src/nodes/db_arango/IGlobal.py:78-105 currently does not clamp max_attempts to [1, 20] and only lower-bounds max_execute_rows. Non-UI configs can therefore bypass these limits and cause excessive retry/cost behavior or oversized result windows.

Suggested fix
--- a/nodes/src/nodes/db_arango/IGlobal.py
+++ b/nodes/src/nodes/db_arango/IGlobal.py
@@
-        try:
-            self.max_validation_attempts = int(config.get('max_attempts', 5))
-        except (ValueError, TypeError):
-            self.max_validation_attempts = 5
+        try:
+            self.max_validation_attempts = int(config.get('max_attempts', 5))
+        except (ValueError, TypeError):
+            self.max_validation_attempts = 5
+        self.max_validation_attempts = min(20, max(1, self.max_validation_attempts))
@@
-        try:
-            self.max_execute_rows = max(1, int(config.get('max_execute_rows', DEFAULT_MAX_EXECUTE_ROWS)))
-        except (TypeError, ValueError):
-            self.max_execute_rows = DEFAULT_MAX_EXECUTE_ROWS
+        try:
+            self.max_execute_rows = int(config.get('max_execute_rows', DEFAULT_MAX_EXECUTE_ROWS))
+        except (TypeError, ValueError):
+            self.max_execute_rows = DEFAULT_MAX_EXECUTE_ROWS
+        self.max_execute_rows = min(1_000_000, max(1, self.max_execute_rows))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/db_arango/services.json` around lines 164 - 177, The schema
definitions in services.json specify bounds for arangodb.max_attempts (1-20) and
arangodb.max_execute_rows (1-1000000), but these constraints are not enforced at
runtime in IGlobal.py lines 78-105. Add runtime validation to clamp max_attempts
within the [1, 20] range and enforce the upper bound of 1000000 for
max_execute_rows (note that max_execute_rows already has a lower bound check in
place). This ensures that non-UI configurations cannot bypass the defined limits
and cause excessive retry behavior or oversized result windows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Documentation module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an ArangoDB node (db_arango): natural-language → AQL over a multi-model database

1 participant