Skip to content

bug: HTTP route heuristic misclassifies filesystem paths; client/server routes never join #455

@ilyabrykau-orca

Description

@ilyabrykau-orca

Describe the bug

On a Python repo (~82.6k graph nodes), route / HTTP_CALLS extraction produces false and unjoinable data:

  • 1062 / 1213 Route nodes are empty URL-string-literal stubs (no method/route_path).
  • url_path includes filesystem paths misclassified as HTTP — /root/.aws/credentials, /etc/crio/crio.conf — and str.split('/locations/') delimiters; os.remove/os.path.join are emitted as HTTP_CALLS.
  • Client and server route sets are fully disjoint: Route<-[:HANDLES]-Route<-[:HTTP_CALLS]- = 0. No client call resolves to a handled endpoint; internal URLs (http://api-admin:8001) exist only as orphan Route name strings, 0 wired.

Expected

Route extraction distinguishes HTTP URLs from filesystem paths / string-split args; client HTTP_CALLS resolve to server Routes (intra-repo at minimum).

Repro

Index any repo mixing requests/HTTP clients with os.path/file I/O, then:

MATCH (a)-[:HTTP_CALLS]->(b:Route) RETURN b.url_path

Confirmations

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingparsing/qualityGraph extraction bugs, false positives, missing edges

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions