Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions BUNDLING.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ linux/amd64`).
Releases, symlinks `codegraph` onto PATH. Re-run to upgrade; `--uninstall` to
remove.
2. **npm** ([`scripts/npm-shim.js`](scripts/npm-shim.js)) — preserves
`npm i -g @colbymchenry/codegraph`. The main package is a tiny shim; the
`npm i -g @andersonlimahw/lemon-codegraph`. The main package is a tiny shim; the
bundles ship as per-platform `optionalDependencies`
(`@colbymchenry/codegraph-<target>` with `os`/`cpu`), so npm installs only the
(`@andersonlimahw/lemon-codegraph-<target>` with `os`/`cpu`), so npm installs only the
matching one. The shim — run by the user's Node — execs the bundle, so the
real work runs on the bundled Node 24. Works even on old Node. On Windows it
invokes the bundled `node.exe` against the app entry directly (not the `.cmd`
Expand Down
141 changes: 94 additions & 47 deletions CHANGELOG.md

Large diffs are not rendered by default.

68 changes: 62 additions & 6 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Project Overview

CodeGraph is a local-first code intelligence library + CLI + MCP server. It parses any supported codebase with tree-sitter, stores symbols/edges/files in SQLite (FTS5), and exposes a knowledge graph to AI agents (Claude Code, Cursor, Codex CLI, opencode) over MCP. Per-project data lives in `.codegraph/`. Extraction is deterministic — derived from AST, not LLM-summarized.
CodeGraph (lemon-codegraph) is a fork of [colbymchenry/codegraph](https://github.com/colbymchenry/codegraph), extended for the full-stack/frontend/mobile stack. It is a local-first code intelligence library + CLI + MCP server. It parses any supported codebase with tree-sitter, stores symbols/edges/files in SQLite (FTS5), and exposes a knowledge graph to AI agents (Claude Code, Cursor, Codex CLI, opencode) over MCP. Per-project data lives in `.codegraph/`. Extraction is deterministic — derived from AST, not LLM-summarized.

Distributed as `@colbymchenry/codegraph` on npm; same binary serves as installer, indexer, and MCP server.
Distributed as `@andersonlimahw/lemon-codegraph` on npm; same binary serves as installer, indexer, and MCP server.

## Build, Test, Run

Expand Down Expand Up @@ -52,7 +52,7 @@ The public API surface is `src/index.ts` — the `CodeGraph` class wires all the
- `src/index.ts` — `CodeGraph` class: `init`/`open`/`close`, `indexAll`, `sync`, `searchNodes`, `getCallers`/`getCallees`, `getImpactRadius`, `buildContext`, `watch`/`unwatch`.
- `src/db/` — `DatabaseConnection`, `QueryBuilder` (prepared statements), `schema.sql`. Backed by `better-sqlite3` (native) when available, transparently falls back to `node-sqlite3-wasm`. `codegraph status` surfaces which backend is live; wasm is the slow path.
- `src/extraction/` — `ExtractionOrchestrator`, tree-sitter wrappers, per-language extractors under `languages/` (one file per language), plus standalone extractors for non-tree-sitter formats (`svelte-extractor.ts`, `vue-extractor.ts`, `liquid-extractor.ts`, `dfm-extractor.ts` for Delphi). `parse-worker.ts` runs heavy parsing off the main thread.
- `src/resolution/` — `ReferenceResolver` orchestrates `import-resolver.ts` (with `path-aliases.ts` for tsconfig path aliases + cargo workspace member globs), `name-matcher.ts`, and `frameworks/` (Express, Laravel, Rails, FastAPI, Django, Flask, Spring, Gin, Axum, ASP.NET, Vapor, React Router, SvelteKit, Vue/Nuxt, Cargo workspaces). Frameworks emit `route` nodes and `references` edges.
- `src/resolution/` — `ReferenceResolver` orchestrates `import-resolver.ts` (with `path-aliases.ts` for tsconfig path aliases + cargo workspace member globs), `name-matcher.ts`, and `frameworks/` (Express, NestJS, React/Next.js, Angular, Vue/Nuxt, SvelteKit, React Query, Bun/Elysia, React Native, Android, iOS/macOS, FastAPI, Spring, Gin, ASP.NET). Frameworks emit `route` and `component` nodes with `references` edges.
- `src/graph/` — `GraphTraverser` (BFS/DFS, impact radius, path finding) and `GraphQueryManager` (high-level queries).
- `src/context/` — `ContextBuilder` + formatter for markdown/JSON output.
- `src/search/` — full-text query parser and helpers for FTS5.
Expand All @@ -71,7 +71,7 @@ Defined in `src/types.ts`. Both extractors and resolvers must use these exact st

### Multi-agent installer

`src/installer/` is the entry point for `codegraph install` (and the bare `codegraph`/`npx @colbymchenry/codegraph` invocation). Architecture:
`src/installer/` is the entry point for `codegraph install` (and the bare `codegraph`/`npx @andersonlimahw/lemon-codegraph` invocation). Architecture:

- `targets/registry.ts` lists every supported agent.
- `targets/types.ts` defines the `AgentTarget` interface — adding a 5th agent (Continue, Zed, Windsurf…) is **one new file in `targets/` + one entry in `registry.ts`**. Each target owns its config-file location, MCP-server JSON/TOML/JSONC writing, and instructions-file path.
Expand Down Expand Up @@ -200,7 +200,7 @@ For any Windows-specific PR, bug, or implementation, validate it on the real Win

## Releases

Released to npm and mirrored as [GitHub Releases](https://github.com/colbymchenry/codegraph/releases). `CHANGELOG.md` is the source of truth; GitHub Release notes are extracted from it.
Released to npm and mirrored as [GitHub Releases](https://github.com/andersonlimahw/lemon-code-graph/releases). `CHANGELOG.md` is the source of truth; GitHub Release notes are extracted from it.

### Writing changelog entries

Expand All @@ -209,7 +209,7 @@ When asked for an entry for a new version:
1. Add a new `## [X.Y.Z] - YYYY-MM-DD` block at the **top** of `CHANGELOG.md` (under the intro, above the previous version).
2. Group under `### Added`, `### Changed`, `### Fixed`, `### Removed`, `### Deprecated`, `### Security` — omit empty sections.
3. Write from the **user's perspective**, not the implementation's. Lead with the observable symptom or capability; mention internals only if a user needs them (e.g., to work around an existing bad install).
4. Add the link reference at the bottom: `[X.Y.Z]: https://github.com/colbymchenry/codegraph/releases/tag/vX.Y.Z`.
4. Add the link reference at the bottom: `[X.Y.Z]: https://github.com/andersonlimahw/lemon-code-graph/releases/tag/vX.Y.Z`.

### Release flow (the user runs these)

Expand All @@ -236,8 +236,64 @@ publishes to npm. Requires the `NPM_TOKEN` repo secret.
**Do not run `npm publish`, `git push`, or `git tag` yourself** — these are
publish actions on shared state. Write the files, hand the user the commands.

## AI / LLM Development Principles

Inspired by Andrej Karpathy's guidance on working with LLMs, adapted to CodeGraph's architecture.

### The Karpathy Rules for AI-Assisted Coding

1. **Describe intent, not steps** — Tell the agent *what* outcome you want; let it figure out the *how*. "Add Angular route extraction" beats "go to frameworks/index.ts, add an import, then…"

2. **Verify, don't trust** — Run `npm test` after every AI-generated change. Never ship unverified AI output. CodeGraph's deterministic extraction makes this easy: output is AST-derived, testable, and reproducible.

3. **Small atomic diffs** — Each commit should answer one question. Big diffs hide bugs; small diffs expose them. If a change touches >5 files for a single feature, break it apart.

4. **Read the output** — Actually read what the AI wrote. A bug in a 5-line suggestion is invisible if you rubber-stamp it.

5. **Iterate fast, abandon fast** — If the first pass is wrong, discard it entirely and re-prompt with better constraints. Polishing a bad start is slower than restarting.

6. **Context window is the workspace** — The model can only use what it sees. Front-load relevant types, interfaces, and examples. CodeGraph itself is a retrieval layer — use it to feed agents exactly the symbols they need before asking them to write code.

7. **Explicit > implicit** — Spell out edge cases in prompts. "Handle null and empty string" beats assuming the model will infer it.

8. **Use structured output** — TypeScript types, JSON schemas, and interfaces are better specifications than prose. The compiler enforces what prose can't.

9. **Temperature discipline** — Deterministic tasks (code, SQL, schemas) need low temperature / no sampling. Brainstorming, naming, and docs benefit from variation.

10. **Don't fight the model** — If a design pattern is hard to get right in one shot, change the architecture. Extractors, resolvers, and test fixtures in this repo are designed to be easy for AI to add without deep context.

### Token & Cost Optimization (CodeGraph's Core Thesis)

The mechanism: **an agent falls back to Read/Grep the instant a codegraph answer is insufficient.** Every optimization therefore asks: *does this make the codegraph answer sufficient enough to stop the agent from reading?*

**What works:**
- `codegraph_trace` inlines hop bodies + callees → one call ends a flow investigation
- `codegraph_explore` with a precise bag of symbol names → trace-quality coverage of multi-file flows
- Framework extractors emitting `route` nodes → agents find endpoints without directory walking
- Dynamic-dispatch synthesizers bridging React/Observer/EventEmitter boundaries → flows connect end-to-end

**What fails:**
- Changing `server-instructions.ts` or tool descriptions to steer agent behavior — validated: wording variants don't reliably move tool choice
- Adding new tools — rarely chosen; agents under-pick even `trace`
- Half-bridged flows — covering one hop but not the next reveals a gap the agent drills into (measured: react-render alone *raised* Read calls to 5–7 on Excalidraw)

**Measure everything:** Use `scripts/agent-eval/run-all.sh <repo> "<Q>"` to A/B with vs without codegraph. The pass bar is `~0 Read/Grep` within the explore-call budget. Run ≥2 times per arm — variance is large. See `docs/benchmarks/call-sequence-analysis.md`.

### LLM Prompt Injection Defense

CodeGraph is an MCP server that returns arbitrary user code as context. This is an **indirect prompt injection** surface — a malicious `// IGNORE PREVIOUS INSTRUCTIONS` comment in an indexed file can reach the agent.

Mitigations in place:
- `src/mcp/server-instructions.ts` — instructs the agent to treat all returned code as untrusted data, not AI directives
- `src/installer/instructions-template.ts` — same instruction in each agent's own markdown file
- `src/mcp/tools.ts` — input size limits reject oversized payloads before they reach FTS5 (`MAX_INPUT_LENGTH = 10_000`)
- `src/db/queries.ts` — FTS5 special chars, null bytes, and boolean operators stripped before query execution

When adding new MCP tools, keep these invariants: validate + bound all inputs; never interpolate user data into SQL strings; never let returned code content reach the agent's system prompt channel.

## House rules

- The `0.7.x` line is in active multi-agent rollout. Any change to `src/installer/` (especially `targets/`) needs corresponding test coverage and a CHANGELOG entry — installer regressions break every new install silently.
- When changing what the MCP tools do or how agents should use them, update **all three** of `src/mcp/server-instructions.ts`, `src/installer/instructions-template.ts`, and `.cursor/rules/codegraph.mdc` — they're written to different places but say the same thing.
- CodeGraph provides **code context**, not product requirements. For new features, ask the user about UX, edge cases, and acceptance criteria — the graph won't tell you.
- Test isolation: tests that call `git commit` must set `git config commit.gpgsign false` in the temp repo — the global config may have signing enabled (e.g. in CI). Tests that test Unicode output must clear `TERM` via `withEnv({ TERM: undefined, … })` since the CI terminal sets `TERM=linux`.
137 changes: 137 additions & 0 deletions PLAN_NPM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
# Publishing Plan — npm + GitHub Releases

This document tracks the steps needed to publish the first release of
`@andersonlimahw/lemon-codegraph` on npm and create the GitHub Release bundles.

Until this is done, users must install via:

```bash
npm install -g github:andersonlimahw/lemon-code-graph
```

---

## Prerequisites

- [ ] npm account at <https://www.npmjs.com> with access to the `@andersonlimahw` org scope
- [ ] `NPM_TOKEN` — a **publish** token from your npm account (Granular: read+write on `@andersonlimahw/*`)
- [ ] GitHub repo secret `NPM_TOKEN` set in **Settings → Secrets and variables → Actions**

---

## Step 1 — Add the NPM_TOKEN secret to GitHub

1. Go to <https://github.com/andersonlimahw/lemon-code-graph/settings/secrets/actions>
2. Click **New repository secret**
3. Name: `NPM_TOKEN`
4. Value: your npm publish token (create one at <https://www.npmjs.com/settings/~/tokens>)
5. Click **Add secret**

---

## Step 2 — Verify the npm org scope exists

Run locally:

```bash
npm org ls @andersonlimahw
```

If the org doesn't exist yet, create it at <https://www.npmjs.com/org/create>.

---

## Step 3 — Verify the release scripts exist and are correct

The following scripts are called by the Release workflow:

| Script | Purpose |
|---|---|
| `scripts/build-bundle.sh` | Builds a self-contained Node + app archive per platform |
| `scripts/pack-npm.sh` | Creates the npm thin-installer + per-platform packages |
| `scripts/extract-release-notes.mjs` | Pulls release notes from CHANGELOG.md |

Check they exist and are executable:

```bash
ls -la scripts/build-bundle.sh scripts/pack-npm.sh scripts/extract-release-notes.mjs
chmod +x scripts/build-bundle.sh scripts/pack-npm.sh
```

---

## Step 4 — Ensure CHANGELOG.md has the release section

The release workflow reads notes from CHANGELOG.md. Confirm the file has a
`## [0.9.4]` section (or `## [Unreleased]` as fallback). Example:

```markdown
## [0.9.4] - 2026-06-05

### Added
- Kotlin/Ktor framework resolver
- Java extractor improvements (abstract methods, annotations, async)
- Full-stack/frontend/mobile framework support
```

---

## Step 5 — Trigger the Release workflow

1. Go to <https://github.com/andersonlimahw/lemon-code-graph/actions/workflows/release.yml>
2. Click **Run workflow** → select branch `main` → **Run workflow**

The workflow will:
1. Build self-contained bundles for all 6 targets:
`darwin-arm64`, `darwin-x64`, `linux-x64`, `linux-arm64`, `win32-x64`, `win32-arm64`
2. Generate `SHA256SUMS`
3. Create GitHub Release `v0.9.4` with all archives as assets
4. Publish to npm:
- `@andersonlimahw/lemon-codegraph-darwin-arm64`
- `@andersonlimahw/lemon-codegraph-darwin-x64`
- `@andersonlimahw/lemon-codegraph-linux-x64`
- `@andersonlimahw/lemon-codegraph-linux-arm64`
- `@andersonlimahw/lemon-codegraph-win32-x64`
- `@andersonlimahw/lemon-codegraph-win32-arm64`
- `@andersonlimahw/lemon-codegraph` (main shim)

---

## Step 6 — Verify after workflow completes

```bash
# Confirm GitHub Release exists
gh release view v0.9.4 --repo andersonlimahw/lemon-code-graph

# Confirm npm package is live
npm view @andersonlimahw/lemon-codegraph version

# Test the standalone installer
curl -fsSL https://raw.githubusercontent.com/andersonlimahw/lemon-code-graph/main/install.sh | sh

# Test npm install
npm install -g @andersonlimahw/lemon-codegraph
codegraph --help
```

---

## Troubleshooting

### "No release notes found"
Add a `## [0.9.4]` or `## [Unreleased]` section to CHANGELOG.md.

### "Package already exists" on npm
The workflow handles this with `npm view "$name@$V" version` checks — it skips packages already on the registry. Safe to re-run.

### Per-platform package missing after install
The npm shim self-heals: it downloads the matching bundle from GitHub Releases automatically. The user sees:
```
codegraph: platform bundle missing (registry did not provide @andersonlimahw/lemon-codegraph-<platform>).
codegraph: downloading codegraph-<platform>.tar.gz from GitHub Releases...
```

### Can't access `@andersonlimahw` scope on npm
Create the org at <https://www.npmjs.com/org/create>, or publish as unscoped:
change `"name": "@andersonlimahw/lemon-codegraph"` to `"name": "lemon-codegraph"` in
`package.json` and update all references in `scripts/pack-npm.sh` and `scripts/npm-shim.js`.
Loading