Code2Logic

Convert source code to logical representation for LLM analysis.

Code2Logic analyzes codebases and generates compact, LLM-friendly representations with semantic understanding. Perfect for feeding project context to AI assistants, building code documentation, or analyzing code structure.

✨ Features

🌳 Multi-language support - Python, JavaScript, TypeScript, Java, Go, Rust, and more
🎯 Tree-sitter AST parsing - 99% accuracy with graceful fallback
📊 NetworkX dependency graphs - PageRank, hub detection, cycle analysis
🔍 Rapidfuzz similarity - Find duplicate and similar functions
🧠 NLP intent extraction - Human-readable function descriptions
📦 Zero dependencies - Core works without any external libs

🚀 Installation

Basic (no dependencies)

pip install code2logic

Full (all features)

pip install code2logic[full]

Selective features

pip install code2logic[treesitter]  # High-accuracy AST parsing
pip install code2logic[graph]       # Dependency analysis
pip install code2logic[similarity]  # Similar function detection
pip install code2logic[nlp]         # Enhanced intents

📖 Quick Start

code2logic ./ -f yaml --compact --function-logic --with-schema -o project.yaml
code2logic ./ -f toon --ultra-compact --function-logic --with-schema -o project.toon

Command Line

# Standard Markdown output
code2logic /path/to/project

# If the `code2logic` entrypoint is not available (e.g. running from source without install):
python -m code2logic /path/to/project

# Compact YAML (14% smaller, meta.legend transparency)
code2logic /path/to/project -f yaml --compact -o analysis-compact.yaml

# Ultra-compact TOON (71% smaller, single-letter keys)
code2logic /path/to/project -f toon --ultra-compact -o analysis-ultra.toon


# Generate schema alongside output
code2logic /path/to/project -f yaml --compact --with-schema

# With detailed analysis
code2logic /path/to/project -d detailed

Python API

from code2logic import analyze_project, MarkdownGenerator

# Analyze a project
project = analyze_project("/path/to/project")

# Generate output
generator = MarkdownGenerator()
output = generator.generate(project, detail_level='standard')
print(output)

# Access analysis results
print(f"Files: {project.total_files}")
print(f"Lines: {project.total_lines}")
print(f"Languages: {project.languages}")

# Get hub modules (most important)
hubs = [p for p, n in project.dependency_metrics.items() if n.is_hub]
print(f"Key modules: {hubs}")

Organized Imports

# Core analysis
from code2logic import ProjectInfo, ProjectAnalyzer, analyze_project

# Format generators
from code2logic import (
    YAMLGenerator,
    JSONGenerator,
    TOONGenerator,
    LogicMLGenerator,
    GherkinGenerator,
)

# LLM clients
from code2logic import get_client, BaseLLMClient

# Development tools
from code2logic import run_benchmark, CodeReviewer

📋 Output Formats

Markdown (default)

Human-readable documentation with:

Project structure tree with hub markers (★)
Dependency graphs with PageRank scores
Classes with methods and intents
Functions with signatures and descriptions

Compact

Ultra-compact format optimized for LLM context:

# myproject | 102f 31875L | typescript:79/python:23
ENTRY: index.ts main.py
HUBS: evolution-manager llm-orchestrator

[core/evolution]
  evolution-manager.ts (3719L) C:EvolutionManager | F:createEvolutionManager
  task-queue.ts (139L) C:TaskQueue,Task

JSON

Machine-readable format for:

RAG (Retrieval-Augmented Generation)
Database storage
Further analysis

🔧 Configuration

Library Status

Check which features are available:

code2logic --status

Library Status:
  tree_sitter: ✓
  networkx: ✓
  rapidfuzz: ✓
  nltk: ✗
  spacy: ✗

LLM Configuration

Manage LLM providers, models, API keys, and routing priorities:

code2logic llm status
code2logic llm set-provider auto
code2logic llm set-model openrouter nvidia/nemotron-3-nano-30b-a3b:free
code2logic llm key set openrouter <OPENROUTER_API_KEY>
code2logic llm priority set-provider openrouter 10
code2logic llm priority set-mode provider-first
code2logic llm priority set-llm-model nvidia/nemotron-3-nano-30b-a3b:free 5
code2logic llm priority set-llm-family nvidia/ 5
code2logic llm config list

Notes:

code2logic llm set-provider auto enables automatic fallback selection: providers are tried in priority order.
API keys should be stored in .env (or environment variables), not in litellm_config.yaml.
These commands write configuration files:
- .env in the current working directory
- litellm_config.yaml in the current working directory
- ~/.code2logic/llm_config.json in your home directory

Priority modes

You can choose how automatic fallback ordering is computed:

provider-first providers are ordered by provider priority (defaults + overrides)
model-first providers are ordered by priority rules for the provider's configured model (exact/prefix)
mixed providers are ordered by the best (lowest) priority from either provider priority or model rules

Configure the mode:

code2logic llm priority set-mode provider-first
code2logic llm priority set-mode model-first
code2logic llm priority set-mode mixed

Model priority rules are stored in ~/.code2logic/llm_config.json.

Python API (Library Status)

from code2logic import get_library_status

status = get_library_status()
# {'tree_sitter': True, 'networkx': True, ...}

📊 Analysis Features

Dependency Analysis

PageRank - Identifies most important modules
Hub detection - Central modules marked with ★
Cycle detection - Find circular dependencies
Clustering - Group related modules

Intent Generation

Functions get human-readable descriptions:

methods:
  async findById(id:string) -> Promise<User>  # retrieves user by id
  async createUser(data:UserDTO) -> Promise<User>  # creates user
  validateEmail(email:string) -> boolean  # validates email

Similarity Detection

Find duplicate and similar functions:

Similar Functions:
  core/auth.ts::validateToken:
    - python/auth.py::validate_token (92%)
    - services/jwt.ts::verifyToken (85%)

🏗️ Architecture

code2logic/
├── analyzer.py      # Main orchestrator
├── parsers.py       # Tree-sitter + fallback parser
├── dependency.py    # NetworkX dependency analysis
├── similarity.py    # Rapidfuzz similar detection
├── intent.py        # NLP intent generation
├── generators.py    # Output generators (MD/Compact/JSON)
├── models.py        # Data structures
└── cli.py           # Command-line interface

🔌 Integration Examples

With Claude/ChatGPT

from code2logic import analyze_project, CompactGenerator

project = analyze_project("./my-project")
context = CompactGenerator().generate(project)

# Use in your LLM prompt
prompt = f"""
Analyze this codebase and suggest improvements:

{context}
"""

With RAG Systems

import json
from code2logic import analyze_project, JSONGenerator

project = analyze_project("./my-project")
data = json.loads(JSONGenerator().generate(project))

# Index in vector DB
for module in data['modules']:
    for func in module['functions']:
        embed_and_store(
            text=f"{func['name']}: {func['intent']}",
            metadata={'path': module['path'], 'type': 'function'}
        )

🧪 Development

Setup

git clone https://github.com/wronai/code2logic
cd code2logic
poetry install --with dev -E full
poetry run pre-commit install

# Alternatively, you can use Makefile targets (prefer Poetry if available)
make install-full

Tests

make test
make test-cov

# Or directly:
poetry run pytest
poetry run pytest --cov=code2logic --cov-report=html

Type Checking

make typecheck

# Or directly:
poetry run mypy code2logic

Linting

make lint
make format

# Or directly:
poetry run ruff check code2logic
poetry run black code2logic

📈 Performance

Codebase Size	Files	Lines	Time	Output Size
Small	10	1K	<1s	~5KB
Medium	100	30K	~2s	~50KB
Large	500	150K	~10s	~200KB

Compact format is ~10-15x smaller than Markdown.

🔬 Code Reproduction Benchmarks

Code2Logic can reproduce code from specifications using LLMs. Benchmark results:

Format Comparison (Token Efficiency)

Format	Score	Token Efficiency	Spec Tokens	Runs OK
YAML	71.1%	42.1	366	66.7%
Markdown	65.6%	48.7	385	100%
JSON	61.9%	23.7	605	66.7%
Gherkin	51.3%	19.1	411	66.7%

Key Findings

YAML is best for score - 71.1% reproduction accuracy
Markdown is best for token efficiency - 48.7 score/1000 tokens
YAML uses 39.6% fewer tokens than JSON with 9.2% higher score
Markdown has 100% runs OK - generated code always executes

Run Benchmarks

# Token-aware benchmark
python examples/11_token_benchmark.py --folder tests/samples/ --no-llm

# Async multi-format benchmark
python examples/09_async_benchmark.py --folder tests/samples/ --no-llm

# Function-level reproduction
python examples/10_function_reproduction.py --file tests/samples/sample_functions.py --no-llm

python examples/15_unified_benchmark.py --folder tests/samples/ --no-llm

# Terminal markdown rendering demo
python examples/16_terminal_demo.py --folder tests/samples/

🤝 Contributing

Contributions welcome! Please read our Contributing Guide.

📄 License

Apache 2 License - see LICENSE for details.

🔄 Companion Packages

logic2test - Generate Tests from Logic

Generate test scaffolds from Code2Logic output:

# Show what can be generated
python -m logic2test out/code2logic/project.c2l.yaml --summary

# Generate unit tests
python -m logic2test out/code2logic/project.c2l.yaml -o out/logic2test/tests/

# Generate all test types (unit, integration, property)
python -m logic2test out/code2logic/project.c2l.yaml -o out/logic2test/tests/ --type all

from logic2test import TestGenerator

generator = TestGenerator('out/code2logic/project.c2l.yaml')
result = generator.generate_unit_tests('tests/')
print(f"Generated {result.tests_generated} tests")

logic2code - Generate Code from Logic

Generate source code from Code2Logic output:

# Show what can be generated
python -m logic2code out/code2logic/project.c2l.yaml --summary

# Generate Python code
python -m logic2code out/code2logic/project.c2l.yaml -o out/logic2code/generated_code/

# Generate stubs only
python -m logic2code out/code2logic/project.c2l.yaml -o out/logic2code/generated_code/ --stubs-only

from logic2code import CodeGenerator

generator = CodeGenerator('out/code2logic/project.c2l.yaml')
result = generator.generate('output/')
print(f"Generated {result.files_generated} files")

Full Workflow: Code → Logic → Tests/Code

# 1. Analyze existing codebase
code2logic src/ -f yaml -o out/code2logic/project.c2l.yaml

# 2. Generate tests for the codebase
python -m logic2test out/code2logic/project.c2l.yaml -o out/logic2test/tests/ --type all

# 3. Generate code scaffolds (for refactoring)
python -m logic2code out/code2logic/project.c2l.yaml -o out/logic2code/generated_code/ --stubs-only

📚 Documentation

00 - Docs Index - Documentation home (start here)
01 - Getting Started - Install and first steps
02 - Configuration - API keys, environment setup
03 - CLI Reference - Command-line usage
04 - Python API - Programmatic usage
05 - Output Formats - Format comparison and usage
06 - Format Specifications - Detailed format specs
07 - TOON Format - Token-Oriented Object Notation
08 - LLM Integration - OpenRouter/Ollama/LiteLLM
09 - LLM Comparison - Provider/model comparison
10 - Benchmarking - Benchmark methodology and results
11 - Repeatability - Repeatability testing
12 - Examples - Usage workflows and examples
13 - Architecture - System design and components
14 - Format Analysis - Deeper format evaluation
15 - Logic2Test - Test generation from logic files
16 - Logic2Code - Code generation from logic files
17 - LOLM - LLM provider management
18 - Reproduction Testing - Format validation and code regeneration
19 - Monorepo Workflow - Managing all packages from repo root

🧩 Examples

examples/ - All runnable examples
examples/run_examples.sh - Example runner script (multi-command workflows)
examples/code2logic/ - Minimal project + docker example for code2logic
examples/logic2test/ - Minimal project + docker example for logic2test
examples/logic2code/ - Minimal project + docker example for logic2code

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.github/workflows		.github/workflows
code2logic		code2logic
docs		docs
examples		examples
logic2code		logic2code
logic2test		logic2test
lolm		lolm
scripts		scripts
tests		tests
todo		todo
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
DOCS.md		DOCS.md
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
docker-compose.yml		docker-compose.yml
image-1.png		image-1.png
image.png		image.png
litellm_config.yaml		litellm_config.yaml
multilang_reproduction_tester.py		multilang_reproduction_tester.py
ollama_benchmark.sh		ollama_benchmark.sh
project.c2l.hybrid.yaml		project.c2l.hybrid.yaml
project.c2l.hybrid.yaml-schema.json		project.c2l.hybrid.yaml-schema.json
project.c2l.toon		project.c2l.toon
project.c2l.toon-schema.json		project.c2l.toon-schema.json
project.c2l.yaml		project.c2l.yaml
project.c2l.yaml-schema.json		project.c2l.yaml-schema.json
project.func.json		project.func.json
project.func.logicml		project.func.logicml
project.func.toon		project.func.toon
project.func.yaml		project.func.yaml
project.sh		project.sh
project.toon		project.toon
project.toon-schema.json		project.toon-schema.json
pyproject.toml		pyproject.toml
reproduction-test-final-report.md		reproduction-test-final-report.md
reproduction_test_code.py		reproduction_test_code.py
universal_validator.py		universal_validator.py

License

wronai/code2logic

Folders and files

Latest commit

History

Repository files navigation