Sketch2Graphviz

Convert hand-drawn sketches and images of graphs into proper Graphviz DOT code using a fine-tuned Vision-Language Model with Retrieval-Augmented Generation.

Features • Demos • Architecture • Getting Started • API Reference • Model Details • Evaluation Results

Demos

Features

Sketch-to-Code Conversion - Draw a graph or flowchart on an interactive canvas, or upload an image, and get Graphviz DOT code generated automatically
RAG-Enhanced Generation - Retrieval-Augmented Generation via a PostgreSQL/PGVector vector database provides few-shot examples for higher quality output (89.07% vs 86.17% LLM-as-Judge accuracy)
Live Graphviz Rendering - Generated DOT code is rendered as an SVG in real time, with download options for SVG and PNG
Code Editing - Refine generated code with natural language edit requests using rewrite or selective edit modes
Interactive Drawing Canvas - Full-featured sketchpad with multiple shapes, colors, and drawing tools built with Konva

Architecture

Sketch2Graphviz uses a LoRA fine-tuned Llama 3.2 11B Vision Instruct model as its core Vision-Language Model (VLM). At inference time, the input image is embedded using the VLM's vision encoder, and a similarity search against a PostgreSQL + PGVector vector database retrieves the top-K most similar Graphviz DOT code samples. These retrieved examples are injected into the prompt as few-shot context before the model generates the final DOT code.

Component	Technology
Vision-Language Model	Llama 3.2 11B Vision Instruct + LoRA
Vector Database	PostgreSQL + PGVector
Backend API	FastAPI + Uvicorn
Frontend	React 19, Vite, Tailwind CSS
Drawing Canvas	Konva / react-konva
Graphviz Rendering	Viz.js
Containerization	Docker
Deployment	Runpod (GPU)

Project Structure

Sketch2Graphviz/
├── client/                          # React + Vite frontend
│   ├── public/
│   │   └── assets/                  # Static assets (icon, etc.)
│   ├── src/
│   │   ├── api/
│   │   │   ├── server.js            # API client for FastAPI server
│   │   │   └── axios.js             # Axios instance configuration
│   │   ├── components/
│   │   │   └── GraphSketchpad.jsx   # Interactive Konva drawing canvas
│   │   ├── App.jsx                  # Main app layout, state, rendering
│   │   └── main.jsx                 # React entry point
│   ├── package.json
│   └── vite.config.js
│
├── model/                           # Python backend and ML model
│   ├── main.py                      # FastAPI server entry point
│   ├── llm_judge.py                 # LLM-based evaluation (Azure AI)
│   ├── requirements.txt             # Python dependencies
│   ├── Dockerfile                   # Full-stack Docker image
│   ├── scripts/
│   │   ├── model.py                 # Sketch2GraphvizVLM model class
│   │   ├── inference.py             # Inference pipeline with RAG
│   │   ├── finetune_lora.py         # LoRA fine-tuning loop
│   │   ├── eval.py                  # Evaluation metrics (SSIM, LPIPS, F1, etc.)
│   │   ├── data.py                  # Dataset classes and augmentations
│   │   ├── prompts.py               # System and generation prompts
│   │   ├── graphviz_renderer.py     # DOT code to PNG rendering
│   │   ├── embeddings.py            # Image embedding generation for RAG
│   │   ├── selective_edit.py        # Selective JSON-based code editing
│   │   ├── psql_vector_db.py        # Vector DB operations (PGVector)
│   │   └── synthetic_data_gen.py    # Synthetic DOT data generation (OpenAI)
│   ├── lora_checkpoints/            # Saved LoRA adapter weights
│   │   ├── epoch_1_vlm_lora/
│   │   ├── epoch_2_vlm_lora/
│   │   └── epoch_3_vlm_lora/
│   ├── testing_graphs/              # Test images for evaluation
│   ├── outputs/                     # Model output artifacts
│   ├── postgreSQL_data/             # PostgreSQL database dump
│   └── results.md                   # Model evaluation results
│
├── data/                            # Synthetic training datasets (JSON)
│   ├── simple_synthetic_data_gen.json
│   ├── synthetic_data_gen.json
│   └── complex_synthetic_data_gen.json
│
├── README.md
└── .env

Getting Started

Prerequisites

Node.js (v18+) and npm for the frontend
Python 3.10+ for the backend
Docker for containerized deployment
A HuggingFace account with access to the gated meta-llama/Llama-3.2-11B-Vision-Instruct model
A GPU with 12-16 GB VRAM (4-bit quantized model) or 24 GB VRAM (16-bit quantized model)

Clone the Repository

git clone https://github.com/RishabSA/Sketch2Graphviz.git
cd Sketch2Graphviz

Frontend Setup

cd client
npm install

Create a .env file in the client/ directory with the server URL:

VITE_SERVER_URL=https://<YOUR_SERVER_URL>

Start the development server:

npm run dev

The frontend will be available at http://localhost:5173.

Docker Deployment (Backend)

The pre-built Docker image is available on DockerHub:

docker pull rishabsa/sketch2graphviz:latest

Or build from source:

cd model
docker buildx build --platform linux/amd64 -t sketch2graphviz:latest .

Runpod Deployment

Create a pod with an appropriate GPU (~12-16 GB VRAM for 4-bit quantized, ~24 GB VRAM for 16-bit). An RTX A5000 (24 GB) works well for the 4-bit model.
Use the rishabsa/sketch2graphviz:latest Docker image as the template
Set the container disk and volume disk storage both to ~40 GB
Expose HTTP port 8000
Set an environment variable HF_TOKEN with a HuggingFace token that has read permissions
Uncheck "Start Jupyter notebook"
Deploy the pod on-demand

Note: You must have been granted access to meta-llama/Llama-3.2-11B-Vision-Instruct on HuggingFace as it is a gated model.

On startup, the FastAPI server will download and load the base model into VRAM with 4-bit quantization, load the LoRA adapters from storage, and initialize the PostgreSQL + PGVector database with pre-computed embeddings.

Verify the Server

curl -X POST https://<RUNPOD_POD>-8000.proxy.runpod.net/graphviz_code_from_image \
  -F "file=@model/testing_graphs/graph_6.png"

API Reference

`POST /graphviz_code_from_image`

Generate Graphviz DOT code from an image.

Request:

Parameter	Type	Location	Description
`file`	File (PNG)	Form data	Image of a graph or flowchart
`use_rag`	Boolean	Header	Enable RAG retrieval (default: `true`)
`top_K_rag`	Integer	Header	Number of similar examples to retrieve (default: `5`)

Response: Plain text Graphviz DOT code

Example:

curl -X POST https://<SERVER_URL>/graphviz_code_from_image \
  -F "file=@image.png" \
  -H "use_rag: true" \
  -H "top_K_rag: 5"

`POST /graphviz_code_edit`

Edit existing Graphviz DOT code using natural language instructions.

Request Body (JSON):

Field	Type	Description
`edit_text`	String	Natural language description of the edit
`graphviz_code`	String	Existing DOT code to modify
`use_selective_changes`	Boolean	Use selective edit mode (default: `true`)

Response: Updated Graphviz DOT code

Example:

curl -X POST https://<SERVER_URL>/graphviz_code_edit \
  -H "Content-Type: application/json" \
  -d '{
    "edit_text": "change all node colors to lightblue",
    "graphviz_code": "digraph { A -> B; B -> C; }",
    "use_selective_changes": true
  }'

Model Details

Base Model and Fine-Tuning

Parameter	Value
Base Model	`meta-llama/Llama-3.2-11B-Vision-Instruct` (11B parameters)
Fine-Tuning Method	LoRA (Low-Rank Adaptation)
LoRA Rank	64
LoRA Alpha	128
LoRA Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, fc1, fc2, multi_modal_projector
Training Quantization	16-bit
Inference Quantization	4-bit

Training Configuration

Parameter	Value
Epochs	3
Batch Size	1 (per GPU)
Gradient Accumulation Steps	16 (effective batch size of 16)
Optimizer	AdamW
Learning Rate	2e-4 with linear warmup (10% of training steps)

Synthetic Data Generation

Because no publicly available Graphviz datasets provide DOT code of sufficient quality, training data was synthesized using the OpenAI API (gpt-4o-mini). Specific graph types are generated with prompt suffixes that specify different node types, edge types, and attributes across batches. The datasets include:

Dataset	Samples
Simple graphs	~447
Standard graphs	~700+
Complex graphs	~298

Retrieval-Augmented Generation (RAG)

Embedding: Input images are passed through the VLM's vision encoder to extract hidden states. These are mean-pooled across token positions and L2-normalized to produce a 4096-dimensional embedding vector.
Storage: Embeddings and their corresponding DOT code are stored in PostgreSQL with the PGVector extension.
Retrieval: At inference time, the input image is embedded, and the top-K most similar embeddings are retrieved by Euclidean (L2) distance.
Prompting: The DOT code corresponding to the retrieved embeddings is injected into the model's prompt as few-shot examples.

Editing Modes

Rewrite Edit: The full DOT code and edit request are sent to the model, which regenerates the entire code with the requested changes applied.

Selective Edit: The DOT code is split into indexed statements. The model outputs a JSON action plan:

{
	"actions": [
		{ "command": "add", "idx": 3, "content": "A -> D [label=\"new edge\"];" },
		{ "command": "edit", "idx": 5, "content": "B [shape=diamond];" },
		{ "command": "delete", "idx": 7 }
	]
}

Each action is validated and applied to the original code, allowing precise modifications without regenerating unrelated parts.

Evaluation Results

Test Loss by Epoch

Epoch	Test Loss
1	0.1974
2	0.1466
3	0.1296

Final Model Evaluation (Epoch 3, No RAG)

Metric	Value
Render Success Rate	97.96% (914/933)
Mean SSIM	0.9306
Mean LPIPS	0.1352
Graph Isomorphism Rate	43.52%
Node F1	0.5453
Edge F1	0.4974
Node Attribute F1	0.4879
Edge Attribute F1	0.4703

LLM-as-a-Judge Accuracy

Configuration	Correct	Total	Accuracy
Without RAG	804	933	86.17%
With RAG	831	933	89.07%

RAG improves LLM-as-Judge accuracy by 2.90%, demonstrating the value of few-shot prompting with retrieved examples.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
client		client
data		data
model		model
resources		resources
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sketch2Graphviz

Demos

Features

Architecture

Project Structure

Getting Started

Prerequisites

Clone the Repository

Frontend Setup

Docker Deployment (Backend)

Runpod Deployment

Verify the Server

API Reference

`POST /graphviz_code_from_image`

`POST /graphviz_code_edit`

Model Details

Base Model and Fine-Tuning

Training Configuration

Synthetic Data Generation

Retrieval-Augmented Generation (RAG)

Editing Modes

Evaluation Results

Test Loss by Epoch

Final Model Evaluation (Epoch 3, No RAG)

LLM-as-a-Judge Accuracy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sketch2Graphviz

Demos

Features

Architecture

Project Structure

Getting Started

Prerequisites

Clone the Repository

Frontend Setup

Docker Deployment (Backend)

Runpod Deployment

Verify the Server

API Reference

POST /graphviz_code_from_image

POST /graphviz_code_edit

Model Details

Base Model and Fine-Tuning

Training Configuration

Synthetic Data Generation

Retrieval-Augmented Generation (RAG)

Editing Modes

Evaluation Results

Test Loss by Epoch

Final Model Evaluation (Epoch 3, No RAG)

LLM-as-a-Judge Accuracy

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /graphviz_code_from_image`

`POST /graphviz_code_edit`

Packages