🦙 Llama Embedder

This project demonstrates how to use a local GGUF model with node-llama-cpp to generate embeddings from text and compute their cosine similarities. It uses a quantized nomic-embed-text-v1.5.Q4_K_M.gguf model. Generate and compare text embeddings using local GGUF models with node-llama-cpp and llamafile.

✨ Features

🚀 Run embeddings on CPU with optimized GGUF models
📊 Calculate cosine similarity between text samples
💾 Save embeddings and similarity scores to JSON
⚡ Works with quantized models for better performance
🖥️ No GPU required (but recommended for faster inference)

🚀 Quick Start

Prerequisites

Node.js >=18
A Vulkan-compatible GPU (recommended for hardware acceleration)
Linux (tested on Linux Mint)

Installation

# Clone the repository
git clone https://github.com/yourusername/llama-embedder
cd llama-embedder

# Install dependencies
npm install node-llama-cpp

📥 Download and Test the Model

1. Download the Q4_K_M GGUF Model

Visit: nomic-embed-text-v1.5-GGUF on Hugging Face
Download: nomic-embed-text-v1.5.Q4_K_M.gguf (~81 MB)

Create a models directory and place the model file in it:

mkdir -p models
mv ~/Downloads/nomic-embed-text-v1.5.Q4_K_M.gguf models/

2. Using llamafile (Quick CLI Test)

Step 1: Download llamafile

Get the latest prebuilt llamafile binary from: llamafile Releases

Make it executable:

chmod +x ./llamafile

Step 2: Run a test embedding

./llamafile --model ./models/nomic-embed-text-v1.5.Q4_K_M.gguf \
  --embedding \
  --ctx-size 2048 \
  --prompt "search_query: The sun is a star"

This will return a vector (list of floats), which is the embedding.

3. Using node-llama-cpp (JavaScript)

Step 1: Install Dependencies

Make sure you have Node.js 18+ installed, then install the required package:

npm install node-llama-cpp

Step 2: Run the Example Script

The project includes an example script (index.mjs) that demonstrates how to:

Load the GGUF model
Generate embeddings for sample texts
Calculate cosine similarities between them
Save the results to a JSON file

To run the example:

node index.mjs

Example Output

Starting Llama...
Loading model...
Creating embedding context...
Generating embeddings...
✅ Embedded: "search_query: The sun is a star"
✅ Embedded: "search_query: The moon is a rock"
✅ Embedded: "search_query: Apples are fruits"
✅ Embedded: "search_query: Stars produce light"

📊 Similarity Matrix:

Similarity between:
  "search_query: The sun is a star"
  "search_query: The moon is a rock"
  → 0.6891

...

The script will save the complete results (including all embeddings and similarity scores) to embeddings-output.json.

Customizing the Input

To use your own texts, modify the samples array in index.mjs:

const samples = [
  'Your first text here',
  'Your second text here',
  // Add more texts as needed
];

Advanced Usage

For more advanced usage, you can:

Change the model path in the script
Adjust the context size
Modify the similarity calculation
Process larger batches of text

Refer to the node-llama-cpp documentation for more details.

🗂️ Project Structure

llama-embedder/
├── models/                   # Store GGUF models here
│   └── nomic-embed-text-v1.5.Q4_K_M.gguf
├── index.mjs                # Main script
└── embeddings-output.json    # Generated output file

🔍 Example Usage

Using Node.js (`index.mjs`)

The included index.mjs script will:

Load the GGUF model
Generate embeddings for sample texts
Calculate cosine similarities between all pairs
Save results to embeddings-output.json

Using Llamafile (CLI)

./llamafile \
  --model ./models/nomic-embed-text-v1.5.Q4_K_M.gguf \
  --embedding \
  --ctx-size 2048 \
  --prompt "search_query: Your text here"

📊 Expected Output

Embedded: "search_query: The sun is a star"
Embedded: "search_query: The moon is a rock"
Embedded: "search_query: Apples are fruits"
Embedded: "search_query: Stars produce light"

Similarity Matrix:

Similarity between:
  "search_query: The sun is a star"
  "search_query: The moon is a rock"
  → 0.6891

...

🧠 Use Cases

🔍 Semantic search
🏷️ Text classification
📊 Clustering similar documents
🎯 Recommendation systems
📐 Similarity detection

🏆 Recommended Models

Model	Size (Q4)	Dim	GGUF Native	Notes
nomic-embed-text-v1.5	~81MB	768	✅	Best general-purpose
bge-small-en-v1.5	~300MB	384	❌	Lightweight, fast
all-MiniLM-L6-v2	~250MB	384	❌	Good for short texts

📝 Notes

For best performance, use quantized models (e.g., Q4_K_M)
First run may be slow as models are loaded into memory
See node-llama-cpp documentation for advanced configuration

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Credits

node-llama-cpp
Nomic AI for the nomic-embed-text model
llama.cpp for GGUF support

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
models		models
.gitignore		.gitignore
README.md		README.md
embeddings-output.json		embeddings-output.json
index.mjs		index.mjs
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🦙 Llama Embedder

✨ Features

🚀 Quick Start

Prerequisites

Installation

📥 Download and Test the Model

1. Download the Q4_K_M GGUF Model

2. Using llamafile (Quick CLI Test)

Step 1: Download llamafile

Step 2: Run a test embedding

3. Using node-llama-cpp (JavaScript)

Step 1: Install Dependencies

Step 2: Run the Example Script

Example Output

Customizing the Input

Advanced Usage

🗂️ Project Structure

🔍 Example Usage

Using Node.js (`index.mjs`)

Using Llamafile (CLI)

📊 Expected Output

🧠 Use Cases

🏆 Recommended Models

📝 Notes

🤝 Contributing

📄 License

🙏 Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🦙 Llama Embedder

✨ Features

🚀 Quick Start

Prerequisites

Installation

📥 Download and Test the Model

1. Download the Q4_K_M GGUF Model

2. Using llamafile (Quick CLI Test)

Step 1: Download llamafile

Step 2: Run a test embedding

3. Using node-llama-cpp (JavaScript)

Step 1: Install Dependencies

Step 2: Run the Example Script

Example Output

Customizing the Input

Advanced Usage

🗂️ Project Structure

🔍 Example Usage

Using Node.js (index.mjs)

Using Llamafile (CLI)

📊 Expected Output

🧠 Use Cases

🏆 Recommended Models

📝 Notes

🤝 Contributing

📄 License

🙏 Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Using Node.js (`index.mjs`)

Packages