Skip to content

TheeValcode/Llama-Embedder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ¦™ Llama Embedder

Node.js Version License

This project demonstrates how to use a local GGUF model with node-llama-cpp to generate embeddings from text and compute their cosine similarities. It uses a quantized nomic-embed-text-v1.5.Q4_K_M.gguf model. Generate and compare text embeddings using local GGUF models with node-llama-cpp and llamafile.

✨ Features

  • πŸš€ Run embeddings on CPU with optimized GGUF models
  • πŸ“Š Calculate cosine similarity between text samples
  • πŸ’Ύ Save embeddings and similarity scores to JSON
  • ⚑ Works with quantized models for better performance
  • πŸ–₯️ No GPU required (but recommended for faster inference)

πŸš€ Quick Start

Prerequisites

  • Node.js >=18
  • A Vulkan-compatible GPU (recommended for hardware acceleration)
  • Linux (tested on Linux Mint)

Installation

# Clone the repository
git clone https://github.com/yourusername/llama-embedder
cd llama-embedder

# Install dependencies
npm install node-llama-cpp

πŸ“₯ Download and Test the Model

1. Download the Q4_K_M GGUF Model

  1. Visit: nomic-embed-text-v1.5-GGUF on Hugging Face
  2. Download: nomic-embed-text-v1.5.Q4_K_M.gguf (~81 MB)
  3. Create a models directory and place the model file in it:
    mkdir -p models
    mv ~/Downloads/nomic-embed-text-v1.5.Q4_K_M.gguf models/

2. Using llamafile (Quick CLI Test)

Step 1: Download llamafile

Get the latest prebuilt llamafile binary from: llamafile Releases

Make it executable:

chmod +x ./llamafile

Step 2: Run a test embedding

./llamafile --model ./models/nomic-embed-text-v1.5.Q4_K_M.gguf \
  --embedding \
  --ctx-size 2048 \
  --prompt "search_query: The sun is a star"

This will return a vector (list of floats), which is the embedding.

3. Using node-llama-cpp (JavaScript)

Step 1: Install Dependencies

Make sure you have Node.js 18+ installed, then install the required package:

npm install node-llama-cpp

Step 2: Run the Example Script

The project includes an example script (index.mjs) that demonstrates how to:

  • Load the GGUF model
  • Generate embeddings for sample texts
  • Calculate cosine similarities between them
  • Save the results to a JSON file

To run the example:

node index.mjs

Example Output

Starting Llama...
Loading model...
Creating embedding context...
Generating embeddings...
βœ… Embedded: "search_query: The sun is a star"
βœ… Embedded: "search_query: The moon is a rock"
βœ… Embedded: "search_query: Apples are fruits"
βœ… Embedded: "search_query: Stars produce light"

πŸ“Š Similarity Matrix:

Similarity between:
  "search_query: The sun is a star"
  "search_query: The moon is a rock"
  β†’ 0.6891

...

The script will save the complete results (including all embeddings and similarity scores) to embeddings-output.json.

Customizing the Input

To use your own texts, modify the samples array in index.mjs:

const samples = [
  'Your first text here',
  'Your second text here',
  // Add more texts as needed
];

Advanced Usage

For more advanced usage, you can:

  • Change the model path in the script
  • Adjust the context size
  • Modify the similarity calculation
  • Process larger batches of text

Refer to the node-llama-cpp documentation for more details.

πŸ—‚οΈ Project Structure

llama-embedder/
β”œβ”€β”€ models/                   # Store GGUF models here
β”‚   └── nomic-embed-text-v1.5.Q4_K_M.gguf
β”œβ”€β”€ index.mjs                # Main script
└── embeddings-output.json    # Generated output file

πŸ” Example Usage

Using Node.js (index.mjs)

The included index.mjs script will:

  1. Load the GGUF model
  2. Generate embeddings for sample texts
  3. Calculate cosine similarities between all pairs
  4. Save results to embeddings-output.json

Using Llamafile (CLI)

./llamafile \
  --model ./models/nomic-embed-text-v1.5.Q4_K_M.gguf \
  --embedding \
  --ctx-size 2048 \
  --prompt "search_query: Your text here"

πŸ“Š Expected Output

Embedded: "search_query: The sun is a star"
Embedded: "search_query: The moon is a rock"
Embedded: "search_query: Apples are fruits"
Embedded: "search_query: Stars produce light"

Similarity Matrix:

Similarity between:
  "search_query: The sun is a star"
  "search_query: The moon is a rock"
  β†’ 0.6891

...

🧠 Use Cases

  • πŸ” Semantic search
  • 🏷️ Text classification
  • πŸ“Š Clustering similar documents
  • 🎯 Recommendation systems
  • πŸ“ Similarity detection

πŸ† Recommended Models

Model Size (Q4) Dim GGUF Native Notes
nomic-embed-text-v1.5 ~81MB 768 βœ… Best general-purpose
bge-small-en-v1.5 ~300MB 384 ❌ Lightweight, fast
all-MiniLM-L6-v2 ~250MB 384 ❌ Good for short texts

πŸ“ Notes

  • For best performance, use quantized models (e.g., Q4_K_M)
  • First run may be slow as models are loaded into memory
  • See node-llama-cpp documentation for advanced configuration

🀝 Contributing

Contributions are welcome! Please open an issue or submit a pull request.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Credits

About

This project demonstrates how to use a local GGUF model with node-llama-cpp to generate embeddings from text and compute their cosine similarities. It uses a quantized nomic-embed-text-v1.5.Q4_K_M.gguf model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors