This project demonstrates how to use a local GGUF model with node-llama-cpp to generate embeddings from text and compute their cosine similarities. It uses a quantized nomic-embed-text-v1.5.Q4_K_M.gguf model.
Generate and compare text embeddings using local GGUF models with node-llama-cpp and llamafile.
- π Run embeddings on CPU with optimized GGUF models
- π Calculate cosine similarity between text samples
- πΎ Save embeddings and similarity scores to JSON
- β‘ Works with quantized models for better performance
- π₯οΈ No GPU required (but recommended for faster inference)
- Node.js
>=18 - A Vulkan-compatible GPU (recommended for hardware acceleration)
- Linux (tested on Linux Mint)
# Clone the repository
git clone https://github.com/yourusername/llama-embedder
cd llama-embedder
# Install dependencies
npm install node-llama-cpp- Visit: nomic-embed-text-v1.5-GGUF on Hugging Face
- Download:
nomic-embed-text-v1.5.Q4_K_M.gguf(~81 MB) - Create a
modelsdirectory and place the model file in it:mkdir -p models mv ~/Downloads/nomic-embed-text-v1.5.Q4_K_M.gguf models/
Get the latest prebuilt llamafile binary from: llamafile Releases
Make it executable:
chmod +x ./llamafile./llamafile --model ./models/nomic-embed-text-v1.5.Q4_K_M.gguf \
--embedding \
--ctx-size 2048 \
--prompt "search_query: The sun is a star"This will return a vector (list of floats), which is the embedding.
Make sure you have Node.js 18+ installed, then install the required package:
npm install node-llama-cppThe project includes an example script (index.mjs) that demonstrates how to:
- Load the GGUF model
- Generate embeddings for sample texts
- Calculate cosine similarities between them
- Save the results to a JSON file
To run the example:
node index.mjsStarting Llama...
Loading model...
Creating embedding context...
Generating embeddings...
β
Embedded: "search_query: The sun is a star"
β
Embedded: "search_query: The moon is a rock"
β
Embedded: "search_query: Apples are fruits"
β
Embedded: "search_query: Stars produce light"
π Similarity Matrix:
Similarity between:
"search_query: The sun is a star"
"search_query: The moon is a rock"
β 0.6891
...
The script will save the complete results (including all embeddings and similarity scores) to embeddings-output.json.
To use your own texts, modify the samples array in index.mjs:
const samples = [
'Your first text here',
'Your second text here',
// Add more texts as needed
];For more advanced usage, you can:
- Change the model path in the script
- Adjust the context size
- Modify the similarity calculation
- Process larger batches of text
Refer to the node-llama-cpp documentation for more details.
llama-embedder/
βββ models/ # Store GGUF models here
β βββ nomic-embed-text-v1.5.Q4_K_M.gguf
βββ index.mjs # Main script
βββ embeddings-output.json # Generated output file
The included index.mjs script will:
- Load the GGUF model
- Generate embeddings for sample texts
- Calculate cosine similarities between all pairs
- Save results to
embeddings-output.json
./llamafile \
--model ./models/nomic-embed-text-v1.5.Q4_K_M.gguf \
--embedding \
--ctx-size 2048 \
--prompt "search_query: Your text here"Embedded: "search_query: The sun is a star"
Embedded: "search_query: The moon is a rock"
Embedded: "search_query: Apples are fruits"
Embedded: "search_query: Stars produce light"
Similarity Matrix:
Similarity between:
"search_query: The sun is a star"
"search_query: The moon is a rock"
β 0.6891
...
- π Semantic search
- π·οΈ Text classification
- π Clustering similar documents
- π― Recommendation systems
- π Similarity detection
| Model | Size (Q4) | Dim | GGUF Native | Notes |
|---|---|---|---|---|
| nomic-embed-text-v1.5 | ~81MB | 768 | β | Best general-purpose |
| bge-small-en-v1.5 | ~300MB | 384 | β | Lightweight, fast |
| all-MiniLM-L6-v2 | ~250MB | 384 | β | Good for short texts |
- For best performance, use quantized models (e.g., Q4_K_M)
- First run may be slow as models are loaded into memory
- See
node-llama-cppdocumentation for advanced configuration
Contributions are welcome! Please open an issue or submit a pull request.
This project is licensed under the MIT License - see the LICENSE file for details.
- node-llama-cpp
- Nomic AI for the nomic-embed-text model
- llama.cpp for GGUF support