Skip to content

feat: local embedding provider for search#36

Open
jeffmm wants to merge 10 commits into
1st1:mainfrom
jeffmm:jmm/local-embedding-provider
Open

feat: local embedding provider for search#36
jeffmm wants to merge 10 commits into
1st1:mainfrom
jeffmm:jmm/local-embedding-provider

Conversation

@jeffmm

@jeffmm jeffmm commented Mar 22, 2026

Copy link
Copy Markdown

Adds a local embedding provider so lat search works out of the box without an API key.

  • @huggingface/transformers as optional dep — in-process inference, no external server
  • Three pre-selected model sizes via LAT_LOCAL_MODEL_SIZE (small/medium/large, default small at 384 dims)
  • Discriminated union (kind: 'api' | 'local') on EmbeddingProvider for clean dispatch between API and local paths
  • ensureSchema tracks dimensions in a meta table and rebuilds the index on mismatch (e.g. switching providers)
  • Pipeline promise cached so concurrent callers share a single model load
  • Tests for provider detection, dimension mismatch, local embedding quality, and MCP no-key behavior

Full disclosure: Claude Code helped me write this PR with the help of lat.md's lat.md 😄

Closes #8

@vercel

vercel Bot commented Mar 22, 2026

Copy link
Copy Markdown

@jeffmm is attempting to deploy a commit to the Yury Selivanov's projects Team on Vercel.

A member of the Team first needs to authorize it.

@jeffmm jeffmm force-pushed the jmm/local-embedding-provider branch from 5e362e0 to 6fe644b Compare March 23, 2026 15:26
@1st1

1st1 commented Mar 24, 2026

Copy link
Copy Markdown
Owner

@jeffmm huge thanks! Let me ponder on this. Quick question: what if i just want to use LM Studio's embedding model? Should we design for that too?

@anbuzin Andrey, what do you think?

@jeffmm

jeffmm commented Mar 24, 2026

Copy link
Copy Markdown
Author

@1st1 I think adding support for custom endpoints would be easy enough to add, and would be a nice integration for people who are already running local servers (which I assume LM Studio does). We could prioritize it like:

  1. LAT_LLM_ENDPOINT for users running local servers or otherwise want to use other OpenAI-compatible endpoints
  2. LAT_LLM_API_KEY for users who have OpenAI or Vercel API keys
  3. LAT_LOCAL_MODEL_SIZE as a fallback.

I could see to adding LAT_LLM_ENDPOINT as a fast follow-up PR.

@nickshirobokov

Copy link
Copy Markdown

@jeffmm looks awesome! Thank you!

I'd consider for the future PRs exploring better optimized local inference ('transformers.feature_extraction' on CPU might be really slow even for small models), and switching to late interaction instead of pooling since we already have multi-vector local models.

@lars20070

lars20070 commented May 4, 2026

Copy link
Copy Markdown
Contributor

Adding the embedding models via Hugging Face has a great out-of-the-box user experience.
But I would suggest to review the alternative approach in #52 as well. IMO, only one of the two PRs needs to be merged.

@nickshirobokov

Copy link
Copy Markdown

@lars20070 I see the value in the provider-first approach, and I think it fits the project better today, but I disagree that it's future-proof. Most recent advancements in semantic search are focused on reducing model sizes, combining dense embeddings with static sparse embeddings, and merging vectors to make semantic search work locally in most deployments without any http requests. That's why I think lat.md should follow local-first approach.

@jeffmm

jeffmm commented May 9, 2026

Copy link
Copy Markdown
Author

I think adding a general API base is a good feature for lat.md, but I would not describe it as an alternative approach that is mutually exclusive to this PR. They are complimentary.

I managed to convince my team to use lat.md in production, but one of the biggest barriers for getting other teams to fully embrace it is that it requires extra setup to work well. Right now, the experience is degraded for many of our engineers whose agents are simply relying on lat locate and grep, and I've had to disable hooks and steer the agent away from lat search in the AGENTS.md to avoid complaints of poor agent behavior. An out-of-the-box experience is crucial for maintaining a low barrier to entry, and would increase the likelihood of adoption.

@lars20070

Copy link
Copy Markdown
Contributor

Thanks for the helpful replies — you're right that this isn't #36 vs #52, and either or both approaches can be adopted.

@jeffmm I've also come across several teams running lat.md in production, and the API-key setup keeps surfacing as a friction point. The longer that gap remains, the more likely teams are to build their own workarounds, which risks fragmenting the codebase.

It would be great to see #36 land soon. Happy to help review, test, or pick up follow-ups — just let me know what would be most useful.

@grumd

grumd commented May 13, 2026

Copy link
Copy Markdown

@1st1 I can attest to what @jeffmm is saying. My team started using lat.md and it's been an amazing tool, but we don't use Vercel or OpenAI, so we have to resort to using lat locate and the agents keep trying lat search and getting errors. Sometimes agents will try lat search, get an error, and just skip lat altogether and get to work without it.

lat locate is rough to use, substring matching doesn't work. I can't match an existing section "Delete Transaction" by running lat locate 'Delete'. lat search is very crucial for lat to work like its supposed to.

This has been honestly the only and biggest barrier to using lat for us. I've cloned and built this PR and tried a few lat search commands on my existing project, and it works very well out of the box. API keys should be an optional feature for users who need higher quality embeddings, but a built-in small model that this PR provides is an incredible accessibility improvement. Please please get this merged if possible <3

@jeffmm What do you think about publishing your fork to npm until this is merged?

@jeffmm

jeffmm commented May 19, 2026

Copy link
Copy Markdown
Author

@lars20070 Thanks. It would be great if you can give the PR a review. I've been testing locally for several weeks now without issues, but I'd encourage you to give it a try locally as well. Personally I think these changes give lat.md a smooth "it just works" experience, but I'm happy to iterate on it based on your recommendations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature: add local embeddings to search engine

5 participants