Local and arbitrary model support #9619

zachlloyd · 2026-04-30T15:44:40Z

zachlloyd
Apr 30, 2026
Maintainer

We are trying to figure out the best way to implement local model support and I wanted to start a discussion on our different potential approaches to see what resonates most with the community.

The reason local model support is not trivial for us to implement is that our harness is split between our client (rust, open-source) and server (golang, not currently open). Moving the harness to be entirely on the client is a fair amount of work.

The options we are considering here (not mutually exclusive):

Port the entire harness to the client and open-source it (most work).
Implement Warp as an ACP client and allow folks to use other harnesses within our rich terminal UI.
Implement a new rust-based "lite" local harness that speaks the same protocol that our client understands but supports local models and arbitrary endpoints.
Route local model requests through our server and back to the client via something like ngrok (hacky, but quick).

Questions on my mind:

How important to users is it to use our real harness as opposed to a harness that works as an ACP server?
How important is it that local model requests are truly local, with no server interaction?
Which aspects of our UI are most important for folks wanting local model support?

djdanielsson · 2026-04-30T16:36:10Z

djdanielsson
Apr 30, 2026

Answers to the Questions part: I am not sure I have a good answer for number 1 and 3 but 2 it's extremely important to me that local model is truly local that is likely the point of why I am using a local model to start with for that task. I haven't been using warp for a long time because it wasn't open source so I do not have many thoughts on what parts of the UI I want at this time for local models, the little I have done with like Claude cli and the information that the UI provides for that is really nice and to have something like that for local models would be cool but it might depend on what harness people are using idk.

2 replies

zachlloyd Apr 30, 2026
Maintainer Author

super helpful and tracks with what i expected

djdanielsson Apr 30, 2026

going to the options stuff I am kinda interested in number 2 the most I think, I personally want to control my harness and the context I am feeding my agents vs using just another harness.

FFatTiger · 2026-04-30T17:00:29Z

FFatTiger
Apr 30, 2026

No server transit for any requests

1 reply

harry-xm May 1, 2026

This. At my company it's a policy violation to use something like ngrok.

officiallymarky · 2026-04-30T17:04:52Z

officiallymarky
Apr 30, 2026

Warp looks really cool, but the fact it only worked with cloud models always was a deal breaker for me. I would love to have full local AI support for not only for coding, but for the terminal agent when reacting to commands.

0 replies

phidauex · 2026-04-30T17:08:27Z

phidauex
Apr 30, 2026

Thanks for opening up some discussion!

I think for your options, #1 is most appealing, but yes, more work. I'd like to think it could also help your architecture long term by relying less on scaling server side components along with client components. #2 is a bit hacky but could be quick. I connect to my Hermes agent using OpenWebUI because it is nicer than the raw terminal client - I'd connect to it through Warp if it were an option, but that doesn't really support Warp being a standalone tool. #3 could be most practical because it would be fully local, not require a second tool, and for most local models, not being feature-complete would be OK. The smaller context windows mean fewer turns, fewer tools available, etc. But my use case in Warp would mostly be "uh help me remember how this command is used" not "build out an entire ansible deployment for a lab". #4 is probably not worth doing - if someone wants to use a local model, its because they want it local.

I'd rank them - 1, 3, 2, 4

For the questions:

I'm split because I use another agent tool already. However for new users, only having it work if you already have another harness running feels duplicative.
Quite important - I either have no/patchy internet at the time I'm working, or I'm doing something that I need to keep local for confidentiality reasons, and in either case, server routing breaks the whole point.
I don't use the most advanced tools in Warp now, so for me, the "advanced command completion" and inline planning directly in the terminal UI is what I use, and would love to use with a local model that would be fully competent at that.

1 reply

zachlloyd Apr 30, 2026
Maintainer Author

extremely helpful as we think this through

mastertyko · 2026-04-30T17:45:42Z

mastertyko
Apr 30, 2026

My vote is strongly for option 1.

If Warp supports local or arbitrary models, I think it should mean truly local execution with no server transit. I understand that porting the harness to the client and open sourcing it is the most work, but it seems like the right long term architecture for privacy, offline use, trust, and extensibility.

Thanks!

0 replies

crazygamerZ783 · 2026-04-30T18:02:59Z

crazygamerZ783
Apr 30, 2026

hey got a warp fork of my own ,trying to get the ollama support work repo:https://github.com/crazygamerZ783/warp-ollama
i would appreciate a bit of help

3 replies

rozsazoltan May 1, 2026

The repo feels a bit strange because the original Git history is missing, and the first commit doesn't show what you actually changed.

crazygamerZ783 May 1, 2026

it was probably because of github glitches

regismesquita May 1, 2026

ollama supports openai-compatible as far as I remember, and there are already some implementations out there supporting that.

VicZhang6 · 2026-04-30T19:45:17Z

VicZhang6
Apr 30, 2026

Honestly, I want to use DeepSeek V4 Flash inside Warp — it’s cheap, and it allows me to interact with the terminal using natural language.

1 reply

jensenojs May 2, 2026

same here, Although I can understand from a business logic perspective why supporting an open method of simply providing a URL + API key is not allowed, from a user demand standpoint, I think it would be more natural.

FunkyFresh67 · 2026-04-30T20:07:06Z

FunkyFresh67
Apr 30, 2026

Local models in Ollama or similar should be configurable as sources within Warp. Once available, you should be able to select a model during a session, either manually or by directing Warp to use it automatically based on the task or preference.

0 replies

FelixZoe · 2026-04-30T22:13:37Z

FelixZoe
Apr 30, 2026

6666

0 replies

regismesquita · 2026-05-01T13:23:00Z

regismesquita
May 1, 2026

There are already a handful of "local warp server" implementations on your PR list , and on the wild forking from this repo.

People just want to be able to use a software that they really like (warp) without going through something that they don't need (your servers). we might end up with some opensource spin-off leading this if you don't just release a minimalist opensource server that simply allows people to use warp with a openai-compatible upstream.

In the future you can add something feature-rich and supporting a bunch of stuff... but for now people just want to use warp and remote models without touching someone else servers.

0 replies

apetti1920 · 2026-05-01T15:33:28Z

apetti1920
May 1, 2026

Model selection should be allowed to be

local (lm studio, ollama, etc) 2. also allowed to be configurable and tiered, small model for cmd suggestions (with bash history injection) as well as large for agent interactions

0 replies

zachlloyd · 2026-05-01T15:46:10Z

zachlloyd
May 1, 2026
Maintainer Author

All this feedback makes sense. We will have a proposed solution here shortly.

0 replies

bernardodsanderson · 2026-05-01T18:47:14Z

bernardodsanderson
May 1, 2026

I am mostly interested as I want to use one source of models (openrouter/GLM Coding Plan) for it.

0 replies

chukwunonsomichael189-boop · 2026-05-02T00:52:14Z

chukwunonsomichael189-boop
May 2, 2026

Warp looks really cool, but the fact it only worked with cloud models always was a deal breaker for me. I would love to have full local AI support for not only for coding, but for the terminal agent when reacting to commands.

0 replies

chukwunonsomichael189-boop · 2026-05-02T00:53:03Z

chukwunonsomichael189-boop
May 2, 2026

There are already a handful of "local warp server" implementations on your PR list , and on the wild forking from this repo.

People just want to be able to use a software that they really like (warp) without going through something that they don't need (your servers). we might end up with some opensource spin-off leading this if you don't just release a minimalist opensource server that simply allows people to use warp with a openai-compatible upstream.

In the future you can add something feature-rich and supporting a bunch of stuff... but for now people just want to use warp and remote models without touching someone else servers.

0 replies

Sitric1 · 2026-05-02T12:20:47Z

Sitric1
May 2, 2026

Adding my two cents here for what it's worth:
I think it is really necessary long term to have the harness client side and open source it. Warp seems to have a very capable UI/UX team that as long as that stays consistent I can see people, including myself, sticking around on the Warp terminal for its nice UX over any forks or competitors. Option 2 is also preferable, to those wanting to use specific harnesses but within the Warp experience.
On your questions: I'd be in the mind that local has to be fully local (no server interaction). That feels like an absolute. Can't really answer the other questions well tbh.

0 replies

Vahn84 · 2026-05-02T12:43:46Z

Vahn84
May 2, 2026

We are trying to figure out the best way to implement local model support and I wanted to start a discussion on our different potential approaches to see what resonates most with the community.

The reason local model support is not trivial for us to implement is that our harness is split between our client (rust, open-source) and server (golang, not currently open). Moving the harness to be entirely on the client is a fair amount of work.

The options we are considering here (not mutually exclusive):

Port the entire harness to the client and open-source it (most work).

Implement Warp as an ACP client and allow folks to use other harnesses within our rich terminal UI.

Implement a new rust-based "lite" local harness that speaks the same protocol that our client understands but supports local models and arbitrary endpoints.

Route local model requests through our server and back to the client via something like ngrok (hacky, but quick).

Questions on my mind:

How important to users is it to use our real harness as opposed to a harness that works as an ACP server?

How important is it that local model requests are truly local, with no server interaction?

Which aspects of our UI are most important for folks wanting local model support?

I basically wouldn't change anything about it...I like the folder navigation, the agent tool belt, the ui is clear and appealing...the fact that at each session it can auto-detect the user intent, be it a terminal command or a natural language request to the agent. Probably too many sidebars. Also local models should be truly local, but let the user decide, and keep the account signin/signup if anyone wants to tinker with what warp offers as its models. Just don't paywall features pls

1 reply

zachlloyd May 2, 2026
Maintainer Author

lol on too many sidebars. i don't disagree. challenging design problem

understood on the local means local point

markhellier · 2026-05-02T21:27:02Z

markhellier
May 2, 2026

Thanks for opening this up for discussion. I wanted to share my perspective as a user, as my requirements for local LLM support are driven by privacy , Corporate policy and workflow needs.

1. On the "harness" vs. ACP client:

For my workflow, the specific protocol matters less than the capability. As long as the implementation (whether it's an ACP client or a lite harness) can seamlessly pull in the context I need to be productive, I don't have a strong preference between the two.

2. On the importance of "truly local" requests:

This is the most important point for me. In my professional environment is privacy and governance. Policies are strict: code, secrets, and sensitive data must stay local.
Because of this, I couldn't use a solution like Option 4 (routing through a server/ngrok). Even if the model itself is local, the act of routing my terminal data through an external server creates a compliance hurdle that would prevent me from using the feature in a professional setting. I am looking for a "Zero Trust" experience where I can be certain that my code and secrets never leave my machine.

3. On the most important UI aspects:

The reason I use Warp is the "rich" terminal experience. If I move to local models, I still want the AI to feel integrated into the terminal, be context-aware, be able to see my current command buffer, and capable of assisting within the flow of my work.

Privacy is my primary driver: I am looking for a way to use LLMs without compromising my data.
I don't mind paying a subscription for a tier that explicitly guarantees this local-first privacy.

1 reply

zachlloyd May 2, 2026
Maintainer Author

super helpful

snowyu · 2026-05-02T22:30:35Z

snowyu
May 2, 2026

A local terminal should, first and foremost, function offline; online services should be optional. However, Warp has completely inverted this paradigm—which is precisely the primary reason I do not use it. That said, now that the project has been open-sourced, I felt compelled to offer my two cents.

So, is this so-called "terminal" essentially just a conduit for piping all local data to an AI service provider?
Furthermore, the fact that this terminal only functions within a GUI environment is absolutely laughable.

1 reply

zachlloyd May 2, 2026
Maintainer Author

to be clear, warp does function fully offline, no server is requried at all for the terminal functionality.

it's the ai only right now that requires a server, which i think makes sense in general, but not for local model access.

Local and arbitrary model support #9619

Uh oh!

zachlloyd Apr 30, 2026 Maintainer

Replies: 19 comments · 11 replies

Uh oh!

Uh oh!

Uh oh!

zachlloyd Apr 30, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd Apr 30, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd May 1, 2026 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zachlloyd May 2, 2026 Maintainer Author

Uh oh!

1. On the "harness" vs. ACP client:

2. On the importance of "truly local" requests:

3. On the most important UI aspects:

Uh oh!

zachlloyd May 2, 2026 Maintainer Author

Uh oh!

Uh oh!

zachlloyd May 2, 2026 Maintainer Author

zachlloyd
Apr 30, 2026
Maintainer

Replies: 19 comments 11 replies

zachlloyd Apr 30, 2026
Maintainer Author

zachlloyd Apr 30, 2026
Maintainer Author

zachlloyd
May 1, 2026
Maintainer Author

zachlloyd May 2, 2026
Maintainer Author

zachlloyd May 2, 2026
Maintainer Author

zachlloyd May 2, 2026
Maintainer Author