Skip to content

feat: enhance chat CLI with readline history, line editing, and distributed support#1382

Open
Vlor999 wants to merge 5 commits into
ml-explore:mainfrom
Vlor999:main
Open

feat: enhance chat CLI with readline history, line editing, and distributed support#1382
Vlor999 wants to merge 5 commits into
ml-explore:mainfrom
Vlor999:main

Conversation

@Vlor999

@Vlor999 Vlor999 commented Jun 9, 2026

Copy link
Copy Markdown

Description

This PR improves the mlx_lm.chat interactive interface by adding standard terminal features and ensuring compatibility across different execution environments.

Changes

  • Persistent History: Integrated readline to support command history (Up/Down arrows) saved to ~/.mlx_lm_chat_history.
  • Line Editing: Enabled full cursor movement (Left/Right arrows) and in-place text editing for prompts.
  • Distributed Support: Implemented broadcast_string logic to ensure the chat loop remains synchronized across all ranks in multi-GPU/distributed mode.
  • Graceful Fallbacks: Added safety checks and EOFError handling (Ctrl+D to exit) for a smoother terminal experience.
  • User Feedback: Added an explicit warning message when model output is truncated due to max_tokens limits.

Motivation

The current chat interface is a basic loop that lacks standard CLI ergonomics. Users cannot easily correct typos or recall previous instructions, which hinders the local testing workflow. These changes bring mlx-lm closer to the UX of modern LLM toolkits.

Future Work / UI Enhancements

I have also prototyped an enhanced UI version using the rich library (see screenshot below) which supports Markdown rendering and syntax highlighting for code blocks. I kept this PR minimal to avoid adding new dependencies, but I am open to integrating a "soft dependency" version if the maintainers are interested in a more polished visual output.

Closes #818

Capture d’écran 2026-02-03 à 13 34 08

Comment thread mlx_lm/chat.py
)


def broadcast_string(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly this is needed?

Comment thread mlx_lm/chat.py
while True:
query = ui.prompt()
query = ui.prompt() if rank == 0 else ""
query = broadcast_string(query, group).strip()

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly, here we communicate prompt other ranks, but I am not sure that I understand what is the goal.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of that section is to make interactive chat work correctly in distributed mode.

What problem it solves:

  • In distributed execution, several processes/ranks are running at once.
  • We do not want every rank to ask the user for input.
  • We want only rank 0 to read the prompt from the terminal.
  • Then we want that same prompt to be sent to all the other ranks so they all generate from the exact same user message.

So the flow is:
User enter : "Hello"
Rank 0 reads "Hello"
Rank 1, rank 2, etc. read ""
broadcast_string(...) sends "Hello" from rank 0 to everyone after that, every rank has "Hello"
Without this, distributed chat would break in one of these ways:

  • every rank would try to read from stdin
  • non-root ranks would hang
  • different ranks could end up with inconsistent input

@nastya236 nastya236 Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prompt is read separately on each rank from its own stdin. Could you please clarify why non-root ranks would hang and why different ranks could end up with inconsistent input?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple failure may looks like this:

  • rank 0 reads "Hello"
  • rank 1 is still blocked in ui.prompt()
  • rank 0 starts generation or reaches a collective operation
  • rank 1 has not reached the same point yet

Now one rank is waiting for compute synchronization while the other is still waiting for terminal input

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you have this issue?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not append to me no.
I added this since a friend and I were speaking about it and decided that it was safer !

@nastya236

nastya236 commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator

Thanks for your contribution! Adding history file sounds like a great idea. I am not sure about default markdown prompt, I need to sometime to think about it. Do you mind recording 2 demo videos for single and distributed mlx_lm chat?

@Vlor999

Vlor999 commented Jun 9, 2026

Copy link
Copy Markdown
Author

Here is the first demo using single mlx_chat.

@nastya236

Copy link
Copy Markdown
Collaborator

Thanks! Single chat looks great, could you do the same for distributed setting?

@Vlor999

Vlor999 commented Jun 9, 2026

Copy link
Copy Markdown
Author

Here is the version with distributed mlx_lm chat.
Sorry for the delay, i catched a small issue on the distributed version !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement] Enhance mlx_lm.chat CLI experience (history, line editing, and formatting)

3 participants