[vllm-ext] allows setting flashinfer workspace #1259

yiliu30 · 2026-01-12T10:07:54Z

Signed-off-by: yiliu30 yi4.liu@intel.com

Signed-off-by: yiliu30 <yi4.liu@intel.com>

for more information, see https://pre-commit.ci

Copilot

Pull request overview

This PR adds support for configuring the FlashInfer workspace buffer size through an environment variable. It introduces a new environment variable VLLM_AR_FLASHINFER_WORKSPACE_BUFFER_SIZE that allows users to override the default workspace buffer size.

Changes:

Added new environment variable VLLM_AR_FLASHINFER_WORKSPACE_BUFFER_SIZE to control FlashInfer workspace buffer size
Implemented automatic update of FlashInfer's workspace buffer size on module initialization when the environment variable is set

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
auto_round_extension/vllm_ext/envs_ext.py	Registers the new `VLLM_AR_FLASHINFER_WORKSPACE_BUFFER_SIZE` environment variable with default value of 0
auto_round_extension/vllm_ext/vllm_oot_patches.py	Implements the workspace buffer size update logic and applies it during module initialization if the environment variable is set

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-12T10:08:36Z

auto_round_extension/vllm_ext/vllm_oot_patches.py

+    import vllm.v1.attention.backends.flashinfer as flashinfer_backend
+
+    flashinfer_backend.WORKSPACE_BUFFER_SIZE = new_size


This code executes at module import time and may fail if the flashinfer backend module is not available or hasn't been initialized. Consider wrapping this in a try-except block to handle ImportError gracefully, or defer the update until runtime when the backend is actually needed.

Suggested change

import vllm.v1.attention.backends.flashinfer as flashinfer_backend

flashinfer_backend.WORKSPACE_BUFFER_SIZE = new_size

try:

import vllm.v1.attention.backends.flashinfer as flashinfer_backend

except ImportError:

logger.warning(

"FlashInfer backend is not available; "

"skipping update of WORKSPACE_BUFFER_SIZE to %s",

new_size,

)

return

except Exception:

# Catch any unexpected errors during backend import or initialization

logger.exception(

"Failed to import FlashInfer backend while trying to update "

"WORKSPACE_BUFFER_SIZE to %s",

new_size,

)

return

flashinfer_backend.WORKSPACE_BUFFER_SIZE = new_size

logger.info("Updated FlashInfer WORKSPACE_BUFFER_SIZE to %s", new_size)

Copilot · 2026-01-12T10:08:36Z

auto_round_extension/vllm_ext/vllm_oot_patches.py



+def update_flashinfer_workspace_buffer_size(new_size: int):
+    import vllm.v1.attention.backends.flashinfer as flashinfer_backend


If the flashinfer backend module cannot be imported or WORKSPACE_BUFFER_SIZE doesn't exist, this will raise an unclear error. Add error handling with a descriptive message explaining that FlashInfer backend is required for this feature.

Suggested change

import vllm.v1.attention.backends.flashinfer as flashinfer_backend

try:

import vllm.v1.attention.backends.flashinfer as flashinfer_backend

except ImportError as e:

raise RuntimeError(

"FlashInfer backend is required to set WORKSPACE_BUFFER_SIZE, but it "

"could not be imported. Make sure vLLM is installed and built with "

"FlashInfer support before setting VLLM_AR_FLASHINFER_WORKSPACE_BUFFER_SIZE."

) from e

if not hasattr(flashinfer_backend, "WORKSPACE_BUFFER_SIZE"):

raise RuntimeError(

"FlashInfer backend does not define WORKSPACE_BUFFER_SIZE. This feature "

"requires a FlashInfer-enabled version of vLLM that exposes this setting."

)

allow set flashinfer workspace

51d7012

Signed-off-by: yiliu30 <yi4.liu@intel.com>

Copilot AI review requested due to automatic review settings January 12, 2026 10:07

yiliu30 added the vllm-ext label Jan 12, 2026

[pre-commit.ci] auto fixes from pre-commit.com hooks

5690ce9

for more information, see https://pre-commit.ci

yiliu30 changed the title ~~allow set flashinfer workspace~~ [vllm-ext] allows setting flashinfer workspace Jan 12, 2026

Copilot AI reviewed Jan 12, 2026

View reviewed changes

Merge branch 'main' into fix-flashinfer

1216013

yiliu30 mentioned this pull request Jan 12, 2026

[WIP]Add FP8KV for DS/QWEN intel/neural-compressor#2367

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vllm-ext] allows setting flashinfer workspace #1259

[vllm-ext] allows setting flashinfer workspace #1259

yiliu30 commented Jan 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 12, 2026

Uh oh!

Copilot AI Jan 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		import vllm.v1.attention.backends.flashinfer as flashinfer_backend

		flashinfer_backend.WORKSPACE_BUFFER_SIZE = new_size

-    import vllm.v1.attention.backends.flashinfer as flashinfer_backend
-    flashinfer_backend.WORKSPACE_BUFFER_SIZE = new_size
+    try:
+        import vllm.v1.attention.backends.flashinfer as flashinfer_backend
+    except ImportError:
+        logger.warning(
+            "FlashInfer backend is not available; "
+            "skipping update of WORKSPACE_BUFFER_SIZE to %s",
+            new_size,
+        )
+        return
+    except Exception:
+        # Catch any unexpected errors during backend import or initialization
+        logger.exception(
+            "Failed to import FlashInfer backend while trying to update "
+            "WORKSPACE_BUFFER_SIZE to %s",
+            new_size,
+        )
+        return
+    flashinfer_backend.WORKSPACE_BUFFER_SIZE = new_size
+    logger.info("Updated FlashInfer WORKSPACE_BUFFER_SIZE to %s", new_size)



		def update_flashinfer_workspace_buffer_size(new_size: int):
		import vllm.v1.attention.backends.flashinfer as flashinfer_backend

-    import vllm.v1.attention.backends.flashinfer as flashinfer_backend
+    try:
+        import vllm.v1.attention.backends.flashinfer as flashinfer_backend
+    except ImportError as e:
+        raise RuntimeError(
+            "FlashInfer backend is required to set WORKSPACE_BUFFER_SIZE, but it "
+            "could not be imported. Make sure vLLM is installed and built with "
+            "FlashInfer support before setting VLLM_AR_FLASHINFER_WORKSPACE_BUFFER_SIZE."
+        ) from e
+    if not hasattr(flashinfer_backend, "WORKSPACE_BUFFER_SIZE"):
+        raise RuntimeError(
+            "FlashInfer backend does not define WORKSPACE_BUFFER_SIZE. This feature "
+            "requires a FlashInfer-enabled version of vLLM that exposes this setting."
+        )

[vllm-ext] allows setting flashinfer workspace #1259

Are you sure you want to change the base?

[vllm-ext] allows setting flashinfer workspace #1259

Conversation

yiliu30 commented Jan 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants