Java/JNI SDK surface for OpenVINO GenAI with an Android-first runtime path and explicit OpenVINO plugin registration.
Current implementation focus:
- Java API surface for
LLMPipeline, chat history, generation configuration, tokenizer access, continuous batching, and generation handles - Android runtime bootstrap with explicit native preload ordering
- Android asset staging helpers for packaged OpenVINO runtime/model assets
- explicit
GFXplugin registration for Android deployments - JNI bridge selection between a real OpenVINO GenAI bridge and a stub bridge for API-only builds
The Java package is com.ovx.openvino.genai.
The native side now has two modes:
STUB: fallback when OpenVINO GenAI is not discoverable by CMakeREAL: direct JNI bridge forLLMPipeline,GenerationConfig,Tokenizer, chat generation, and streaming callbacks
Currently implemented in REAL mode:
runtimeVersionruntimeConfigurewith plugin registration supportLLMPipelinecreate/disposegenerate(prompt)andgenerate(ChatHistory)with streaming callback supportgetGenerationConfig/setGenerationConfiggetTokenizerTokenizer.applyChatTemplate- typed
PipelinePropertiesforwarding to the native OpenVINO GenAIov::AnyMap GenerationPerfMetricsconvenience accessors for generated-token, TTFT, TPOT, and throughput metrics
Still stubbed:
ContinuousBatchingPipelineGenerationHandle
Build selection is controlled by OV_GENAI_JNI_MODE. See docs/ARCHITECTURE.md for the full mode matrix.
The intended Android load order is explicit and application-controlled, for example:
c++_sharedopenvinoopenvino_genaiov_genai_java_jni- runtime plugin registration for
GFX
Example bootstrap:
RuntimeConfiguration configuration = RuntimeConfiguration.builder()
.loadLibrary("c++_shared")
.loadLibrary("openvino")
.loadLibrary("openvino_genai")
.loadLibrary("ov_genai_java_jni")
.registerPlugin(PluginRegistration.gfx("/data/local/tmp/ov_genai_android/libopenvino_gfx_plugin.so"))
.build();
OpenVinoGenAiRuntime.initialize(configuration);Pipeline options are configured per pipeline, not at runtime bootstrap:
PipelineProperties properties = PipelineProperties.builder()
.cacheDir(context.getCacheDir().toPath().resolve("openvino-genai").toString())
.attentionBackend(PipelineProperties.AttentionBackend.SDPA)
.performanceHint(PipelineProperties.PerformanceHint.LATENCY)
.enableMmap(true)
.build();
LLMPipeline pipeline = new LLMPipeline(modelDir, DeviceSelection.gfx(), properties);Android applications can use AndroidOpenVinoGenAiRuntime to stage packaged runtime metadata, preload native libraries, register the device plugin, and initialize the Java API bridge. AndroidAssetBundle provides the shared asset-directory copy/read helpers used by Android integrations; application code remains responsible for choosing and downloading model bundles. AndroidPipelineProperties.cpuLatency(...) is available when an app intentionally runs a CPU pipeline.
Desktop stub build:
export JAVA_HOME=/path/to/jdk-17
cmake -S . -B build
cmake --build buildAndroid real build against existing OpenVINO and OpenVINO GenAI Android package configurations:
cmake -S . -B build/android-jni-real -G Ninja \
-DOV_GENAI_JNI_MODE=REAL \
-DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
-DANDROID_ABI=arm64-v8a \
-DANDROID_PLATFORM=35 \
-DANDROID_STL=c++_shared \
-DOpenVINO_DIR=/path/to/openvino-android \
-DOpenVINOGenAI_DIR=/path/to/openvino-genai-android
cmake --build build/android-jni-realThe next implementation step is to finish the remaining async layer against upstream openvino.genai:
ContinuousBatchingPipelineGenerationHandle- richer
AnyMapcoverage for advanced scheduler/plugin settings - Android runtime validation on a fully working
GFXgeneration path