genai-java-api exposes a small Java API over OpenVINO GenAI through JNI. The public package is com.ovx.openvino.genai; native implementation details stay under com.ovx.openvino.genai.internal.
OpenVinoGenAiRuntimeowns native library loading and plugin registration.RuntimeConfigurationdefines the native preload order and runtime plugin list.LLMPipelinewraps prompt and chat generation.ChatHistory,ChatMessage, andChatRoleprovide chat request construction.GenerationConfigis a native-friendly map wrapper for OpenVINO GenAI generation parameters.PipelinePropertiesis a native-friendly map wrapper for OpenVINO/OpenVINO GenAI pipeline properties such asCACHE_DIR,ATTENTION_BACKEND,PERFORMANCE_HINT,ENABLE_MMAP, and static prompt/response limits.GenerationPerfMetricsreads the known native perf metric keys fromGenerationResult.perfMetrics()while preserving the raw map for advanced consumers.Tokenizerexposes chat-template application from anLLMPipelinetokenizer.ContinuousBatchingPipelineandGenerationHandledefine the async API surface, but the native bridge still treats those calls as pending implementation work.com.ovx.openvino.genai.androidcontains Android integration helpers: runtime asset staging, native preload/plugin bootstrap, and shared asset-directory copy/read utilities.
The Android convenience path defaults LLMPipeline(String modelPath) to DeviceSelection.gfx(). Use LLMPipeline(String, DeviceSelection, PipelineProperties) when an application needs a different OpenVINO device or additional properties.
OV_GENAI_JNI_MODE selects the native bridge at CMake configure time:
AUTO: build the real bridge whenOpenVINOGenAIis found, otherwise build the stub bridge.REAL: require theOpenVINOGenAICMake package and linkopenvino::genai.STUB: build an API-only JNI library that reports its stub version and throwsUnsupportedOperationExceptionfor model execution.
The stub bridge is useful for Java API compilation, tests, IDE indexing, and CI environments that do not provide OpenVINO GenAI native artifacts. It is not a runtime inference implementation.
Applications should initialize the runtime before creating pipelines:
RuntimeConfiguration configuration = RuntimeConfiguration.builder()
.loadLibrary("c++_shared")
.loadLibrary("openvino")
.loadLibrary("openvino_genai")
.loadLibrary("ov_genai_java_jni")
.registerPlugin(PluginRegistration.gfx("/data/local/tmp/ov_genai_android/libopenvino_gfx_plugin.so"))
.build();
OpenVinoGenAiRuntime.initialize(configuration);Library ordering is intentionally application-controlled because Android deployments often package OpenVINO, OpenVINO GenAI, plugin libraries, and c++_shared separately.
On Android, AndroidOpenVinoGenAiRuntime is the convenience bootstrap around this same contract. It copies runtime metadata from packaged assets into app storage, mirrors plugin libraries where OpenVINO expects them, loads the preferred native libraries by absolute path, registers the requested device plugin, and initializes OpenVinoGenAiRuntime. AndroidAssetBundle is a small shared helper for reading and copying packaged asset directories; model selection, download policy, and cache invalidation remain application responsibilities.
Pipeline properties are intentionally separate from runtime initialization. RuntimeConfiguration loads native libraries and registers plugins; PipelineProperties is passed into LLMPipeline and becomes the C++ ov::AnyMap. For example:
PipelineProperties properties = PipelineProperties.builder()
.cacheDir(context.getCacheDir().toPath().resolve("openvino-genai").toString())
.attentionBackend(PipelineProperties.AttentionBackend.SDPA)
.performanceHint(PipelineProperties.PerformanceHint.LATENCY)
.enableMmap(true)
.build();
LLMPipeline pipeline = new LLMPipeline(modelDir, DeviceSelection.gfx(), properties);Use DeviceSelection.gfx() for Android GFX deployments. AndroidPipelineProperties.cpuLatency(...) is a separate preset for applications that explicitly choose CPU execution.
Public documentation should describe the Java API, JNI modes, Android packaging requirements, and validation commands. Do not publish local model directories, generated bundles, machine-specific paths, private architecture notes, or device logs.