LLM-API-Verifier is a modular, plugin-based framework for verifying the true identity of models behind LLM APIs.
🚧 This framework is currently in its early stages of development. We warmly welcome contributions from the community!
Access to frontier LLMs like GPT-5.4, Claude 4.6, and Gemini 3.1 is often hindered by high pricing and regional restrictions. This has led to the emergence of LLM proxies and API gateways — third-party services that claim to provide access to official models via indirect routing, often at discounted prices and without geographical barriers.
However, recent research [1] [2] [3] reveals deceptive practices among these third-party APIs. A large number of evaluated proxy endpoints fail fingerprint verification, secretly substituting premium models with cheaper alternatives. As a direct consequence of this bait-and-switch substitution, these APIs exhibit severe performance collapse across various tasks.
These deceptive practices not only critically undermine the reproducibility of scientific research but also cause direct harm to developers and individual users building downstream applications. This project was created to address this opacity, providing researchers, developers, and API consumers with an automated toolset to verify the authenticity of models behind APIs.
This project is a pure framework. You run it by loading an external adapter. Below is an example of how to use it in conjunction with LLMmap (an academic model fingerprinting project).
# Clone this framework
git clone https://github.com/qiyongzheng/LLM-API-Verifier.git
cd LLM-API-Verifier
# Install framework dependencies
pip install -r requirements.txtFor the built-in llmmap adapter to work, we need to download the original LLMmap repository. You can place it inside the framework directory or adjacent to it.
# Clone LLMmap into the framework directory
git clone https://github.com/pasquini-dario/LLMmap.git
# Install specific dependencies required by the algorithm (e.g., PyTorch)
cd LLMmap
pip install -r requirements.txt
cd ..(Note: If you are only planning to write your own custom detection methods, you do not need to clone LLMmap.)
(Note: The default LLMmap model supports a limited set of target models. See The Included LLMmap Adapter for the full list.)
Run the CLI, pointing it to the adapter:
python cli.py \
--api-base-url "https://api.openai.com/v1" \
--api-key "sk-xxx" \
--model-name "openai/gpt-5.4" \
--method "adapters/llmmap.py"- Flexible Method Integration — Our framework can integrate any detection method; you just need to create an Adapter to plug it in. An adapter simply exposes a
create_method()factory function to be dynamically loaded. - Unified API layer — powered by litellm, supporting 100+ LLM providers (OpenAI, Anthropic, Google, Azure, local endpoints, etc.) through a single interface.
- Async & cached — The framework handles all API concurrency (via semaphores) and per-sample SQLite response caching. It significantly improves speed and reduces API costs, even when multiple methods request the same prompts.
- Pluggable Fusion Strategies — When multiple adapters are loaded, the framework fuses their probability distributions into a single authoritative result, making it easy to combine complementary detection algorithms.
cli.py
│
├─ load_adapter("adapters/llmmap.py")
│ └─ dynamically imports external LLMmap code, returns BaseMethod
│
└─ FingerprintOrchestrator
├─ Phase 1: get_probes() ← collect queries from each adapter
├─ Phase 2: batch_query() ← send all probes via UnifiedLLMClient
├─ Phase 3: evaluate() ← each adapter scores the API responses
└─ Phase 4: fuse() ← combine probabilities via weighted average
LLM-API-Verifier/
cli.py # CLI entry point
adapters/
llmmap.py # LLMmap adapter (reference implementation)
llm_api_verifier/
client/
llm_client.py # Async litellm wrapper + concurrency
cache.py # SQLite response cache (aiosqlite)
models.py # ProbeConfig and APIResponse dataclasses
core/
orchestrator.py # FingerprintOrchestrator — 4-phase pipeline
fusion.py # WeightedAverageFusion
methods/
base.py # BaseMethod abstract class
utils/
loader.py # Dynamic adapter loading
math.py # softmax, normalize, distances_to_probabilities
| Parameter | Description |
|---|---|
--model-name / -m |
Target model identifier in litellm format (e.g. openai/gpt-5.4, anthropic/claude-4-6-sonnet) |
--method |
Path to an adapter.py (repeatable for multiple methods) |
| Parameter | Default | Description |
|---|---|---|
--api-key / -k |
$OPENAI_API_KEY |
API key for the provider |
--api-base-url / -b |
$OPENAI_BASE_URL |
Custom API base URL |
--method-arg |
— | key=value pairs passed to adapters (repeatable) |
--cache-path |
./cache/fingerprint_cache.db |
SQLite response cache path |
# Pass method-specific arguments (e.g. telling the LLMmap adapter to use CUDA)
python cli.py \
--api-base-url "..." --api-key "..." --model-name "..." \
--method "adapters/llmmap.py" \
--method-arg device=cuda \
--method-arg max_tokens=8192
# Run multiple methods concurrently
python cli.py \
--api-base-url "..." --api-key "..." --model-name "..." \
--method "adapters/llmmap.py" \
--method "adapters/some_other_method.py"You can also use this framework directly in your own Python code. The CLI is just a thin wrapper around the core API.
import asyncio
from llm_api_verifier import (
UnifiedLLMClient,
FingerprintOrchestrator,
WeightedAverageFusion,
load_adapter
)
async def verify_api():
# 1. Initialize the client targeting the API under test
client = UnifiedLLMClient(
model="openai/gpt-5.4",
api_key="sk-...",
base_url="https://api.openai.com/v1",
cache_path="./cache/my_cache.db"
)
# 2. Setup orchestrator and fusion strategy
orchestrator = FingerprintOrchestrator(
client=client,
fusion=WeightedAverageFusion()
)
# 3. Load and register adapters
llmmap_adapter = load_adapter(
"adapters/llmmap.py",
device="cuda",
max_tokens=4096
)
orchestrator.register_method(llmmap_adapter)
# 4. Run the pipeline
result = await orchestrator.run()
# 5. Inspect results
print(f"Predicted Model: {result.predicted_model()}")
print(f"Full Distribution: {result.final}")
await client.close()
if __name__ == "__main__":
asyncio.run(verify_api())If you have developed your own model identification algorithm, you only need to write a simple adapter script to plug into this framework's asynchronous and caching ecosystem.
from llm_api_verifier.methods.base import BaseMethod
from llm_api_verifier.client.models import APIResponse, ProbeConfig
class MyMethod(BaseMethod):
def __init__(self, **kwargs):
super().__init__(name="my_method")
# Store config. DO NOT load heavy PyTorch models here!
async def get_probes(self) -> list[ProbeConfig]:
# Define what queries to send to the target API
return [
ProbeConfig(prompt="Who are you?", max_tokens=512),
ProbeConfig(prompt="What is 2+2?", temperature=0.0),
]
async def evaluate(self, responses: list[APIResponse]) -> dict[str, float]:
# Perform your algorithm inference here (lazy-load weights if needed)
# Process responses → return {model_name: probability}
return {"model-a": 0.85, "model-b": 0.10, "model-c": 0.05}
def create_method(**kwargs) -> BaseMethod:
"""Factory function — dynamically loaded by the CLI."""
return MyMethod(**kwargs)- Lazy-load heavy resources. If you need to import
torchor runmodel.load_state_dict(), do it inside a guard pattern withinevaluate()so memory is only consumed when the method actually runs. - Accept
**kwargs. The CLI injects--method-argvalues intocreate_methodas kwargs. Extract what you need (e.g. thresholds, paths) and safely ignore the rest using**kwargs. - Return a full probability distribution. Keys are model names from your algorithm's own internal logic, and values should sum to ~1.0.
- Manage your own dependencies. This framework only requires lightweight core packages (see
requirements.txt). Real algorithm dependencies liketorchorsentence-transformersshould be managed by your external project.
This repository includes adapters/llmmap.py as a reference implementation. It wraps the academic project LLMmap as a framework adapter.
Note on Supported Models: The original LLMmap paper evaluated the method on 120 target LLMs. Out of the box, the default pre-trained LLMmap model supports 52 models, including:
- OpenAI:
gpt-3.5-turbo,gpt-4-turbo-2024-04-09,gpt-4o-2024-05-13- Anthropic:
claude-3-haiku,claude-3-5-sonnet,claude-3-opus- Meta:
Llama-2,Llama-3/3.1/3.2(8B, 70B, etc.)- Qwen:
Qwen2/2.5(various sizes)- Google:
gemma-1.1/2- Microsoft:
Phi-3/3.5- Others:
Mistral/Mixtral,Cohere Aya,internlm2_5, etc.(You can add support for new models using LLMmap's dataset generation scripts).
It works by converting LLMmap's 8 behavioral probes into ProbeConfigs, handing them to the framework for concurrent execution. Once responses are gathered, it calls LLMmap's native embedding and Euclidean distance pipeline, finally mapping those distances to a smooth probability distribution using temperature-scaled softmax.
Specific --method-arg parameters for LLMmap:
| Key | Default | Description |
|---|---|---|
device |
cpu |
PyTorch device (cpu or cuda) |
max_tokens |
4096 |
Max tokens per probe (increase for reasoning models) |
softmax_temperature |
0.5 |
Temperature for distance-to-probability conversion |
🚧 Status: Under Active Development 🚧
This framework is currently in its early stages of development. We warmly welcome contributions from the community! In particular, we are looking for contributions in the form of:
- New Detection Methods: If you have a novel algorithm or methodology for identifying LLM models, please consider adding it.
- New Adapters: Implementations of existing model verification papers or projects as pluggable adapters.
- Bug fixes, documentation improvements, and core feature enhancements.
Please feel free to open issues or submit pull requests.
MIT