Skip to content

qiyongzheng/LLM-API-Verifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM-API-Verifier

中文文档

LLM-API-Verifier is a modular, plugin-based framework for verifying the true identity of models behind LLM APIs.

🚧 This framework is currently in its early stages of development. We warmly welcome contributions from the community!

Background: The Trust Problem in LLM Proxies

Access to frontier LLMs like GPT-5.4, Claude 4.6, and Gemini 3.1 is often hindered by high pricing and regional restrictions. This has led to the emergence of LLM proxies and API gateways — third-party services that claim to provide access to official models via indirect routing, often at discounted prices and without geographical barriers.

However, recent research [1] [2] [3] reveals deceptive practices among these third-party APIs. A large number of evaluated proxy endpoints fail fingerprint verification, secretly substituting premium models with cheaper alternatives. As a direct consequence of this bait-and-switch substitution, these APIs exhibit severe performance collapse across various tasks.

These deceptive practices not only critically undermine the reproducibility of scientific research but also cause direct harm to developers and individual users building downstream applications. This project was created to address this opacity, providing researchers, developers, and API consumers with an automated toolset to verify the authenticity of models behind APIs.

Quick Start

This project is a pure framework. You run it by loading an external adapter. Below is an example of how to use it in conjunction with LLMmap (an academic model fingerprinting project).

1. Setup Framework

# Clone this framework
git clone https://github.com/qiyongzheng/LLM-API-Verifier.git
cd LLM-API-Verifier

# Install framework dependencies
pip install -r requirements.txt

2. Download an External Detection Algorithm (e.g., LLMmap)

For the built-in llmmap adapter to work, we need to download the original LLMmap repository. You can place it inside the framework directory or adjacent to it.

# Clone LLMmap into the framework directory
git clone https://github.com/pasquini-dario/LLMmap.git

# Install specific dependencies required by the algorithm (e.g., PyTorch)
cd LLMmap
pip install -r requirements.txt
cd ..

(Note: If you are only planning to write your own custom detection methods, you do not need to clone LLMmap.)

(Note: The default LLMmap model supports a limited set of target models. See The Included LLMmap Adapter for the full list.)

3. Run Fingerprint Verification

Run the CLI, pointing it to the adapter:

python cli.py \
    --api-base-url "https://api.openai.com/v1" \
    --api-key "sk-xxx" \
    --model-name "openai/gpt-5.4" \
    --method "adapters/llmmap.py"

Highlights

  • Flexible Method Integration — Our framework can integrate any detection method; you just need to create an Adapter to plug it in. An adapter simply exposes a create_method() factory function to be dynamically loaded.
  • Unified API layer — powered by litellm, supporting 100+ LLM providers (OpenAI, Anthropic, Google, Azure, local endpoints, etc.) through a single interface.
  • Async & cached — The framework handles all API concurrency (via semaphores) and per-sample SQLite response caching. It significantly improves speed and reduces API costs, even when multiple methods request the same prompts.
  • Pluggable Fusion Strategies — When multiple adapters are loaded, the framework fuses their probability distributions into a single authoritative result, making it easy to combine complementary detection algorithms.

Architecture

cli.py
  │
  ├─ load_adapter("adapters/llmmap.py")
  │     └─ dynamically imports external LLMmap code, returns BaseMethod
  │
  └─ FingerprintOrchestrator
        ├─ Phase 1: get_probes()       ← collect queries from each adapter
        ├─ Phase 2: batch_query()      ← send all probes via UnifiedLLMClient
        ├─ Phase 3: evaluate()         ← each adapter scores the API responses
        └─ Phase 4: fuse()             ← combine probabilities via weighted average

Directory Structure

LLM-API-Verifier/
  cli.py                                # CLI entry point
  adapters/
    llmmap.py                           # LLMmap adapter (reference implementation)
  llm_api_verifier/
    client/
      llm_client.py                     # Async litellm wrapper + concurrency
      cache.py                          # SQLite response cache (aiosqlite)
      models.py                         # ProbeConfig and APIResponse dataclasses
    core/
      orchestrator.py                   # FingerprintOrchestrator — 4-phase pipeline
      fusion.py                         # WeightedAverageFusion
    methods/
      base.py                           # BaseMethod abstract class
    utils/
      loader.py                         # Dynamic adapter loading
      math.py                           # softmax, normalize, distances_to_probabilities

CLI Reference

Required Parameters

Parameter Description
--model-name / -m Target model identifier in litellm format (e.g. openai/gpt-5.4, anthropic/claude-4-6-sonnet)
--method Path to an adapter.py (repeatable for multiple methods)

Optional Parameters

Parameter Default Description
--api-key / -k $OPENAI_API_KEY API key for the provider
--api-base-url / -b $OPENAI_BASE_URL Custom API base URL
--method-arg key=value pairs passed to adapters (repeatable)
--cache-path ./cache/fingerprint_cache.db SQLite response cache path

Examples

# Pass method-specific arguments (e.g. telling the LLMmap adapter to use CUDA)
python cli.py \
    --api-base-url "..." --api-key "..." --model-name "..." \
    --method "adapters/llmmap.py" \
    --method-arg device=cuda \
    --method-arg max_tokens=8192

# Run multiple methods concurrently
python cli.py \
    --api-base-url "..." --api-key "..." --model-name "..." \
    --method "adapters/llmmap.py" \
    --method "adapters/some_other_method.py"

Programmatic Usage

You can also use this framework directly in your own Python code. The CLI is just a thin wrapper around the core API.

import asyncio
from llm_api_verifier import (
    UnifiedLLMClient,
    FingerprintOrchestrator,
    WeightedAverageFusion,
    load_adapter
)

async def verify_api():
    # 1. Initialize the client targeting the API under test
    client = UnifiedLLMClient(
        model="openai/gpt-5.4",
        api_key="sk-...",
        base_url="https://api.openai.com/v1",
        cache_path="./cache/my_cache.db"
    )

    # 2. Setup orchestrator and fusion strategy
    orchestrator = FingerprintOrchestrator(
        client=client,
        fusion=WeightedAverageFusion()
    )

    # 3. Load and register adapters
    llmmap_adapter = load_adapter(
        "adapters/llmmap.py", 
        device="cuda", 
        max_tokens=4096
    )
    orchestrator.register_method(llmmap_adapter)

    # 4. Run the pipeline
    result = await orchestrator.run()
    
    # 5. Inspect results
    print(f"Predicted Model: {result.predicted_model()}")
    print(f"Full Distribution: {result.final}")
    
    await client.close()

if __name__ == "__main__":
    asyncio.run(verify_api())

Writing an Adapter

If you have developed your own model identification algorithm, you only need to write a simple adapter script to plug into this framework's asynchronous and caching ecosystem.

from llm_api_verifier.methods.base import BaseMethod
from llm_api_verifier.client.models import APIResponse, ProbeConfig

class MyMethod(BaseMethod):
    def __init__(self, **kwargs):
        super().__init__(name="my_method")
        # Store config. DO NOT load heavy PyTorch models here!

    async def get_probes(self) -> list[ProbeConfig]:
        # Define what queries to send to the target API
        return [
            ProbeConfig(prompt="Who are you?", max_tokens=512),
            ProbeConfig(prompt="What is 2+2?", temperature=0.0),
        ]

    async def evaluate(self, responses: list[APIResponse]) -> dict[str, float]:
        # Perform your algorithm inference here (lazy-load weights if needed)
        # Process responses → return {model_name: probability}
        return {"model-a": 0.85, "model-b": 0.10, "model-c": 0.05}

def create_method(**kwargs) -> BaseMethod:
    """Factory function — dynamically loaded by the CLI."""
    return MyMethod(**kwargs)

Guidelines

  1. Lazy-load heavy resources. If you need to import torch or run model.load_state_dict(), do it inside a guard pattern within evaluate() so memory is only consumed when the method actually runs.
  2. Accept **kwargs. The CLI injects --method-arg values into create_method as kwargs. Extract what you need (e.g. thresholds, paths) and safely ignore the rest using **kwargs.
  3. Return a full probability distribution. Keys are model names from your algorithm's own internal logic, and values should sum to ~1.0.
  4. Manage your own dependencies. This framework only requires lightweight core packages (see requirements.txt). Real algorithm dependencies like torch or sentence-transformers should be managed by your external project.

The Included LLMmap Adapter

This repository includes adapters/llmmap.py as a reference implementation. It wraps the academic project LLMmap as a framework adapter.

Note on Supported Models: The original LLMmap paper evaluated the method on 120 target LLMs. Out of the box, the default pre-trained LLMmap model supports 52 models, including:

  • OpenAI: gpt-3.5-turbo, gpt-4-turbo-2024-04-09, gpt-4o-2024-05-13
  • Anthropic: claude-3-haiku, claude-3-5-sonnet, claude-3-opus
  • Meta: Llama-2, Llama-3/3.1/3.2 (8B, 70B, etc.)
  • Qwen: Qwen2/2.5 (various sizes)
  • Google: gemma-1.1/2
  • Microsoft: Phi-3/3.5
  • Others: Mistral/Mixtral, Cohere Aya, internlm2_5, etc.

(You can add support for new models using LLMmap's dataset generation scripts).

It works by converting LLMmap's 8 behavioral probes into ProbeConfigs, handing them to the framework for concurrent execution. Once responses are gathered, it calls LLMmap's native embedding and Euclidean distance pipeline, finally mapping those distances to a smooth probability distribution using temperature-scaled softmax.

Specific --method-arg parameters for LLMmap:

Key Default Description
device cpu PyTorch device (cpu or cuda)
max_tokens 4096 Max tokens per probe (increase for reasoning models)
softmax_temperature 0.5 Temperature for distance-to-probability conversion

Status & Contributing

🚧 Status: Under Active Development 🚧

This framework is currently in its early stages of development. We warmly welcome contributions from the community! In particular, we are looking for contributions in the form of:

  • New Detection Methods: If you have a novel algorithm or methodology for identifying LLM models, please consider adding it.
  • New Adapters: Implementations of existing model verification papers or projects as pluggable adapters.
  • Bug fixes, documentation improvements, and core feature enhancements.

Please feel free to open issues or submit pull requests.

License

MIT

About

A modular, plugin-based framework for verifying the true identity of models behind LLM APIs. / 模块化的 LLM 指纹识别与 API 模型身份验证框架。

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors

Languages