Skip to content

Add AI bot classification for event enrichment#57

Open
jaredmixpanel wants to merge 4 commits intomasterfrom
feature/ai-bot-classification
Open

Add AI bot classification for event enrichment#57
jaredmixpanel wants to merge 4 commits intomasterfrom
feature/ai-bot-classification

Conversation

@jaredmixpanel
Copy link
Contributor

Summary

Adds AI bot classification with a BotClassifyingMessageBuilder decorator that automatically detects AI crawler requests and enriches tracked events with classification properties.

What it does

  • Classifies user-agent strings against a database of 12 known AI bots
  • Enriches events with $is_ai_bot, $ai_bot_name, $ai_bot_provider, and $ai_bot_category properties
  • Supports custom bot patterns that take priority over built-in patterns
  • Case-insensitive matching

AI Bots Detected

GPTBot, ChatGPT-User, OAI-SearchBot, ClaudeBot, Claude-User, Google-Extended, PerplexityBot, Bytespider, CCBot, Applebot-Extended, Meta-ExternalAgent, cohere-ai

Files Added

  • src/main/java/com/mixpanel/mixpanelapi/AiBotClassification.java
  • src/main/java/com/mixpanel/mixpanelapi/AiBotClassifier.java
  • src/main/java/com/mixpanel/mixpanelapi/AiBotEntry.java
  • src/main/java/com/mixpanel/mixpanelapi/BotClassifyingMessageBuilder.java
  • src/test/java/com/mixpanel/mixpanelapi/AiBotClassifierTest.java
  • src/test/java/com/mixpanel/mixpanelapi/BotClassifyingMessageBuilderTest.java

Files Modified

  • None

Test Plan

  • All 12 AI bot user-agents correctly classified
  • Non-AI-bot user-agents return $is_ai_bot: false (Chrome, Googlebot, curl, etc.)
  • Empty string and null/nil inputs handled gracefully
  • Case-insensitive matching works
  • Custom bot patterns checked before built-in
  • Event properties preserved through enrichment
  • No regressions in existing test suite

Part of AI bot classification feature for Java SDK.
Part of AI bot classification feature for Java SDK.
Part of AI bot classification feature for Java SDK.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds optional AI-bot user-agent classification and event enrichment to the Mixpanel Java SDK via a BotClassifyingMessageBuilder decorator, plus unit tests for classification and enrichment behavior.

Changes:

  • Introduces an AI bot “database” (AiBotEntry) and classifier (AiBotClassifier) that returns an immutable result (AiBotClassification).
  • Adds BotClassifyingMessageBuilder wrapper to enrich event and importEvent properties with $is_ai_bot and related $ai_bot_* fields based on $user_agent.
  • Adds focused JUnit tests for classifier behavior and message enrichment/passthrough behavior.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/main/java/com/mixpanel/mixpanelapi/AiBotClassification.java Immutable classification result model returned by the classifier.
src/main/java/com/mixpanel/mixpanelapi/AiBotClassifier.java Default bot database + classification logic + builder for custom bot patterns.
src/main/java/com/mixpanel/mixpanelapi/AiBotEntry.java Immutable DB entry mapping regex patterns to bot metadata.
src/main/java/com/mixpanel/mixpanelapi/BotClassifyingMessageBuilder.java Decorator around MessageBuilder that enriches event/import properties based on $user_agent.
src/test/java/com/mixpanel/mixpanelapi/AiBotClassifierTest.java Tests for default classification, negative cases, case-insensitivity, and custom pattern priority.
src/test/java/com/mixpanel/mixpanelapi/BotClassifyingMessageBuilderTest.java Tests for enrichment, passthrough behavior, preservation, and end-to-end delivery serialization.

- Add null guard in AiBotEntry.matches()
- Validate null elements in Builder.addBots()
- Remove unused ArrayList import
- Fix getBotDatabase() Javadoc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments