Hyperliquid Copy Trader - Wallet Profiling System

A Python package for tracking Hyperliquid trading activity, recording market data, and using machine learning to detect and classify profitable traders using real-time data streams.

Overview
Why WebSocket Streams vs Blockchain Events?
Features
Installation
- From PyPI (when published)
- From source
- Environment Variables
Usage
- Stage 1: Data Collection
- Stage 2: Model Training & Classification
- Stage 3: Visualization
Machine Learning Model
- Model Architecture: Multi-Task Learning (MTL)
- Training Process
Feature Engineering
Wallet Classification Categories
- Classification Logic
- Why Some Wallets Have No Categories
Output Format
Challenges and Solutions in Wallet Categorization
Three-Stage Architecture
API Endpoints
Requirements
File Structure
Important Notes
Limitations
Future Enhancements
Contributing
License
Disclaimer

Overview

This system consists of three main stages:

Data Collection (1_record_data.py): Subscribes to blockchain events and Hyperliquid public API streams to record trading data
Model Training & Classification (2_run_model.py): Uses PyTorch multi-task learning to analyze wallets and classify trader behavior
Visualization (3_frontend.py): Generates an interactive HTML dashboard to visualize wallet profiling results

The system monitors trading activity on Hyperliquid (a decentralized exchange on its own L1 blockchain) by subscribing to trading events and recording them in a parseable, human-readable format. The recorded data is designed for machine learning tasks related to profitable trader detection and wallet classification.

Why WebSocket Streams vs Blockchain Events?

WebSocket streams are recommended for the following reasons:

Lower Latency: Real-time data delivery without waiting for block confirmations
Efficiency: No need to poll blockchain nodes or parse transaction logs
Rich Data: Direct access to trade data, user fills, and position information
Reliability: Works with free public RPC endpoints without rate limit issues
Completeness: Captures all trading activity including order book interactions

Features

Real-time Trade Monitoring: Captures all trades across all active markets on Hyperliquid
Comprehensive Data Collection: Records blocks, transactions, trades, BBO, L2Book, and candles
ML-Ready Data Format: JSON Lines format (one JSON object per line) for easy parsing
Human-Readable Output: Timestamped, formatted log entries for manual inspection
Wallet Profiling: Multi-task learning model for wallet classification
Market Context: Candle data provides market conditions during trading activity

Installation

From PyPI (when published)

pip install hypertrack

From source

Clone the repository:

git clone https://github.com/yourusername/hypertrack.git
cd hypertrack

Install the package:

pip install -e .

Or install dependencies only:

pip install -r requirements.txt

Environment Variables

The system requires RPC endpoints for blockchain data access. These are configured via a .env file to keep sensitive API keys out of the codebase.

Why .env is necessary:

Security: API keys should never be committed to version control
Flexibility: Different users can use different RPC providers or API keys
Best Practice: Standard approach for managing configuration and secrets

Setup:

Copy the example file:
```
cp .env.example .env
```

Edit .env and add your Alchemy API key:

MAINNET_RPC_HTTP=https://hyperliquid-mainnet.g.alchemy.com/v2/YOUR_API_KEY
MAINNET_RPC_WS=wss://hyperliquid-mainnet.g.alchemy.com/v2/YOUR_API_KEY

Get your API key:
- Sign up at Alchemy
- Create a new app for Hyperliquid Mainnet
- Copy your API key and replace YOUR_API_KEY in .env

Note: The .env file is already in .gitignore and will not be committed to the repository. The .env.example file serves as a template for other users.

Usage

Stage 1: Data Collection

Record blockchain and market data:

python -m hypertrack.1_record_data

This will:

Subscribe to blockchain events (blocks, transactions)
Subscribe to Hyperliquid API (trades, BBO, L2Book, candles)
Save all data to recorded_data/ directory
Generate a data collection report

Configuration: Edit flags in 1_record_data.py to enable/disable specific data types:

RECORD_BLOCKS = True
RECORD_TRANSACTIONS = True
RECORD_TRADES = True
RECORD_BBO = True
RECORD_L2BOOK = True
RECORD_CANDLES = True

Stage 2: Model Training & Classification

Train the model and generate wallet classifications:

python -m hypertrack.2_run_model

This will:

Load recorded data from recorded_data/
Extract features from wallets
Train a multi-task learning model
Generate predictions and classifications
Create final_report.txt with results

Stage 3: Visualization

Generate an interactive dashboard to visualize wallet profiling results:

python -m hypertrack.3_frontend

This will:

Parse final_report.txt
Generate an interactive HTML dashboard (wallet_dashboard.html)
Open the dashboard automatically in your browser

Features:

Dark theme with modern UI
Summary statistics (total wallets, wallets with categories)
Grid layout showing all wallets as cards
Interactive gauge charts for each wallet (Risk, Profitability, Bot Probability, Sophistication)
Copy-to-clipboard functionality for wallet addresses
Documentation modal (click "Docs" button to view README.md)

Machine Learning Model

Model Architecture: Multi-Task Learning (MTL)

The system uses a PyTorch Multi-Task Learning (MTL) neural network to simultaneously predict multiple wallet characteristics.

Why Multi-Task Learning?

Multi-task learning was chosen for several reasons:

Shared Representations: The model learns a shared encoder that captures common patterns across all tasks, improving generalization
Data Efficiency: By learning multiple related tasks together, the model can leverage shared information, requiring less data per task
Regularization: Learning multiple tasks acts as a form of regularization, preventing overfitting to any single task
Realistic Use Case: In practice, we want to know multiple things about a wallet simultaneously (risk, profitability, bot probability, etc.), making MTL a natural fit
Transfer Learning: Knowledge learned for one task (e.g., bot detection) can help with related tasks (e.g., sophistication scoring)

Model Structure

Input Features (36 features)
    ↓
Shared Encoder (3-layer MLP with BatchNorm & Dropout)
    ├─→ Trading Style Head (6D vector)
    ├─→ Risk Score Head (0-1)
    ├─→ Profitability Score Head (0-1)
    ├─→ Bot Probability Head (0-1)
    ├─→ Influence Score Head (≥0)
    └─→ Sophistication Score Head (0-1)

Architecture Details:

Input Dimension: 36 features (see Feature Engineering section)
Hidden Dimensions: 256 → 256 → 128 (shared encoder)
Activation: ReLU with BatchNorm and 30% dropout for regularization
Output Layers: Task-specific heads with appropriate activations (Sigmoid for probabilities, Tanh for bounded vectors, ReLU for non-negative scores)

Training Process

Optimizer: Adam with learning rate 0.001, weight decay 1e-5
Loss Function: Weighted MSE (Mean Squared Error) with task-specific weights:
- Bot detection: 25% weight (high importance)
- Profitability: 25% weight (high importance)
- Risk: 15% weight
- Sophistication: 15% weight
- Trading Style: 10% weight
- Influence: 10% weight
Epochs: Up to 20 epochs with early stopping (patience=5)
Batch Size: 32
Normalization: Features are normalized using z-score normalization (mean=0, std=1)
Learning Rate Scheduling: ReduceLROnPlateau (reduces LR when loss plateaus)
Regularization:
- Output diversity regularization (prevents trivial solutions)
- Gradient clipping (max_norm=1.0) for training stability
- L2 weight decay
Early Stopping: Stops training if no improvement for 5 epochs, loads best model

Score Calculation

The model outputs 6 different scores for each wallet, each calculated by a dedicated neural network head:

Trading Style Score (6D vector)
- Range: -1.0 to 1.0 (Tanh activation)
- Calculation: 6-dimensional vector representing trading style preferences
- Dimensions: May represent different trading strategies (momentum, mean reversion, volatility trading, market making, trend following, contrarian)
- Usage: Used to identify strategy preferences in categorization (e.g., momentum followers, contrarian traders)
Risk Score
- Range: 0.0 to 1.0 (Sigmoid activation)
- Calculation: Neural network output based on 36 input features, trained to identify risk-taking behavior patterns
- Interpretation: Higher values indicate higher risk-taking behavior
- Usage: Used in "High Risk Trader" and "Moderate Risk Trader" categories
Profitability Score
- Range: 0.0 to 1.0 (Sigmoid activation)
- Calculation: Neural network output based on 36 input features, trained to identify profitable trading patterns
- Interpretation: Higher values suggest the wallet exhibits patterns associated with profitable trading
- Usage: Used in "Profitable Trader" category and to boost confidence in other categories (Active Trader, Arbitrageur, etc.)
Bot Probability
- Range: 0.0 to 1.0 (Sigmoid activation)
- Calculation: Neural network output based on 36 input features, trained to identify bot-like behavior patterns
- Interpretation: Higher values indicate higher likelihood of automated trading (bot)
- Usage: Directly used for "Bot" and "Possible Bot" classification
Influence Score
- Range: ≥ 0.0 (ReLU activation, unbounded)
- Calculation: Neural network output based on 36 input features, trained to identify market influence patterns
- Interpretation: Higher values suggest the wallet has significant market impact (e.g., market makers, large traders)
- Usage: Used in "Influential Trader" category
- Note: Unlike other scores, this is unbounded (can exceed 1.0)
Sophistication Score
- Range: 0.0 to 1.0 (Sigmoid activation)
- Calculation: Neural network output based on 36 input features, trained to identify sophisticated trading patterns
- Interpretation: Higher values indicate more sophisticated trading strategies and execution
- Usage: Used in "Sophisticated Trader" category and to boost confidence in other categories (Bot, Scalper, Arbitrageur, etc.)

How Scores Are Generated:

Feature Extraction: 36 features are extracted from wallet transaction history and market context
Feature Normalization: Features are normalized using z-score normalization (mean=0, std=1)
Shared Encoder: All features pass through a shared 3-layer MLP encoder (256 → 256 → 128 dimensions)
Task-Specific Heads: The encoded representation is passed to 6 separate neural network heads
Activation Functions: Each head applies its specific activation function (Sigmoid, Tanh, or ReLU) to produce the final score
Self-Supervised Learning: The model is trained using self-supervised learning (learning patterns in the feature space without explicit labels)

Important Notes:

Scores are relative and based on patterns learned from the data
The model learns to distinguish between different wallet behaviors through the shared encoder
Higher scores don't necessarily mean "better" - they indicate the strength of a particular characteristic
Scores are used both directly (for ML-driven categories) and as confidence boosters (for rule-based categories)

Feature Engineering

The model uses 36 features extracted from wallet transaction history and market context:

Basic Transaction Features (4)

tx_count: Total number of transactions performed by the wallet
- Purpose: Measures overall trading activity level
- Usage: Used to filter low-activity wallets and identify active traders
erc20_count: Number of ERC-20 token transfers
- Purpose: Distinguishes between regular transactions and token transfers
- Usage: Helps identify token collectors and DeFi participants
unique_tokens: Number of unique tokens the wallet has interacted with
- Purpose: Measures token diversity and trading breadth
- Usage: Identifies arbitrageurs (high diversity) and specialized traders (low diversity)
unique_addresses: Number of unique addresses the wallet has interacted with
- Purpose: Measures network connectivity and interaction patterns
- Usage: Helps identify market makers and active network participants

Temporal Features (4)

age_days: Wallet age in days (time from first seen to last seen)
- Purpose: Measures wallet maturity and trading history length
- Usage: Distinguishes new wallets from established traders; used in HODLer classification
tx_per_day: Average number of transactions per day
- Purpose: Measures trading frequency and activity rate
- Usage: Key metric for Scalper (high) and HODLer (low) classification
burstiness: Coefficient of variation of inter-transaction times
- Purpose: Measures irregularity of trading patterns
- Calculation: std(inter_tx_times) / mean(inter_tx_times)
- Usage: High burstiness = human-like irregular trading; low burstiness = bot-like regular trading
hour_entropy: Entropy of transaction hour distribution
- Purpose: Measures time diversity of trading activity
- Calculation: Shannon entropy of hour-of-day distribution
- Usage: Low entropy = trades at specific times (possibly bot); high entropy = trades throughout day (human)

Value Features (4)

total_value: Sum of all transaction values
- Purpose: Measures total trading volume
- Usage: Identifies high-volume traders and whales
avg_value: Average transaction value
- Purpose: Measures typical transaction size
- Usage: Distinguishes between small and large traders
max_value: Maximum single transaction value
- Purpose: Identifies the largest transaction
- Usage: Key metric for Whale classification (very large single transactions)
value_std: Standard deviation of transaction values
- Purpose: Measures consistency of transaction sizes
- Usage: Low std = consistent trader; high std = varied trading patterns

ERC-20 Features (1)

erc20_volume: Total volume of ERC-20 token transfers
- Purpose: Measures token transfer activity separate from regular transactions
- Usage: Identifies token collectors and DeFi participants

Directional Features (2)

long_ratio: Ratio of long positions to total positions
- Purpose: Measures directional bias (bullish vs bearish)
- Usage: Identifies directional traders vs market-neutral strategies
flip_rate: Rate of direction changes (long→short or short→long)
- Purpose: Measures how often the wallet changes position direction
- Calculation: Number of direction changes / total position changes
- Usage: High flip rate suggests arbitrage or mean reversion strategies

Size Features (1)

size_entropy: Entropy of trade size distribution
- Purpose: Measures consistency vs variety in trade sizes
- Calculation: Shannon entropy of trade size distribution
- Usage: Low entropy = consistent sizes (bot-like); high entropy = varied sizes (human-like)

Network Features (1)

contract_interaction_count: Number of unique smart contracts interacted with
- Purpose: Measures DeFi protocol engagement and sophistication
- Usage: Higher counts suggest sophisticated DeFi users

Market Impact Features (4)

avg_market_impact: Average market impact per trade
- Purpose: Measures how much each trade moves the market price
- Usage: High impact = large trader or illiquid markets; low impact = small trader or liquid markets
max_market_impact: Maximum market impact observed
- Purpose: Identifies the largest single market impact event
- Usage: Helps identify whales and high-impact traders
avg_slippage: Average slippage experienced per trade
- Purpose: Measures execution quality and market liquidity
- Usage: High slippage = large orders or illiquid markets; low slippage = efficient execution
slippage_std: Standard deviation of slippage
- Purpose: Measures consistency of execution quality
- Usage: Low std = consistent execution; high std = variable market conditions

Trading Strategy Features (5)

momentum_score: Tendency to follow momentum (buy when price rising, sell when falling)
- Purpose: Identifies momentum-following strategies
- Usage: Used in Momentum Follower classification
mean_reversion_score: Tendency to trade against trends (buy when price falling, sell when rising)
- Purpose: Identifies contrarian and mean reversion strategies
- Usage: Used in Contrarian Trader classification
volatility_trading_score: Activity during volatile market periods
- Purpose: Measures preference for trading during high volatility
- Usage: Used in Volatility Trader classification
market_maker_score: Market making activity level
- Purpose: Measures provision of liquidity (placing orders on both sides)
- Usage: Identifies market makers and liquidity providers
order_book_participation: Level of order book interaction
- Purpose: Measures engagement with limit orders vs market orders
- Usage: Higher values suggest sophisticated order placement strategies

Candle-Based Features (10)

These features provide market context during active trading periods by analyzing OHLCV candle data matched to transaction timestamps:

avg_candle_volatility: Average volatility during active trading periods
- Calculation: mean((high - low) / open) for candles matched to transactions
- Purpose: Measures typical market volatility when the wallet trades
- Usage: Used in Volatility Trader classification
max_candle_volatility: Maximum volatility encountered
- Purpose: Identifies the most volatile period the wallet traded in
- Usage: Helps identify volatility-seeking traders
volatility_std: Standard deviation of volatility levels
- Purpose: Measures consistency of volatility preferences
- Usage: Low std = consistent volatility preference; high std = trades in various conditions
avg_candle_momentum: Average price momentum during active periods
- Calculation: mean((close - open) / open) for matched candles
- Purpose: Measures typical price direction when the wallet trades
- Usage: Used in Momentum Follower and Contrarian Trader classification
momentum_std: Standard deviation of momentum
- Purpose: Measures consistency of momentum preferences
- Usage: Low std = consistent momentum preference; high std = trades in various conditions
positive_momentum_ratio: Ratio of periods with positive momentum (price rising)
- Calculation: count(positive_momentum) / total_periods
- Purpose: Measures tendency to trade during upswings vs downswings
- Usage: High ratio = momentum follower; low ratio = contrarian trader
avg_trend_strength: Average trend strength during active periods
- Calculation: mean(|close - open| / (high - low)) for matched candles
- Purpose: Measures how strong trends are when the wallet trades
- Usage: Identifies trend-following vs range-trading strategies
avg_candle_volume: Average trading volume during active periods
- Purpose: Measures typical market liquidity when the wallet trades
- Usage: Used in High Volume Trader classification
max_candle_volume: Maximum volume encountered
- Purpose: Identifies the highest volume period the wallet traded in
- Usage: Helps identify liquidity-seeking traders
volume_std: Standard deviation of volume
- Purpose: Measures consistency of volume preferences
- Usage: Low std = consistent volume preference; high std = trades in various liquidity conditions

Why These Features?

Comprehensive Coverage: Features span transaction patterns, temporal behavior, value metrics, market interactions, and market context
Bot Detection: Temporal features (burstiness, hour_entropy) and size features (size_entropy) help identify automated trading
Strategy Identification: Directional, strategy, and candle features help classify trading strategies
Market Context: Candle features provide crucial context about market conditions during trading activity
Whale Identification: Value features (max_value, total_value) help identify large traders
Strategy Identification: Volatility traders trade during high volatility, momentum followers trade during uptrends, contrarians trade during downtrends
Timing Analysis: Reveals whether traders prefer specific market conditions

Wallet Classification Categories

The system classifies wallets into multiple categories based on behavioral patterns. A wallet can belong to multiple categories with confidence scores.

Bot Detection Categories

Bot (confidence: calculated)
- Metrics: bot_probability > 0.699 AND tx_count >= 2
- Why: ML model detects automated behavior patterns
- Confidence: min(1.0, bot_probability * 0.8 + sophistication_score * 0.2)
Possible Bot (confidence: calculated)
- Metrics: bot_probability > 0.499 AND tx_count >= 2
- Why: Moderate likelihood of automation
- Confidence: min(0.9, bot_probability * 0.7 + sophistication_score * 0.1)

Trading Frequency Categories

Scalper (confidence: calculated)
- Metrics: tx_per_day > 49.7 AND size_entropy < 0.401 AND tx_count >= 5
- Why: High frequency + consistent sizes = scalping behavior
- Confidence: min(1.0, (tx_per_day / 100.0) * (1.0 - size_entropy) * 0.7 + ML_boost)
Possible Scalper (confidence: calculated)
- Metrics: tx_per_day > 29.7 AND size_entropy < 0.446 AND tx_count >= 5
- Confidence: min(0.85, 0.6 + ML_boost)
Active Trader (confidence: calculated)
- Metrics: 0.995 <= tx_per_day <= 50 AND burstiness > 0.299 AND tx_count >= 3
- Why: Moderate-high frequency with irregular timing (human-like)
- Confidence: min(1.0, tx_per_day / 50.0 * 0.6 + ML_boost)
Moderate Trader (confidence: calculated)
- Metrics: 0.497 <= tx_per_day <= 30 AND tx_count >= 3
- Confidence: min(0.8, 0.6 + ML_boost)
HODLer (confidence: calculated)
- Metrics: tx_per_day < 0.101 AND age_days > 29
- Why: Very low frequency + old wallet = holding strategy
- Confidence: min(1.0, (28.0 / age_days) * (0.11 / tx_per_day))
Possible HODLer (confidence: 0.6)
- Metrics: tx_per_day < 0.401 AND age_days > 59

Value-Based Categories

Whale (confidence: calculated)
- Metrics: tx_count >= 5 AND ((max_value > 49700.0 AND total_value > 99500.0) OR total_value > 497000.0)
- Why: Very large transaction values indicate significant capital (whales are rare)
- Confidence: min(1.0, (max_value / 500000.0) * 0.5 + (total_value / 5000000.0) * 0.5)
- Note: Thresholds were tightened to reduce over-classification while still identifying true whales
Possible Whale (confidence: calculated)
- Metrics: tx_count >= 5 AND ((max_value > 9950.0 AND total_value > 49700.0) OR total_value > 199000.0)
- Confidence: min(0.85, (max_value / 100000.0) * 0.4 + (total_value / 1000000.0) * 0.4)

Token Interaction Categories

Token Collector (confidence: calculated)
- Metrics: unique_tokens > 9.9 AND erc20_count > tx_count * 1.995 AND tx_count >= 5
- Why: High ERC-20 activity relative to regular transactions
- Confidence: min(1.0, (unique_tokens / 50.0) * (erc20_count / (tx_count * 3)))
Possible Token Collector (confidence: 0.6)
- Metrics: unique_tokens > 4.9 AND erc20_count > tx_count * 0.998 AND tx_count >= 5
Arbitrageur (confidence: calculated)
- Metrics: unique_tokens > 4.9 AND flip_rate > 0.497 AND tx_count >= 5
- Why: Multiple tokens + high direction changes = arbitrage
- Confidence: min(1.0, (unique_tokens / 20.0) * flip_rate * 0.5 + ML_boost)
Possible Arbitrageur (confidence: calculated)
- Metrics: unique_tokens > 2.9 AND flip_rate > 0.297 AND tx_count >= 5
- Confidence: min(0.85, 0.6 + ML_boost)

Market Condition Categories (Candle-Based) - NEW

Volatility Trader (confidence: calculated)
- Metrics: avg_candle_volatility > 0.0199 (1.99%+) AND tx_count >= 5
- Why: Trades during high volatility periods (volatility trading strategy)
- Confidence: min(1.0, (avg_volatility / 0.05) * (tx_count / 20.0) * 0.7 + style_boost)
Possible Volatility Trader (confidence: calculated)
- Metrics: avg_candle_volatility > 0.00995 (0.995%+) AND tx_count >= 5
- Confidence: min(0.85, 0.6 + style_boost)
Momentum Follower (confidence: calculated)
- Metrics: positive_momentum_ratio > 0.699 AND avg_momentum > 0.000995 AND tx_count >= 5
- Why: Trades primarily when price is rising (momentum strategy)
- Confidence: min(1.0, positive_momentum_ratio * (avg_momentum / 0.01) * 0.5 + style_boost + ML_boost)
Possible Momentum Follower (confidence: calculated)
- Metrics: positive_momentum_ratio > 0.599 AND tx_count >= 5
- Confidence: min(0.85, 0.6 + style_boost)
Contrarian Trader (confidence: calculated)
- Metrics: positive_momentum_ratio < 0.301 AND avg_momentum < -0.000995 AND tx_count >= 5
- Why: Trades when price is falling (mean reversion/contrarian strategy)
- Confidence: min(1.0, (1.0 - positive_momentum_ratio) * (abs(avg_momentum) / 0.01) * 0.5 + style_boost + ML_boost)
Possible Contrarian Trader (confidence: calculated)
- Metrics: positive_momentum_ratio < 0.401 AND tx_count >= 5
- Confidence: min(0.85, 0.6 + style_boost)
High Volume Trader (confidence: calculated)
- Metrics: avg_candle_volume > 99.5 AND tx_count >= 5
- Why: Prefers trading during high volume periods (liquidity seeking)
- Confidence: min(1.0, (avg_volume / 500.0) * (tx_count / 20.0) * 0.7 + ML_boost)

ML-Driven Categories (NEW)

These categories are based primarily on ML model predictions, enhanced with rule-based validation:

Profitable Trader (confidence: calculated)
- Metrics: profitability_score > 0.599 AND tx_count >= 4
- Why: ML model identifies wallets with high profitability patterns
- Confidence: min(1.0, profitability_score * 0.7 + sophistication_score * 0.3)
Possibly Profitable Trader (confidence: calculated)
- Metrics: profitability_score > 0.399 AND tx_count >= 4
- Confidence: min(0.85, profitability_score * 0.6 + sophistication_score * 0.2)
High Risk Trader (confidence: calculated)
- Metrics: risk_score > 0.599 AND tx_count >= 4
- Why: ML model identifies wallets with high risk-taking behavior
- Confidence: min(1.0, risk_score * 0.7 + sophistication_score * 0.2)
Moderate Risk Trader (confidence: calculated)
- Metrics: risk_score > 0.399 AND tx_count >= 4
- Confidence: min(0.85, risk_score * 0.6)
Sophisticated Trader (confidence: calculated)
- Metrics: sophistication_score > 0.599 AND tx_count >= 5
- Why: ML model identifies wallets with sophisticated trading patterns
- Confidence: min(1.0, sophistication_score * 0.6 + profitability_score * 0.3 + risk_score * 0.1)
Possibly Sophisticated Trader (confidence: calculated)
- Metrics: sophistication_score > 0.399 AND tx_count >= 4
- Confidence: min(0.85, sophistication_score * 0.7)
Influential Trader (confidence: calculated)
- Metrics: influence_score > 2.985 AND tx_count >= 5
- Why: ML model identifies wallets with high market influence (market makers, large traders)
- Confidence: min(1.0, min(1.0, influence_score / 50.0) * 0.7 + sophistication_score * 0.3)
Possibly Influential Trader (confidence: calculated)
- Metrics: influence_score > 0.995 AND tx_count >= 4
- Confidence: min(0.85, min(1.0, influence_score / 20.0) * 0.6)

Fallback Categories

Occasional Trader (confidence: 0.5)
- Metrics: tx_count >= 5 AND tx_per_day > 0.5 AND no other categories matched
- Why: Fallback category for occasional traders with sufficient activity
- Note: "Active Wallet" category has been removed as it was not informative

Classification Logic

Categories are determined using a hybrid approach combining rule-based classification with ML-enhanced confidence:

Rule-Based Foundation: Each category has specific thresholds for relevant metrics (transaction frequency, values, patterns)
ML Enhancement: ML model scores (risk, profitability, sophistication, bot probability, trading style vector) are used to:
- Boost confidence when ML confirms the category
- Add new ML-driven categories (Profitable Trader, Sophisticated Trader, etc.)
- Refine category confidence based on learned patterns
Multi-Category Assignment: A wallet can belong to multiple categories (e.g., "Bot" + "Scalper" + "Profitable Trader")
Ranking: Categories are sorted by confidence score (highest first)

Hybrid Approach Benefits:

Interpretability: Rule-based thresholds are transparent and explainable
Domain Knowledge: Incorporates trading domain expertise through rules
ML Intelligence: ML model learns patterns that rules might miss
Robustness: Rule-based categories work even when ML scores are low (ML is optional boost)
Flexibility: Easy to adjust thresholds based on observed data

How ML Enhances Categories:

Bot Classification: Uses bot_probability directly, boosted by sophistication_score
Scalper: Rule-based frequency/size + ML sophistication/risk boost
Active Trader: Rule-based frequency + ML profitability/sophistication boost
Arbitrageur: Rule-based tokens/flip_rate + ML profitability/sophistication boost
Strategy Categories: Rule-based market conditions + Trading Style vector + ML profitability
New ML Categories: Purely ML-driven (Profitable, Risk, Sophisticated, Influential)

Why Some Wallets Have No Categories

A wallet may show "Categories: None" in the report, but this is now rare due to loosened thresholds and fallback categories. This does not mean the model is inconclusive or that the wallet wasn't analyzed.

Reasons for No Categories:

Very Low Activity: Wallets with tx_count < 2 are excluded from categorization
- Most categories require at least 3-4 transactions to establish patterns
- This ensures categories are based on meaningful activity
Missing Market Context: If candle data is unavailable or doesn't match transaction timestamps:
- Candle-based categories (Volatility Trader, Momentum Follower, etc.) won't match
- This is normal if the wallet traded when no candle data was recorded
- However, other categories (Bot, Active Trader, etc.) may still apply

Fallback Categories:

To ensure very active wallets get categorized, the system includes fallback categories for wallets with tx_count >= 5 that don't match specific patterns:

Occasional Trader: tx_per_day > 0.5

What This Means:

The ML model still works: All wallets receive ML scores (bot probability, risk, profitability, etc.)
Categories are descriptive labels: They provide behavioral context for analysis
Selective categorization: With tightened thresholds, only wallets with clear behavioral patterns receive categories, ensuring quality over quantity
You can adjust thresholds: Modify classify_wallet() in 2_run_model.py to further adjust thresholds

Configuration:

Set SHOW_NO_CATEGORIES = False in 2_run_model.py to exclude wallets with no categories from the report
Set TX_COUNT_THRESHOLD to filter out wallets with too few transactions (default: 1)

Output Format

Data Collection Output

Data is saved in JSON Lines format (JSONL) in the recorded_data/ directory:

blocks.jsonl: Blockchain blocks
transactions.jsonl: All transactions
trades.jsonl: Trade data from Hyperliquid API
bbo.jsonl: Best Bid Offer updates
l2book.jsonl: Level 2 order book updates
candles.jsonl: Completed OHLCV candles (1-minute intervals)
data_collection_report.txt: Summary of collected data

Model Output

wallet_model.pt: Trained PyTorch model checkpoint
final_report.txt: Human-readable report with wallet classifications

Example Report

================================================================================
WALLET PROFILING REPORT
================================================================================
Generated: 2026-01-22T12:00:00Z
Total wallets analyzed: 150

================================================================================

Wallet #1: 0x1234...
--------------------------------------------------------------------------------
Categories: Bot (0.85), Scalper (0.72), Momentum Follower (0.68)

Trading Style Score (6D vector): ['0.2341', '-0.1234', '0.5678', '0.1234', '-0.2341', '0.3456']
Risk Score: 0.7234
Profitability Score: 0.8123
Bot Probability: 0.8500
Influence Score: 5.2341
Sophistication Score: 0.6789

Transaction Count: 1250
Age (days): 45.23

Challenges and Solutions in Wallet Categorization

Initial Challenges

During development, we encountered several significant hurdles in creating an effective wallet categorization system:

1. Everyone Classified as Whales

Problem: Initially, almost all wallets were being classified as "Whale" or "Possible Whale", making the categorization useless.

Root Cause: The original thresholds were too low:

Whale: max_value > 100.0 OR total_value > 1000.0
Possible Whale: max_value > 10.0 OR total_value > 100.0

In crypto markets, especially on Hyperliquid, these values are quite common, so most participants met the criteria.

Solution: Increased thresholds by 100-500x:

Whale: max_value > 50000.0 AND total_value > 100000.0 OR total_value > 500000.0
Possible Whale: max_value > 10000.0 AND total_value > 50000.0 OR total_value > 200000.0
Added requirement: tx_count >= 5 to avoid one-off large transactions

This made whale classification much more exclusive and meaningful.

2. ML Model Not Contributing to Categorization

Problem: The ML model was trained and provided scores, but categories were purely rule-based. The ML scores weren't being used effectively, making the ML component feel disconnected.

Root Cause:

Categories were determined by simple threshold rules on raw features
Only bot_probability was used (for Bot classification)
Other ML scores (risk, profitability, sophistication, trading style) were reported but not used for categorization

Solution: Implemented hybrid ML-enhanced categorization:

ML scores integrated: All ML scores now influence category confidence
ML-driven categories added: New categories based purely on ML predictions (Profitable Trader, Sophisticated Trader, etc.)
Trading Style vector used: 6D style vector helps identify strategy preferences (momentum, mean reversion, volatility)
ML as boost, not requirement: Rule-based categories still work even with low ML confidence; ML provides enhancement when available

3. No Categories Assigned (0 Wallets with Categories)

Problem: After implementing ML enhancements, sometimes no wallets were getting categories assigned.

Root Cause:

ML confidence filter was too strict: removed categories when ml_confidence < 0.2
ML boost thresholds were too high: required ml_confidence > 0.3 to apply boosts
ML-driven category thresholds were too strict (0.7/0.5 for scores, high tx_count requirements)
Rule-based thresholds were also too strict (high tx_count requirements, high value thresholds)

Solution:

Removed strict ML filter: Rule-based categories now work regardless of ML confidence
Lowered ML boost threshold: Changed from ml_confidence > 0.3 to ml_confidence > 0.1 (now ml_confidence > 0.2 in some places)
Lowered ML category thresholds: Reduced from 0.7/0.5 to 0.5/0.3, tx_count from 3/5 to 2
Lowered rule-based thresholds: Reduced transaction count requirements from 5 to 2, lowered value/frequency thresholds across categories
Added fallback categories: Wallets with 5+ transactions that don't match specific patterns get basic categories (Occasional Trader). "Active Wallet" category was removed as it was not informative.
ML is optional: Categories work based on rules; ML provides boost when available

4. Self-Supervised Learning Limitations

Problem: The ML model uses self-supervised learning (training against zeros), which may not learn meaningful patterns.

Solution: Enhanced training process:

Weighted loss: Higher weight for important tasks (bot, profitability)
Output diversity regularization: Prevents model from collapsing to trivial solutions
Better training: Increased epochs (10→20), learning rate scheduling, early stopping
Gradient clipping: Improves training stability

Note: For production use, consider supervised learning with labeled data if available.

Current Approach

The system now uses a hybrid rule-based + ML-enhanced approach:

Rule-based foundation: Categories are primarily determined by interpretable rules on features
ML enhancement: ML scores boost confidence and add new categories
Robustness: Works even when ML scores are low (ML is enhancement, not requirement)
Flexibility: Easy to adjust thresholds and add new categories

This approach balances interpretability (rules) with intelligence (ML), ensuring the system works reliably while learning from data.

Three-Stage Architecture

Stage 1: Data Collection (`1_record_data.py`)

Purpose: Collect and store raw data for later analysis

Features:

Runs continuously until stopped (Ctrl+C)
Clears recorded_data/ folder on startup
Optional recording flags for each data type
Real-time status updates
Automatic reconnection on connection loss
Filters intra-candle updates (only completed candles)

Why Separate Stage?

Allows data collection to run independently
Enables multiple model runs on same dataset
Separates data collection from analysis concerns

Stage 2: Model Training & Classification (`2_run_model.py`)

Purpose: Analyze collected data and generate wallet classifications

Features:

Loads all recorded data
Extracts features from wallets
Trains multi-task learning model
Generates predictions and classifications
Creates comprehensive report (final_report.txt)

Why Separate Stage?

Can run offline on collected data
Allows experimentation with different models
Enables batch processing of historical data

Stage 3: Visualization (`3_frontend.py`)

Purpose: Generate interactive dashboard to visualize wallet profiling results

Features:

Parses final_report.txt to extract wallet data
Creates interactive HTML dashboard with dark theme
Displays wallet cards in responsive grid layout
Shows summary statistics (total wallets, wallets with categories)
Interactive gauge charts for each wallet (Risk, Profitability, Bot Probability, Sophistication)
Copy-to-clipboard functionality for wallet addresses
Documentation modal (displays README.md content)
Opens automatically in browser

Why Separate Stage?

Can be run independently after model training
Generates static HTML file (works offline)
Provides visual interface for exploring results
Separates visualization from data collection and analysis

API Endpoints

The package uses Hyperliquid's official public API endpoints:

Mainnet: https://api.hyperliquid.xyz (default)
WebSocket: wss://api.hyperliquid.xyz/ws (for real-time streams)
Blockchain RPC: Alchemy RPC endpoint (configured in code)

These are free, public, and reliable endpoints provided by Hyperliquid.

Requirements

Python 3.8+
torch (PyTorch for ML model)
numpy (numerical operations)
requests (for HTTP REST API calls)
websocket-client (for WebSocket real-time streams)

File Structure

hypertrack/
├── __init__.py              # Package initialization
├── 1_record_data.py         # Stage 1: Data collection
├── 2_run_model.py          # Stage 2: Model training & classification
├── 3_frontend.py            # Stage 3: Interactive dashboard visualization
├── recorder_trades.py       # Example: Trade recorder for testing Hyperliquid API
├── recorder_bbo.py          # Example: BBO recorder for testing Hyperliquid API
├── recorder_l2Book.py       # Example: L2Book recorder for testing Hyperliquid API
└── recorder_candle.py       # Example: Candle recorder for testing Hyperliquid API

**Note**: The `recorder_*.py` files are example separate Hyperliquid feed recorders used to test the Hyperliquid API. They demonstrate how to subscribe to individual data streams (trades, BBO, L2Book, candles) and can be used independently for testing purposes. The main data collection is handled by `1_record_data.py`, which integrates all these streams.

recorded_data/               # Data collection output
├── blocks.jsonl
├── transactions.jsonl
├── trades.jsonl
├── bbo.jsonl
├── l2book.jsonl
├── candles.jsonl
└── data_collection_report.txt

sample_data/                 # Sample recorder outputs
├── trades_ETH.log
├── bbo_ETH.log
├── l2Book_ETH.log
└── candle_ETH.log

Important Notes

No Private Data: Does NOT subscribe to userEvents or userTrades (those are private)
Public Blockchain Only: Only uses publicly available blockchain data
Candle Filtering: Only completed candles are recorded (intra-candle updates filtered)
Data Clearing: recorded_data/ folder is cleared on each run of 1_record_data.py
GPU Support: Model automatically uses GPU if available (CUDA)

Limitations

Rate Limits: The recorder includes delays to respect API rate limits
Historical Data: The recorder starts from the current time. For historical data, use Hyperliquid's historical API endpoints separately
Model Training: Requires sufficient data (50+ wallets recommended for meaningful results)
Candle Matching: Candles are matched to transactions by timestamp; if no candle exists for a transaction period, candle features default to 0

Future Enhancements

Real-time streaming classification (classify wallets as data arrives)
Database storage option (SQLite, PostgreSQL)
Integration with live trading signals
Additional market data sources
Model versioning and A/B testing
Web dashboard for visualization

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

This project is provided as-is for educational and research purposes.

Disclaimer

This tool is for monitoring and analysis purposes only. Always do your own research and never invest more than you can afford to lose. Trading cryptocurrencies involves substantial risk.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
hypertrack		hypertrack
report_examples		report_examples
.env.example		.env.example
.gitignore		.gitignore
.pypirc.example		.pypirc.example
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
PUBLISH.md		PUBLISH.md
README.md		README.md
hyper.png		hyper.png
paper.png		paper.png
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

License

ssanin82/hyperliquid-copy-trader

Folders and files

Latest commit

History

Repository files navigation

Hyperliquid Copy Trader - Wallet Profiling System

Table of Contents

Overview

Why WebSocket Streams vs Blockchain Events?

Features

Installation

From PyPI (when published)

From source

Environment Variables

Usage

Stage 1: Data Collection

Stage 2: Model Training & Classification

Stage 3: Visualization

Machine Learning Model

Model Architecture: Multi-Task Learning (MTL)

Why Multi-Task Learning?

Model Structure

Training Process

Score Calculation

Feature Engineering

Basic Transaction Features (4)

Temporal Features (4)

Value Features (4)

ERC-20 Features (1)

Directional Features (2)

Size Features (1)

Network Features (1)

Market Impact Features (4)

Trading Strategy Features (5)

Candle-Based Features (10)

Wallet Classification Categories

Bot Detection Categories

Trading Frequency Categories

Value-Based Categories

Token Interaction Categories

Market Condition Categories (Candle-Based) - NEW

ML-Driven Categories (NEW)

Fallback Categories

Classification Logic

Why Some Wallets Have No Categories

Output Format

Data Collection Output

Model Output

Example Report

Challenges and Solutions in Wallet Categorization

Initial Challenges

1. Everyone Classified as Whales

2. ML Model Not Contributing to Categorization

3. No Categories Assigned (0 Wallets with Categories)

4. Self-Supervised Learning Limitations

Current Approach

Three-Stage Architecture

Stage 1: Data Collection (1_record_data.py)

Stage 2: Model Training & Classification (2_run_model.py)

Stage 3: Visualization (3_frontend.py)

API Endpoints

Requirements

File Structure

Important Notes

Limitations

Future Enhancements

Contributing

License

Disclaimer

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Stage 1: Data Collection (`1_record_data.py`)

Stage 2: Model Training & Classification (`2_run_model.py`)

Stage 3: Visualization (`3_frontend.py`)

Packages