AdFL In-browser Federated Learning for Online Advertisement - Model and Data samples

This repository contains the code and data sample for the AdFL research paper, published in ICWSM 2026. All feature names and data values have been obfuscated to protect privacy information while maintaining the scientific reproducibility of the results.

model.py - Main federated learning implementation
data_sample.csv - Obfuscated data sample (2,000 records, 10 users, 200 samples each)
README.md - This documentation file
requirements.txt - Python package dependencies
LICENSE - MIT License
CITATION.bib - BibTeX citation for academic use
.gitignore - Git ignore file for version control

Features

The model uses 27 obfuscated features plus a user identifier:

Binary Features (4)

bin_1, bin_2, bin_3, bin_4 - Binary indicators (0/1)

Numeric Features (14)

num_1 through num_14 - Continuous numeric features
All numeric values have been normalized to [0, 1] range
Statistical properties are preserved while protecting original values

Categorical Features (9)

cat_1 through cat_9 - Categorical features
Values have been hash-based obfuscated (e.g., cat_1_a3f2b8c9)

Additional Fields

user_id - Obfuscated user identifier, essential to split data per user.
target - Binary prediction target (0/1)

Requirements

python >= 3.8
tensorflow >= 2.8
pandas >= 1.3
numpy >= 1.20
scikit-learn >= 1.0

Install dependencies:

pip install tensorflow pandas numpy scikit-learn

Usage

Quick Start

Default run will run the model for 5 users.

python model.py

Run the model with 2, 5, 10, 50, 100, or 500 users:

python model.py --num_users 2 --data_file data_sample.csv
python model.py --num_users 5 --data_file data_sample.csv
python model.py --num_users 10 --data_file data_sample.csv

Note that in AdFL paper, we run the model for 50, 100, and 500 users, however for demonstration here we provided 2, 5, and 10 users due to limited data samples publically available.

User Configuration Options

The model file supports five user configurations:

Users	Command	Early stopping patience	Early stopping start	Min samples/user
5	`python model.py --num_users 5 --data_file data_sample.csv --rounds 1000 --experiments 10`	10	Round 20	100
10	`python model.py --num_users 10 --data_file data_sample.csv --rounds 1000 --experiments 10`	15	Round 30	100
50	`python model.py --num_users 50 --data_file data_sample.csv --rounds 1000 --experiments 10`	20	Round 50	150
100	`python model.py --num_users 100 --data_file data_sample.csv --rounds 1000 --experiments 10`	30	Round 100	110
500	`python model.py --num_users 500 --data_file data_sample.csv --rounds 1000 --experiments 10`	40	Round 200	55

Command Line Arguments

--num_users         Number of federated clients (2, 5, 10, 50, 100, or 500) [default: 5]
--data_file         Path to obfuscated CSV data file [default: data_sample.csv]
--rounds            Maximum training rounds [default: 1000]
--experiments       Number of independent experiments [default: 1]
--batch_size        Mini-batch size for training [default: 32]

# Differential Privacy (Optional)
--use_dp            Enable differential privacy protection
--l2_norm_clip      L2 norm clipping threshold [default: 1.0]
--noise_multiplier  Noise multiplier for DP [default: 0.1, higher=more privacy]
--num_microbatches  Number of microbatches for DP [default: 1]

Differential Privacy Support

The model supports standard DP-SGD (Differential Privacy Stochastic Gradient Descent) to provide formal (ε, δ)-differential privacy guarantees during federated training.

Implementation: Uses tensorflow-privacy library (Abadi et al., 2016).

Installation

To use differential privacy, install tensorflow-privacy:

pip install tensorflow-privacy
# or
conda install -c conda-forge tensorflow-privacy

Usage

# Run with standard DP-SGD enabled
python model.py \
    --num_users 50 \
    --rounds 1000 \
    --use_dp \
    --noise_multiplier 0.5 \
    --l2_norm_clip 1.0

Important: This uses the standard DP-SGD algorithm which provides formal privacy guarantees. The model automatically uses model.fit() for DP compatibility.

DP Parameters

--use_dp: Enables differential privacy (requires tensorflow-privacy)
--l2_norm_clip: Maximum L2 norm of gradients (default: 1.0)
- Clips gradients to bound sensitivity
- Smaller values = more privacy but may slow convergence
--noise_multiplier: Amount of noise added to gradients (default: 0.1)
- Higher values = stronger privacy guarantees
- Typical range: 0.1 to 2.0

Privacy Budget

Differential privacy provides (ε, δ)-DP guarantees. The privacy budget depends on:

Noise multiplier
Number of training steps
Dataset size

Lower ε means stronger privacy (typical target: ε < 10)

Output Files

The model file generates:

{num_users}users_experiment_{id}.txt - Final metrics for each experiment
training_YYYYMMDD_HHMMSS.log - Detailed training logs

Model Architecture

Input Processing

Categorical features → Hash layer → Embedding layer (dim = min(50, bins/2))
Numeric features → Dense layer (64 units, ReLU)
Binary features → Dense layer (200 units, ReLU)

Neural Network

Concatenated embeddings
Dense layers: 500 → 250 → 100 → 50 → 30 neurons (all ReLU)
Output: Single sigmoid neuron (binary classification)

Federated Learning

Algorithm: Federated Averaging (FedAvg)
Client training: One epoch per round on local data
Aggregation: Simple weight averaging across all clients
Evaluation: Validation loss for early stopping, test metrics (Loss, Accuracy, AUC) for reporting

Data Format

The data_sample.csv file contains 1000 records with the following structure:

bin_1,bin_2,bin_3,bin_4,num_1,num_2,...,cat_1,cat_2,...,user_id,target
0,1,1,0,0.523,0.891,...,cat_1_a3f2b8c9,cat_2_7e4d1f5a,...,user_3a4e5f1e,0
1,1,0,1,0.234,0.456,...,cat_1_b8c2a9d3,cat_2_9f3e7b1c,...,user_b544f557,1
...

Data Obfuscation

All data has been obfuscated for publication:

Feature Names: Replaced with generic names (bin_X, num_X, cat_X)
Numeric Values: Factorized to [0,1] range with small Gaussian noise
Categorical Values: Hash-based anonymization (MD5 truncated)
User IDs: Hash-based anonymization
Target Labels: Preserved as binary (0/1)

The obfuscation preserves:

Statistical distributions
Correlations between features
Relative relationships within features
Model training dynamics

Example Results

After training, you will see output like:

Round 173, Global Model Metrics - Loss: 0.3227, Accuracy: 0.8848, AUC: 0.9301

Each experiment saves final metrics to a text file for analysis.

Data Sample Statistics

Total Records: 2,000
Unique Users: 10
Samples per User: 200 (balanced distribution)
Target Distribution:
- Class 0: 611 records (30.5%)
- Class 1: 1,389 records (69.5%)
Features: 27 (4 binary + 14 numeric + 9 categorical) + 1 user_id + 1 target

Note:

The user_id column is included for proper user-based data splitting but is not used as a model feature.
The provided dataset contains 2,000 records from 10 users. Configurations with more than 10 users (50, 100, 500) would require a larger dataset and are included in the code for completeness but cannot be tested with the provided sample data.

Privacy and Ethics

All data has been anonymized and obfuscated
No personally identifiable information (PII) is included
Feature names and values do not reveal privacy information
The obfuscation maintains scientific validity while protecting privacy
Differential privacy support available for additional privacy guarantees

Citation

If you use this code or data in your research, please cite:

@article{alemariAdFLInBrowserFederated2026,
	title = {{AdFL}: {In}-{Browser} {Federated} {Learning} for {Online} {Advertisement}},
	volume = {20},
	copyright = {Copyright (c) 2026 Association for the Advancement of Artificial Intelligence},
	issn = {2334-0770},
	shorttitle = {{AdFL}},
	url = {https://ojs.aaai.org/index.php/ICWSM/article/view/42624},
	doi = {10.1609/icwsm.v20i1.42624},
	language = {en},
	number = {1},
	urldate = {2026-06-01},
	journal = {Proceedings of the International AAAI Conference on Web and Social Media},
	author = {Alemari, Ahmad and Sen, Pritam and Borcea, Cristian},
	month = may,
	year = {2026},
	pages = {45--57},
}

See CITATION.bib for the complete citation.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Installation

For easy installation of all dependencies:

pip install -r requirements.txt

See requirements.txt for the complete list of dependencies.

Contact

For questions about the code or data, please open an issue on the GitHub repository.

Note: The provided data sample (2,000 records, 10 users) is for demonstration, validation, and reproducibility purposes. All data has been obfuscated to protect privacy information while maintaining the statistical properties necessary for scientific reproducibility.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
CITATION.bib		CITATION.bib
LICENSE		LICENSE
README.md		README.md
data_sample.csv		data_sample.csv
datasheet_of_dataset.pdf		datasheet_of_dataset.pdf
model.py		model.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

AdFL In-browser Federated Learning for Online Advertisement - Model and Data samples

Contents

Features

Binary Features (4)

Numeric Features (14)

Categorical Features (9)

Additional Fields

Requirements

Usage

Quick Start

User Configuration Options

Command Line Arguments

Differential Privacy Support

Installation

Usage

DP Parameters

Privacy Budget

Output Files

Model Architecture

Input Processing

Neural Network

Federated Learning

Data Format

Data Obfuscation

Example Results

Data Sample Statistics

Privacy and Ethics

Citation

License

Installation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages