Skip to content

crowd4u/COLLAPSE

Repository files navigation

Aggregating Labels from Humans and AIs with Asymmetric Performance

This repository provides supplementary material for the paper "Aggregating Labels from Humans and AIs with Asymmetric Performance," currently under peer review.

Appendix of the paper

Please see appendix.pdf.

EXP1: COMPREHENSIVE BENCHMARK

We provide a Docker container for easy reproduction.

FYI: You can use jupyter lab on http://localhost:8008/ when running this container (Please check the token following command docker exec -it collapse jupyter server list).

Re-Run this experiment with the same data of the paper

$ docker compose up -d
$ docker exec -it collapse bash
$ cd exp1
$ rm -r results
$ rm -r results_cbcc
$ rm -r results_human
$ rm -r results_human_cbcc
$ mkdir results results_cbcc results_human results_human_cbcc
$ python exp.py

Note: It will take 2-3 weeks until all the experiments are completed (because it requires over 30,000 runs).

CBCC cannot be run in a non-Windows environment, so please run exp_cbcc.py on a Windows PC. We used Python 3.11.3 with the libraries listed in requirements_python3_win.txt.

However, data containing only human worker results (with num_ai=0) cannot be generated using this method. Please run notebooks/human_only_results.ipynb (in the container) and notebooks/human_only_results_cbcc.ipynb (in the Windows venv).

Data Preprocessing

For reproducibility, we provide the code used to process and generate human and AI responses in the preprocessing folder.

If you want to reproduce the experiment from the data generation process, you can regenerate the data using the following command.

$ docker compose up -d
$ docker exec -it collapse bash
$ cd exp1/preprocessing
$ python generate_human_responses.py
$ python generate_ai_responses.py

The preprocessing/raw_datasets directory contains the raw datasets before redundancy adjustment. These data were copied from the following publicly available data sources (excluding Tiny).

Visualize Results

We obtained a total of 38,125 lines of experimental results and provide a visualization tool to analyze them.

$ docker compose up -d
$ docker exec -it collapse bash
$ cd exp1/streamlit
$ streamlit run app.py --server.port 9999

Please visit http://localhost:9009/ to use this app.

You can regenerate a part of figures in Figure 5 and 8 by this app.

Case Studies & Analysis

We provide a notebook that allows you to re-run the case studies and analysis performed in our paper.

  • Confusion Matrices (Figure 6) : notebooks\cm_analysis.ipynb
  • Analysis of Convergence (Section 4.4.4) : notebooks\analysis_convergence.ipynb
  • Communities of CBCC (Figure 7): notebooks\CBCC_analysis.ipynb

EXP2: REAL-WORLD EVALUATION

Our final experimental results can be found in ds1task5_aggregation_results_all_processed.csv.

Methods

The truth_infer_methods folder contains implementations of various aggregation methods, copied from the following repositories with minimal modifications.

Method Link
CATD, LFC, Minmax, PM-CRH, ZenCrowd https://github.com/zhydhkcws/crowd_truth_infer
LA https://github.com/yyang318/LA_onepass

Re-Run the additinal Experiment

General Methods

For all methods except CBCC and Minmax, you can rerun the experiment with the following commands:

$ docker compose up -d
$ docker exec -it collapse bash
$ cd exp2
$ rm ds1task5_aggregation_results_*.csv
$ python 1_run_exp2.py

CBCC

For CBCC, after activating the appropriate virtual environment and run on Windows PC:

PS> cd exp2
PS> .\1_run_exp2_cbcc.bat

(We used Python 3.11.3 with the libraries listed in requirements_python3_win.txt.)

Minmax

For Minmax, you have to use MATLAB (paid) or MATLAB Online (free).

In this document, we introduce the way using MATLAB Online.

  1. Upload the truth_infer_methods\l_minimax-s folder to MATLAB online.
  2. Run the prepare.m
  3. Download and Save result_*.csv files in minmax_results_raw folder.
  4. Run 2_transform_minmax_results.ipynb in the container.

Note that the __ds_*__.csv files in the truth_infer_methods\l_minimax-s folder are generated by 1_run_exp2.py during its running in the main_scripts/ds1task5/workdir/datasets, which means that you can reproduct those files.

Obtain the summary (table data)

Run 3_summarize_results.ipynb in the container.

Data Preprocessing

For reproducibility, we provide the code used to process human and AI responses in the preprocessing folder.

If you want to reproduce the experiment from the data preparing process, you can regenerate the data using the following steps.

  1. Run preprocessing/1_human_data_transform.ipynb in the container.
  2. Run preprocessing/2_AI_data_transform.ipynb in the container.
  3. Run preprocessing/3_preprocess_responses.ipynb in the container.

The preprocessing/original directory contains the raw datasets. These data were copied from the following publicly available data sources.

Reproducting Figure 9

Run the 0_figure9.ipynb.

BDS / HS-DS implementations

The methods folder contains code for BDS, HS-DS, and CBCC in Crowd-Kit format.

Some of the code uses Crowd-Kit code under license. We would like to express our gratitude to the Crowd-Kit team. Additionally, we have made minimal modifications to the original CBCC code by the authors and included it in this repository. We would also like to express our gratitude to the authors of the CBCC code.

About

Supplementary material for the paper "Aggregating Labels from Humans and AIs with Asymmetric Performance"

Resources

Stars

Watchers

Forks

Contributors