take a look at the presentation!
This work is based on the novel approach for the anonymization of time series with a special focus on the pattern loss, presented by this paper
foo@bar:~$ python3 -m pip install -r requirements.txt- numpy==1.18.1
- pandas==1.0.3
- loguru==0.5.1
- saxpy==1.0.1.dev167 (https://github.com/seninp/saxpy)
- pathlib==1.0.1
- matplotlib==3.2.2
foo@bar:~$ python3 kp-anonymity.py algorithm k_value p_value paa_value dataset_path dataset_output_path- kp-anonimity : main program
- algorithm : choose between naive or kapra approach
- k_value : value of k-anonymity
- p_value : value of p-anonymity, pattern
- paa_value : to reduce the dimensionality of patterns (see how)
- dataset_path : csv input file
- dataset_output_path : csv output file
foo@bar:~$ python3 kp-anonymity.py kapra 10 2 5 Dataset/Input/Sales_Transaction_Dataset_Weekly_Final.csv Dataset/Anonymized/output.csvTo compare time scalability between naive and kapra approaches, launch the test utility:
foo@bar:~$ cd Utility
foo@bar:~$ ./test.sh- Dataset: contains the datasets used in my tests
- Input: input datasets for the tool
- Anonymized: store the output of the tool
- Paper: contains the two papers studied for this project
- Utility: contains scripts for verify time efficiency and for resetting the tool
- kp-anonymity.py: main script, which implement naive and kapra algorithms
- node.py: manage the create-tree phase of both algorithms
- dataset_anonymized.py: manage the printing and anonymized value replacing of the output dataset
- create_dataset.py: script for generating subdataset of the News Social Dataset, used for measure time scalability of both algorithms
- requirements.txt: list of necessary packages
Author: Giorgio Rossi, student of Computer Engineering (LM) - UNIGE - a.y. 2019/2020.
Final project of the course Data Protection and Privacy. Work based on Davide Caputo's repository.