HTLAⁿ: Hierarchical Text-Label Association

Implementation for the 2024 IEEE 11th International Conference on Data Science and Advanced Analytics (DSAA) accepted paper "Local Hierarchy-Aware Text-Label Association for Hierarchical Text Classification" paper-link

Requirements

Python >= 3.6
torch >= 1.6.0
transformers >= 4.30.2
Below libraries only if you want to run on GAT/GCN as the graph encoder
- torch-geometric == 2.4.0
- torch-sparse == 0.6.17
- torch-scatter == 2.1.1

Data

All datasets are publically available and can be accessed at WOS, RCV1-V2 and NYT.
We followed the specific details mentioned in the contrastive-htc repository to obtain and preprocess the original datasets (WOS, RCV1-V2, and NYT).
After accessing the dataset, run the scripts in the folder preprocess for each dataset separately to obtain tokenized version of dataset and the related files. These will be added in the data/x folder where x is the name of dataset with possible choices as: wos, rcv and nyt.
Detailed steps regarding how to obtain and preprocess each dataset are mentioned in the readme file of preprocess folder
For reference, we have added tokenized versions of the WOS and NYT datasets along with their related files in the data folder. The RCV1-V2 dataset exceeds 400 MB in size, so it couldn't be uploaded due to GitHub's file size limits.

Train

The train.py can be used to train all the models by setting different arguments.

For HTLAⁿ (does Hierarchical Text Classification with Margin Separation Loss (MSL))

python train.py --name='ckp_htla' --batch 10 --data='wos' --graph 1 --graph_type='graphormer' --msl 1 --msl_pen 1 --mg_list 0.1 0.1

Some Important arguments:

--name name of directory in which your model will be saved. For e.g. the above model will be saved in ./HTLA/data/wos/ckp_htla
--data name of dataset directory which contains your data and related files. Possible options are 'wos', 'rcv' and 'nyt'
--graph whether to use graph encoder
--graph_type type of graph encoder. Possible choices are 'graphormer', 'GCN', and 'GAT'. HTLA uses graphormer as the graph encoder. The code for graph encoder is in the script graph.py
--msl whether Margin Separation Loss required or not. The code for MSL is in criterion.py.
--msl_pen weight for the MSL component (we set it to 1 for all datasets)
--mg_list margin distance for each level. (We use 0.1 as margin distance for each level in all datasets).
- For rcv: --mg_list 0.1 0.1 0.1
- For nyt: --mg_list 0.1 0.1 0.1 0.1 0.1 0.1
  - Note: For RCV and NYT, the last level contains only 1 and 2 labels, respectively, so MSL is not applied there.
The node feature is fixed as 768 to match the text feature size and is not included as run time argument

For BERT-Graphormer (does Hierarchical Text Classification without MSL)

python train.py --name='ckp_bgrapho' --batch 10 --data='wos' --graph 1 --graph_type='graphormer' --msl 0

For BERT (does flat multi-label classification)

python train.py --name='ckp_bert' --batch 10 --data='wos' --graph 0

Test

To run the trained model on test set run the script test.py
python test.py --name ckp_htla --data wos --extra _macro

Some Important arguments

--name name of the directory which contains the saved checkpoint. The checkpoint is saved in ../HTLA/data/wos/ when working with WOS dataset
--data name of dataset directory which contains your data and related files
--extra two checkpoints are kept based on best macro-F1 and micro-F1 respectively. The possible choices are _macro and _micro to choose from the two checkpoints

Citation

If you find our work helpful, please cite it using the following BibTeX entry:

@INPROCEEDINGS{10722840,
  author={Kumar, Ashish and Toshniwal, Durga},
  booktitle={2024 IEEE 11th International Conference on Data Science and Advanced Analytics (DSAA)}, 
  title={Local Hierarchy-Aware Text-Label Association for Hierarchical Text Classification}, 
  year={2024},
  volume={},
  number={},
  pages={1-10},
  doi={10.1109/DSAA61799.2024.10722840}}

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
data		data
preprocess		preprocess
README.md		README.md
criterion.py		criterion.py
eval.py		eval.py
graph.py		graph.py
model.py		model.py
optim.py		optim.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HTLAⁿ: Hierarchical Text-Label Association

Requirements

Data

Train

For HTLAⁿ (does Hierarchical Text Classification with Margin Separation Loss (MSL))

For BERT-Graphormer (does Hierarchical Text Classification without MSL)

For BERT (does flat multi-label classification)

Test

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

havelhakimi/HTLA-n

Folders and files

Latest commit

History

Repository files navigation

HTLAn: Hierarchical Text-Label Association

Requirements

Data

Train

For HTLAn (does Hierarchical Text Classification with Margin Separation Loss (MSL))

For BERT-Graphormer (does Hierarchical Text Classification without MSL)

For BERT (does flat multi-label classification)

Test

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

HTLAⁿ: Hierarchical Text-Label Association

For HTLAⁿ (does Hierarchical Text Classification with Margin Separation Loss (MSL))

Packages