Skip to content

less-lab-uva/RBT4DNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

138 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RBT4DNN

RBT4DNN is a framework designed to generate high-quality test cases tailored to specific requirements, as described in the paper "RBT4DNN: Requirements-based Testing of Neural Networks". This repository showcases the usability of the proposed approach and enables the replication of the results presented in the paper.

Overview

Illustration of Training and Generated Data across different requirements

Precondition Real Images Generated Images
MNIST. The digit is a 7 and is very thick.
CelebA-HQ. The person is wearing eyeglasses and has black hair. cori cgen
SGSM. The ego is in the rightmost lane and is not in an intersection. sori sgen
Imagenet. The single animal has no limbs and no ears. iori igen

Glossary Term Preparation

MNIST Glossary Terms

The following table shows glossary terms for MNIST digits for different ranges of values for different Morphometric attributes with associated SNL text phrasing.

mnist_glossary_terms

We converted the Morpho-MNIST values from this source into binary labels. Each SNL text entry from the table above is mapped to its corresponding binary label using the Python script gt_label_mnist.py

CelebA-HQ Glossary Terms

The CelebA official website provides 40 attributes. From these, we selected a subset to define the glossary terms. To demonstrate that RBT4DNN can also work with unlabeled data, we re-labeled the dataset using the MiniCPM-o-2_6 model. Label generation can be reproduced with the script gt_label_generation_celebahq.py

SGSM Glossary Terms

For the SGSM dataset, we adopted the glossary terms from the paper: S3C: Spatial Semantic Scene Coverage for Autonomous Vehicles

ImageNet Glossary Terms

For ImageNet, we selected intermediate nodes from the WordNet taxonomic tree—a lexical database that organizes words based on semantic relationships—that correspond to animals (e.g., bird). We then defined preconditions as combinations of morphological features, following a Zoology textbook. Specifically, we used standard morphological features of animals (e.g., feathers, wings, hooves, antennae) that distinguish levels in zoological taxa (e.g., birds, insects), and adopted these features as our glossary terms. To obtain labels for these glossary terms, we used the MiniCPM-o-2_6 model. The labeling process can be reproduced with the script gt_label_generation_imagenet.py

Requirements (M = MNIST, C = CelebA-HQ, S = SGSM, generated images: data/images from loras)

Id Precondition Postcondition
M1 The digit is a 2 and has very low height label as 2
M2 The digit is a 3 and is very thick label as 3
M3 The digit is a 7 and is very thick label as 7
M4 The digit is a 9 and is very left leaning label as 9
M5 The digit is a 6 and is very right leaning label as 6
M6 The digit is a 0 and has very low height label as 0
M7 The digit is an 8 and is very thin or very thick label as 8
C1 The person is wearing eyeglasses and has black hair label as eyeglasses
C2 The person is wearing eyeglasses and has brown hair label as eyeglasses
C3 The person is wearing eyeglasses and has a mustache label as eyeglasses
C4 The person is wearing eyeglasses and has wavy hair label as eyeglasses
C5 The person is wearing eyeglasses and is bald label as eyeglasses
C6 The person is wearing eyeglasses and a hat label as eyeglasses
C7 The person is wearing eyeglasses and has a 5 o’clock shadow or goatee or mustache or beard or sideburns label as eyeglasses
S1 A vehicle is within 10 meters, in front, and in the same lane not accelerate
S2 The ego lane is controlled by a red or yellow light decelerate
S3 The ego lane is controlled by a green light, and no vehicle is in front, in the same lane, and within 10 meters accelerate
S4 The ego is in the rightmost lane and is not in an intersection do not steer to the right
S5 The ego is in the leftmost lane and is not in an intersection do not steer to the left
S6 A vehicle is in the lane to the left and within 7 meters do not steer to the left
S7 A vehicle is in the lane to the right and within 7 meters do not steer to the right
I1 The single real animal has feathers, wings, a beak, and two legs label as a hyponym of bird
I2 The single real animal has fur or hair, hooves, and four legs label as a hyponym of ungulate
I3 The single real animal has an exoskeleton, antennae, and six legs label as a hyponym of insect
I4 The single animal has no limbs and no ears label as a hyponym of snake

GTC Training

To train a Glossary Term Classifier (GTC), we first held out test data from the training data. To do that, we first computed the set, $D = [D_1, D_2,..., D_l]$, where $D_i$ is a set satisfying the requirement $i$. We also computed $\overline{D} = [\overline{D_1}, \overline{D_2},..., \overline{D_l}]$, where $\overline{D_i}$ is a set that does not satisfy requirement $i$. To construct the test set, we sorted $D$ and inserted $r$% from the smallest set of $D$ into the test set. Then, we moved to the next smaller set and inserted the data absent in the test set and previously considered sets. While inserting data from $D_i$, we also ensured that the amount of the data in the test set satisfying requirement $i$ is not more than $r$% of $D_i$. We repeated the same procedure for $\overline{D}$ with an additional checking that $D$ did not have the inserted data.

For each glossary term, we split the training data to include an equal number of randomly chosen inputs with and without the glossary term. We randomly held out 10% of the data with and without the glossary term for the validation set. Then, we trained the GTC model over the filtered train set and validated it using the validation set.

Result Data

The results of our experiments can be found in results/. The description of the files are as follows:

  • rq1_[dataset].json: Contains the test and generated data percentage match for the dataset. The dataset value can be mnist, celeba or sgsm.
  • rq1_[dataset]_fulldata.json: Contains the detail results for the dataset. For each requirement and for each image, this file contains the glossary term classifiers' decision with the image id.
  • [dataset]_rq2.txt: Contains the KID score for each requirements of a dataset.
  • [dataset]_rq3.txt: contains the detail JS divergence calculation for each requirements of a dataset.
  • [dataset]_r[req]_passrate.txt: Contains passrates for 10 repetitions of the RQ4 study. req is 1-6 for the MNIST and CelebA-HQ datasets and 1-7 for the SGSM dataset.

RQ1:

rq1

RQ2:

rq2

RQ3:

rq3

RQ4 and RQ5:

rq4_5

RQ6:

rq6 rq6_1

Usage

To reproduce the results, First create a python virtual environment (our used version is python 3.11.6). Then, activate the environment and inside the environment, run the following command:

pip install -r requirements.txt

Dataset Preparation

For MNIST, download the train images and train-morpho.csv from here MorphoMNIST. Put the images in 'data/MNIST/train_images'

For CelebA-HQ, download the images and the 40 attribute annotations list from here CelebAMask-HQ

For SGSM, download data from here SGSM

For ImageNet, download data from here ImageNet

To have train and test sets, run the following command:

python make_train_test_dataset.py

Note that, the sets are randomly generated, hence it is possible to observe little deviation in the results from our reported results. We reported the train and test sets used in our experiments in data so that one can reproduce the exact results.

Train LoRA

To train a LoRA, we first need to create a folder with images and their associated text for each requirement. Use the following codes to create the folder for MNIST and CelebA-HQ.

python create_mnist_image_datafolder.py

python create_celeba_image_datafolder.py

For SGSM, run the ipynb file: create_sgsm_image_datafolder.ipynb

For ImageNet, run python imagenet_precondition_filter.py

To train a LoRA, follow the steps from here LoRA

Generate Images from Lora

We provided 100 generated images from the per-requirement LoRA for each requirement of each dataset in images_from_loras. To produce more images, use the following code.

python sample_from_lora.py --dataset [dataset] --num_samples [num_of_samples_to_generate] --num_samples_per_epoch [num_of_samples_to_generate_at_once] --req [list_of_requirements]

Train Glossary Term Classifier

Use the following code to train the glossary term classifier for a glossary term.

python train_classifier.py --fet [glossary_term] --dataset [dataset]

Pretrained Models

The pretrained models (the classifier and the diffusion model checkpoints) used in the paper can be found here trained models

Run RQ Studies

To Produce the RQ results, run the following commands

For RQ1:

python rq1.py (for MNIST and CelebA-HQ)

run rq1_sgsm.ipynb for SGSM

For RQ2:

python rq2.py

For RQ3:

python rq3.py

For RQ4 and RQ5:

Before running SGSM experiment, download ComfyUI code from (https://github.com/comfyanonymous/ComfyUI) to the project directory. We used v0.2.6. Copy pretrained SGSM loras from trained models to ComfyUI/models/loras/ directory.

Follow the instructions in (https://comfyanonymous.github.io/ComfyUI_examples/flux/) and copy the flux models to the appropriate ComfyUI subdirectories.

Copy driving_model.ckpt from trained models to output directory.

python rq4_gen_samples.py --req r[1,6] --dataset [mnist,celeba]

python rq4_sgsm.py --req r[1,7]

python rq4_imagenet.py --req r[1,4]

python rq4_mnist.py

python rq4_celeba.py

For RQ6:

python rq6_imagenet.py

About

This repository is for the update of the project "RBT4DNN"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors