RBT4DNN

RBT4DNN is a framework designed to generate high-quality test cases tailored to specific requirements, as described in the paper "RBT4DNN: Requirements-based Testing of Neural Networks". This repository showcases the usability of the proposed approach and enables the replication of the results presented in the paper.

Overview

Illustration of Training and Generated Data across different requirements

Precondition	Real Images	Generated Images
MNIST. The digit is a 7 and is very thick.
CelebA-HQ. The person is wearing eyeglasses and has black hair.
SGSM. The ego is in the rightmost lane and is not in an intersection.
Imagenet. The single animal has no limbs and no ears.

Glossary Term Preparation

MNIST Glossary Terms

The following table shows glossary terms for MNIST digits for different ranges of values for different Morphometric attributes with associated SNL text phrasing.

We converted the Morpho-MNIST values from this source into binary labels. Each SNL text entry from the table above is mapped to its corresponding binary label using the Python script gt_label_mnist.py

CelebA-HQ Glossary Terms

The CelebA official website provides 40 attributes. From these, we selected a subset to define the glossary terms. To demonstrate that RBT4DNN can also work with unlabeled data, we re-labeled the dataset using the MiniCPM-o-2_6 model. Label generation can be reproduced with the script gt_label_generation_celebahq.py

SGSM Glossary Terms

For the SGSM dataset, we adopted the glossary terms from the paper: S3C: Spatial Semantic Scene Coverage for Autonomous Vehicles

ImageNet Glossary Terms

For ImageNet, we selected intermediate nodes from the WordNet taxonomic tree—a lexical database that organizes words based on semantic relationships—that correspond to animals (e.g., bird). We then defined preconditions as combinations of morphological features, following a Zoology textbook. Specifically, we used standard morphological features of animals (e.g., feathers, wings, hooves, antennae) that distinguish levels in zoological taxa (e.g., birds, insects), and adopted these features as our glossary terms. To obtain labels for these glossary terms, we used the MiniCPM-o-2_6 model. The labeling process can be reproduced with the script gt_label_generation_imagenet.py

Requirements (M = MNIST, C = CelebA-HQ, S = SGSM, generated images: data/images from loras)

Id	Precondition	Postcondition
M1	The digit is a 2 and has very low height	label as 2
M2	The digit is a 3 and is very thick	label as 3
M3	The digit is a 7 and is very thick	label as 7
M4	The digit is a 9 and is very left leaning	label as 9
M5	The digit is a 6 and is very right leaning	label as 6
M6	The digit is a 0 and has very low height	label as 0
M7	The digit is an 8 and is very thin or very thick	label as 8
C1	The person is wearing eyeglasses and has black hair	label as eyeglasses
C2	The person is wearing eyeglasses and has brown hair	label as eyeglasses
C3	The person is wearing eyeglasses and has a mustache	label as eyeglasses
C4	The person is wearing eyeglasses and has wavy hair	label as eyeglasses
C5	The person is wearing eyeglasses and is bald	label as eyeglasses
C6	The person is wearing eyeglasses and a hat	label as eyeglasses
C7	The person is wearing eyeglasses and has a 5 o’clock shadow or goatee or mustache or beard or sideburns	label as eyeglasses
S1	A vehicle is within 10 meters, in front, and in the same lane	not accelerate
S2	The ego lane is controlled by a red or yellow light	decelerate
S3	The ego lane is controlled by a green light, and no vehicle is in front, in the same lane, and within 10 meters	accelerate
S4	The ego is in the rightmost lane and is not in an intersection	do not steer to the right
S5	The ego is in the leftmost lane and is not in an intersection	do not steer to the left
S6	A vehicle is in the lane to the left and within 7 meters	do not steer to the left
S7	A vehicle is in the lane to the right and within 7 meters	do not steer to the right
I1	The single real animal has feathers, wings, a beak, and two legs	label as a hyponym of bird
I2	The single real animal has fur or hair, hooves, and four legs	label as a hyponym of ungulate
I3	The single real animal has an exoskeleton, antennae, and six legs	label as a hyponym of insect
I4	The single animal has no limbs and no ears	label as a hyponym of snake

GTC Training

To train a Glossary Term Classifier (GTC), we first held out test data from the training data. To do that, we first computed the set, $D = [D_1, D_2,..., D_l]$, where $D_i$ is a set satisfying the requirement $i$. We also computed $\overline{D} = [\overline{D_1}, \overline{D_2},..., \overline{D_l}]$, where $\overline{D_i}$ is a set that does not satisfy requirement $i$. To construct the test set, we sorted $D$ and inserted $r$% from the smallest set of $D$ into the test set. Then, we moved to the next smaller set and inserted the data absent in the test set and previously considered sets. While inserting data from $D_i$, we also ensured that the amount of the data in the test set satisfying requirement $i$ is not more than $r$% of $D_i$. We repeated the same procedure for $\overline{D}$ with an additional checking that $D$ did not have the inserted data.

For each glossary term, we split the training data to include an equal number of randomly chosen inputs with and without the glossary term. We randomly held out 10% of the data with and without the glossary term for the validation set. Then, we trained the GTC model over the filtered train set and validated it using the validation set.

Result Data

The results of our experiments can be found in results/. The description of the files are as follows:

rq1_[dataset].json: Contains the test and generated data percentage match for the dataset. The dataset value can be mnist, celeba or sgsm.
rq1_[dataset]_fulldata.json: Contains the detail results for the dataset. For each requirement and for each image, this file contains the glossary term classifiers' decision with the image id.
[dataset]_rq2.txt: Contains the KID score for each requirements of a dataset.
[dataset]_rq3.txt: contains the detail JS divergence calculation for each requirements of a dataset.
[dataset]_r[req]_passrate.txt: Contains passrates for 10 repetitions of the RQ4 study. req is 1-6 for the MNIST and CelebA-HQ datasets and 1-7 for the SGSM dataset.

RQ1:

RQ2:

RQ3:

RQ4 and RQ5:

RQ6:

Usage

To reproduce the results, First create a python virtual environment (our used version is python 3.11.6). Then, activate the environment and inside the environment, run the following command:

pip install -r requirements.txt

Dataset Preparation

For MNIST, download the train images and train-morpho.csv from here MorphoMNIST. Put the images in 'data/MNIST/train_images'

For CelebA-HQ, download the images and the 40 attribute annotations list from here CelebAMask-HQ

For SGSM, download data from here SGSM

For ImageNet, download data from here ImageNet

To have train and test sets, run the following command:

python make_train_test_dataset.py

Note that, the sets are randomly generated, hence it is possible to observe little deviation in the results from our reported results. We reported the train and test sets used in our experiments in data so that one can reproduce the exact results.

Train LoRA

To train a LoRA, we first need to create a folder with images and their associated text for each requirement. Use the following codes to create the folder for MNIST and CelebA-HQ.

python create_mnist_image_datafolder.py

python create_celeba_image_datafolder.py

For SGSM, run the ipynb file: create_sgsm_image_datafolder.ipynb

For ImageNet, run python imagenet_precondition_filter.py

To train a LoRA, follow the steps from here LoRA

Generate Images from Lora

We provided 100 generated images from the per-requirement LoRA for each requirement of each dataset in images_from_loras. To produce more images, use the following code.

python sample_from_lora.py --dataset [dataset] --num_samples [num_of_samples_to_generate] --num_samples_per_epoch [num_of_samples_to_generate_at_once] --req [list_of_requirements]

Train Glossary Term Classifier

Use the following code to train the glossary term classifier for a glossary term.

python train_classifier.py --fet [glossary_term] --dataset [dataset]

Pretrained Models

The pretrained models (the classifier and the diffusion model checkpoints) used in the paper can be found here trained models

Run RQ Studies

To Produce the RQ results, run the following commands

For RQ1:

python rq1.py (for MNIST and CelebA-HQ)

run rq1_sgsm.ipynb for SGSM

For RQ2:

python rq2.py

For RQ3:

python rq3.py

For RQ4 and RQ5:

Before running SGSM experiment, download ComfyUI code from (https://github.com/comfyanonymous/ComfyUI) to the project directory. We used v0.2.6. Copy pretrained SGSM loras from trained models to ComfyUI/models/loras/ directory.

Follow the instructions in (https://comfyanonymous.github.io/ComfyUI_examples/flux/) and copy the flux models to the appropriate ComfyUI subdirectories.

Copy driving_model.ckpt from trained models to output directory.

python rq4_gen_samples.py --req r[1,6] --dataset [mnist,celeba]

python rq4_sgsm.py --req r[1,7]

python rq4_imagenet.py --req r[1,4]

python rq4_mnist.py

python rq4_celeba.py

For RQ6:

python rq6_imagenet.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RBT4DNN

Overview

Illustration of Training and Generated Data across different requirements