RBT4DNN is a framework designed to generate high-quality test cases tailored to specific requirements, as described in the paper "RBT4DNN: Requirements-based Testing of Neural Networks". This repository showcases the usability of the proposed approach and enables the replication of the results presented in the paper.
The following table shows glossary terms for MNIST digits for different ranges of values for different Morphometric attributes with associated SNL text phrasing.
We converted the Morpho-MNIST values from this source into binary labels. Each SNL text entry from the table above is mapped to its corresponding binary label using the Python script gt_label_mnist.py
The CelebA official website provides 40 attributes. From these, we selected a subset to define the glossary terms. To demonstrate that RBT4DNN can also work with unlabeled data, we re-labeled the dataset using the MiniCPM-o-2_6 model. Label generation can be reproduced with the script gt_label_generation_celebahq.py
For the SGSM dataset, we adopted the glossary terms from the paper: S3C: Spatial Semantic Scene Coverage for Autonomous Vehicles
For ImageNet, we selected intermediate nodes from the WordNet taxonomic tree—a lexical database that organizes words based on semantic relationships—that correspond to animals (e.g., bird). We then defined preconditions as combinations of morphological features, following a Zoology textbook. Specifically, we used standard morphological features of animals (e.g., feathers, wings, hooves, antennae) that distinguish levels in zoological taxa (e.g., birds, insects), and adopted these features as our glossary terms. To obtain labels for these glossary terms, we used the MiniCPM-o-2_6 model. The labeling process can be reproduced with the script gt_label_generation_imagenet.py
Requirements (M = MNIST, C = CelebA-HQ, S = SGSM, generated images: data/images from loras)
| Id |
Precondition |
Postcondition |
|---|---|---|
| M1 | The digit is a 2 and has very low height | label as 2 |
| M2 | The digit is a 3 and is very thick | label as 3 |
| M3 | The digit is a 7 and is very thick | label as 7 |
| M4 | The digit is a 9 and is very left leaning | label as 9 |
| M5 | The digit is a 6 and is very right leaning | label as 6 |
| M6 | The digit is a 0 and has very low height | label as 0 |
| M7 | The digit is an 8 and is very thin or very thick | label as 8 |
| C1 | The person is wearing eyeglasses and has black hair | label as eyeglasses |
| C2 | The person is wearing eyeglasses and has brown hair | label as eyeglasses |
| C3 | The person is wearing eyeglasses and has a mustache | label as eyeglasses |
| C4 | The person is wearing eyeglasses and has wavy hair | label as eyeglasses |
| C5 | The person is wearing eyeglasses and is bald | label as eyeglasses |
| C6 | The person is wearing eyeglasses and a hat | label as eyeglasses |
| C7 | The person is wearing eyeglasses and has a 5 o’clock shadow or goatee or mustache or beard or sideburns | label as eyeglasses |
| S1 | A vehicle is within 10 meters, in front, and in the same lane | not accelerate |
| S2 | The ego lane is controlled by a red or yellow light | decelerate |
| S3 | The ego lane is controlled by a green light, and no vehicle is in front, in the same lane, and within 10 meters | accelerate |
| S4 | The ego is in the rightmost lane and is not in an intersection | do not steer to the right |
| S5 | The ego is in the leftmost lane and is not in an intersection | do not steer to the left |
| S6 | A vehicle is in the lane to the left and within 7 meters | do not steer to the left |
| S7 | A vehicle is in the lane to the right and within 7 meters | do not steer to the right |
| I1 | The single real animal has feathers, wings, a beak, and two legs | label as a hyponym of bird |
| I2 | The single real animal has fur or hair, hooves, and four legs | label as a hyponym of ungulate |
| I3 | The single real animal has an exoskeleton, antennae, and six legs | label as a hyponym of insect |
| I4 | The single animal has no limbs and no ears | label as a hyponym of snake |
To train a Glossary Term Classifier (GTC), we first held out test data from the training data.
To do that, we first computed the set,
For each glossary term, we split the training data to include an equal number of randomly chosen inputs with and without the glossary term. We randomly held out 10% of the data with and without the glossary term for the validation set. Then, we trained the GTC model over the filtered train set and validated it using the validation set.
The results of our experiments can be found in results/. The description of the files are as follows:
- rq1_[dataset].json: Contains the test and generated data percentage match for the dataset. The dataset value can be mnist, celeba or sgsm.
- rq1_[dataset]_fulldata.json: Contains the detail results for the dataset. For each requirement and for each image, this file contains the glossary term classifiers' decision with the image id.
- [dataset]_rq2.txt: Contains the KID score for each requirements of a dataset.
- [dataset]_rq3.txt: contains the detail JS divergence calculation for each requirements of a dataset.
- [dataset]_r[req]_passrate.txt: Contains passrates for 10 repetitions of the RQ4 study. req is 1-6 for the MNIST and CelebA-HQ datasets and 1-7 for the SGSM dataset.
To reproduce the results, First create a python virtual environment (our used version is python 3.11.6). Then, activate the environment and inside the environment, run the following command:
pip install -r requirements.txt
For MNIST, download the train images and train-morpho.csv from here MorphoMNIST. Put the images in 'data/MNIST/train_images'
For CelebA-HQ, download the images and the 40 attribute annotations list from here CelebAMask-HQ
For SGSM, download data from here SGSM
For ImageNet, download data from here ImageNet
To have train and test sets, run the following command:
python make_train_test_dataset.py
Note that, the sets are randomly generated, hence it is possible to observe little deviation in the results from our reported results. We reported the train and test sets used in our experiments in data so that one can reproduce the exact results.
To train a LoRA, we first need to create a folder with images and their associated text for each requirement. Use the following codes to create the folder for MNIST and CelebA-HQ.
python create_mnist_image_datafolder.py
python create_celeba_image_datafolder.py
For SGSM, run the ipynb file: create_sgsm_image_datafolder.ipynb
For ImageNet, run python imagenet_precondition_filter.py
To train a LoRA, follow the steps from here LoRA
We provided 100 generated images from the per-requirement LoRA for each requirement of each dataset in images_from_loras. To produce more images, use the following code.
python sample_from_lora.py --dataset [dataset] --num_samples [num_of_samples_to_generate] --num_samples_per_epoch [num_of_samples_to_generate_at_once] --req [list_of_requirements]
Use the following code to train the glossary term classifier for a glossary term.
python train_classifier.py --fet [glossary_term] --dataset [dataset]
The pretrained models (the classifier and the diffusion model checkpoints) used in the paper can be found here trained models
To Produce the RQ results, run the following commands
For RQ1:
python rq1.py (for MNIST and CelebA-HQ)
run rq1_sgsm.ipynb for SGSM
For RQ2:
python rq2.py
For RQ3:
python rq3.py
For RQ4 and RQ5:
Before running SGSM experiment, download ComfyUI code from (https://github.com/comfyanonymous/ComfyUI) to the project directory. We used v0.2.6. Copy pretrained SGSM loras from trained models to ComfyUI/models/loras/ directory.
Follow the instructions in (https://comfyanonymous.github.io/ComfyUI_examples/flux/) and copy the flux models to the appropriate ComfyUI subdirectories.
Copy driving_model.ckpt from trained models to output directory.
python rq4_gen_samples.py --req r[1,6] --dataset [mnist,celeba]
python rq4_sgsm.py --req r[1,7]
python rq4_imagenet.py --req r[1,4]
python rq4_mnist.py
python rq4_celeba.py
For RQ6:
python rq6_imagenet.py















