Skip to content

VeriFIT/rsaregex-artifact

Repository files navigation

Towards Efficient Matching of Regexes with Backreferences using Register Set Automata

This is an artifact reproducing the experimental results of the above paper.

Getting Started

  1. Import the rsaregex-artifact.ova file into a VM player.

We used VirtualBox (https://www.virtualbox.org/wiki/Downloads). To import the VM in VirtualBox, File -> Import Appliance -> select the .ova file.

  1. Start the VM, log in, and open terminal (pinned to the dash).

username: artifact

password: artifact

The VM requires at least 8 GiB of memory, a single core, and at least 25 GiB of storage available.

  1. Enter the directory /home/artifact/rsaregex-artifact with cd rsaregex-artifact.

  2. Run the smoke test experiments by running ./run-very-short.sh.

After the script finishes running, the output can be found in ./experiments/short-run-results.txt. This also generates scatterplots comparing our tool to the other tools. They can be found in the experiments directory as vs_*.pdf.

The contents of the short-run-results.txt file should look similar to this:

time:  2026-03-17 19:46:01.133194
# of attacks: 9
| Tool        |   < 1 s |   1--5 s |   5--10 s |   10--50 s |   50--100 s |   > 100 s (TOs) |   Errors |   > 1 s |
|-------------|---------|----------|-----------|------------|-------------|-----------------|----------|---------|
| rsaregexMat |       9 |        0 |         0 |          0 |           0 |               0 |        0 |       0 |
| grep        |       9 |        0 |         0 |          0 |           0 |               0 |        0 |       0 |
| re          |       8 |        0 |         1 |          0 |           0 |               0 |        0 |       1 |
| pcre2       |       9 |        0 |         0 |          0 |           0 |               0 |        0 |       0 |
| js          |       7 |        0 |         0 |          0 |           0 |               0 |        2 |       0 |
| java        |       6 |        0 |         0 |          0 |           0 |               0 |        3 |       0 |
| net         |       6 |        0 |         0 |          1 |           0 |               0 |        2 |       1 |
| method      |      mean |   median |   std. dev |
|-------------|-----------|----------|------------|
| rsaregexMat | 0.216272  | 0.196367 | 0.111951   |
| grep        | 0.0244444 | 0.01     | 0.0433333  |
| re          | 0.747778  | 0.01     | 2.09885    |
| pcre2       | 0.02      | 0.01     | 0.03       |
| js          | 0.144286  | 0.03     | 0.311226   |
| java        | 0.0566667 | 0.055    | 0.00816497 |
| net         | 1.82143   | 0.03     | 4.66514    |

Full experiment reproduction

To reproduce the full experimental results, simply run the ./run-experiments-full.sh script (from the rsaregex-artifact directory). This should take about 10 hours in total.

After the script finishes, experiments/evaluation-results.txt will contain a textual table as in the paper (Table 1), and pdf scatterplots are generated comparing our tool to all the other tools (stored in experiments/vs_TOOL.pdf) (Fig. 6).

Note that this overwrites the scatterplots from the previous section.

Artifact structure

The directories packages and pi-packages contain the deb and pip packages necessary for the artifact to run. The directory tools-install-files contains the sources for the the tools that our approach was compared against, it also contains shell scripts that install the tools (install-*.sh). The directory rsaregex is the python package for our tool.

The experiments directory contains the pycobench and pyco_proc scripts (see https://github.com/VeriFIT/pycobench) for benchmark running and processing results respectively. The file config.yaml tells pycobench which tools to run and with which arguments. It also contains the run-measurements.sh, run-very-short.sh, and eval.py scripts. The run-measurements.sh script calls pycobench with correct arguments to reproduce the results, and run-very-short.sh does the same with just a few regexes. The eval.py script generates the tables and scatterplots from the paper (provided .csv files of results already exist).

Inside the experiments directory are the directories inputs, matchers, and results. The inputs directory contains json files with regexes and malicious inputs, as well as text files labeled with .input which contain line numbers to be fed to the benchmarking script. The matchers directory contains *-wrap.py files which are called by the benchmarking script. The directory also contains source files for minimal matchers in their respective languages (java, js, .NET) and scripts that compile them if necessary (compile-java.sh, compile-dotnet.sh). The results folder contains the script create-csvs.sh that simply calls pyco_proc with the proper arguments after running the benchmarks to generate .csv files from the results generated by pycobench. The pycobench results are by default stored in the files (lf | rengar).output in the results directory.

Note that running ANY experiment overwrites the results stored in results/(lf | rengar).(output | csv).

Installation on a fresh VM

This section provides a guide on how to make the artifact work on a VM without the tools preinstalled. We suggest avoiding running this outside a VM, since the tools are installed system-wide.

The installation assumes that make and gcc are installed already on the machine.

The following commands install and compile tools and packages necessary for the experiments and their evaluation:

cd rsaregex-artifact
./install.sh

Requires sudo password to be filled in.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages