Skip to content

Add benchmark and dataset#34

Merged
klaasmeinke merged 9 commits intomainfrom
benchmark
Dec 18, 2025
Merged

Add benchmark and dataset#34
klaasmeinke merged 9 commits intomainfrom
benchmark

Conversation

@EduardoTerres
Copy link
Contributor

@EduardoTerres EduardoTerres commented Nov 18, 2025

Benchmark overview

We compare four subdomain discovery pipelines per apex domain that aim at testing the subwiz model in a realistic setting, were a user firsts discover a set of domains using traditional tools (e.g. Subfinder, Amass, Gobuster) and then use this output as input to the subwiz model. The problem is a difficult one since these tools might have already discovered most of the subdomains for a given apex domain, however it remains useful in a setting of exhaustive subdomain discovery.
Here are the specific quantities compared:

  • Subfinder subdomains: This baseline captures the unique resolved subdomains identified by running the subfinder tool for each apex domain. Since running subfinder across many domains can be very time-consuming, we have already performed this step and stored the results in benchmark_dataset.json, but feel free to do it yourself. This data is as of May 2025.

  • Subfinder subdomains --> subwiz v0: Starting with the subdomains discovered by Subfinder as seed inputs, subwiz v0 (version 0.4.1) generates additional candidate subdomains. This version only uses the input subdomains themselves as context for generation, without incorporating the apex domain information. We use max-recursion=1 given that v0 initially presented no recursion and these changes came later with v1.

  • Subfinder subdomains --> subwiz v1: Starting with the Subfinder results as seed inputs, subwiz v1 generates candidates using the improved model with apex domain context. Additionally, this pipeline uses the default recursive generation (max_recursion=5): newly discovered subdomains from each iteration are automatically fed back as inputs for the next iteration, allowing the model to discover deeper nested subdomains that might only be found by building upon previously generated candidates.

  • Subfinder subdomains --> subwiz v1 + maximum recursion: Starting with the Subfinder results as seed inputs, subwiz v1 generates candidates using the improved model with apex domain context. Additionally, this pipeline uses the maximum allowed recursion (max_recursion=50) to protray full potential.

Copy link
Collaborator

@klaasmeinke klaasmeinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@klaasmeinke klaasmeinke merged commit 36e66ed into main Dec 18, 2025
7 checks passed
@klaasmeinke klaasmeinke deleted the benchmark branch December 18, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants