Add benchmark and dataset by EduardoTerres · Pull Request #34 · hadriansecurity/subwiz

EduardoTerres · 2025-11-18T12:49:22Z

Benchmark overview

We compare four subdomain discovery pipelines per apex domain that aim at testing the subwiz model in a realistic setting, were a user firsts discover a set of domains using traditional tools (e.g. Subfinder, Amass, Gobuster) and then use this output as input to the subwiz model. The problem is a difficult one since these tools might have already discovered most of the subdomains for a given apex domain, however it remains useful in a setting of exhaustive subdomain discovery.
Here are the specific quantities compared:

Subfinder subdomains: This baseline captures the unique resolved subdomains identified by running the subfinder tool for each apex domain. Since running subfinder across many domains can be very time-consuming, we have already performed this step and stored the results in benchmark_dataset.json, but feel free to do it yourself. This data is as of May 2025.
Subfinder subdomains --> subwiz v0: Starting with the subdomains discovered by Subfinder as seed inputs, subwiz v0 (version 0.4.1) generates additional candidate subdomains. This version only uses the input subdomains themselves as context for generation, without incorporating the apex domain information. We use max-recursion=1 given that v0 initially presented no recursion and these changes came later with v1.
Subfinder subdomains --> subwiz v1: Starting with the Subfinder results as seed inputs, subwiz v1 generates candidates using the improved model with apex domain context. Additionally, this pipeline uses the default recursive generation (max_recursion=5): newly discovered subdomains from each iteration are automatically fed back as inputs for the next iteration, allowing the model to discover deeper nested subdomains that might only be found by building upon previously generated candidates.
Subfinder subdomains --> subwiz v1 + maximum recursion: Starting with the Subfinder results as seed inputs, subwiz v1 generates candidates using the improved model with apex domain context. Additionally, this pipeline uses the maximum allowed recursion (max_recursion=50) to protray full potential.

klaasmeinke

lgtm

EduardoTerres added 2 commits November 18, 2025 13:47

Add benchmark and dataset

01ed882

fix couple titles

4c4b285

EduardoTerres requested a review from klaasmeinke November 18, 2025 12:49

EduardoTerres added 7 commits November 18, 2025 13:56

Fix version naming in the text

e5c124f

nit

254e91f

correct version note

b8b556a

dont be so dramatic

f3f9d5e

small fix in benchmark overview

1f3d76f

subfinder data retrieval note

8436c13

Filter outliers

cc7c7e7

klaasmeinke approved these changes Dec 18, 2025

View reviewed changes

klaasmeinke merged commit 36e66ed into main Dec 18, 2025
7 checks passed

klaasmeinke deleted the benchmark branch December 18, 2025 10:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add benchmark and dataset#34

Add benchmark and dataset#34
klaasmeinke merged 9 commits intomainfrom
benchmark

EduardoTerres commented Nov 18, 2025 •

edited

Loading

Uh oh!

klaasmeinke left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

EduardoTerres commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark overview

Uh oh!

klaasmeinke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EduardoTerres commented Nov 18, 2025 •

edited

Loading