-
Notifications
You must be signed in to change notification settings - Fork 17
Open
Description
I sometimes encounter a situation where there are very few examples of subdomains for a certain domain, for example, there is only www.some_domain.com or subdomain.com and there is no more data.
Two simple ideas arise:
- Before tokenization, add fake subdomains that will provide additional information for the model and possibly improve search results. For example
[git, gitlab, ...], or[ftp, mail, ...]. It seems like a difficult question, which subdomains to choose. Perhaps it is worth considering the distribution of the training data. - Enable the option to automatically restart subdomain scanning using information from newly found subdomains. Sure, you could get a new list of domains, paste it into
example_input.txt, and restart, but that doesn't seem very convenient. You can continue searching in a loop until no new domains appear.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels