Skip to content

Vplatform v2#49

Open
rom1504 wants to merge 7 commits into
mainfrom
vplatform_v2
Open

Vplatform v2#49
rom1504 wants to merge 7 commits into
mainfrom
vplatform_v2

Conversation

@rom1504

@rom1504 rom1504 commented Nov 15, 2023

Copy link
Copy Markdown
Owner

No description provided.

does not actually work well
@rom1504 rom1504 mentioned this pull request Nov 15, 2023
Comment thread cc2dataset/main.py
def is_link_suitable(link, extractors):
"""Check if link is valid given an extractor."""
try:
return any([ie.suitable(link) for ie in extractors])

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't make a huge difference but we can at least early stop if any extractor accepts it

Comment thread cc2dataset/main.py
import unicodedata

generic_extractors = [yt_dlp.extractor.generic.GenericIE, yt_dlp.extractor.lazy_extractors.GenericIE]
porn_patterns = ["porn", "adult", "xxx", "xvideos", "xhamster", "redtube", "xtube", "xstream", "xfileshare", "sex"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "tnaflix" in the list of porn patterns

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need also to exclude http://xnxx.com, i.e. add "xnxx" in the list of porn patterns

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Important to finish

Development

Successfully merging this pull request may close these issues.

2 participants