Skip to content

Conversation

@vladimish
Copy link

@vladimish vladimish commented Dec 30, 2025

Hello! I've just stumbled upon this project while looking for a solution to dump a few wikis to my local drive. I've tried to dump one, using a command like

wikiteam3dumpgenerator --api https://whiteknuckle.miraheze.org/w/api.php --xml --images

Suddenly, I've encountered an error at the end of an export process

->  Downloaded 2850 pages

    Module:Infobox/doc, 2 edits
    Module:Key, 1 edit
    Module:Mbox, 1 edit
    Module:Mbox/data, 1 edit
    Module:Mbox/data/doc, 1 edit
    Module:Mbox/doc, 1 edit
    Module:Namespace detect, 1 edit
    Module:Namespace detect/config, 1 edit
    Module:Namespace detect/config/doc, 1 edit
    Module:Namespace detect/data, 1 edit
           
->  Downloaded 2860 pages

    Module:Namespace detect/data/doc, 1 edit
    Module:Namespace detect/doc, 1 edit
    Module:Navbox, 2 edits
    Module:Navbox/doc, 1 edit
    Module:Quote, 1 edit
    Module:Quote/doc, 1 edit
    Module:RandomImage, 1 edit
XML dump saved at... whiteknuckle.miraheze.org_w-20251229-history.xml
)Retrieving image filenames
Using API to retrieve image names...
    Found 1675 images                                            so far...
Sorting image filenames (1675 images)...
Done
Image metadata (images.txt) saved at: whiteknuckle.miraheze.org_w-20251229-images.txt
Estimated size of all images (images.txt): 1241180249 bytes (1.16 GiB)
--assert_max_images: None, passed
--assert_max_images_bytes: None, passed
Retrieving images...
Creating "whiteknuckle.miraheze.org_w-20251229-wikidump/images" directory
Creating "whiteknuckle.miraheze.org_w-20251229-wikidump/images_mismatch" directory
Traceback (most recent call last):g                                       
  File "/Users/vladimish/3rdparty/wikiteam3/.env/bin/wikiteam3dumpgenerator", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "/Users/vladimish/3rdparty/wikiteam3/.env/lib/python3.14/site-packages/wikiteam3/dumpgenerator/__init__.py", line 4, in main
    DumpGenerator()
    ~~~~~~~~~~~~~^^
  File "/Users/vladimish/3rdparty/wikiteam3/.env/lib/python3.14/site-packages/wikiteam3/dumpgenerator/dump/generator.py", line 86, in __init__
    DumpGenerator.createNewDump(config=config, other=other)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/vladimish/3rdparty/wikiteam3/.env/lib/python3.14/site-packages/wikiteam3/dumpgenerator/dump/generator.py", line 119, in createNewDump
    Image.generate_image_dump(
    ~~~~~~~~~~~~~~~~~~~~~~~~~^
        config=config, other=other, images=images, session=other.session
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/Users/vladimish/3rdparty/wikiteam3/.env/lib/python3.14/site-packages/wikiteam3/dumpgenerator/dump/image/image.py", line 267, in generate_image_dump
    check_response(r)
    ~~~~~~~~~~~~~~^^^
  File "/Users/vladimish/3rdparty/wikiteam3/.env/lib/python3.14/site-packages/wikiteam3/dumpgenerator/dump/image/image.py", line 39, in check_response
    raise RuntimeError("Found cf-polished header in response, use --bypass-cdn-image-compression to bypass it")
RuntimeError: Found cf-polished header in response, use --bypass-cdn-image-compression to bypass it

following the advice in the logs to use --bypass-cdn-image-compression, I've re-run the dumper with a command like that

wikiteam3dumpgenerator --api https://whiteknuckle.miraheze.org/w/api.php --xml --images --bypass-cdn-image-compression

but unfortunately, result was the same. Apparently setting _wiki_t param does not prevent cf caching on some configurations, or I have some other network issues.
Either way, the original discussion under bypass-cdn-image-compression flag PR indicates, that showing a warning here would've been more preferable. Fully agree here, I doubt that most users would mind having CDN compression on the images and only if they want to bypass it - they should use the flag.

@yzqzss
Copy link
Member

yzqzss commented Dec 30, 2025

Sorry, I can't reproduce the issue.
--bypass-cdn-image-compression is working fine with the wiki on my machine, hmmm.

@vladimish
Copy link
Author

Damn, it's now working for me for the same wiki, both with and without --bypass-cdn-image-compression. The logs I attached to the PR were taken yesterday, so apparently something has changed in either my or CF's network config since then.
Then this does not sound that critical. If you're fine with the current default behavior of throwing an exception without the flag - this PR could be closed for now I suppose.
Sorry for taking your time. Have a happy new year!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants