Requirements

Pukeko

Pukeko goes in the wild, create tailored wordlists and enumerate credentials from a local folder

Tired of using sort input.txt | uniq > output.txt I wanted to create a cross OS script that could read any possible file, take each word once, and list them all in a word-list.

Requirements

Python packages (required)

Package	Purpose	Install
`python-magic`	Detect plain text files by content	`pip install python-magic`
`pyxtxt`	Extract text from all document and image formats	`pip install pyxtxt`
`openai-whisper`	Transcribe audio and video locally	`pip install openai-whisper`
`colorama`	Colour terminal output on all platforms	`pip install colorama`

Install all at once:

pip install python-magic openai-whisper colorama pyxtxt

For broader format coverage install pyxtxt with optional extras:

pip install "pyxtxt[pdf,docx,presentation,spreadsheet,html,ocr]"

System tools (required for certain file types)

Tool	Purpose	Without it
Tesseract OCR	Extract text from images (`.jpg`, `.png`, `.gif`, `.tif`)	Images will be skipped
ffmpeg	Decode audio and video for Whisper	Audio/video will not work

Pukeko will warn you at startup if any system tool is missing.

How to Install

pip install python-magic openai-whisper colorama pyxtxt
Install Tesseract OCR for image support
Install ffmpeg for audio/video support

Troubleshooting:

python-magic: as if the World wasn't complicated enough, there are 2 'Magic' libraries. You can find the right one here on GitHub or here on pypi.python.org
openai-whisper: runs fully locally — no API key, no internet, no token limits. See the GitHub repo for details. A GPU is optional but speeds up transcription significantly.
pyxtxt: supports many formats out of the box; install with extras (pyxtxt[ocr], pyxtxt[pdf], etc.) for broader coverage. See the pyxtxt PyPI page for the full list.

If the situation gets tragic open an issue and I will help you troubleshooting

How to use it

Pukeko can currently parse:

Documents & images (via pyxtxt): '.csv', '.doc', '.docx', '.eml', '.epub', '.gif', '.htm', '.html', '.jpeg', '.jpg', '.json', '.log', '.msg', '.odt', '.pdf', '.png', '.pptx', '.ps', '.psv', '.rtf', '.tff', '.tif', '.tiff', '.tsv', '.txt', '.xls', '.xlsx'

Audio & video (via openai-whisper, transcribed locally): '.mp3', '.wav', '.ogg', '.flac', '.m4a', '.aac', '.wma', '.mp4', '.avi', '.mkv', '.mov', '.wmv', '.flv', '.webm', '.m4v'

Plain text files (via python-magic): any file identified as plain text by the system, e.g. .py, .js, .xml, shell scripts, config files, and other text-based formats.

The -model flag lets you choose the Whisper model size for audio/video transcription:

Model	Speed	Accuracy
`tiny`	fastest	lowest
`base`	fast	low
`small`	balanced	good (default)
`medium`	slow	very good
`large`	slowest	best

Example: python Pukeko.py -input /path/to/files -output wordlist.txt -model medium

Have a look at my YouTube presentation:

Future developent

On spare time my TODO list is:

add option -URL to create wordlists from a target web page like CeWL
add option -site to create wordlists from a target website
add option Leet (or 1337), also known as eleet or leetspeak (so many passwords are week because of leetspeak )
add multilanguage (pip install alphabet-detector)
add highlight HotWords in string
add e-mail to HotWords

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
Capture.PNG		Capture.PNG
LICENSE		LICENSE
Pukeko.py		Pukeko.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pukeko

Pukeko goes in the wild, create tailored wordlists and enumerate credentials from a local folder

Requirements

Python packages (required)

System tools (required for certain file types)

How to Install

Troubleshooting:

How to use it

Future developent

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pukeko

Pukeko goes in the wild, create tailored wordlists and enumerate credentials from a local folder

Requirements

Python packages (required)

System tools (required for certain file types)

How to Install

Troubleshooting:

How to use it

Future developent

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages