Wiki-Analyzer is a program designed to retrieve data from Wikipedia and create a Word2Vec model based on that data.
-
Clone the repository:
git clone https://github.com/XXXFQ/Wiki-Analyzer.git cd Wiki-Analyzer -
Install the required packages:
poetry install
Run the following command to download the latest Japanese Wikipedia data:
curl https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2 -o jawiki-latest-pages-articles.xml.bz2Use the following command to extract article content from the downloaded XML data:
python -m wikiextractor.WikiExtractor jawiki-latest-pages-articles.xml.bz2To build the database file, run the appropriate script based on your operating system:
- For Windows: Run
Wiki-Analyzer.cmd. - For Linux: Run
Wiki-Analyzer.sh.
The generated database file will be saved in the data directory.
- Python: 3.10
Copyright (C) 2025 ARM