Skip to content

XXXFQ/Wiki-Analyzer

Repository files navigation

Wiki-Analyzer

Wiki-Analyzer is a program designed to retrieve data from Wikipedia and create a Word2Vec model based on that data.

Setup

  1. Clone the repository:

    git clone https://github.com/XXXFQ/Wiki-Analyzer.git
    cd Wiki-Analyzer
  2. Install the required packages:

    poetry install

Downloading Wikipedia Data

Run the following command to download the latest Japanese Wikipedia data:

curl https://dumps.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2 -o jawiki-latest-pages-articles.xml.bz2

Extracting Articles from the Dump Data

Use the following command to extract article content from the downloaded XML data:

python -m wikiextractor.WikiExtractor jawiki-latest-pages-articles.xml.bz2

Building the Database File

To build the database file, run the appropriate script based on your operating system:

  • For Windows: Run Wiki-Analyzer.cmd.
  • For Linux: Run Wiki-Analyzer.sh.

The generated database file will be saved in the data directory.

Environment Requirements

  • Python: 3.10

Copyright

Copyright (C) 2025 ARM

About

Wikipediaのデータを取得し、そのデータを基にword2vecモデルを作成するプログラム

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors