This is an algorithm for Arabic stemming written on Snowball framework language. If offers light stemming and text normalization. voc
-
You can download it automatically using:
$ make download- Install python requirements
$ sudo pip install -r requirements.txtor manually by:
- extracting snowball into the root folder
{Root}/snowball - extracting snowball-data/arabic/voc.txt.gz into
{Root}/test_data/voc.txt
- light stemming
$ make build- root-based stemming
$ make build_root_based_stemmer- Light Stemmer
$ make run
الطالب
طالب- Root-Based Stemmer
$ make run_root
الطالب
طلبWe configured tests to run against snowball-data arabic sample.
- time:
$ make time- grouping effect:
$ make grouping- all:
$ make test- Test SAS with golden arabic corpus:
$ make test_arabicstemmer- Test ISRI Stemmer with golden arabic corpus:
$ make test_isri- dist light stemmer to available languages:
$ make dist- dist root-based stemmer to available languages:
$ make dist_rooterSnowball Arabic (Stemmer & rooter) Results
| Word | Stem | root |
|---|---|---|
| طفل | طفل | طفل |
| اطفال | اطفال | طفل |
| الاطفال | اطفال | طفل |
| اطفالكم | اطفال | طفل |
| فأطفالكم | اطفال | طفل |
| اطفالهم | اطفال | طفل |
| والاطفال | اطفال | طفل |
| فاطفالهم | اطفال | طفل |
| وطفل | طفل | طفل |
| الطفولة | طفول | طفل |
| والطفلتين | طفل | طفل |
| طفلتان | طفل | طفل |