How to convert aspell dictionary to simple list of words?

Solution 1:

Give this a try:

aspell -d pl dump master | aspell -l pl expand > my.dict

Edited to match corrections in comment.

Solution 2:

For some languages, e.g. Italian, expanding is not enough and you will have to do some more processing to get a list of plain words.

This is the command I use to get a list of words in Italian (note that it will take some time to perform):

aspell -d it dump master | aspell -l it expand | sed "s/\w*'//g;s/ \+/\n/g" |
awk '{ print tolower($0) }' | uniq > wordlist.txt

Breaking the pipeline

Aspell expansion:

aspell -d it dump master | aspell -l it expand > list1

a
ab
abaco Quell'Abaco quell'Abaco quell'abaco Quest'Abaco quest'Abaco quest'abaco D'Abaco d'Abaco d'abaco Coll'Abaco coll'Abaco coll'abaco Sull'Abaco sull'Abaco sull'abaco Nell'Abaco nell'Abaco nell'abaco Dall'Abaco dall'Abaco dall'abaco Dell'Abaco dell'Abaco dell'abaco All'Abaco all'Abaco all'abaco L'Abaco l'Abaco l'abaco Bell'Abaco bell'Abaco bell'abaco Brav'Abaco brav'Abaco brav'abaco abachi
Abacuc
...

Remove any chars up to an apostrophe (included):

sed "s/\w*'//g" list1 > list2

a
ab
abaco Abaco Abaco abaco Abaco Abaco abaco Abaco Abaco abaco Abaco Abaco abaco Abaco Abaco abaco Abaco Abaco abaco Abaco Abaco abaco Abaco Abaco abaco Abaco Abaco abaco Abaco Abaco abaco Abaco Abaco abaco Abaco Abaco abaco abachi
Abacuc
...

Break lines on space(s):

sed "s/ \+/\n/g" list2 > list3

a
ab
abaco
Abaco
...

Lowercase the whole content in order to use uniq without sorting:

awk '{ print tolower($0) }' list3 > list4

a
ab
abaco
abaco
...

Remove duplicates:

uniq list4 > list5

a
ab
abaco
abachi
...

How to convert aspell dictionary to simple list of words?

Solution 1:

Solution 2:

Breaking the pipeline

Related

Recent Posts