TreeTagger installation successful but cannot open .par file
Do anyone know how to resolve this file reading error in TreeTagger
that is a common Natural Language Processing tool used to POS
tag, lemmatize and chunk sentences?
alvas@ikoma:~/treetagger$ echo 'Hello world!' | cmd/tree-tagger-english
reading parameters ...
ERROR: Can't open for reading: /home/alvas/treetagger/lib/english.par
aborted.
I didn't encounter any possible installation problems as hinted on http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/installation-hints.txt. I've followed the instructions on the webpage and it's installed properly (http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/#Linux):
alvas@ikoma:~$ mkdir treetagger
alvas@ikoma:~$ cd treetagger
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tree-tagger-linux-3.2.tar.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/tagger-scripts.tar.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/install-tagger.sh
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/dutch-par-linux-3.2-utf8.bin.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/german-par-linux-3.2-utf8.bin.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/italian-par-linux-3.2-utf8.bin.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/spanish-par-linux-3.2-utf8.bin.gz
alvas@ikoma:~/treetagger$ wget ftp://ftp.ims.uni-stuttgart.de/pub/corpora/french-par-linux-3.2-utf8.bin.gz
alvas@ikoma:~/treetagger$ sh install-tagger.sh
Linux version of TreeTagger installed.
Tagging scripts installed.
German parameter file (Linux, UTF8) installed.
German chunker parameter file (Linux) installed.
French parameter file (Linux, UTF8) installed.
French chunker parameter file (Linux, UTF8) installed.
Italian parameter file (Linux, UTF8) installed.
Spanish parameter file (Linux, UTF8) installed.
Dutch parameter file (Linux, UTF8) installed.
Path variables modified in tagging scripts.
You might want to add /home/alvas/treetagger/cmd and /home/alvas/treetagger/bin to the PATH variable so that you do not need to specify the full path to run the tagging scripts.
But when i try to test the software i get these errors:
alvas@ikoma:~/treetagger$ echo 'Hello world!' | cmd/tree-tagger-english
reading parameters ...
ERROR: Can't open for reading: /home/alvas/treetagger/lib/english.par
aborted.
alvas@ikoma:~/treetagger$ echo 'Das ist ein Test.' | cmd/tagger-chunker-german
ERROR: Can't open for reading: /home/alvas/treetagger/lib/german-chunker.par
aborted.
ERROR: Can't open for reading: /home/alvas/treetagger/lib/german.par
aborted.
reading parameters ...
ERROR: Can't open for reading: /home/alvas/treetagger/lib/german.par
aborted.
Solution 1:
I think there are two problems: first, the scripts should have "-utf8" in their name, e.g. cmd/tagger-chunker-german-utf8
, because you downloaded the UTF-8 data. Second, tagging and chunking requires a data file each. See the homepage which has a section "Parameter files for PC" and "Chunker parameter files for PC" - download the files from both sections, then re-execute install-tagger.sh
.