Note: this is no longer a wiki, only a static archive of the orginal!

Home

Software

This page contains the software for evaluation and validation of data used in the CoNLL 2007 Shared Task, as well as links to other software that participants may have a use for. It is to a very large extent based on the software page from the CoNLL 2006 Shared Task. We gratefully acknowledge the work of our predecessors, Sabine Buchholz, Erwin Marsi, Yuval Krymolowski, and Amit Dubey.

Contents

External software

eval07.pl

CoNLL-07 evaluation script:

This script evaluates a system output with respect to a gold standard. Both files should be in UTF-8 encoded CoNLL-07 tabular format.

Punctuation tokens (those where all characters have the Unicode category property "Punctuation") are by default included in scoring unlike last year. (The -p flag can be used to exclude them.)

The output breaks down the errors according to their type and context.

Optional parameters:
-o FILE : output: print output to FILE (default is standard output)
-q : quiet: only print overall performance, without the details
-b : evalb: produce output in a format similar to evalb (http://nlp.cs.nyu.edu/evalb/ ); use together with -q
-p : punctuation: do not score on punctuation (default is to score)
-d : deriv: do not score on DERIV links (default is to score)
-v : version: show the version number
-h : help: print this help text and exit

Download latest release of eval07.pl.

History

This is the official CoNLL-07 shared task evaluation script. It computes the official scoring metric "labeled attachment score" and also provides details useful for error analysis. It was first released on 19 December 2006.

It is based on the CoNLL-X evaluation script which was released on January 9, 2006. An improved version was released on 22 January 2006. The first release required Perl v5.8. However, that version of Perl still contains bugs with respect to Unicode handling. The new release of the evaluation script therefore requires at least Perl v5.8.1. If you have at least Perl v5.8.1, the new release of the evaluation script gives identical scores to the first release. However, it provides more output:

A version with additional output for error analysis was released on 8 February 2006.

A version with a new option for significance testing (-b), with "label accuracy score" and one more error analysis table (thanks to Prokopis Prokopidis) was released on 12 March 2006. Use the -b option as follows:

perl eval.pl -b -q -g GOLD_FILE -s SYSTEM_FILE1 > system1.txt perl eval.pl -b -q -g GOLD_FILE -s SYSTEM_FILE2 > system2.txt perl compare.pl system1.txt system2.txt

where compare.pl is Dan Bikel (http://www.cis.upenn.edu/~dbikel/ )'s Randomized Parsing Evaluation Comparator (Statistical Significance Tester for evalb Output) (http://www.cis.upenn.edu/~dbikel/software.html#comparator ). Its output talks about "recall" and "precision" but for the output of eval.pl these are really "unlabeled attachment" and "labeled attachment" respectively.

The main difference in eval07.pl is that the punctuation scoring is turned on by default and the -p option is used to turn it off if needed. Also the -d option was added to ignore the DERIV links during scoring.

validateFormat.py

usage:

purpose:

args:

options:

This script can be used to check files for compliance with the CoNLL-X shared task format. It prints detailed warnings and error messages to STDERR. The returned status code indicates whether the files passed the test (status 0) or not (status 1). The requirements for training data (-t train) are stricter than for system produced output (-t system): Errors in the (P)HEAD and (P)DEPREL columns cause status 1 for training data but not for system output (STDERR messages are the same). System output is allowed but not required to have the PHEAD and PDEPREL column.

You can suppress warnings by using the -s option. E.g. if you already know that your system sometimes predicts cycles in the dependency structure, you could call the script with:

./validateFormat.py -t system -s cycle systems_output.conll

You cannot suppress error messages.

Download validateFormat.py (version 1.4)

Download SharedTaskCommon.py which is needed by validateFormat.py

Note: I have fixed a bug in version 1.2 that caused validateFormat.py to complain if a file followed the Windows end-of-line convention (of using \r\n)

Note: The script may not be as strict as it needs to be: for example it does not complain when there are extra tabs at the end of a line, or extra spaces in blank lines.

tabs2blanks.py

usage:

purpose:

options:

Download tabs2blanks.py.

blanks2tabs.py

usage:

purpose:

options:

Download blanks2tabs.py.

conlltab2dot.py

usage:

purpose:

examples:

options:

Download conlltab2dot.py.

Treebank conversion software

For the CoNLL-X shared task, 13 treebanks were converted from their original formats to the data format used in the shared task. The software to do that was developed by several different people and for practical reasons, no effort was made to standardize it. We provide this software here without any warranty but hope that it will be useful to other researchers. For general questions about this page, please contact conll06st@uvt.nl. For questions about specific software, however, please contact the respective author directly.

Amit Dubey's software to convert the Cast3LB, Sinica and TIGER treebanks

Download tarred and zipped software: dubey-software.tar.bz2. This tarball is zipped using bzip2, and can be unpacked with either 'tar xjf filename' or 'tar xyf filename' (depending on the your version of tar).

This software is written in OCaml (http://caml.inria.fr/ocaml/index.en.html ). The tarball contains library functions for the conversion of other treebanks as well.

See tools/nlX4/README.CONLL for general information.

You can contact Amit Dubey at "adubey at inf dot ed dot ac dot uk"

Erwin Marsi's software to convert the DDT, Alpino, Japanese Verbmobil and Talbanken05 treebanks

Download tarred and zipped software: erwins-software.tar.bz2. This tarball is zipped using bzip2, and can be unpacked with either 'tar xjf filename' or 'tar xyf filename' (depending on the your version of tar).

The software is organized in the same way as the training and test data.

See the files data//<language>///<treebank>//README for general information.

The treebank-specific conversion software is in data//<language>///<treebank>//tools/.

Some general software is in tools/.

The shell scripts that control the conversion processes are data//<language>///<treebank>//tools/build.sh (go there and execute build.sh).

The main conversion scripts are data//<language>///<treebank>//tools//<treebank>/2tab.py (written in Python).

The treebank files are expected in data//<language>///<treebank>//treebank/.

The resulting files can be found in data//<language>///<treebank>//dist/.

There are also some log files that were created during the original conversion. They might be useful for comparing against your log files as a sanity check that your conversion worked the same way as ours.

The Dutch Alpino treebank was not only converted but also retagged. The software used for tagging is MBT <http://ilk.uvt.nl/mbt/ >. You will need to get the following files and put them into these locations:

data/dutch/alpino/tools/mbt-nl/gen-optimal-mbt data/dutch/alpino/tools/mbt-nl/Mbt data/dutch/alpino/tools/mbt-nl/wotan.all.tag.5paxes data/dutch/alpino/tools/mbt-nl/wotan.all.tag.known.ddwfWawa data/dutch/alpino/tools/mbt-nl/wotan.all.tag.lex data/dutch/alpino/tools/mbt-nl/wotan.all.tag.lex.ambi.20 data/dutch/alpino/tools/mbt-nl/wotan.all.tag.settings data/dutch/alpino/tools/mbt-nl/wotan.all.tag.top50 data/dutch/alpino/tools/mbt-nl/wotan.all.tag.unknown.chnppddwFawasss

Other Software

Other software to convert the PADT, PDT, Bosque, Metu-Sabanci and SDT treebanks, and for making the training-test-split for the BulTreeBank

Download tarred and zipped software: other-software.tar.bz2. This tarball is zipped using bzip2, and can be unpacked with either 'tar xjf filename' or 'tar xyf filename' (depending on the your version of tar).

The software is organized in the same way as the training and test data.

The treebank-specific conversion software is in data//<language>///<treebank>//tools/.

Some general software is in tools/.

The Makefiles that control the conversion processes are data//<language>///<treebank>//Makefile (go there and type "make"). They also contain some comments about how the training-test-split was determined.

The conversion scripts for PDT, Bosque, and Metu-Sabanci are data//<language>///<treebank>//tools//<treebank>/2MALT.py (written in Python). The other conversions were done by different people, so there is no pattern in the naming.

The Bosque, SDT and BulTreeBank treebank files are expected in data//<language>///<treebank>//treebank/. The Metu-Sabanci treebank files are expected in data//<language>///<treebank>//tb_corrected because that's what "tb_corrected_versionConll.zip" (the version of the treebank that we used) expands to. For PDT, you have to modify the Makefile to point it to the location of your PDT CD. The PADT Makefile does not control the complete conversion process: you will have to convert the treebank files individually (using data//<language>///<treebank>//tools/padt2tab.py) and put the results into the directories data//<language>///<treebank>//train/ and data//<language>///<treebank>//test/, respectively.

The resulting files can be found in data//<language>///<treebank>//dist/.

There are also some log files that were created during the original conversion. They might be useful for comparing against your log files as a sanity check that your conversion worked the same way as ours.

Atanas Chanev's software for converting the BulTreeBank

The CoNLL-X shared task conversion of the BulTreeBank is based on, but not identical to, Atanas' scripts. The final software used in the shared task will hopefully be released later. You can contact Atanas Chanev at "artanisz at gmail dot com". See the beginning of "BTB_HPSG2Dep.pl" for a short explanation.

Scoring software

Deniz Yuret, 2007-04-29

scoring.tar.gz contains automated scripts to produce scores, emails, significance tests, and tables for the conll-xi dependency parsing shared task.

1. The scripts look for the data in the following subdirectories:

2. The uploads directory should have the following structure, which is automatically produced by the upload script:

3. Simply running "make" should produce the following:

4. These are the scripts:

SoftwarePage (last edited 2007-04-30 14:01:58 by DenizYuret)