Treebank vs. X-BAR based Automatic F-Structure Annotation

Josef van Genabith
Dublin City University

Anette Frank
DFKI GmbH

Andy Way
Dublin City University

 


ABSTRACT

In a number of papers [Frank et.al.,2001] [Sadler et.al.,2000][Frank,2000] have developed a method for automatically annotating treebank resources with feature structure information. The methodinvolves the statement of f-structure annotation principles which express generalisations over partial tree/CFG-rule and f-structure correspondences.

Treebank trees and CFGs extracted from treebank resources do not tend to follow strongly hierarchical and recursive X-BAR design principles but instead feature a large number of rather flat trees and CFG rules. Often this flat analysis is motivated by the fact that it relieves the human annotator from having to make subtle, difficult and time consuming decisions as regards alternative structural analyses.

Because of the flat analyses and in contrast to X-BAR inspired analyses, the RHS of a treebank grammar rule may not correspond to a single constituent but rather to a flat sequence of a number of constituents. As a consequence, feature structure annotation principles are partial and underspecified: they are designed to pick out subsequences in RHSs of flat treebank rules to induce the required f-structure annotations.

This contrasts with principle-based architectures for f-structure annotation in the theoretical literature [Bresnan,2001] which assume X-BAR based c-structure configurations driving feature structure annotation.

We present an experiment to compare treebank grammar based with X-BAR based annotation: we recode a fragment of the AP treebank using broad, vanilla flavour X-BAR design principles, extract a CFG from the recoded treebank trees, state annotation principles over the extracted CFG and compile the principles over the extracted grammar. Finally, we evaluate the outcome of the compilation and compare it with automatic f-structure annotation on the original flat treebank grammar.

The findings shed light on the further questions (i) to what extent and at what cost can flat c-structure treebank representations be automatically moved to more hierarchical X-BAR inspired representations; (ii) to what extent are X-BAR motivated analyses a real alternative in treebank construction; (iii) what is the influence of X-BAR analyses on the scalability of automatic feature structure annotation?

 

References:

[Bresnan, 2001] Joan Bresnan. Lexical-Functional Syntax. Oxford: Blackwell.
[Frank et.al.,2001] Anette Frank, Louisa Sadler, Josef van Genabith, Andy Way (to appear): "From Treebank Resources to LFG F-Structures", to appear in: A. Abeille (ed): "Treebanks. Building and using syntactically annotated corpora", Kluwer Academic Publishers, The Netherlands.
[Frank,2000] Anette Frank, "Automatic F-structure Annotation of reebank Trees" in: Miriam Butt and Tracy Holloway King (eds): Proceedings of the LFG00 Conference, 19 - 20 July 2000, University of California at Berkeley, CSLI Online Publications.
[Sadler et.al.,2000] Louisa Sadler, Josef van Genabith and Andy Way, "Automatic F-Structure Annotation from the AP Treebank", LFG-2000, Proceedings of the fifth International Conference on Lexical-Functional Grammar, The University of California at Berekeley, 19 July - 20 July 2000, CSLI Online Publications