XLFG : a parsing scheme for French Lionel Clement
ABSTRACT
We present XFLG, a parser which has been developed by Lionel Clement (Clement 01). This parser, written in C, uses an LR algorithm. It comprises a graphical user interface and is used for educational purpose. As such, in order for the students to get a better understanding of the LFG formalism, the parser allows to visualize ill-formed F-structures and C-structures for sentences which are ungrammatical because of : 1- Unification (including problems with constraint equations and
existential equations) It is downloadable on http://talana.linguist.jussieu.fr/~lionel/xlfg-main.html. A demo, with a limited lexicon and grammar, is also available online. We also present an evaluation of the parser for French : we developed a medium-size French grammar and lexicon consisting in 560 000 lexical entries and more than 50 grammar rules which aim at covering the following syntactic phenomena : clitics, passive, sentential complements, infinitival complements (control + raising verbs), some idiomatic expressions, some coordinations and comparatives (including gapping phenomena), compound tenses, negation. We then parsed approximately 1200 grammatical sentences from the TSNLP test suite (Estival & Lehman 97). 81.5% of these sentences were assigned at least one correct F-structure. 65% of these sentences (781 sentences) were ambiguous and were assigned on average 3.36 F-structures per sentence. These results are similar to those obtained for French on the same sentences using a wide-coverage Tree-Adjoining Grammar for French of approx. 5000 elementary trees (Abeille & al. 00). When writing grammars, there is always a trade-off between coverage and ambiguity rate. In order to perform disambiguation, we departed from traditional LFG disambiguation approaches, both probabilistic (Riezler & al. 2000) and based on optimality theory (Frank & al 98). Instead, we observed that LFG F-structures and TAG derivation trees (i.e. dependency-like structures) are intuitively very similar, modulo reentrant features, so we have decided to adapt TAG-based disambiguation principles to LFG. We implemented the following disambiguation principles : 1- Prefer the F-structure which resort to the most constrained lexical
items. This allows to prefer the idiomatic interpretation of a sentence such as (a) over
its literal interpretation, since LFG encodes idioms at the level of lexical entries. (a) John breaks the ice. These principles are motivated from a cognitive point of view, as argued in (Kinyon 00) and have yielded good results when implemented on TAG derivation trees. When implemented on F-structures on the 781 ambiguous TSNLP sentences mentioned above, the principles allowed to totally disambiguate 408 sentences (52.2%) and to partially disambiguate another 40 sentences (5%). After disambiguation, the average number of F-structure per sentence decreased from 3.36 to 1.29. More interestingly, this partial disambiguation very rarely discards an F-structure which was deemed correct. Future work will include augmenting the coverage of the grammar to deal with more syntactic phenomena, although the goal of this tool is more to test and investigate linguistic hypothesis (rather than to achieve robustness and wide-coverage). Also, we hope to investigate more systematically the similarities existing between the LFG framemork and the TAG framework, in order to port resources between these 2 formalisms.
References Abeille A. Candito M.H. Kinyon A. 2000. FTAG : Developping and
maintaining a wide-coverage grammar for French. Proceeding of the ESSLLI-2000 workshop on
Linguistic Theory and Grammar Implementation. Birmingham.
Back to Abstracts index |