XLFG : a parsing scheme for French

Lionel Clement
Universite Paris 7

Alexandra Kinyon
University of Pennsylvania

ABSTRACT

 

We present XFLG, a parser which has been developed by Lionel Clement (Clement 01). This parser, written in C, uses an LR algorithm. It comprises a graphical user interface and is used for educational purpose. As such, in order for the students to get a better understanding of the LFG formalism, the parser allows to visualize ill-formed F-structures and C-structures for sentences which are ungrammatical because of :

1- Unification (including problems with constraint equations and existential equations)
2- Completeness
3- Coherence

It is downloadable on http://talana.linguist.jussieu.fr/~lionel/xlfg-main.html. A demo, with a limited lexicon and grammar, is also available online.

We also present an evaluation of the parser for French : we developed a medium-size French grammar and lexicon consisting in 560 000 lexical entries and more than 50 grammar rules which aim at covering the following syntactic phenomena : clitics, passive, sentential complements, infinitival complements (control + raising verbs), some idiomatic expressions, some coordinations and comparatives (including gapping phenomena), compound tenses, negation.

We then parsed approximately 1200 grammatical sentences from the TSNLP test suite (Estival & Lehman 97). 81.5% of these sentences were assigned at least one correct F-structure. 65% of these sentences (781 sentences) were ambiguous and were assigned on average 3.36 F-structures per sentence. These results are similar to those obtained for French on the same sentences using a wide-coverage Tree-Adjoining Grammar for French of approx. 5000 elementary trees (Abeille & al. 00).

When writing grammars, there is always a trade-off between coverage and ambiguity rate. In order to perform disambiguation, we departed from traditional LFG disambiguation approaches, both probabilistic (Riezler & al. 2000) and based on optimality theory (Frank & al 98). Instead, we observed that LFG F-structures and TAG derivation trees (i.e. dependency-like structures) are intuitively very similar, modulo reentrant features, so we have decided to adapt TAG-based disambiguation principles to LFG.

We implemented the following disambiguation principles :

1- Prefer the F-structure which resort to the most constrained lexical items. This allows to prefer the idiomatic interpretation of a sentence such as (a) over its literal interpretation, since LFG encodes idioms at the level of lexical entries.
2- Prefer the F-structure with the highest number of arguments. This allows to prefer to attach a constituent as an argument rather than as a modifier in a sentence such as (b), where "to be honest" is argument of "prefer" rather than sentence modifier (i.e. "To be honest, J. prefers his daughter").
3- Attach arguments to their closest potential governors. This is achieved by computing for each F-structure the sum of all the distances between each node and its arguments in the linear order of a sentence, and preferring the F-structure with the lowest sum. It allows for sentences such as (c) to prefer "of the demonstration " as argument of "organizer" rather than of "suspects".

(a) John breaks the ice.
(b) John prefers his daughter to be honest.
(c) John suspects the organizers of the demonstration

These principles are motivated from a cognitive point of view, as argued in (Kinyon 00) and have yielded good results when implemented on TAG derivation trees. When implemented on F-structures on the 781 ambiguous TSNLP sentences mentioned above, the principles allowed to totally disambiguate 408 sentences (52.2%) and to partially disambiguate another 40 sentences (5%). After disambiguation, the average number of F-structure per sentence decreased from 3.36 to 1.29. More interestingly, this partial disambiguation very rarely discards an F-structure which was deemed correct.

Future work will include augmenting the coverage of the grammar to deal with more syntactic phenomena, although the goal of this tool is more to test and investigate linguistic hypothesis (rather than to achieve robustness and wide-coverage). Also, we hope to investigate more systematically the similarities existing between the LFG framemork and the TAG framework, in order to port resources between these 2 formalisms.

 

References

Abeille A. Candito M.H. Kinyon A. 2000. FTAG : Developping and maintaining a wide-coverage grammar for French. Proceeding of the ESSLLI-2000 workshop on Linguistic Theory and Grammar Implementation. Birmingham.
Clement L. 2001 XLFG a parser to learn the LFG framework. Proc. NAACL'01. Pittsburgh.
Estival & Lehman D. 1997 TSNLP --- Des jeux de phrases-test pour l'evaluation d'applications dans le domaine du TALN. Proc. TALN'98. Marseille.
Frank A. Holloway King T., Kuhn J., Maxwell J. 2000. Optimality Theory Style Constraint Ranking in Large-scale LFG Grammars. Proc. LFG'98 conference. Australia.
Kinyon A. 2000. Are Structural Principles Useful for Automatic Disambiguation ? Proceedings COGSCI'00. Philadelphia
Riezler, S., Prescher, D., Kuhn, J., and Johnson, M. 2000. Lexicalized Stochastic Modeling of ConstraintBased Grammars using Log-Linear Measures and EM Training. Proc. ACL'00. Hong Kong.

 

 

Back to Abstracts index