|
|
Vincent Claveau,
Acquisition automatique de lexiques
sémantiques pour la recherche d'information,
PhD Thesis, Rennes 1 University, December 2003,
Document (pdf), Document (ps.gz) |
|
Abstract Many applications
in the field of Natural Language Processing (information
retrieval, machine translation, etc.) need semantic resources that are
specific
to their tasks and domains.
To satisfy this need we have developed ASARES, a corpus-based lexical
semantic
acquisition system. It fulfills three objectives: it has good
extraction results;
these results and the whole acquisition process are interpretable; and
it is
generic and automatic enough to be easily portable from a corpus to
another. To achieve these goals, ASARES uses a machine learning method
---inductive logic
programming--- which makes possible to infer part-of-speech and
semantic
patterns from examples of the semantic elements we want to acquire.
These
patterns are then used to extract new elements from the corpus.
We also show that it is possible to combine this symbolic method with
statistical acquisition methods to make ASARES more automatic.
To validate our system, we have used it to acquire a kind of semantic
relations
between nouns and verbs defined in the Generative Lexicon and called
qualia
relations. This task has two main interests. On one hand, these
relations are
defined only in a theoretical point of view; the linguistic
interpretation of
the patterns thus allows to have a deeper understanding of their
contextual
realizations. On the other hand, several authors have noticed that such
relations can be useful in information retrieval tasks because they
make
semantically equivalent reformulations of ideas accessible.
With the help of a query expansion experiment using qualia relations
extracted
with ASARES, we show that this assumption is true to a certain extend:
the
performances of an information retrieval system are significantly
improved though
localized. |
|
|