features, and different combinations of them, are then categorised into
attribute configuration files which are fed into three supervised learning
methods, namely Naive Bayes, maximum entropy and support vector machine. This
work evaluates the improvement achieved in definition extraction by using these
machine learning methods and the different feature combinations. The accuracy
achieved over the different methods and configuration files varies with the
best resulting method being the maximum entropy on the configuration using the first
three features described above.

& Monachesi (2007a, 2008) experiment with both rule-based and machine
learning techniques. They use an eLearning corpus in Dutch which is partially
annotate with manually identified definitions. They argue that since the definition
extractor is for an eLearning setting and the learning objects tend to be small
in size, both precision and recall need to be given importance unlike similar work
where only precision is considered. The manually annotated definitions are divided
into five different categories that have been identified through observation. Linguistic
rules are used to capture a large number of definitions, after which machine
learning techniques similar to Fahmi & Bouma (2006) are applied as a filtering
technique. Their final results are slightly lower than those of Fahmi & Bouma.
This is to be expected since the corpus used is less structured than Wikipedia

et al. (2008) focus on Polish eLearning material with
the purpose of extracting definitions to be presented to a tutor for glossary
creation. Initial attempts using manually crafted grammar rules didn’t achieve
a good f-measure and thus the authors attempt several machine learning classifiers
available in the Weka toolset (Witten & Frank 2005) to improve results. The
techniques used are naive Bayes, decision trees such as ID3 and C4.5, lazy
classifier IB1, AdaBoostM1 with Decision Stump and AdaBoostM1 with nu-SVC. In
these experiments they report an increase of f-measure with the best result
obtained by the ID3 decision tree classifier.

experiments on the Polish language use Balanced Random Forest (BRF) in Kobylifinski
& Przepifiorkowski (2008). BRF is a machine learning technique for classification
using decision trees, where decisions are based on a subset of attributes which
are randomly selected and the best attribute for the current tree is then
chosen. Both techniques have improved results over manually crafted rules,
achieving close results to work by (Fahmi & Bouma 2007; Westerhout &
Monachesi 2008).

Written by

I'm Colleen!

Would you like to get a custom essay? How about receiving a customized one?

Check it out