Workshop on Natural Language Processing and Knowledge Representation for eLearning environments
September 26th, 2007
Borovets, Bulgaria


In conjunction with RANLP '2007

Aims

Several initiatives have been launched in the area of Computational Linguistics, Language Resources and Knowledge Representation both at the national and international level aiming at the development of resources and tools. Unfortunately, there are few initiatives that integrate these results within eLearning. The situation is slightly better with respect to the results achieved within Knowledge Representation since ontologies are being developed which describe not only the content of the learning material but crucially also its context and the structure. Furthermore, knowledge representation techniques and natural language processing play an important role in improving the adaptivity of learning environments even though they are not fully exploited yet.

On the other hand, eLearning environments constitute valuable scenarios to demonstrate the maturity of computational linguistic methods as well as of natural language technologies and tools. This kind of task-based evaluation of resources, methods and tools is a crucial issue for the further development of language and information technology.

The goal of this workshop is to discuss:

The workshop will bring together computational linguists, language resources developers, knowledge engineers, researchers involved in technology-enhanced learning as well as developers of eLearning material, ePublishers and eLearning practitioners. It will provide a forum for interaction among members of different research communities, and a means for attendees to increase their knowledge and understanding of the potential of computational resources in eLearning.

Topics

Topics of interest include, but are not limited to:

Programme

9:30-9:45Words of welcome
9:45-10:30Educational Natural Language Processing - Electronic Career Guidance and BeyondIryna Gurevych (Invited speaker of the Technische Universität Darmstadt)
10:30-11:00Keyword extraction for metadata annotation of Learning ObjectsLothar Lemnitzer, Paola Monachesi
11:00-11:15Break
11:15-11:45Combining pattern-based and machine learning methods to detect definitions for eLearning purposesEline Westerhout, Paola Monachesi
11:45-12:15 Supporting e-learning with automatic glossary extraction: Experiments with PortugueseRosa Del Gaudio, António Branco
12:15-12:45Grammar-based Automatic Extraction of Definitions and Applications for RomanianAdrian Iftene, Diana Trandabăţ, Ionuţ Pistol
12:45-13:15On the evaluation of Polish definition extraction grammarsAdam Przepiórkowski, Łukasz Degórski, Beata Wójtowicz
13:15-14:30Lunch Break
14:30-15:00ALPE as LT4eL processing chain environment Dan Cristea, Corina Forăscu & Ionuţ Pistol
15:00-15:30Applying Ontology-Based Lexicons to the Semantic Annotation of Learning ObjectsKiril Simov, Petya Osenova
15:30-16:00Crosslingual Ontology-Based Document RetrievalEelco Mossel
16:00-16:30Break
16:30-17:15From multimedia semantic indexing to cross-lingual retrieval: the Prestospace approach to cultural heritage preservation and disseminationRoberto Basili (Invited speaker of the University of Rome, Tor Vergata)
17:15-18:00Discussion

Keynote Speakers

The following keynote speakers have been invited:

Abstracts of accepted papers

The following papers have been accepted, and can be downloaded here:

ALPE as LT4eL processing chain environment
(Dan Cristea, Corina Forăscu, Ionuţ Pistol - Faculty of Computer Science, University “Al. I. Cuza” of Iaşi, Romania)

(Download the paper)

(Download the presentation)

This paper briefly describes the concept, initial implementation and usage of the ALPE1 system for natural language processing. A hierarchy connecting annotation schemas, processing tools and resources is used as working environment for the system, which can perform various complex NL processing tasks. ALPE will be used to build linguistic processing chains involving the annotation formats and tools developed in the LT4eL2 project. The particularities and advantages of such an endeavor are the main topics of this paper.

Combining pattern-based and machine learning methods to detect definitions for eLearning purposes
(Eline Westerhout, Paola Monachesi - Utrecht University)

(Download the paper)

(Download the presentation)

One of the aims of the Language Technology for eLearning project is to show that Natural Language Processing techniques can be employed to enhance the learning process. To this end, one of the functionalities that has been developed is a pattern-based glossary candidate detector which is capable of extracting definitions in eight languages. In order to improve the results obtained with the pattern-based approach, machine learning techniques are applied on the Dutch results to filter out incorrectly extracted definitions. In this paper, we discuss the machine learning techniques used and we present the results of the quantitative evaluation. We also discuss the integration of the tool into the Learning Management System ILIAS.

Supporting e-learning with automatic glossary extraction: Experiments with Portuguese
(Rosa Del Gaudio, António Branco - University of Lisbon)

(Download the paper)

(Download the presentation)

This paper reports a preliminary work on automatic glossary extraction for e-learning purpose. Glossaries are an important resource for learners, in fact they not only facilitate access to learning documents but also represent an important learning resource by themselves. The work presented here was carried out within the project LT4eL which aim is to improve e-Learning experience by the means of natural language and semantic techniques. This work will focus on a system that automatically extract glossary from learning objects, in particular the system extract definitions from morpho-syntactic annotated documents using a rule-based grammar. In order to develop such a system a corpus composed by a collection of Learning Object covering three different domain was collected and annotated. A quantitative evaluation was carried out comparing the definition retrieved by the system against the definitions manually marked, On average, we obtain 14% for precision, 86% for recall and 0.33 for F2 score.

Grammar-based Automatic Extraction of Definitions and Applications for Romanian
(Adrian Iftene, Diana Trandabăţ, Ionuţ Pistol - Faculty of Computer Science, University “Al. I. Cuza” of Iaşi, Romania)

(Download the paper)

(Download the presentation)

This paper presents part of our work in the LT4eL project regarding the grammar developed by the Romanian team in order to extract definitions from texts. Some qualitative results come in order to evaluate our grammar rules. Among the applications of this kind of grammar we will discuss the possible inclusion of the grammar rules into a question answering system in order to extract answers for definition type questions. Another possible usage of those rules envisages the extraction of supplementary knowledge from linguistic resources like Wikipedia. The benefits of such an extra-knowledge resource are evident in textual entailment systems, where some resources like WordNet, Acronyms database or Dirt cannot cover all the requirements of the system.

Crosslingual Ontology-Based Document Retrieval
(Eelco Mossel - University of Hamburg)

(Download the paper)

(Download the presentation)

An approach for crosslingual ontology-based document retrieval has been devised and is being implemented. It allows the user to enter a query in any language that is part of the system and retrieve documents in selected languages. A domain ontology and term-concept lexicons, containing synonymous terms where applicable, are used to overcome discrepancies between the search query and the words occurring in the documents, in a monolingual situation for the individual languages as well as in a crosslingual setting.

The ontology is used in two different ways. First, concepts relevant for a search query are found automatically and used to retrieve documents. Second, relevant parts of the ontology are displayed to the user, who can navigate further starting from the displayed part of the ontology, and explicitly select concepts to continue the search with.

Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects
(Kiril Simov, Petya Osenova - LML, IPOI, BAS)

(Download the paper)

(Download the presentation)

This paper discusses the role of the ontology in the definition of domain lexicons in several languages and its usage for the semantic annotation of Learning Objects (LOs). We assume that the ontology has the leading role and the lexicons are created on the basis of the meanings defined within the ontology. The semantic annotation requires the construction of special partial grammars connected to the terms in the lexicons. These special grammars are used for automatic annotation of domain texts. The ambiguous cases are resolved manually on the base of the context. The process of semantic annotation plays a twofold role: first, it produces semantically annotated texts (gold standard corpus), and second, it helps in checking the coverage of the lexicon as well as the precision of the ontology.

Keyword extraction for metadata annotation of Learning Objects
(Lothar Lemnitzer, Paola Monachesi - Tübingen University, Utrecht University)

(Download the paper)

(Download the presentation)

One of the functionalities developed within the LT4eL project is the possibility to annotate learning objects semi-automatically with keywords that describe them. To this end, a keyword extractor has been created which can deal with documents in 8 languages. The approach employed is based on a linguistic processing step which is followed by a filtering step of candidate keywords and their subsequent ranking based on frequency criteria. Two tests have been carried out to provide a rough evaluation of the performance of the tool and to measure inter annotator agreement in order to determine the complexity of the task and to evaluate its performance with respect to human annotators.

On the evaluation of Polish definition extraction grammars
(Adam Przepiórkowski, Łukasz Degórski, Beata Wójtowicz - Polish Academy of Sciences, Institute of Computer Science)

(Download the paper)

(Download the presentation)

This paper presents the results of experiments in the automatic extraction of definitions (for semi-automatic glossary construction) from usually unstructured or only weakly structured e-learning texts in Polish. The extraction is performed by regular grammars over XML-encoded morphosyntactically-annotated documents. The results, although perhaps still not fully satisfactory, are carefully evaluated and compared to the inter-annotator agreement; they clearly improve on previous definition extraction attempts for Polish.

Program Committee

Antonio Branco (University of Lisbon, Portugal)
Dan Cristea (University of Iaşi, Romania)
Diane Evans (Open University, United Kingdom)
Walther v. Hahn (University of Hamburg, Germany)
Erhard Hinrichs (University of Tübingen, Germany)
Susanne Jekat (Zürich Winterthur Hochschule, Switzerland)
Alex Killing (ETHZ, Switzerland)
Steven Krauwer (University of Utrecht, the Netherlands)
Vladislav Kubon (Charles University Prague, Czech Republic)
Petya Osenova (Bulgarian Academy of Sciences, Bulgaria)
Adam Prezpiórkowski (Institute of Computer Science, Polish Academy of Sciences)
Anne de Roeck (Open University, United Kingdom)
Mike Rösner (University of Malta, Malta)
Paul Buitelaar (DFKI, Germany)
Lothar Lemnitzer (University of Tübingen, Germany)
Paola Monachesi (University of Utrecht, the Netherlands)
Marco Ronchetti (University of Trento, Italy)
Cristina Vertan (University of Hamburg, Germany)

Organizing Committee

Paola Monachesi
University of Utrecht, The Netherlands
Lothar Lemnitzer
University of Tübingen, Germany
Cristina Vertan
University of Hamburg, Germany
The workshop is partially supported by the European Community under the Information Society and Media Directorate, Learning and Cultural Heritage Unit via the LT4eL project, STREP-IST 027391