Aims
Several initiatives have been launched in the area of Computational Linguistics,
Language Resources and Knowledge Representation both at the national and international level aiming at the development of resources and
tools. Unfortunately, there are few initiatives that integrate these results
within eLearning. The situation is slightly better with respect to the results
achieved within Knowledge Representation since ontologies are being
developed which describe not only the content of the learning material but
crucially also its context and the structure. Furthermore, knowledge representation techniques and natural language processing play an important
role in improving the adaptivity of learning environments even though they
are not fully exploited yet.
On the other hand, eLearning environments constitute valuable scenarios
to demonstrate the maturity of computational linguistic methods as well
as of natural language technologies and tools. This kind of task-based evaluation
of resources, methods and tools is a crucial issue for the further
development of language and information technology.
The goal of this workshop is to discuss:
- the use of language and knowledge resources and tools in eLearning
- requirements on natural language resources, standards, and applications
originating in eLearning activities and environments
- the expected added value of natural language resources and technology
to learning environments and the learning process
- strategies and methods for the task based evaluation of Natural Language
Processing applications.
The workshop will bring together computational linguists, language resources
developers, knowledge engineers, researchers involved in technology-enhanced
learning as well as developers of eLearning material, ePublishers and eLearning
practitioners. It will provide a forum for interaction among members of
different research communities, and a means for attendees to increase their
knowledge and understanding of the potential of computational resources in
eLearning.
Topics
Topics of interest include, but are not limited to:
- ontology modelling in the eLearning domain
- Natural Language Processing techniques for supplying metadata for
learning objects on a (semi)-automatic basis, e.g. for the automatic
extraction of key terms and their definitions
- techniques for summarization of discussion threads and support of
discourse coherence in eLearning
- improvements on (semantic, cross-lingual) search methods to in learning
environments
- techniques of matching the semantic representation of learning objects
with the user's knowledge in order to support personalized and adaptive
learning
- adaptive information filtering and retrieval (content-based filtering and
retrieval, collaborative filtering)
- intelligent tutoring (curriculum sequencing, intelligent solution analysis,
problem solving support)
- intelligent collaborative learning (adaptive group formation and peer
help, adaptive collaboration)
Programme
| 9:30-9:45 | Words of welcome | |
|
| 9:45-10:30 | Educational Natural Language Processing - Electronic Career Guidance and Beyond | Iryna Gurevych (Invited speaker of the Technische Universität Darmstadt) |
|
| 10:30-11:00 | Keyword extraction for metadata annotation of Learning Objects | Lothar Lemnitzer, Paola Monachesi |
|
| 11:00-11:15 | Break |
|
| 11:15-11:45 | Combining pattern-based and machine learning methods to detect definitions for eLearning purposes | Eline Westerhout, Paola Monachesi |
|
| 11:45-12:15 |
Supporting e-learning with automatic glossary extraction: Experiments with Portuguese | Rosa Del Gaudio, António Branco |
|
| 12:15-12:45 | Grammar-based Automatic Extraction of Definitions and Applications for Romanian | Adrian Iftene, Diana Trandabăţ, Ionuţ Pistol |
|
| 12:45-13:15 | On the evaluation of Polish definition extraction grammars | Adam Przepiórkowski, Łukasz Degórski, Beata Wójtowicz |
|
| 13:15-14:30 | Lunch Break |
|
| 14:30-15:00 | ALPE as LT4eL processing chain environment |
Dan Cristea, Corina Forăscu & Ionuţ Pistol |
|
| 15:00-15:30 | Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects | Kiril Simov, Petya Osenova |
|
| 15:30-16:00 | Crosslingual Ontology-Based Document Retrieval | Eelco Mossel |
|
| 16:00-16:30 | Break |
|
| 16:30-17:15 | From multimedia semantic indexing to cross-lingual retrieval: the Prestospace approach to cultural heritage preservation and dissemination | Roberto Basili (Invited speaker of the University of Rome, Tor Vergata) |
|
| 17:15-18:00 | Discussion | |
|
Keynote Speakers
The following keynote speakers have been invited:
- Educational Natural Language Processing - Electronic Career Guidance and Beyond
Iryna Gurevych (Technische Universität Darmstadt)
- The talk aims at defining Educational Natural Language Processing (e-NLP) as a field of research exploring the use of NLP techniques in educational contexts. Current renaissance of interest in e-NLP is due to eLearning 2.0 which leads to the creation of large repositories with user generated discourse and user generated metadata. This user generated knowledge can be employed for creating structured knowledge bases to improve NLP, but it needs advanced information management capabilities and NLP to be efficiently accessed. The talk will present snapshots from several ongoing e-NLP research projects in the Ubiquitous Knowledge Processing (UKP) Lab at the Technische Universität Darmstadt to illustrate some of the claims and challenges in e-NLP.
- From multimedia semantic indexing to cross-lingual retrieval: the Prestospace approach to cultural heritage preservation and dissemination.
Roberto Basili (University of Roma, Tor Vergata)
-
Digital archives in large European TV broadcasters constitute an immense resource for cultural, historical and scientific education. In the Prestospace project, a framework for the acquisition and delivery of semantic metadata (MAD) from the multimedia repositories of the major European TV broadcasters (e.g. BBC, RAI) has been defined and a platform for annotation, indexing, publication and conceptual retrieval has been realised.
An innovative feature of the MAD system is the semantic annotation ability able to derive ontology-based metadata through speech recognition and natural language processing. The specific industrial requirements of the project pushed for the adoption of robust, efficient and portable models for language processing. In the talk, I will discuss some research aspects of the semantic analysis process applied in Prestospace. A particular emphasis will be given to the unsupervised learning approach adopted for cross-lingual retrieval from the annotated multimedia material. It integrates geometrical learning models (like Latent Semantic Analysis) with large scale ontological and lexical resources (e.g. Wordnet): the resulting semantic disambiguation process supports robust unsupervised query translation. The overall MAD approach supports effective multimedia annotation and indexing and it is easily applicable to large scale archives and Web scenarios (e.g. Web 2.0).
Abstracts of accepted papers
The following papers have been accepted, and can be downloaded here:
ALPE as LT4eL processing chain environment
(Dan Cristea, Corina Forăscu, Ionuţ Pistol - Faculty of Computer Science, University “Al. I. Cuza” of Iaşi, Romania)
(Download the paper)
(Download the presentation)
This paper briefly describes the concept, initial implementation and usage of the ALPE1 system for natural language processing. A hierarchy connecting annotation schemas, processing tools and resources is used as working environment for the system, which can perform various complex NL processing tasks. ALPE will be used to build linguistic processing chains involving the annotation formats and tools developed in the LT4eL2 project. The particularities and advantages of such an endeavor are the main topics of this paper.
Combining pattern-based and machine learning methods to detect definitions for eLearning purposes
(Eline Westerhout, Paola Monachesi - Utrecht University)
(Download the paper)
(Download the presentation)
One of the aims of the Language Technology for eLearning project is to show that Natural Language Processing techniques can be employed to enhance the learning process. To this end, one of the functionalities that has been developed is a pattern-based glossary candidate detector which is capable of extracting definitions in eight languages. In order to improve the results obtained with the pattern-based approach, machine learning techniques are applied on the Dutch results to filter out incorrectly extracted definitions. In this paper, we discuss the machine learning techniques used and we present the results of the quantitative evaluation. We also discuss the integration of the tool into the Learning Management System ILIAS.
Supporting e-learning with automatic glossary extraction: Experiments with Portuguese
(Rosa Del Gaudio, António Branco - University of Lisbon)
(Download the paper)
(Download the presentation)
This paper reports a preliminary work on automatic glossary extraction for e-learning purpose. Glossaries are an important resource for learners, in fact they not only facilitate access to learning documents but also represent an important learning resource by themselves. The work presented here was carried out within the project LT4eL which aim is to improve e-Learning experience by the means of natural language and semantic techniques. This work will focus on a system that automatically extract glossary from learning objects, in particular the system extract definitions from morpho-syntactic annotated documents using a rule-based grammar. In order to develop such a system a corpus composed by a collection of Learning Object covering three different domain was collected and annotated. A quantitative evaluation was carried out comparing the definition retrieved by the system against the definitions manually marked, On average, we obtain 14% for precision, 86% for recall and 0.33 for F2 score.
Grammar-based Automatic Extraction of Definitions and Applications for Romanian
(Adrian Iftene, Diana Trandabăţ, Ionuţ Pistol - Faculty of Computer Science, University “Al. I. Cuza” of Iaşi, Romania)
(Download the paper)
(Download the presentation)
This paper presents part of our work in the LT4eL project regarding the grammar developed by the Romanian team in order to extract definitions from texts. Some qualitative results come in order to evaluate our grammar rules. Among the applications of this kind of grammar we will discuss the possible inclusion of the grammar rules into a question answering system in order to extract answers for definition type questions. Another possible usage of those rules envisages the extraction of supplementary knowledge from linguistic resources like Wikipedia. The benefits of such an extra-knowledge resource are evident in textual entailment systems, where some resources like WordNet, Acronyms database or Dirt cannot cover all the requirements of the system.
Crosslingual Ontology-Based Document Retrieval
(Eelco Mossel - University of Hamburg)
(Download the paper)
(Download the presentation)
An approach for crosslingual ontology-based document retrieval has been devised and is being implemented. It allows the user to enter a query in any language that is part of the system and retrieve documents in selected languages. A domain ontology and term-concept lexicons, containing synonymous terms where applicable, are used to overcome discrepancies between the search query and the words occurring in the documents, in a monolingual situation for the individual languages as well as in a crosslingual setting.
The ontology is used in two different ways. First, concepts relevant for a search query are found automatically and used to retrieve documents. Second, relevant parts of the ontology are displayed to the user, who can navigate further starting from the displayed part of the ontology, and explicitly select concepts to continue the search with.
Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects
(Kiril Simov, Petya Osenova - LML, IPOI, BAS)
(Download the paper)
(Download the presentation)
This paper discusses the role of the ontology in the definition of domain lexicons in several languages and its usage for the semantic annotation of Learning Objects (LOs). We assume that the ontology has the leading role and the lexicons are created on the basis of the meanings defined within the ontology. The semantic annotation requires the construction of special partial grammars connected to the terms in the lexicons. These special grammars are used for automatic annotation of domain texts. The ambiguous cases are resolved manually on the base of the context. The process of semantic annotation plays a twofold role: first, it produces semantically annotated texts (gold standard corpus), and second, it helps in checking the coverage of the lexicon as well as the precision of the ontology.
Keyword extraction for metadata annotation of Learning Objects
(Lothar Lemnitzer, Paola Monachesi - Tübingen University, Utrecht University)
(Download the paper)
(Download the presentation)
One of the functionalities developed within the LT4eL project is the possibility to annotate learning objects semi-automatically with keywords that describe them. To this end, a
keyword extractor has been created which can deal with documents in 8 languages. The approach employed is based on a linguistic processing step which is followed by a filtering step of candidate keywords and their subsequent ranking based on frequency criteria. Two tests have been carried out to provide a rough evaluation of the performance of the tool and to measure inter annotator agreement in order to determine the complexity of the task and to evaluate its performance with respect to human annotators.
On the evaluation of Polish definition extraction grammars
(Adam Przepiórkowski, Łukasz Degórski, Beata Wójtowicz - Polish Academy of Sciences, Institute of Computer Science)
(Download the paper)
(Download the presentation)
This paper presents the results of experiments in the automatic extraction of definitions (for semi-automatic glossary construction) from usually unstructured or only weakly structured e-learning texts in Polish. The extraction is performed by regular grammars over XML-encoded morphosyntactically-annotated documents. The results, although perhaps still not fully satisfactory, are carefully evaluated and compared to the inter-annotator agreement; they clearly improve on previous definition extraction attempts for Polish.
Program Committee
Antonio Branco (University of Lisbon, Portugal)
Dan Cristea (University of Iaşi, Romania)
Diane Evans (Open University, United Kingdom)
Walther v. Hahn (University of Hamburg, Germany)
Erhard Hinrichs (University of Tübingen, Germany)
Susanne Jekat (Zürich Winterthur Hochschule, Switzerland)
Alex Killing (ETHZ, Switzerland)
Steven Krauwer (University of Utrecht, the Netherlands)
Vladislav Kubon (Charles University Prague, Czech Republic)
Petya Osenova (Bulgarian Academy of Sciences, Bulgaria)
Adam Prezpiórkowski (Institute of Computer Science, Polish Academy of Sciences)
Anne de Roeck (Open University, United Kingdom)
Mike Rösner (University of Malta, Malta)
Paul Buitelaar (DFKI, Germany)
Lothar Lemnitzer (University of Tübingen, Germany)
Paola Monachesi (University of Utrecht, the Netherlands)
Marco Ronchetti (University of Trento, Italy)
Cristina Vertan (University of Hamburg, Germany)
Organizing Committee
Paola Monachesi
University of Utrecht, The Netherlands
Lothar Lemnitzer
University of Tübingen, Germany
Cristina Vertan
University of Hamburg, Germany
The workshop is partially supported by the European Community under
the Information Society and Media Directorate, Learning and Cultural Heritage Unit
via the
LT4eL project, STREP-IST 027391