Workshop: What can Natural Language Processing and Semantic Web technologies do for eLearning?
June 24th, 2007
Faculty of Mathematics and Physics of the Charles University in Prague, Czech Republic. Address: Malostranske square 25 (Malostranske namesti 25), room S8

In conjunction with ACL 2007


The European project Language Technology for eLearning (LT4eL) exploits NLP techniques, language resources as well as Semantic Web technologies to improve the retrieval and the accessibility of learning material within a Learning Management System. More generally, we believe that eLearning can benefit enormously from the results achieved in these areas and it is for this reason that we think that it would be useful to bring together researchers from these various disciplines to exchange results, ideas and plans.

The workshop will provide a forum for interaction among members of these different research communities, and a means for attendees to increase their knowledge and understanding of the potential of Natural Language Processing, Language technology and the Semantic Web in eLearning.

The workshop will have the following goals:

  1. identify what Natural Language Processing, Language Technology and the Semantic Web vision can do for eLearning and what eLearning expects from results in these areas.
  2. identify strong and weak points of the technologies developed within Natural Language Processing, Language Technology and the Semantic Web (on the basis of the applications/projects discussed in the presentations) in order to assess to which extent they can be employed within eLearning and eventually come up with new solutions and ideas.

Presentations will be limited to the invited speakers but participation will be open to the public. Attendance is free, but registration is needed. To register, please send an email message to: Paola Monachesi

Workshop Organization: Paola Monachesi (Utrecht University)
Local Organization: Vladisvlav Kubon (Charles University Prague)


Morning session: Applications of Natural Language Processing techniques, Language resources and semantic knowledge to eLearning

9:30-10:00The LT4eL project: first resultsPaola Monachesi et. al.
University of Utrecht
10:00-10:30Semantic and text-processing technologies for use within an integrated work-learn environmentStefanie Lindstaed & Viktoria Pammer
Know Center, Austria
10:30-11:00Language Technology and e-learning at CSTPatrizia Paggio
CST, University of Copenhagen
11:15-11:45NEED: New E-learning Environment and DemonstratorsMaria Teresa Pazienza & Fabio Zanzotto
University of Roma Tor Vergata
11:45-12:15Applications of Memory-based NLPAntal van den Bosch & Roser Morante
University of Tilburg
12:15-12:30Summing up
12:30-14:00Lunch Break

Afternoon Session: What can semantic knowledge and NLP do for eLearning: new ideas

14:00-14:30Using Watson for Building Intelligent Applications in E-learningMathieu d' Aquin
Open University, United Kingdom
14:30-15:00Language Technology in the Ontology Life-CyclePaul Buitelaar
15:00-15:30Mining Ontological Knowledge from Syntactically Annotated CorporaGosse Bouma & Gertjan van Noord
University of Groningen
15:30-15:45Summing up
16:00-16:30Cross-lingual Information Retrieval for Cultural Heritage AppreciationShuly Wintner, Idan Szpektor, Ido Dagan, Alon Lavie & Danny Shacham
University of Haifa
16:30-17:00Bridging gapped related linguistic communitiesDan Tufis & Radu Ion
Romanian Academy of Science
17:00-18:30Summing up, Discussion, Plans


The LT4eL Project: first results
(Paola Monachesi et. al., Utrecht University)

(View the slides)

We will report the first results of the LT4eL project whose aim is to improve the retrieval and accessibility of learning material within a learning management systems. The project approaches this task by providing Language Technology based functionalities for the 8 languages represented in the consortium and by integrating semantic knowledge through a domain specific ontology.

We will present the results achieved in the development of:

We will discuss the integration of these modules into the learning management system ILIAS as well as the problems we are facing in adapting NLP and semantic web techniques to eLearning.

Semantic and text-processing technologies for use within an integrated work-learn environment
(Stefanie Lindstaed and Viktoria Pammer, Know Center, Austria)

(View the slides)

Work-integrated learning: Imagine that you are not separating working from learning, and neither from collaborating. Everything you do is productive and for everything you do, you need to learn some new things. You often collaborate with your colleagues or even people you did not know beforehand. Work-integrated authoring: All artefacts you create during your daily activities, like papers, documentation, source code, building plans etc., contain a lot of knowledge. They will be easily available to your colleagues for learning, reference and reuse.

This is the kind of "work"-environment we are interested in. Specifically, we want to provide systems that technologically enable, support or enhance such an environment.

In our presentation we will first analyse requirements put on such a system. Second, we will present the approach we have so far taken in a concrete system, APOSDLE . APOSDLE is an EU project on work-integrated learning which the Know-Center is coordinating. On the one hand we make heavy use of semantic technologies in order to describe users (competencies) and their work-context (process and knowledge domain). On the other hand, we rely on text-based analysis (statistical textmining and content-based similarity detection) for retrieval of learning material and for facilitation of models creation.

Finally, we will point out how such a system could profit from advances in NLP and address issues that would be of interest to research done at the Know-Center.

Language Technology and e-learning at CST
(Patrizia Paggio CST, University of Copenhagen)

(View the slides)

CST's expertise in the areas of interest to the workshop concerns e-learning, language technology as well as the semantic web. In my talk I will present three different projects. The first two are e-learning projects, one dealing with reading and writing support for dislexics in a mobile environment, and the other with measuring of learners' proficiency levels in language acquisition. The third project is not in the area of e-learning but shows our work in the semantic web area: it is a European project on question-answering in a semantic web environment. In general, the kind of linguistic and semantic knowledge we have exploited so far in projects on e-learning is quite simple, we are therefore very interested in exploring more ambitious possibilities.

NEED: New E-learning Environment and Demonstrators
(Maria Teresa Pazienza and Fabio Zanzottom University of Roma Tor Vergata)

It is of great interest for the enlarged European and Mediterranean community to maintain national languages and cultures in order to save differences while providing integration.

Being able to deal with different cultures (much more than languages) allows both an easy access to different kinds of documentation (administrative, legal, technological, social, political, cultural, economical, and so on) and an improved ability of text comprehension. In fact multilingualism and the effective ability to interact at different levels of comprehension with several knowledge sources are assets when accessing information provided in an unknown language.

As a consequence, the need for improved education and training services emerges, including second language learning, either for non-mother tongue scholars or for any kind of citizens in a long life learning process.

HLT (Human Language Technology) is asked to play a very important role in a "globalized" framework. It has reached maturity and it is incrementally progressing towards the status "ready for integration" in a number of text processing applications. On the other hand there is still lack of understanding how advanced text processing tools and linguistic resources contribute to the technology-empowered language learning. As a consequence, currently it is difficult for learners and teachers to use existing widely available HL technologies. One further difficulty for e-learning tools is the lack of a coherent evaluation framework with objective measures in the context of a technological environment.

What is required is much more than collecting corpora, creating linguistic resources, developing text processing systems, supporting web semantics, improving documents classification; it requires an overall framework in which the user is able to navigate autonomously, getting what he needs by using his own knowledge in accessing documentation.

Building an information and knowledge-based society across Europe requires the involvement of the research community into a long term RTD project. We could take advantage of the potential of knowledge-based services in real life only after investigating, experimenting and evaluating different approaches and emerging technologies. As an example of the problems to be dealt with, let us consider concepts representation into ontologies; users access document collections by using their own ontologies, while each document refers to its own ontology; moreover as the manageable knowledge will be used for text access, it is important to stress relations among the conceptual descriptions and their different (linguistic) surface representations in texts. This, at first glance, could appear tightly related to a specific language, while we are interested in stressing methodological aspects that will require customisation to a language only for the application phase. The adopted approach will go in the direction of analysing all problems related to the dichotomy "formal versus linguisticÓ representation of concepts.

In such a framework it will be necessary to develop conceptual representation structures general enough to be useful for text content access at multilingual levels. That is, apart from the specificity of semantics for each language, the methodology for concepts representation, as well as the access tools could be reusable from one language to another.

NEED is a system environment under development at the University of Rome Tor Vergata which will explore innovative solutions in language learning by integrating HLT tools, e.g. morpho-syntactic analysers, dictionaries, glossaries, ontologies etc.

Through an easy GUI the learner could write a sentence and get detailed explanation on its structure, different possible alternatives in syntactic links suggesting different sentence meanings. By clicking on different windows the user could be informed on rules underlying each syntactic link, on different words senses, on alternative linguistic forms and structures, jergal/archaic/formal expressions, terminology details, glossary information, etc.

Both level and quality of provided information could be tailored on each learner either per age, or per interest and topics. Then, by comparing the input sentence (written by the learner) with different possible correct ones, it will be possible to evaluate the correctness of entered sentence at morphological, syntactic and semantic levels through specific measures.

Applications of Memory-based NLP
(Antal van den Bosch and Roser Morante, University of Tilburg)

(View the slides)

Some 15 years ago the ILK (Tilburg) and CNTS (Antwerp) research groups started developing and applying machine-learning methods to NLP tasks. Then a new field in NLP, the groups pioneered new methodologies and helped starting up initiatives such as ACL SIGNLL, and CoNLL, the Conference on Computational Natural Language Learning. The focus of the Tilburg and Antwerp research has been on memory-based learning (MBL, Daelemans and Van den Bosch, 2005), a simple yet powerful processing engine that draws on a memory of literally stored examples, and that processes new examples by drawing analogies to the memorized instances. Robust language technology applications of MBL have been developed in the areas of speech synthesis, morpho-phonology, parsing, semantic analysis (lexical semantics, relations, entities), and dialogue management tasks. These basic modules have been integrated in spoken dialogue systems, text mining systems, and ontological knowledge management systems. New foci of research are memory-based machine translation, spelling correction, and authorship-based expertise ranking for information retrieval and recommendation systems.

Due to a modular approach to software development based on the TiMBL (Tilburg Memory-Based Learning) software package, the memory-based suite of NLP tools can be integrated in virtually any higher-level information assistant, including e-Learning environments. The output of existing web demos of, for example, the memory-based shallow parser for English, are typically directly interpretable by laypersons, and could be used in language learning environments. Highly-accurate modules (such as speech synthesis modules and spelling correctors) may be directly included in interfaces to provide feedback. For modules that for now remain less accurate, such as semantic analysis, a mutually beneficial situation would occur if the learning would also go the other way; if a student has mastered to some degree a language task, (s)he could start teaching the computer (cf. the OpenMind project) in an interface that displays automatically generated analyses, and allows the user to correct them, so that the machine learning engine underneath the interface can learn as well.



Using Watson for Building Intelligent Applications in E-learning
(Mathieu d' Aquin Open University, United Kingdom)

(View the slides)

Watson is a Gateway to the semantic Web: it collects, analyses and gives access to ontologies and semantic data represented using Semantic Web technologies and made available online. In this presentation, I will briefly describe the architecture of Watson, the advanced features it provides (semantic data search and querying, ontology navigation, etc.), and the design principles on which it relies: focusing on knowledge quality, considering relations between ontologies, and providing a wide range of access mechanisms, from simple keyword search to formal queries. In a second time, I will describe how Watson is used as the basic infrastructure for next generation Semantic Web applications, providing concrete examples of such applications in relation with e-learning (namely, question answering with PowerAqua and semantic browsing with Magpie). This will lead to a discussion on how, thanks to tools such as Watson, Semantic Web technologies (RDF, OWL, SPARQL, etc.) can be integrated into e-learning environments, to provide to learners an intelligent and efficient access to the increasing network of formal knowledge available online.

Language Technology in the Ontology Life-Cycle
(Paul Buitelaar, DFKI)

(View the slides)

In this talk I will discuss the role of language technology in a data-driven approach to the ontology life-cycle, specifically in regard of ontology selection, population and learning. Solutions for each of these steps in the ontology life-cycle will be presented by applications that we are working on at DFKI. Additionally, I will address the multilingual dimension of ontologies and present current work towards a lexicon model for the integration of linguistic information into ontologies. Most of the work presented here is performed in context of the German funded project SmartWeb on Mobile Access to the Semantic Web.

Mining Ontological Knowledge from Syntactically Annotated Corpora
(Gosse Bouma and Gertjan van Noord, Faculty of Information Science, University of Groningen)

(View the slides)

Alpino is a linguistically motivated grammar and parser for Dutch, which produces dependency graphs with state of the art accuracy. It is a robust wide-coverage grammar that has been used to annotate large volumes of text automatically.

In this talk, we will show that corpora that have been annotated by Alpino with dependency relations (as well as part of speech tags and named entity classes) can be used for automatic acquisition of various kinds of lexical and ontological knowledge that can be used to explicate the meaning of terms in a document.

Dependency triples, for instance, can be used to measure distributional similarity between words, which in turn can be used to detect semantically related words. Snow et al. (2005) present an approach to learning hypernym relations that relies on detecting dependency paths between a word and its hypernym in large corpora. We have experimented with a similar approach using Dutch EuroWordNet as data source. Finally, we have developed an accurate method for identifying definitions in Dutch Wikipedia that combines a syntactic filter (for extracting potential definition sentences) with a classifier that is trained on words in the sentence, syntactic features, and document features (i.e. sentence position).

Cross-lingual Information Retrieval for Cultural Heritage Appreciation
(Shuly Wintner, Idan Szpektor, Ido Dagan, Alon Lavie and Danny Shacham, University of Haifa)

(View the slides)

We describe a system which enhances the experience of museum visits by providing users with language-technology-based information retrieval capabilities. The system consists of a cross-lingual search engine, augmented by state of the art semantic expansion technology, specifically designed for the domain of the museum (history and archaeology of Israel). We discuss the technology incorporated in the system, its adaptation to the specific domain and its contribution to cultural heritage appreciation.

The main component of the system is a domain-specific search engine that enables users to specify queries and retrieve information pertaining to the domain of the museum. The engine is enriched by linguistic capabilities which embody an array of means for addressing semantic variation. Queries are expanded using two main techniques: semantic expansion based on textual entailment; and cross-lingual expansion based on translation of Hebrew queries to English and vice versa. Retrieved documents are presented as links with associated snippets; the system also translates snippets from Hebrew to English.

The main contribution of this work is, of course, the system itself, which was recently demonstrated successfully at the museum and which we believe could be useful to a variety of museum visitor types, from children to experts. For example, the system provides to Hebrew speakers access to English documents pertaining to the domain of the museum, and vice versa, thereby expanding the availability of multilingual material to museum visitors. More generally, it is an instance of adaptation of state of the art human language technology to the domain of cultural heritage appreciation, demonstrating how general resources and tools are adapted to a specific domain, thereby improving their accuracy and usability. Finally, it provides a test-bed for evaluating the contribution of language technology in general, as well as specific components and resources, to a large-scale natural language processing system.

Bridging gapped related linguistic communities
(Dan Tufis and Radu Ion, Romanian Academy of Science)

Given a mini-minority language (such as Aromanian) related to a reference language (such as Romanian), one would like to adapt whatever tools available in the reference language so that they would work on the mini-minority language. Most mini-minority languages or dialects lack basic NLP tools and computational resources which would allow their use in the electronic information systems. However, with the recent advancement in multilingual processing (alignment and annotation transfer, knowledge induction) the relatedness of one mini-minority language to a larger language, better "equipped" from the technological point of view, might be exploited with the aim of bridging the technological gap in processing the mini-minority language in case. Such an enterprise would exploit similarities but will also highlight the discrepancy between a reference language and a mini-minority related language.


Krystal Hotel - Praha 6, José Martiho 407/2, District Praha

This hotel would be a good choice with respect to our workshop, and can be booked individually by participants through agencies like Ubytovani, where the cost of a single room is €39 (double room: €51). At the moment of writing, rooms are still available for the end of June. Note that whilst this hotel is a good choice for the workshop, it is definitely not recommended to use this hotel during the whole duration of ACL, as travel from the hotel to the main conference venue, the Top hotel, may take between 60 to 90 minutes, single trip.

Accomodation offered through ACL: - see the ACL page


to hotel Krystal

(for any of the following routes you will need one 20 Kc ticket):

from the airport to hotel Krystal

Take bus No. 119 to stop "Divoka Sarka". It is a stop at the border of the town, and a beginning of tram lines. Take any tram down the hill (numbers 20, 26) to the stop Nad Dzbanem (two stops). On the left hand side you will see a huge building (Citibank). On the right hand side, about 100 meters ahead in the direction the tram continues, you see a yellowish building of hotel Krystal.

from the main railway station (Hlavni nadrazi) to hotel Krystal

Take tram No. 26 from a stop near the main railway station (you'll get to the stop by turning right and going through the park once you leave the main railway station), direction Divoka Sarka (take the tram going to the left when the main railway station is behind you). The tram goes through the town, to Dejvicka metro station. The ride to the stop "Nad Dzbanem" takes about 30 minutes.

from the Holesovice railway station (Nadrazi Holesovice) to hotel Krystal

Take the red ("C") metro line, transfer to green ("A") line at "Muzeum" station. Go to "Dejvicka" station (terminal station of the green line). Exit in the direction of "tram Vokovice", or "Divoka Sarka". Take tram no. 20 or 26 up the hill, get off at stop "Nad Dzbanem".


to the meeting venue

The meeting takes place in the building of the Faculty of Mathematics and Physics of the Charles University at the Malostranske square 25 (Malostranske namesti). The best way how to get there from Krystal is to use the tram No.20 from the tram stop "Nad Dzbanem" and go to the stop "Malostranske namesti." It is the next stop after the Malostranska metro station (don't mix those two stations!). The ride should take something about 20-25 minutes, depending on the traffic density.

You'll see our building immediately when you leave the tram, it's a huge white building just behind the parking place on the square. There's a gate and the door, use the door, go straight until you see a staircase on your right. Use the staircase or the lift a couple of steps further in order to get to the first floor. When you are there, turn right, go through a glass door till the end of the corridor, then turn left and you are almost there, after a couple of more steps the S8 lecture room will be on your right.

More information and maps can be found on the ACL website


A note on public transportation

For all the trams, buses and metro the same tickets are valid. The tickets can be bought in advance from tobacco or newspaper shops or from vending machines in the metro stations. The tram or bus drivers DON'T sell tickets. Once you enter the tram or bus, you MUST validate the ticket by sticking the ticket into a small machine located close to the door. The same validation machines are available at all entrances into metro, NOT in the metro trains.

There are several types of tickets:

The buses and trams go along their regular routes between 5 AM and 1 AM, the metro between 5 AM and midnight. During the night there is a system of night trams and buses going along slightly different routes.

Taxis in Prague are infamous for cheating passengers. The taxis hailed in streets present the highest risk. If you want to use a taxi, always let the hotel reception call it for you, there are several reliable taxi companies which provide good services. For the Airport-city center route you can expect the price about 500 Kc, it should be slightly less from the hotel Krystal to the city center.

The workshop is partially supported by the European Community under the Information Society and Media Directorate, Learning and Cultural Heritage Unit via the LT4eL project, STREP-IST 027391