ERCIM News No.26 - July 1996
The European Language Resources Association
by Khalid Choukri
The European Language Resources Association (ELRA) was established
as a non-profit organization in Luxembourg, in February 1995. The overall
goal of ELRA is to provide a centralised organization for the validation,
and distribution of speech, text, and terminology resources and tools, and
to promote their use within the European telematics R&TD community.
Language Resources (LRs) are universally acknowledged to be critical for
the development of robust, broad-coverage, and cost-effective applications
in all sectors of telematics, in particular those for written and spoken
language. The cost of developing such resources is prohibitive, and due
to the lack of sufficient co-ordination, existing LRs cannot be easily adapted
for multiple users, thereby hindering the rapid deployment of new applications.
Market Situation
The LR area can be considered as three quite distinct fields, all of which
are covered in the three 'colleges' of ELRA; terminology, written resources,
and spoken resources. There is a great deal of terminology work going on
in all the main languages of Europe, both at a general level and in every
major sector of industrial and commercial activity. But the work is to a
large extent uncoordinated and very little effort has been made to turn
this work into commercial products using common standards, a situation that
ELRA intends to rectify.
In the written field, the collection of corpora has become important in
recent years, and is beginning to be a commercial activity; much remains
to be done to organise this activity systematically and to cover all languages
and user domains. The production of written lexica is very expensive and
although there are many toy systems, there are few commercial activities
outside those of the major publishing houses; an important source of material
is the work of the established national language centres.
Spoken resources have become a fully commercial product in the last few
years as the speech processing field has reached technical and commercial
maturity. The major telecommunications firms have moved in and it is here
that the market for ELRA's distribution activities is, in the short run,
at its most mature.
In all three fields, the LR project activities of the EU Language Engineering
(LE) programmes are producing new products and standards which must be a
prime target for ELRA distribution activity.
To achieve its objectives ELRA has established a distribution unit (European
Language resources Distribution Agency - ELDA) as the infrastructure within
ELRA for identifying, collecting, classifying, validating, distributing,
and exploiting LRs. ELDA manages and oversees these activities. Additional
activities include developing evaluation guidelines, serving as a broker
between producers and users of LRs, and functioning as a central clearinghouse
for information.
ELRA appointed several panels of experts which will advise the ELRA Board
in crucial aspects of its activities. The initial panels appointed by the
board are:
- Panel for the identification and collection of LRs
- Panel for the validation of LRs
- Panel for the distribution of LRs
- Panel for external relationships
Each panel consists of a core of ELRA members, selected to represent the
expertise of the 3 colleges (speech, written, terminology) and chaired by
a convenor.
ELRA/ELDA has started addressing the fundamental organisational, technical,
and economic problems which constitute the crucial barriers to the development
of the market of LRs. For this purpose, ELRA is now working in order to:
- constitute a catalogue of existing LRs and start to negotiate with
suppliers the acquisition of an initial selected set of best-seller LRs
for distribution
- define a variety of viable contractual options for the suppliers and
users of LRs
- establish a pricing policy
- study, with the assistance of a legal expert, practical methods and
licence agreements to overcome problems related to intellectual property
- establish co-operation links with permanent major suppliers of LRs
- define a methodology for LR validation: whereas for the validation
of speech LR methods and tools have already been to a certain extent studied
and experimented, very little is known about methods for validating written
and terminological LRs. ELRA will promote specific research on these issues,
in synergy with LR projects launched in the 4th Framework programme of the
European Commission
- actively market and distribute the initial set of LRs which ELRA acquired.
The services provided by ELRA could vary from the simple cataloguing and
propagating of information, to promotion and brokerage, through assistance
to the producers in preparing their LRs for documentation, validation and
normalization of LRs, including their physical distribution.
Because the field is relatively immature, one of the first priorities is
to establish standards to facilitate reuse for performance and interworking
and standards for quality control of the resources. The project of the Expert
Advisory Group on Language Engineering Standards (EAGLES) and other LE projects
(SPEECHDAT, PAROLE, INTERVAL) will be used as the basis for this work in
establishing the standards, but the role of ELRA is to ensure that the standards
are applied, not least in quality control of resources.
Ownership rights are also a major problem in the field, with the associated
problems of copyright and copying prevention. The project will analyse various
possible solutions, suggest codes of conduct, stipulate contracts which
regulate the status of LRs distributed.
Results of the work of the association can be measured by the number of
members, by the number of LRs handled, and by the number and value of the
LRs collected, validated, and disseminated. In a more qualitative, but perhaps
in the long run more important sense, the success of ELRA will be judged
by how it succeeds in raising the profile of LRs and LE throughout the EU.
Results will also come from the stimulation it provides to the creation
of LRs, and in particular in those fields where some social or other non-commercial
incentive is provided for the creation and dissemination of LRs.
Please contact:
Khalid Choukri - ELRA
Tel: +33 1 45 86 53 00
E-mail: elra@calvanet.calvacom.fr
return to the contents page