[PLing] Fwd: LT4HALA workshop: CFP and dissemination

Fri Dec 13 12:38:19 CET 2019

** With apologies for cross-posting**Call for Papers: 1st Workshop on
Language Technologies for Historical and Ancient LAnguages (LT4HALA)

   - Website: https://circse.github.io/LT4HALA/
   - Date: May 12, 2020
   - Place: co-located with LREC 2020 <https://lrec2020.lrec-conf.org/>,
   May 11-16, Marseille, France

Description

LT4HALA is a one-day workshop that seeks to bring together scholars who are
developing and/or are using Language Technologies (LTs) for historically
attested languages, so to foster cross-fertilization between the
Computational Linguistics community and the areas in the Humanities dealing
with historical linguistic data, e.g. historians, philologists, linguists,
archaeologists and literary scholars. Despite the current availability of
large collections of digitized texts written in historical languages, such
interdisciplinary collaboration is still hampered by the limited
availability of annotated linguistic resources for most of the historical
languages. Creating such resources is a challenge and an obligation for
LTs, both to support historical linguistic research with the most updated
technologies and to preserve those precious linguistic data that survived
from past times.

Relevant topics for the workshop include, but are not limited to:

   -

   handling spelling variation;
   -

   detection and correction of OCR errors;
   -

   creation and annotation of digital resources;
   -

   deciphering;
   -

   morphological/syntactic/semantic analysis of textual data;
   -

   adaptation of tools to address diachronic/diatopic/diastratic variation
   in texts;
   -

   teaching ancient languages with NLP tools;
   -

   NLP-driven theoretical studies in historical linguistics;
   -

   evaluation of NLP tools.

Shared Tasks

Just because of the limited amount of data preserved for historical and
ancient languages, an important role is played by evaluation practices, to
understand the level of accuracy of the NLP tools used to build and analyze
resources. Given the prominence of Latin, by virtue of its wide diachronic
and diatopic span covering two millennia all over Europe, the workshop will
host the first edition of EvaLatin
<https://circse.github.io/LT4HALA/EvaLatin>, an evaluation campaign
entirely devoted to the evaluation of NLP tools for Latin. The first
edition of EvaLatin will focus on two tasks (i.e. Lemmatization and PoS
tagging), each featuring three sub-tasks (i.e. Classical, Cross-Genre,
Cross-Time). These sub-tasks are designed to measure the impact of genre
and diachrony on NLP tools performances, a relevant aspect to keep in mind
when dealing with the diachronic and diatopic diversity of Latin.
Participants will be provided with shared data in the CoNLL-U format and
the evaluation script.
Submissions

For the workshop, we invite papers of different types such as experimental
papers, reproduction papers, resource papers, position papers, survey papers.
Both long and short papers describing original and unpublished work are
welcome. Long papers should deal with substantial completed research and/or
report on the development of new methodologies. They may consist of up to 8
pages of content plus 2 pages of references. Short papers are instead
appropriate for reporting on works in progress or for describing a singular
tool or project. They may consist of up to 4 pages of content plus 2 pages
of references. We encourage the authors of papers reporting experimental
results to make their results reproducible and the entire process of
analysis replicable, by making the data and the tools they used available.
The form of the presentation may be oral or poster, whereas in the
proceedings there is no difference between the accepted papers. The
submission is NOT anonymous. The LREC official format
<https://lrec2020.lrec-conf.org/en/submission2020/authors-kit/> is
requested. Each paper will be reviewed but three independent reviewers.

As for EvaLatin <https://circse.github.io/LT4HALA/EvaLatin>, participants
will be required to submit a technical report for each task (with all the
related sub-tasks) they took part in. Technical reports will be included in
the proceedings as short papers: the maximum length is 4 pages (excluding
references) and they should follow the LREC official format
<https://lrec2020.lrec-conf.org/en/submission2020/authors-kit/>. Reports
will receive a light review (we will check for the correctness of the
format, the exactness of results and ranking, and overall exposition). All
participants will have the possibility to present their results at the
workshop: we will allocate an oral session and a poster session fully
devoted to the shared tasks.
Important Dates

Workshop

   -

   17 February 2020: submissions due
   -

   10 March 2020: notifications to authors
   -

   27 March 2020: camera-ready due
   -

   12 May 2020: workshop

EvaLatin

   -

   10 December 2019: training data available
   -

   Evaluation Window I - Task: Lemmatization
   -

      17 February 2010: test data available
      -

      21 February 2020 system results due to organizers
      -

   Evaluation Window II - Task: PoS tagging
   -

      24 February 2020: test data available
      -

      28 February 2020: system results due to organizers
      -

   6 March 2020: assessment returned to participants
   -

   27 March 2020: reports due to organizers
   -

   10 April 2020: camera ready version of reports due to organizers
   -

   12 May 2020: workshop

Share your LRs!

Describing your LRs in the LRE Map <http://lremap.elra.info/> is now a
normal practice in the submission procedure of LREC (introduced in 2010 and
adopted by other conferences). To continue the efforts initiated at LREC
2014 about “Sharing LRs” (data, tools, web-services, etc.), authors will
have the possibility, when submitting a paper, to upload LRs in a special
LREC repository. This effort of sharing LRs, linked to the LRE Map for
their description, may become a new “regular” feature for conferences in
our field, thus contributing to creating a common repository where everyone
can deposit and share data.
ISLRN number

As scientific work requires accurate citations of referenced work so as to
allow the community to understand the whole context and also replicate the
experiments conducted by other researchers, LREC 2020 endorses the need to
uniquely Identify LRs through the use of the International Standard
Language Resource Number (ISLRN
<https://circse.github.io/LT4HALA/www.islrn.org>), a Persistent Unique
Identifier to be assigned to each Language Resource. The assignment of
ISLRNs to LRs cited in LREC papers will be offered at submission time.
Organizers

   -

   Marco Passarotti
   <https://docenti.unicatt.it/ppd2/en/#/en/docenti/14144/marco-carlo-passarotti/profilo>,
   Università Cattolica del Sacro Cuore,Milan, Italy;
   -

   Rachele Sprugnoli
   <https://www.researchgate.net/profile/Rachele_Sprugnoli>, Università
   Cattolica del Sacro Cuore,Milan, Italy.

Programme Committee

   -

   Marcel Bollmann, University of Copenhagen; Denmark;
   -

   Gerlof Bouma, University of Gothenburg, Sweden;
   -

   Patrick Burns, University of Texas at Austin, USA;
   -

   Oksana Dereza, Insight Centre for Data Analytics, Ireland;
   -

   Stefanie Dipper, Ruhr-Universität Bochum, Germany;
   -

   Hanne Eckoff, Oxford University, UK;
   -

   Maud Ehrmann, EPFL, Switzerland;
   -

   Hannes A. Fellner, Universität Wien, Austria;
   -

   Heidi Jauhiainen, University of Helsinki, Finland;
   -

   Julia Krasselt, Zurich University of Applied Sciences, Switzerland;
   -

   John Lee, City University of Hong Kong;
   -

   Chao-Lin Liu, National Chengchi University, Taiwan;
   -

   Barbara McGillivray, University of Cambridge, UK;
   -

   Beáta Megyesi, Uppsala University, Sweden;
   -

   So Miyagawa, University of Göttingen; Germany;
   -

   Joakim Nivre, Uppsala University, Sweden;
   -

   Eva Pettersson, Uppsala University, Sweden;
   -

   Michael Piotrowski, University of Lausanne, Switzerland;
   -

   Sophie Prévost, Laboratoire Lattice, France;
   -

   Halim Sayoud, USTHB University;
   -

   Olga Scrivner, Indiana University, USA;
   -

   Neel Smith, College of the Holy Cross, USA;
   -

   Sara Tonelli, Fondazione Bruno Kessler, Italy;
   -

   Amir Zeldes, Georgetown University, USA;
   -

   Daniel Zeman, Charles University, Czech Republic.

Contact

rachele.sprugnoli[AT]unicatt.it

Please, write “LT4HALA” or “EvaLatin” in the subject of your email.

Follow @ERC_LiLa <https://twitter.com/ERC_LiLa> and the hashtag #LT4HALA
<https://twitter.com/search?q=%23LT4HALA&src=typed_query> on Twitter for
updates.

-- 
Hannes A. Fellner

Department of Linguistics
University of Vienna
Sensengasse 3a
1090 Vienna
Austria

tinyurl.com/hannesafellner
-------------- n�chster Teil --------------
Ein Dateianhang mit HTML-Daten wurde abgetrennt...
URL: <https://lists.univie.ac.at/pipermail/pling/attachments/20191213/d8406536/attachment.html>