[PLing] Spoken Torlak dialect corpus 1.0

Thu Nov 12 11:16:16 CET 2020

Dear Colleagues, 

We are glad to share with you a new linguistic resource.
TraCeBa <https://traceba.net/> project at the University of Zurich has published the Spoken Torlak dialect corpus 1.0 <http://hdl.handle.net/11356/1281>. 

The current version of the corpus includes language samples from Timok <https://goo.gl/maps/JZHSkBoArPsy98Lm6> in Southeast Serbia. The corpus is freely available for search on NoSketchEngine <https://www.clarin.si/noske/run.cgi/first_form?corpname=torlak;align=> and KonText <https://www.clarin.si/kontext/first_form?corpname=torlak>. Corpus files in different formats can be downloaded from the Clarin.si <http://hdl.handle.net/11356/1281> repository. The corpus contains 500’000 tokens, annotated using the MultextEast tagset <http://nl.ijs.si/ME/V6/msd/html/msd-sr-tor.html> and lemmatized.
An upcoming version planned for 2021 will include more data from other Torlak regions.

You will be able to read more about the corpus in an upcoming publication: 
Teodora Vukovic Representing variation in a spoken corpus of an endangered dialect. The case of Torlak. Language Resources and Evaluation <https://www.springer.com/journal/10579>.

Best regards, 
Teodora Vukovic
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.univie.ac.at/pipermail/pling/attachments/20201112/441a0dae/attachment.html>