[PLing] Spoken Torlak dialect corpus 1.0
Teodora Vukovic
teodora.vukovic2 at uzh.ch
Thu Nov 12 11:16:16 CET 2020
Dear Colleagues,
We are glad to share with you a new linguistic resource.
TraCeBa <https://traceba.net/> project at the University of Zurich has published the Spoken Torlak dialect corpus 1.0 <http://hdl.handle.net/11356/1281>.
The current version of the corpus includes language samples from Timok <https://goo.gl/maps/JZHSkBoArPsy98Lm6> in Southeast Serbia. The corpus is freely available for search on NoSketchEngine <https://www.clarin.si/noske/run.cgi/first_form?corpname=torlak;align=> and KonText <https://www.clarin.si/kontext/first_form?corpname=torlak>. Corpus files in different formats can be downloaded from the Clarin.si <http://hdl.handle.net/11356/1281> repository. The corpus contains 500’000 tokens, annotated using the MultextEast tagset <http://nl.ijs.si/ME/V6/msd/html/msd-sr-tor.html> and lemmatized.
An upcoming version planned for 2021 will include more data from other Torlak regions.
You will be able to read more about the corpus in an upcoming publication:
Teodora Vukovic Representing variation in a spoken corpus of an endangered dialect. The case of Torlak. Language Resources and Evaluation <https://www.springer.com/journal/10579>.
Best regards,
Teodora Vukovic
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.univie.ac.at/pipermail/pling/attachments/20201112/441a0dae/attachment.html>
More information about the PLing
mailing list