Hauptthemen unserer Publikationen zu Schweizerdeutsch:
2022
Forschungsberichte
Schraner, Yanick; Scheller, Christian; Plüss, Michel; Neukom, Lukas; Vogel, Manfred
Comparison of Unsupervised Learning and Supervised Learning with Noisy Labels for Low-Resource Speech Recognition Forschungsbericht
2022.
Links | BibTeX | Schlagwörter: forced-alignment, low-resource, self-supervised, semi-supervised, Speech Recognition/Understanding, speech translation
@techreport{nokey,
title = {Comparison of Unsupervised Learning and Supervised Learning with Noisy Labels for Low-Resource Speech Recognition},
author = {Yanick Schraner and Christian Scheller and Michel Plüss and Lukas Neukom and Manfred Vogel},
editor = {University of Applied Sciences and Arts Northwestern Switzerland},
url = {https://www.isca-speech.org/archive/pdfs/interspeech_2022/schraner22_interspeech.pdf},
year = {2022},
date = {2022-09-22},
keywords = {forced-alignment, low-resource, self-supervised, semi-supervised, Speech Recognition/Understanding, speech translation},
pubstate = {published},
tppubtype = {techreport}
}
2020
Forschungsberichte
Plüss, Michel; Neukom, Lukas; Scheller, Christian; Vogel, Manfred
Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus Forschungsbericht
2020.
Abstract | Links | BibTeX | Schlagwörter: Corpus, forced-alignment
@techreport{nokey,
title = {Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus},
author = {Michel Plüss and Lukas Neukom and Christian Scheller and Manfred Vogel },
editor = {Institute for Data Science
University of Applied Sciences and Arts Northwestern Switzerland
Windisch, Switzerland},
url = {https://ceur-ws.org/Vol-2957/paper3.pdf},
year = {2020},
date = {2020-10-06},
urldate = {2020-10-06},
abstract = {We present the Swiss Parliaments Corpus (SPC), an automatically aligned Swiss German speech to Standard German text corpus. This first version of the corpus is based on publicly available data of the Bernese cantonal parliament and consists of 293 hours of data. It was created using a novel forced sentence alignment procedure and an alignment quality estimator, which can be used to trade off corpus size and quality. We trained Automatic Speech Recognition (ASR) models as baselines on different subsets of the data and achieved a Word Error Rate (WER) of 0.278 and a BLEU score of 0.586 on the SPC test set. The corpus is freely available for download.},
keywords = {Corpus, forced-alignment},
pubstate = {published},
tppubtype = {techreport}
}
We present the Swiss Parliaments Corpus (SPC), an automatically aligned Swiss German speech to Standard German text corpus. This first version of the corpus is based on publicly available data of the Bernese cantonal parliament and consists of 293 hours of data. It was created using a novel forced sentence alignment procedure and an alignment quality estimator, which can be used to trade off corpus size and quality. We trained Automatic Speech Recognition (ASR) models as baselines on different subsets of the data and achieved a Word Error Rate (WER) of 0.278 and a BLEU score of 0.586 on the SPC test set. The corpus is freely available for download.