Hauptthemen unserer Publikationen zu Schweizerdeutsch:
2022
Forschungsberichte
Schraner, Yanick; Scheller, Christian; Plüss, Michel; Vogel, Manfred
Swiss German Speech to Text Evaluation Forschungsbericht
2022.
Abstract | Links | BibTeX | Schlagwörter: speech translation, Speech-to-Text, Swiss German, System Evaluation
@techreport{nokey,
title = {Swiss German Speech to Text Evaluation},
author = {Yanick Schraner and Christian Scheller and Michel Plüss and Manfred Vogel
},
editor = {University of Applied Sciences and Arts Northwestern Switzerland},
url = {https://arxiv.org/pdf/2207.00412.pdf},
year = {2022},
date = {2022-11-14},
urldate = {2022-11-14},
abstract = {We present an in-depth evaluation of four commercially available Speech-to-Text (STT) systems
for Swiss German. The systems are anonymized and referred to as system a, b, c and d in this
report. We compare the four systems to our STT models, referred to as FHNW in the following,
and provide details on how we trained our model. To evaluate the models, we use two STT datasets
from different domains. The Swiss Parliament Corpus (SPC) test set and the STT4SG-350 corpus,
which contains texts from the news sector with an even distribution across seven dialect regions. We
provide a detailed error analysis to detect the strengths and weaknesses of the different systems. On
both datasets, our model achieves the best results for both, the WER (word error rate) and the BLEU
(bilingual evaluation understudy) scores. On the SPC test set, we obtain a BLEU score of 0.607,
whereas the best commercial system reaches a BLEU score of 0.509. On the STT4SG-350 test set,
we obtain a BLEU score of 0.722, while the best commercial system achieves a BLEU score of 0.568.
However, we would like to point out that this analysis is somewhat limited by the domain-specific
idiosyncrasies of the selected texts of the two test sets.
},
keywords = {speech translation, Speech-to-Text, Swiss German, System Evaluation},
pubstate = {published},
tppubtype = {techreport}
}
We present an in-depth evaluation of four commercially available Speech-to-Text (STT) systems
for Swiss German. The systems are anonymized and referred to as system a, b, c and d in this
report. We compare the four systems to our STT models, referred to as FHNW in the following,
and provide details on how we trained our model. To evaluate the models, we use two STT datasets
from different domains. The Swiss Parliament Corpus (SPC) test set and the STT4SG-350 corpus,
which contains texts from the news sector with an even distribution across seven dialect regions. We
provide a detailed error analysis to detect the strengths and weaknesses of the different systems. On
both datasets, our model achieves the best results for both, the WER (word error rate) and the BLEU
(bilingual evaluation understudy) scores. On the SPC test set, we obtain a BLEU score of 0.607,
whereas the best commercial system reaches a BLEU score of 0.509. On the STT4SG-350 test set,
we obtain a BLEU score of 0.722, while the best commercial system achieves a BLEU score of 0.568.
However, we would like to point out that this analysis is somewhat limited by the domain-specific
idiosyncrasies of the selected texts of the two test sets.
for Swiss German. The systems are anonymized and referred to as system a, b, c and d in this
report. We compare the four systems to our STT models, referred to as FHNW in the following,
and provide details on how we trained our model. To evaluate the models, we use two STT datasets
from different domains. The Swiss Parliament Corpus (SPC) test set and the STT4SG-350 corpus,
which contains texts from the news sector with an even distribution across seven dialect regions. We
provide a detailed error analysis to detect the strengths and weaknesses of the different systems. On
both datasets, our model achieves the best results for both, the WER (word error rate) and the BLEU
(bilingual evaluation understudy) scores. On the SPC test set, we obtain a BLEU score of 0.607,
whereas the best commercial system reaches a BLEU score of 0.509. On the STT4SG-350 test set,
we obtain a BLEU score of 0.722, while the best commercial system achieves a BLEU score of 0.568.
However, we would like to point out that this analysis is somewhat limited by the domain-specific
idiosyncrasies of the selected texts of the two test sets.