1) ATCO2-test-setĬorpus contains 4 hours of ATC speech with manual transcripts and a subset with The ATCO2 corpus is split into three subsets. The ATCO2 corpus covers 1) data collection and pre-processing,Ģ) pseudo-annotations of speech data, and 3) extraction of ATC-related namedĮntities. Research on the challenging ATC field, which has lagged behind due to lack ofĪnnotated data. This paper, we introduce the ATCO2 corpus, a dataset that aims at fostering Two examples areĪutomatic speech recognition (ASR) and natural language understanding (NLU). Novel technologies into ATC (low-resource domain), large-scale annotatedĭatasets are required to develop the data-driven AI systems. Pilots via very-high frequency radio channels. Voice-based dialogues are carried between an air traffic controller (ATCO) and ATC aims at guidingĪircraft and controlling the airspace in a safe and optimal manner. A clearĮxample is air traffic control (ATC) communications. Systems are becoming more critical in our interconnected digital world. Download a PDF of the paper titled ATCO2 corpus: A Large-Scale Dataset for Research on Automatic Speech Recognition and Natural Language Understanding of Air Traffic Control Communications, by Juan Zuluaga-Gomez and Karel Vesel\'y and Igor Sz\"oke and Alexander Blatt and Petr Motlicek and Martin Kocour and Mickael Rigault and Khalid Choukri and Amrutha Prasad and Seyyed Saeed Sarfjoo and Iuliia Nigmatulina and Claudia Cevenini and Pavel Kol\vernock\'y and Dietrich Klakow Download PDF Abstract: Personal assistants, automatic speech recognizers and dialogue understanding
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |