-
公开(公告)号:US11107462B1
公开(公告)日:2021-08-31
申请号:US16175086
申请日:2018-10-30
Applicant: Facebook, Inc.
Inventor: Christian Fuegen , Yongquiang Wang , Anuj Kumar , Baiyang Liu , Dmitrii Serdiuk
Abstract: Exemplary embodiments relate to improvements in spoken language understanding (SLU) systems. Conventionally, SLU systems include an automatic speech recognition (ASR) component configured to receive an input of audio data and to generate a textual representation of the audio data. Conventional SLU systems also include a natural language understanding (NLU) component configured to receive a text-based transcript and perform language-based tasks such as domain classification, intent determination, and slot-filling. However, these two components are typically trained separately based on different metrics. In real-world situations, errors in the ASR component propagate to the NLU component, which degrades the performance of the overall system. Exemplary embodiments described herein perform SLU in an end-to-end manner that infers semantic meaning directly from audio features without an intermediate text representation. This may allow for more a more accurate translation performed in a more resource-efficient manner (particularly in terms of processing resources).