METHOD OF DETECTING A TRANSCRIPTION ERROR IN SPEECH RECOGNITION CORPUS AND DEVICE FOR THE SAME

    公开(公告)号:US20240194203A1

    公开(公告)日:2024-06-13

    申请号:US18532770

    申请日:2023-12-07

    IPC分类号: G10L15/26 G10L15/01 G10L15/22

    CPC分类号: G10L15/26 G10L15/01 G10L15/22

    摘要: Provided are a method and device for detecting transcription error in a speech recognition corpus. The method of detecting a transcription error in speech recognition corpus includes following steps: (a) receiving the speech recognition corpus including a speech file and a text label for the speech file; (b) performing speech recognition on the speech file of the speech recognition corpus using a speech recognition model and converting the speech recognition result into text; (c) extracting a performance evaluation index of the speech recognition model; (d) extracting a PPL(s2) for the text label and a PPL(s1) for the text using a language model; and (e) detecting a transcription error in text label of the speech recognition corpus using the extracted performance evaluation index and the PPL(s2) and PPL(s1).