MULTI-SPEAKER OVERLAPPING VOICE DETECTION METHOD AND SYSTEM THEREOF

    公开(公告)号:US20250046336A1

    公开(公告)日:2025-02-06

    申请号:US18788092

    申请日:2024-07-29

    Abstract: Disclosed are a multi-speaker overlapping voice detection method and a system. The method includes: obtaining a voice to be detected, and removing silence from the voice to be detected is removed; extracting a feature of the voice to be detected after silence removal to obtain a voice feature of the voice to be detected; and inputting the voice feature into an overlapping voice detection model to obtain an overlapping speaker number corresponding to the voice to be detected output by the overlapping voice detection model. The overlapping voice detection model is obtained by supervised training based on a voice feature of a sample voice and a corresponding label of the overlapping speaker number, extracts an embedding of the voice feature, and classifies the overlapping speaker number to obtain the overlapping speaker number of the voice to be detected based on the extracted speaker embedding.

    CONSTRUCTION METHOD AND SYSTEM OF DESCRIPTIVE MODEL OF CLASSROOM TEACHING BEHAVIOR EVENTS

    公开(公告)号:US20230334862A1

    公开(公告)日:2023-10-19

    申请号:US18011847

    申请日:2021-09-07

    Abstract: The present invention discloses construction method and system of a descriptive model of classroom teaching behavior events. The construction method includes steps as the followings: acquiring classroom teaching video data to be trained; dividing the classroom teaching video data to be trained into multiple events according to utterances of a teacher by using a voice activity detection technology; and performing multi-modal recognition on all events by using multiple artificial intelligence technologies to divide the events into sub-events in multiple dimensions, establishing an event descriptive model according to the sub-events, and describing various teaching behavior events of the teacher in a classroom. The present invention divides a classroom video according to voice, which can ensure the completeness of the teacher's non-verbal behavior in each event to the greatest extent. Also, a descriptive model that uniformly describes all events is established by extracting commonality between different events, which can not only complete the description of various teaching behaviors of the teacher, but also reflect the correlation between events, so that the events are no longer isolated.

Patent Agency Ranking