-
公开(公告)号:US11087763B2
公开(公告)日:2021-08-10
申请号:US16236295
申请日:2018-12-28
Inventor: Chao Li , Weixin Zhu , Ming Wen
IPC: G10L15/28 , G06N3/08 , G06N5/04 , G10L15/16 , G10L15/183
Abstract: A voice recognition method is provided by embodiments of the present application. The method includes: obtaining a voice signal to be recognized; and recognizing a current frame in the voice signal using a pre-trained causal acoustic model, according to the current frame in the voice signal and a frame within a preset time period before the current frame, the causal acoustic model being derived based on a causal convolutional neural network training. In the method provided by the embodiments of the present application, only the information of the current frame and the frame before the current frame is used when performing the recognition of the current frame, thereby solving a problem in voice recognition technologies based on prior art convolutional neural network where a hard delay is created because there is a need to wait for the frames after the current frame, improving the timeliness of the voice recognition.
-
公开(公告)号:US20210225363A1
公开(公告)日:2021-07-22
申请号:US16972420
申请日:2019-03-12
Applicant: SONY CORPORATION
Inventor: HIRO IWASE , YUHEI TAKI , KUNIHITO SAWAI
IPC: G10L15/065 , G10L15/22 , G10L15/18 , G10L15/28
Abstract: The present invention has an issue of effectively reducing the input load related to a voice trigger. There is provided an information processing device comprising a registration control unit that dynamically controls registration of startup phrases used as start triggers of a voice interaction session, in which the registration control unit temporarily additionally registers at least one of the startup phrases based on input voice. There is also provided an information processing method comprising dynamically controlling, by a processor, registration of startup phrases used as start triggers of a voice interaction session, in which the controlling further includes temporarily additionally registering at least one of the startup phrases based on input voice.
-
63.
公开(公告)号:US20210210093A1
公开(公告)日:2021-07-08
申请号:US17132112
申请日:2020-12-23
Inventor: Lei GENG
IPC: G10L15/22 , G10L21/0208 , G10L15/28
Abstract: The present disclosure provides a smart audio device, including a front chip provided therein with a plurality of voice algorithm modules; and a main control chip signally connected with the front chip and configured to call the voice algorithm modules in the front chip according to a user request in a multi-thread mode. The smart audio device is low in cost and power consumption, has long service life, and can improve user experience. The present disclosure further provides a calling method for audio device, an electronic device and a computer readable medium.
-
公开(公告)号:US20210191456A1
公开(公告)日:2021-06-24
申请号:US17196060
申请日:2021-03-09
Applicant: Google LLC
Inventor: James Nelson Castro , Carl Alexander Cepress , Liang Ching Tseng , Darren Torrie , Frances Maria Hui Kwee , Rex Pinegar Price
IPC: G06F1/16 , G02F1/1333 , G02F1/1337 , G06F3/16 , G10L15/28 , G06F21/83 , H04R1/02 , H04R1/34
Abstract: In a display assistant device, a speaker is mounted in a waveguide structure which is at least partially disposed beneath a display screen. The waveguide structure is mounted in an exterior housing which includes speaker grills distributed on a plurality of surfaces of the exterior housing, permitting sound waves from the speaker to be projected outside the exterior housing. A cover structure is disposed on top of the waveguide structure to conceal the waveguide structure and speaker within the exterior housing. The cover structure has a tilted bottom surface configured to be suspended above the waveguide structure and to be separated by a first space. Sound waves projected from an upper portion of the speaker are reflected by the tilted bottom surface and are guided through the first space to exit the device from a speaker grill portion located on a rear side of the exterior housing.
-
公开(公告)号:US20210151041A1
公开(公告)日:2021-05-20
申请号:US17127394
申请日:2020-12-18
Applicant: Apple Inc.
Inventor: Thomas R. GRUBER , Harry J. SADDLER , Jerome Rene BELLEGARDA , Bryce H. NYEGGEN , Alessandro SABATELLI
IPC: G10L15/18 , G10L15/26 , G10L15/28 , G06F40/205
Abstract: Systems and processes are disclosed for handling a multi-part voice command for a virtual assistant. Speech input can be received from a user that includes multiple actionable commands within a single utterance. A text string can be generated from the speech input using a speech transcription process. The text string can be parsed into multiple candidate substrings based on domain keywords, imperative verbs, predetermined substring lengths, or the like. For each candidate substring, a probability can be determined indicating whether the candidate substring corresponds to an actionable command. Such probabilities can be determined based on semantic coherence, similarity to user request templates, querying services to determine manageability, or the like. If the probabilities exceed a threshold, the user intent of each substring can be determined, processes associated with the user intents can be executed, and an acknowledgment can be provided to the user.
-
公开(公告)号:US20210141597A1
公开(公告)日:2021-05-13
申请号:US17064549
申请日:2020-10-06
Applicant: TransEnterix Europe S.a.r.l.
Inventor: Gal ATAROT , Motti FRIMER , Tal NIR , Lior ALPERT
Abstract: The following invention is a vocally activated control system for controlling an apparatus in a surgical setting, the system comprises: a. a voice sensor configured to detect vocal commands generated by surgeons during surgery; b. a signal transmitter connected to the voice sensor, the transmitter is configured to convert a vocal command into a transmittable signal and transmit it; c. a processor connected to a signal transmitter configured to receive a transmittable vocal signal, the processor is configured to convert a vocal signal to a predetermined set of operative instructions associated with the apparatus, the predetermined set of operative instructions comprising at least one instruction; and d. control means connected to the processor and apparatus; the control means is configured to receive a predetermined set of operative instructions and to cause the apparatus to operate accordingly; Said voice sensor and said transmitter are integrated within a wearable element.
-
公开(公告)号:US10996717B2
公开(公告)日:2021-05-04
申请号:US16597745
申请日:2019-10-09
Applicant: GOOGLE LLC
Inventor: James Castro , Carl Cepress , Liang Ching Tseng , Darren Torrie , Frances Kwee , Rex Price
IPC: G06F1/16 , G10L15/28 , H04R1/02 , H04R1/34 , H04L12/28 , G02F1/1333 , G02F1/1337 , G06F3/16 , G06F21/83
Abstract: In a display assistant device, a speaker is mounted in a waveguide structure which is at least partially disposed beneath a display screen. The waveguide structure is mounted in an exterior housing which includes speaker grills distributed on a plurality of surfaces of the exterior housing, permitting sound waves from the speaker to be projected outside the exterior housing. A cover structure is disposed on top of the waveguide structure to conceal the waveguide structure and speaker within the exterior housing. The cover structure has a tilted bottom surface configured to be suspended above the waveguide structure and to be separated by a first space. Sound waves projected from an upper portion of the speaker are reflected by the tilted bottom surface and are guided through the first space to exit the device from a speaker grill portion located on a rear side of the exterior housing.
-
公开(公告)号:US10984782B2
公开(公告)日:2021-04-20
申请号:US15640251
申请日:2017-06-30
Applicant: Microsoft Technology Licensing, LLC
Inventor: Erich-Soren Finkelstein , Han Yee Mimi Fung , Oz Solomon , Keith Coleman Herold
IPC: G10L15/22 , G06F3/16 , G10L15/18 , G06K9/72 , G06K9/00 , G06T7/70 , G06K9/62 , G06N20/00 , G06T7/292 , H04W4/33 , H04W4/029 , A61B5/11 , A61B5/117 , A61B5/00 , G01S5/28 , G06F1/3206 , G06F1/3231 , G06F1/324 , G06F3/01 , G06F3/03 , G06F21/32 , G10L17/04 , G10L17/08 , H04L12/58 , H04L29/08 , H04N5/232 , H04N7/18 , H04N21/422 , H04N21/442 , G07C9/28 , G06F40/35 , G06F40/211 , G06T7/73 , G06T7/246 , G01S5/18 , G06T7/60 , G10L15/28 , H04R1/40 , H04R3/00 , H04N5/33 , G10L15/02 , G06N5/02 , G06N5/04 , G10L15/06 , G10L15/24 , G10L15/26 , G10L15/19 , G10L15/08 , G10L15/32 , G10L25/51 , H04L29/06 , A61B5/0205 , A61B5/0507 , G01S13/72 , G06F21/35 , G08B13/14 , G06F3/0482 , G06F3/0484 , H04N21/231 , G06F3/0488 , G06F16/70 , A61B5/05 , G01S5/16 , G01S11/14 , G01S13/86 , G06N3/04 , G08B29/18 , G10L17/00 , G07C9/32 , H04N5/247 , G01S13/38 , G01S13/88
Abstract: To address the issues of handling conversations with multiple users, an intelligent digital assistant system is provided. The system may include at least one microphone configured to receive an audio input, a speaker configured to emit an audio output, and a processor. The processor may be configured engage in a conversation with a first user, and, concurrent with the first user being engaged in the conversation with the system, recognize speech of one or more additional users in the audio input. The processor may process the recognized speech of the one or more additional users to determine a context for each additional user, and execute a conversation disentanglement module to select and perform one or more predetermined conversation disentanglement actions according to the context of the recognized speech of each additional user.
-
公开(公告)号:US20210109703A1
公开(公告)日:2021-04-15
申请号:US16499197
申请日:2018-03-27
Applicant: Samsung Electronics Co., Ltd
Inventor: Jihyun KIM , Dongho JANG , Minkyung HWANG , Kyungtae KIM , Inwook SONG , Yongjoon JEON
IPC: G06F3/16 , G06F3/0488 , G06F3/0481 , G10L15/28 , G10L15/22
Abstract: Various embodiments of the present invention relate to an electronic device and a screen control method for processing a user input by using the same, and according to the various embodiments of the present invention, the electronic device comprises: a housing; a touchscreen display located inside the housing and exposed through a first part of the housing; a microphone located inside the housing and exposed through a second part of the housing; at least one speaker located inside the housing and exposed through a third part of the housing; a wireless communication circuit located inside the housing; a processor located inside the housing and electrically connected to the touchscreen display, the microphone, the at least one speaker, and the wireless communication circuit; and a memory located inside the housing and electrically connected to the processor, wherein the memory stores a first application program including a first user interface and a second application program including a second user interface, wherein the memory stores instructions, and when the memory is executed, cause the processor to: display the first user interface on the display, while displaying the first user interface, receive a user input through at least one of the display or the microphone, wherein the user input includes a request for performing a task using the second application program, transmit data associated with the user input to an external server via the communication circuit, receive a response from the external server via the communication circuit, wherein the response includes information on a sequence of states of the electronic device to perform the task, and after receiving the response, display the second user interface on a first region of the display, based on the sequence of the states, while displaying a portion of the first user interface on a second region of the display. Other various embodiments, in addition to the various embodiments disclosed in the present invention, are possible.
-
公开(公告)号:US10978048B2
公开(公告)日:2021-04-13
申请号:US15987115
申请日:2018-05-23
Applicant: Samsung Electronics Co., Ltd.
Inventor: Tae Jin Lee , Young Woo Lee , Seok Yeong Jung , Chakladar Subhojit , Jae Hoon Jeong , Jun Hui Kim , Jae Geun Lee , Hyun Woong Lim , Soo Min Kang , Eun Hye Shin , Seong Min Je
Abstract: An apparatus comprising one or more processors, a communication circuit, and a memory for storing instructions, which when executed, performs a method of recognizing a user utterance. The method comprises: receiving first data associated with a user utterance, performing, a first determination to determine whether the user utterance includes the first data and a specified word, performing a second determination to determine whether the first data includes the specified word, transmitting the first data to an external server, receiving a text generated from the first data by the external server, performing a third determination to determine whether the received text matches the specified word, and determining whether to activate the voice-based input system based on the third determination.
-
-
-
-
-
-
-
-
-