Abstract:
In one embodiment, an electronic device includes an input device configured to provide an input stream, a first processing device, and a second processing device. The first processing device is configured to use a keyword-detection model to determine if the input stream comprises a keyword, wake up the second processing device in response to determining that a segment of the input stream comprises the keyword, and modify the keyword-detection model in response to a training input received from the second processing device. The second processing device is configured to use a first neural network to determine whether the segment of the input stream comprises the keyword and provide the training input to the first processing device in response to determining that the segment of the input stream does not comprise the keyword.
Abstract:
A method, performed by a connection manager, for connecting an input device and one of a plurality of electronic devices as a target device is disclosed. The method includes detecting a face of a user in a captured image, and determining a first gaze direction of the user from the face of the user in the captured image. Based on the first gaze direction, the method determines the target device in the plurality of electronic devices and connects the input device and the target device.
Abstract:
Certain aspects of the present disclosure provide techniques for improved domain adaptation in machine learning. A feature tensor is generated by processing input data using a feature extractor. A first set of logits is generated by processing the feature tensor using a domain-agnostic classifier, and a second set of logits is generated by processing the feature tensor using a domain-specific classifier. A loss is computed based at least in part on the first set of logits and the second set of logits, where the loss includes a divergence loss component. The feature extractor, the domain-agnostic classifier, and the domain-specific classifier are refined using the loss.
Abstract:
According to an aspect of the present disclosure, a method for controlling access to a plurality of electronic devices is disclosed. The method includes detecting whether a first device is in contact with a user, adjusting a security level of the first device to activate the first device when the first device is in contact with the user, detecting at least one second device within a communication range of the first device, and adjusting a security level of the at least one second device to control access to the at least one second device based on a distance between the first device and the at least one second device.
Abstract:
A method for generating a notification by an electronic device to alert a user of the electronic device is disclosed. In this method, a speech phrase may be received. Then, the received speech phrase may be recognized, by a processor, as a command to generate the notification. In addition, one or more context data of the electronic device may be detected by at least one sensor. It may be determined whether the notification is to be generated at least based on the context data. The notification may be generated, by the processor, based on the context data and the command to generate the notification.
Abstract:
A method for controlling access to a plurality of applications in an electronic device includes receiving a voice command from a speaker for accessing a target application among the plurality of applications, and verifying whether the voice command is indicative of a user authorized to access the applications based on a speaker model of the authorized user. In this method, each application is associated with a security level having a threshold value. The method further includes updating the speaker model with the voice command if the voice command is verified to be indicative of the user, and adjusting at least one of the threshold values based on the updated speaker model.
Abstract:
According to an aspect of the present disclosure, a method for generating a keyword model of a user-defined keyword in an electronic device is disclosed. The method includes receiving at least one input indicative of the user-defined keyword, determining a sequence of subwords from the at least one input, generating the keyword model associated with the user-defined keyword based on the sequence of subwords and a subword model of the subwords, wherein the subword model is configured to model a plurality of acoustic features of the subwords based on a speech database, and providing the keyword model associated with the user-defined keyword to a voice activation unit configured with a keyword model associated with a predetermined keyword.
Abstract:
A device to perform end-of-utterance detection includes a speaker vector extractor configured to receive a frame of an audio signal and to generate a speaker vector that corresponds to the frame. The device also includes an end-of-utterance detector configured to process the speaker vector and to generate an indicator that indicates whether the frame corresponds to an end of an utterance of a particular speaker.
Abstract:
A device includes a screen and one or more processors configured to provide, at the screen, a graphical user interface (GUI) configured to display data associated with multiple devices on the screen. The GUI is also configured to illustrate a label and at least one control input for each device of the multiple devices. The GUI is also configured to provide feedback to a user. The feedback indicates that a verbal command is not recognized with an action to be performed. The GUI is also configured to provide instructions for the user on how to teach the one or more processors which action is to be performed in response to receiving the verbal command.
Abstract:
According to an aspect of the present disclosure, a method for generating a keyword model of a user-defined keyword in an electronic device is disclosed. The method includes receiving at least one input indicative of the user-defined keyword, determining a sequence of subwords from the at least one input, generating the keyword model associated with the user-defined keyword based on the sequence of subwords and a subword model of the subwords, wherein the subword model is configured to model a plurality of acoustic features of the subwords based on a speech database, and providing the keyword model associated with the user-defined keyword to a voice activation unit configured with a keyword model associated with a predetermined keyword.