Abstract:
A method includes implementing an audio framework to be executed on a data processing device with a virtual audio driver component and a User Mode Component (UMC) communicatively coupled to each other. The virtual audio driver component enables modifying an original default audio endpoint device of an application executing on the data processing device to an emulated audio device associated with a new audio endpoint in response to an initiation through the application in conjunction with the UMC. The virtual audio driver component also enables registering the new audio endpoint as the modified default audio endpoint with an operating system executing on the data processing device. Further, the virtual audio driver component enables capturing audio data intended for the original default audio endpoint device at the new audio endpoint following the registration thereof to enable control of the audio data.
Abstract:
In various examples, audio alerts of emergency response vehicles may be detected and classified using audio captured by microphones of an autonomous or semi-autonomous machine in order to identify travel directions, locations, and/or types of emergency response vehicles in the environment. For example, a plurality of microphone arrays may be disposed on an autonomous or semi-autonomous machine and used to generate audio signals corresponding to sounds in the environment. These audio signals may be processed to determine a location and/or direction of travel of an emergency response vehicle (e.g., using triangulation). Additionally, to identify siren types—and thus emergency response vehicle types corresponding thereto—the audio signals may be used to generate representations of a frequency spectrum that may be processed using a deep neural network (DNN) that outputs probabilities of alert types being represented by the audio data.
Abstract:
In various examples, audio alerts of emergency response vehicles may be detected and classified using audio captured by microphones of an autonomous or semi-autonomous machine in order to identify travel directions, locations, and/or types of emergency response vehicles in the environment. For example, a plurality of microphone arrays may be disposed on an autonomous or semi-autonomous machine and used to generate audio signals corresponding to sounds in the environment. These audio signals may be processed to determine a location and/or direction of travel of an emergency response vehicle (e.g., using triangulation). Additionally, to identify siren types—and thus emergency response vehicle types corresponding thereto—the audio signals may be used to generate representations of a frequency spectrum that may be processed using a deep neural network (DNN) that outputs probabilities of alert types being represented by the audio data. The locations, direction of travel, and/or siren type may allow an ego-vehicle or ego-machine to identify an emergency response vehicle and to make planning and/or control decisions in response.
Abstract:
In various examples, audio alerts of emergency response vehicles may be detected and classified using audio captured by microphones of an autonomous or semi-autonomous machine in order to identify travel directions, locations, and/or types of emergency response vehicles in the environment. For example, a plurality of microphone arrays may be disposed on an autonomous or semi-autonomous machine and used to generate audio signals corresponding to sounds in the environment. These audio signals may be processed to determine a location and/or direction of travel of an emergency response vehicle (e.g., using triangulation). Additionally, to identify siren types—and thus emergency response vehicle types corresponding thereto—the audio signals may be used to generate representations of a frequency spectrum that may be processed using a deep neural network (DNN) that outputs probabilities of alert types being represented by the audio data. The locations, direction of travel, and/or siren type may allow an ego-vehicle or ego-machine to identify an emergency response vehicle and to make planning and/or control decisions in response.
Abstract:
In various examples, game session audio data—e.g., representing speech of users participating in the game—may be monitored and/or analyzed to determine whether inappropriate language is being used. Where inappropriate language is identified, the portions of the audio corresponding to the inappropriate language may be edited or modified such that other users do not hear the inappropriate language. As a result, toxic behavior or language within instances of gameplay may be censored—thereby enhancing the user experience and making online gaming environments safer for more vulnerable populations. In some embodiments, the inappropriate language may be reported—e.g., automatically—to the game developer or game application host in order to suspend, ban, or otherwise manage users of the system that have a proclivity for toxic behavior.
Abstract:
A virtual reality (VR) audio rendering system and method include spatializing microphone-captured real-world sounds according to a VR setting. In a game streaming system, when a player speaks through a microphone, the voice is processed by geometrical acoustic (GA) simulation configured for a virtual scene, and thereby spatialized audio effects specific to the scene are added. The GA simulation may include generating an impulse response using sound propagation simulation and dynamic HRTF-based listener directivity. When the GA-processed voice of the player is played, the local player or other fellow players can hear it as if the sound travels in the scenery and according to the geometries in the virtual scene. This mechanism can advantageously place the players' chatting in the same virtual world like built-in game audio, thereby advantageously providing enhanced immersive VR experience to users.
Abstract:
A virtual reality (VR) audio rendering system and method using pre-computed impulse responses (IRs) to generate audio frames in a VR setting for rendering. Based on a current position of a user or a VR object, a set of possible motions are predicted and a set of IRs are pre-computed by using a Geometric Acoustic (GA) model of a virtual scene. Once a position change is actually detected, one of the pre-computed IRs is selected and convolved with a set of audio frames to generate modified audio frames for rendering. As the modified audio frames are generated by using pre-computed IR without requiring intensive ray tracing computations, the audio latency can be significantly reduced.
Abstract:
In various examples, audio alerts of emergency response vehicles may be detected and classified using audio captured by microphones of an autonomous or semi-autonomous machine in order to identify travel directions, locations, and/or types of emergency response vehicles in the environment. For example, a plurality of microphone arrays may be disposed on an autonomous or semi-autonomous machine and used to generate audio signals corresponding to sounds in the environment. These audio signals may be processed to determine a location and/or direction of travel of an emergency response vehicle (e.g., using triangulation). Additionally, to identify siren types—and thus emergency response vehicle types corresponding thereto—the audio signals may be used to generate representations of a frequency spectrum that may be processed using a deep neural network (DNN) that outputs probabilities of alert types being represented by the audio data.
Abstract:
In various examples, game session audio data—e.g., representing speech of users participating in the game—may be monitored and/or analyzed to determine whether inappropriate language is being used. Where inappropriate language is identified, the portions of the audio corresponding to the inappropriate language may be edited or modified such that other users do not hear the inappropriate language. As a result, toxic behavior or language within instances of gameplay may be censored—thereby enhancing the user experience and making online gaming environments safer for more vulnerable populations. In some embodiments, the inappropriate language may be reported—e.g., automatically—to the game developer or game application host in order to suspend, ban, or otherwise manage users of the system that have a proclivity for toxic behavior.
Abstract:
A virtual reality (VR) audio rendering system and method include spatializing microphone-captured real-world sounds according to a VR setting. In a game streaming system, when a player speaks through a microphone, the voice is processed by geometrical acoustic (GA) simulation configured for a virtual scene, and thereby spatialized audio effects specific to the scene are added. The GA simulation may include generating an impulse response using sound propagation simulation and dynamic HRTF-based listener directivity. When the GA-processed voice of the player is played, the local player or other fellow players can hear it as if the sound travels in the scenery and according to the geometries in the virtual scene. This mechanism can advantageously place the players' chatting in the same virtual world like built-in game audio, thereby advantageously providing enhanced immersive VR experience to users.