Systems and methods for matching audio and image information

    公开(公告)号:US11580727B2

    公开(公告)日:2023-02-14

    申请号:US17141985

    申请日:2021-01-05

    IPC分类号: G06V10/10 G06V20/10 G06F1/16

    摘要: System and methods for processing audio signals are disclosed. In one implementation, a system may comprise a wearable camera configured to capture images from an environment of a user; a microphone configured to capture sounds from the environment of the user; and a processor. The processor may be configured to receive at least one image of the plurality of images, the at least one image comprising a plurality of image portions associated with corresponding image portion timestamps; receive at least one audio signal representative of the sounds captured by the at least one microphone; identify an audio timestamp associated with a portion of the audio signal; identify an image portion from among the plurality of image portions, the image portion having an image portion timestamp associated with the audio timestamp; and analyze the image portion to identify a voice originating from an object represented in the image.

    WEARABLE SYSTEMS AND METHODS FOR SELECTIVELY READING TEXT

    公开(公告)号:US20230012272A1

    公开(公告)日:2023-01-12

    申请号:US17756991

    申请日:2020-12-10

    摘要: Systems and methods are disclosed for selectively reading text. A system may comprise an image capture device, an audio capture device, and a processor. The processor may be configured to receive images captured by the image capture device and audio signals captured by the audio capture device. The processor may analyze the image to identify text represented in the image; identify, based on the image, a structural element of the text; identify a request to read a first portion of the text associated with the structural element, the request being identified by at least one of analyzing the audio signals to detect a spoken request or detecting a gesture in the plurality of images; and present the first portion of text to the user of the wearable device.

    RESPONDING TO A USER QUERY BASED ON CAPTURED IMAGES AND AUDIO

    公开(公告)号:US20230005471A1

    公开(公告)日:2023-01-05

    申请号:US17850578

    申请日:2022-06-27

    摘要: A method for responding to a user query based on captured images and audio. An audio signal captured by at least one microphone is analyzed to determine at least one word. At least one image captured by at least one image sensor is analyzed to determine at least one identifier of at least one of a person, an object, a location, or an event represented in the image. The at least one word and the at least one identifier are stored in a database. A question is received from the user and is analyzed to determine at least one term. The database is searched to determine a correlation between the at least one term and the at least one word or between the at least one term and the at least one identifier. A response to the question is generated based on the correlation and is provided to the user.

    Processing audio and video
    65.
    发明授权

    公开(公告)号:US11546690B2

    公开(公告)日:2023-01-03

    申请号:US17241283

    申请日:2021-04-27

    摘要: A wearable device may include an image sensor configured to capture a plurality of images from an environment, a microphone configured to capture sounds from the environment, and at least one processor. The at least one processor may be programmed to receive audio signals representative of the sounds captured by the at least one microphone, and receive a first image including a representation of a first individual from among the plurality of images captured by the image sensor. The at least one processor may also be programmed to obtain a first audio segment from the audio signals using the first image. The first audio segment may include a first portion of the audio signals in which the first individual is speaking. The at least one processor may also be programmed to receive a second image including a representation of a second individual from among the plurality of images captured by the image sensor, and obtain a second audio segment from the audio signals using the second image. The second audio segment may include a second portion of the audio signals in which the second individual is speaking. The at least one processor may also be programmed to receive a third image including a representation of the first individual from among the plurality of images captured by the image sensor, and using the third image, obtain a third audio segment from the audio signals. The audio segment may include a third portion of the audio signals in which the first individual is speaking. The at least one processor may also associate the first and third audio segments with the first individual and associate the second audio segment with the second individual.

    SYSTEMS AND METHODS FOR TRANSMITTING AUDIO SIGNALS WITH VARYING DELAYS

    公开(公告)号:US20220248149A1

    公开(公告)日:2022-08-04

    申请号:US17585853

    申请日:2022-01-27

    IPC分类号: H04R25/00 H04R3/00

    摘要: A hearing aid and related systems and methods are disclosed. In one implementation, a system may comprise a microphone and a processor. The processor may be configured to receive an original audio signal representative of sounds captured by the microphone; determine that the original audio signal includes a voice of the user; process the original audio signal according to a first processing scheme to generate a first processed audio signal; transmit the first processed audio signal to a hearing interface device after a first time delay associated with the first processing scheme; determine that the original audio signal includes an additional sound; process the original audio signal according to a second processing scheme to generate a second processed audio signal; and transmit the second processed audio signal to the hearing interface device after a second time delay associated with the second processing scheme.

    SYSTEMS AND METHODS FOR PROCESSING AUDIO AND VIDEO USING A VOICE PRINT

    公开(公告)号:US20210350823A1

    公开(公告)日:2021-11-11

    申请号:US17315780

    申请日:2021-05-10

    摘要: A wearable device for processing audio signals may include a microphone configured to capture sounds from an environment of a user and at least one processor. The processor may be programmed to receive first audio signals captured by the microphone during a first time period during which the user is in a location, and obtain an audio segment from the first audio signals. The audio segment may include a portion of the first audio signals in which an individual is speaking. The processor may also be programmed to generate a voice print of the individual using at least the audio segment, and receive second audio signals representative of additional sounds captured by the microphone. The additional sounds may include sounds made by the individual. The second audio signals may be at least one of audio signals captured by the microphone within a predetermined time period after the first time period, or audio signals captured by the microphone while the user is in the location. The at least one processor may also be programmed to process the second audio signals using the generated voice print.

    SYSTEMS AND METHODS FOR MATCHING AUDIO AND IMAGE INFORMATION

    公开(公告)号:US20210209362A1

    公开(公告)日:2021-07-08

    申请号:US17141985

    申请日:2021-01-05

    IPC分类号: G06K9/00 G06F1/16 G06K9/20

    摘要: System and methods for processing audio signals are disclosed. In one implementation, a system may comprise a wearable camera configured to capture images from an environment of a user; a microphone configured to capture sounds from the environment of the user; and a processor. The processor may be configured to receive at least one image of the plurality of images, the at least one image comprising a plurality of image portions associated with corresponding image portion timestamps; receive at least one audio signal representative of the sounds captured by the at least one microphone, identify an audio timestamp associated with a portion of the audio signal; identify an image portion from among the plurality of image portions, the image portion having an image portion timestamp associated with the audio timestamp; and analyze the image portion to identify a voice originating from an object represented in the image.