Systems and methods for active speaker detection

    公开(公告)号:US11983923B1

    公开(公告)日:2024-05-14

    申请号:US18063107

    申请日:2022-12-08

    Applicant: NETFLIX, INC.

    Abstract: The disclosed computer-implemented method may include receiving, as input, an audio/video data object; isolating a video stream of a visible potential speaker over a plurality of frames of the audio/video data object; isolating an audio stream over the plurality of frames; providing the isolated video stream and the isolated audio stream to a machine learning model trained with contrastive learning, the contrastive learning using (i) a corpus of video segments of visible speakers with corresponding original audio for positive samples; and (ii) a corpus of video segments of visible speakers with corresponding dubbed audio for negative samples; and evaluating a match between the isolated audio stream and the isolated video stream based at least in part on an output of the machine learning model. Various other methods, systems, and computer-readable media are also disclosed.

    INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

    公开(公告)号:US20240112450A1

    公开(公告)日:2024-04-04

    申请号:US18537951

    申请日:2023-12-13

    CPC classification number: G06V10/774 G06V10/764 G06V2201/03 G06V2201/10

    Abstract: An information processing device of the present invention is capable of collaboration with a learning device to determine whether an image group that has been obtained in time series by the first endoscope is an image group obtained at a first time or at a second time, and to create a first inference model for image feature determination of images for the first endoscope by performing learning with results of having performed annotation on the image group that was obtained at the second time as training data, the information processing device comprising at least one or a plurality of classifying processors that classify image groups constituting training data candidates, within an image group from the first endoscope that has been newly acquired, or an image group from a second endoscope, using the image group that has been obtained at the first time, when the first inference model was created.

    PREDICTING PERFORMANCE OF CREATIVE CONTENT
    277.
    发明公开

    公开(公告)号:US20240112217A1

    公开(公告)日:2024-04-04

    申请号:US17937342

    申请日:2022-09-30

    CPC classification number: G06Q30/0242 G06F3/0482 G06V20/30 G06V2201/10

    Abstract: Methods and systems for predicting performance of creative content are disclosed. Exemplary implementations may: receive a collection of images; provide a context to a user; serially cause display of pairs of images on a computer interface; receive user responses indicating which image of each pair is preferred given the context; determine a resonance value for each image based on a number of times the user responses indicate each image is preferred when displayed in a pair of images; determine a confidence score for each image; generate one or more models for predicting image performance based on one or more of the resonance value and the confidence score for each image; receive a plurality of candidate images; determine, using at least one model, a first metric set for each candidate; and cause display of a listing of the candidate images, the listing including the first metric set for each candidate image.

    INTEGRATING MODEL REUSE WITH MODEL RETRAINING FOR VIDEO ANALYTICS

    公开(公告)号:US20240096063A1

    公开(公告)日:2024-03-21

    申请号:US18078402

    申请日:2022-12-09

    CPC classification number: G06V10/7715 G06V2201/10

    Abstract: Systems and methods are provided for reusing and retraining an image recognition model for video analytics. The image recognition model is used for inferring a frame of video data that is captured at edge devices. The edge devices periodically or under predetermined conditions transmits a captured frame of video data to perform inferencing. The disclosed technology is directed to select an image recognition model from a model store for reusing or for retraining. A model selector uses a gating network model to determine ranked candidate models for validation. The validation includes iterations of retraining the image recognition model and stopping the iteration when a rate of improving accuracy by retraining becomes smaller than the previous iteration step. Retraining a model includes generating reference data using a teacher model and retraining the model using the reference data. Integrating reuse and retraining of models enables improvement in accuracy and efficiency.

Patent Agency Ranking