-
公开(公告)号:US11823453B2
公开(公告)日:2023-11-21
申请号:US17590275
申请日:2022-02-01
Applicant: Microsoft Technology Licensing, LLC
Inventor: Oron Nir , Maria Zontak , Tucker Cunningham Burns , Apar Singhal , Lei Zhang , Irit Ofer , Avner Levi , Haim Sabo , Ika Bar-Menachem , Eylon Ami , Ella Ben Tov
IPC: G06V20/40 , G06F18/214 , G06F18/24 , G06V20/70
CPC classification number: G06V20/41 , G06F18/214 , G06F18/24765 , G06V20/46 , G06V20/70 , G06V20/47
Abstract: The technology described herein is directed to a media indexer framework including a character recognition engine that automatically detects and groups instances (or occurrences) of characters in a multi-frame animated media file. More specifically, the character recognition engine automatically detects and groups the instances (or occurrences) of the characters in the multi-frame animated media file such that each group contains images associated with a single character. The character groups are then labeled and used to train an image classification model. Once trained, the image classification model can be applied to subsequent multi-frame animated media files to automatically classifying the animated characters included therein.
-
公开(公告)号:US20210056362A1
公开(公告)日:2021-02-25
申请号:US16831105
申请日:2020-03-26
Applicant: Microsoft Technology Licensing, LLC
Inventor: Oron Nir , Maria Zontak , Tucker Cunningham Burns , Apar Singhal , Lei Zhang , Irit Ofer , Avner Levi , Haim Sabo , Ika Bar-Menachem , Eylon Ami , Ella Ben Tov , Anika Zaman
Abstract: The technology described herein is directed to systems, methods, and software for indexing video. In an implementation, a method comprises identifying one or more regions of interest around target content in a frame of the video. Further, the method includes identifying, in a portion of the frame outside a region of interest, potentially empty regions adjacent to the region of interest. The method continues with identifying at least one empty region of the potentially empty regions that satisfies one or more criteria and classifying at least the one empty region as a negative sample of the target content. In some implementations, the negative sample of the target content in a set of negative samples of the target content, with which to train a machine learning model employed to identify instances of the target content.
-
公开(公告)号:US20210056313A1
公开(公告)日:2021-02-25
申请号:US16831353
申请日:2020-03-26
Applicant: Microsoft Technology Licensing, LLC
Inventor: Oron Nir , Maria Zontak , Tucker Cunningham Burns , Apar Singhal , Lei Zhang , Irit Ofer , Avner Levi , Haim Sabo , Ika Bar-Menachem , Eylon Ami , Ella Ben Tov
Abstract: The technology described herein is directed to a media indexer framework including a character recognition engine that automatically detects and groups instances (or occurrences) of characters in a multi-frame animated media file. More specifically, the character recognition engine automatically detects and groups the instances (or occurrences) of the characters in the multi-frame animated media file such that each group contains images associated with a single character. The character groups are then labeled and used to train an image classification model. Once trained, the image classification model can be applied to subsequent multi-frame animated media files to automatically classifying the animated characters included therein.
-
公开(公告)号:US10902288B2
公开(公告)日:2021-01-26
申请号:US15977517
申请日:2018-05-11
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Oron Nir , Royi Ronen , Ohad Jassin , Milan M. Gada , Mor Geva Pipek
Abstract: Aspects of the technology described herein improve an object recognition system by specifying a type of picture that would improve the accuracy of the object recognition system if used to retrain the object recognition system. The technology described herein can take the form of an improvement model that improves an object recognition model by suggesting the types of training images that would improve the object recognition model's performance. For example, the improvement model could suggest that a picture of a person smiling be used to retrain the object recognition system. Once trained, the improvement model can be used to estimate a performance score for an image recognition model given the set characteristics of a set of training of images. The improvement model can then select a feature of an image, which if added to the training set, would cause a meaningful increase in the recognition system's performance.
-
公开(公告)号:US11900961B2
公开(公告)日:2024-02-13
申请号:US17804606
申请日:2022-05-31
Applicant: Microsoft Technology Licensing, LLC
Inventor: Oron Nir , Inbal Sagiv , Maayan Yedidia , Fardau Van Neerden , Itai Norman
IPC: G10L25/69 , G10L25/21 , G10L25/06 , G10L19/16 , G10L19/008
CPC classification number: G10L25/69 , G10L19/008 , G10L19/173 , G10L25/06 , G10L25/21
Abstract: Examples of the present disclosure describe systems and methods for multichannel audio speech classification. In examples, an audio signal comprising multiple audio channels is received at a processing device. Each of the audio channels in the audio signal is transcoded to a predefined audio format. For each of the transcoded audio channels, an average power value is calculated for one or more data windows in the audio signal. A correlation value is calculated between the average power value for each audio channel and the combined average power value of the other audio channels in the audio signal. Each of the correlation values (or an aggregated correlation value for the audio channels) is then compared against a threshold value to determine whether the audio signal is to be classified as a speech-based communication. Based on the classification, an action associated with the audio signal may be performed.
-
公开(公告)号:US11768961B2
公开(公告)日:2023-09-26
申请号:US17513158
申请日:2021-10-28
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yun-Cheng Ju , Ashwarya Poddar , Royi Ronen , Oron Nir , Ami Turgman , Andreas Stolcke , Edan Hauon
IPC: G06F21/62 , G06F40/295 , G10L15/26 , G10L17/00 , G10L15/22
CPC classification number: G06F21/6254 , G06F40/295 , G10L15/26 , G10L17/00 , G10L2015/228
Abstract: Methods for speaker role determination and scrubbing identifying information are performed by systems and devices. In speaker role determination, data from an audio or text file is divided into respective portions related to speaking parties. Characteristics classifying the portions of the data for speaking party roles are identified in the portions to generate data sets from the portions corresponding to the speaking party roles and to assign speaking party roles for the data sets. For scrubbing identifying information in data, audio data for speaking parties is processed using speech recognition to generate a text-based representation. Text associated with identifying information is determined based on a set of key words/phrases, and a portion of the text-based representation that includes a part of the text is identified. A segment of audio data that corresponds to the identified portion is replaced with different audio data, and the portion is replaced with different text.
-
公开(公告)号:US11182504B2
公开(公告)日:2021-11-23
申请号:US16397738
申请日:2019-04-29
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yun-Cheng Ju , Ashwarya Poddar , Royi Ronen , Oron Nir , Ami Turgman , Andreas Stolcke , Edan Hauon
IPC: G06F21/62 , G06F40/295 , G10L15/26 , G10L17/00 , G10L15/22
Abstract: Methods for speaker role determination and scrubbing identifying information are performed by systems and devices. In speaker role determination, data from an audio or text file is divided into respective portions related to speaking parties. Characteristics classifying the portions of the data for speaking party roles are identified in the portions to generate data sets from the portions corresponding to the speaking party roles and to assign speaking party roles for the data sets. For scrubbing identifying information in data, audio data for speaking parties is processed using speech recognition to generate a text-based representation. Text associated with identifying information is determined based on a set of key words/phrases, and a portion of the text-based representation that includes a part of the text is identified. A segment of audio data that corresponds to the identified portion is replaced with different audio data, and the portion is replaced with different text.
-
公开(公告)号:US11062706B2
公开(公告)日:2021-07-13
申请号:US16397745
申请日:2019-04-29
Applicant: Microsoft Technology Licensing, LLC
Inventor: Yun-Cheng Ju , Ashwarya Poddar , Royi Ronen , Oron Nir , Ami Turgman , Andreas Stolcke , Edan Hauon
IPC: G10L15/22 , G10L15/26 , G10L21/028 , G10L17/00
Abstract: Methods for speaker role determination and scrubbing identifying information are performed by systems and devices. In speaker role determination, data from an audio or text file is divided into respective portions related to speaking parties. Characteristics classifying the portions of the data for speaking party roles are identified in the portions to generate data sets from the portions corresponding to the speaking party roles and to assign speaking party roles for the data sets. For scrubbing identifying information in data, audio data for speaking parties is processed using speech recognition to generate a text-based representation. Text associated with identifying information is determined based on a set of key words/phrases, and a portion of the text-based representation that includes a part of the text is identified. A segment of audio data that corresponds to the identified portion is replaced with different audio data, and the portion is replaced with different text.
-
公开(公告)号:US10560734B2
公开(公告)日:2020-02-11
申请号:US15492972
申请日:2017-04-20
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Ohad Jassin , Avner Levi , Oron Nir , Ori Ziv
IPC: H04N21/262 , H04N21/234 , H04N21/2343 , H04N21/266 , H04N21/482 , H04N21/845 , H04N21/2743
Abstract: In various embodiments, methods and systems for implementing video segmentation are provided. A video management system implements a video segment manager that supports generating enhanced segmented video. Enhanced segmented video is a time-based segment of video content. Enhanced segmented video is generated based on a video content cognitive index, segmentation dimensions, segmentation rules and segment reconstruction rules. The video content cognitive index is built for indexing video content. Segmentation rules are applied to the video content to break the video content into time-based segments, the time-based segments are associated with corresponding segmentation dimensions for the video content. Segment reconstruction rules are then applied to the time-based segments to reconstruct the time-based segments into enhanced segmented video. The enhanced segmented video and corresponding values of the segmentation dimensions can be leveraged as distinct portions of the video content for different types of functionality in the video management system.
-
公开(公告)号:US12169984B2
公开(公告)日:2024-12-17
申请号:US17157427
申请日:2021-01-25
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
Inventor: Oron Nir , Royi Ronen , Ohad Jassin , Milan M. Gada , Mor Geva Pipek
IPC: G06V10/70 , G06F18/21 , G06F18/211 , G06F18/214 , G06V10/774 , G06V10/776 , G06V40/16
Abstract: Aspects of the technology described herein improve an object recognition system by specifying a type of picture that would improve the accuracy of the object recognition system if used to retrain the object recognition system. The technology described herein can take the form of an improvement model that improves an object recognition model by suggesting the types of training images that would improve the object recognition model's performance. For example, the improvement model could suggest that a picture of a person smiling be used to retrain the object recognition system. Once trained, the improvement model can be used to estimate a performance score for an image recognition model given the set characteristics of a set of training of images. The improvement model can then select a feature of an image, which if added to the training set, would cause a meaningful increase in the recognition system's performance.
-
-
-
-
-
-
-
-
-