-
公开(公告)号:US20200084055A1
公开(公告)日:2020-03-12
申请号:US16684949
申请日:2019-11-15
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Jonathan F. Brunn , Rachael M.H. Dickens , Jonathan Dunne , Ethan A. Geyer , Liam S. HARPUR , Bo Jiang , ANDREW PENROSE , Naama Tepper
Abstract: A method, computer system, and computer program product for calculating a group chat segment duration is provided. The embodiment may include capturing a plurality of group chat messages from a chat message repository. The embodiment may also include determining a probability distribution based on analyzing the captured group chat messages over a time vector. The embodiment may further include calculating a time parameter based on the determined probability distribution. The embodiment may also include calculating a content parameter based on one or more relevant chat topics. The embodiment may further include calculating an attendee parameter based on a plurality of attendees and one or more attendee associations. The embodiment may also include determining a chat duration prediction based on the calculated time parameter, the calculated content parameter, and the calculated attendee parameter.
-
公开(公告)号:US20190121907A1
公开(公告)日:2019-04-25
申请号:US15791200
申请日:2017-10-23
Applicant: International Business Machines Corporation
Inventor: Jonathan F. Brunn , Daniel Dulaney , Ami Dewar , Ethan A. Geyer , Bo Jiang , Rachael Dickens , Scott E. Chapman , Thomas Blanchflower , Naama Tepper
Abstract: Message grouping using temporal and multi-factor similarity includes grouping multiple messages of a corpus in a group messaging system into a number of message bursts. Each message burst includes a number of messages that have a temporal relationship. Multiple of the number of message bursts are grouped into a message cluster. The grouping is based on a similarity of the number of message bursts as defined by multiple features of the message bursts.
-
公开(公告)号:US11222058B2
公开(公告)日:2022-01-11
申请号:US15840559
申请日:2017-12-13
Applicant: International Business Machines Corporation
Inventor: Ethan A. Geyer , Jonathan F. Brunn , Jonathan Dunne , Naama Tepper
Abstract: Familiarity-based text classification framework selection is described. A list of participants in an electronic message thread is selected. For each pairing of participants, a familiarity score is determined based on a number of criteria. A familiarity model is formed based on multiple familiarity scores and a text classification framework for the electronic message thread is selected based on the familiarity model.
-
4.
公开(公告)号:US20210350076A1
公开(公告)日:2021-11-11
申请号:US16870917
申请日:2020-05-09
Applicant: International Business Machines Corporation
Inventor: Amir Kantor , Ateret Anaby Tavor , Boaz Carmeli , Esther Goldbraich , GEORGE KOUR , Segev Shlomov , Naama Tepper , Naama Zwerdling
IPC: G06F40/279 , G06N20/00 , G06N5/04
Abstract: Embodiments of the present systems and methods may provide techniques for augmenting textual data that may be used for textual classification tasks. Embodiments of such techniques may provide the capability to synthesize labeled data to improve text classification tasks. Embodiments may be specifically useful when only a small amount of data is available, and provide improved performance in such cases. For example, in an embodiment, a method implemented in a computer system may comprise a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, and the method may comprise fine-tuning a language model using a training dataset, synthesizing a plurality of samples using the fine-tuned language model, filtering the plurality of synthesized samples, and generating an augmented training dataset comprising the training dataset and the filtered plurality of synthesized sentences.
-
公开(公告)号:US11057230B2
公开(公告)日:2021-07-06
申请号:US16684949
申请日:2019-11-15
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Jonathan F. Brunn , Rachael M. H. Dickens , Jonathan Dunne , Ethan A. Geyer , Liam S. Harpur , Bo Jiang , Andrew Penrose , Naama Tepper
Abstract: A method, computer system, and computer program product for calculating a group chat segment duration is provided. The embodiment may include capturing a plurality of group chat messages from a chat message repository. The embodiment may also include determining a probability distribution based on analyzing the captured group chat messages over a time vector. The embodiment may further include calculating a time parameter based on the determined probability distribution. The embodiment may also include calculating a content parameter based on one or more relevant chat topics. The embodiment may further include calculating an attendee parameter based on a plurality of attendees and one or more attendee associations. The embodiment may also include determining a chat duration prediction based on the calculated time parameter, the calculated content parameter, and the calculated attendee parameter.
-
公开(公告)号:US20190179955A1
公开(公告)日:2019-06-13
申请号:US15840559
申请日:2017-12-13
Applicant: International Business Machines Corporation
Inventor: Ethan A. Geyer , Jonathan F. Brunn , Jonathan Dunne , Naama Tepper
Abstract: Familiarity-based text classification framework selection is described. A list of participants in an electronic message thread is selected. For each pairing of participants, a familiarity score is determined based on a number of criteria. A familiarity model is formed based on multiple familiarity scores and a text classification framework for the electronic message thread is selected based on the familiarity model.
-
公开(公告)号:US20190103982A1
公开(公告)日:2019-04-04
申请号:US15720265
申请日:2017-09-29
Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor: Jonathan F. Brunn , Rachael M.H. Dickens , Jonathan Dunne , Ethan A. Geyer , Liam S. HARPUR , Bo Jiang , ANDREW PENROSE , Naama Tepper
Abstract: A method, computer system, and computer program product for calculating a group chat segment duration is provided. The embodiment may include capturing a plurality of group chat messages from a chat message repository. The embodiment may also include determining a probability distribution based on analyzing the captured group chat messages over a time vector. The embodiment may further include calculating a time parameter based on the determined probability distribution. The embodiment may also include calculating a content parameter based on one or more relevant chat topics. The embodiment may further include calculating an attendee parameter based on a plurality of attendees and one or more attendee associations. The embodiment may also include determining a chat duration prediction based on the calculated time parameter, the calculated content parameter, and the calculated attendee parameter.
-
公开(公告)号:US11797516B2
公开(公告)日:2023-10-24
申请号:US17317922
申请日:2021-05-12
Applicant: International Business Machines Corporation
Inventor: Naama Tepper , Esther Goldbraich , Boaz Carmeli , Naama Zwerdling , George Kour , Ateret Anaby Tavor
CPC classification number: G06F16/2365 , G06N20/00
Abstract: Balancing an imbalanced dataset, by: Receiving a balancing policy and the imbalanced dataset. Performing initial adjustment of the imbalanced dataset to comply with the balancing policy, by: oversampling one or more underrepresented classes, and, if one or more of the classes are overrepresented, undersampling them. Operating a generative machine learning model to generate samples for the one or more underrepresented classes, based on the initially-adjusted dataset. Operating a machine learning classification model to label the generated samples with class labels corresponding to the one or more underrepresented classes. Selecting some of the generated samples which, according to the labeling, have a relatively high probability of preserving their class labels. Composing a balanced dataset which complies with the balancing policy and comprises: the samples belonging to the one or more underrepresented classes, the selected generated samples, and an undersampling of the samples belonging to the one or more overrepresented classes.
-
9.
公开(公告)号:US11526667B2
公开(公告)日:2022-12-13
申请号:US16870917
申请日:2020-05-09
Applicant: International Business Machines Corporation
Inventor: Amir Kantor , Ateret Anaby Tavor , Boaz Carmeli , Esther Goldbraich , George Kour , Segev Shlomov , Naama Tepper , Naama Zwerdling
IPC: G06F40/279 , G06N5/04 , G06N20/00
Abstract: Embodiments of the present systems and methods may provide techniques for augmenting textual data that may be used for textual classification tasks. Embodiments of such techniques may provide the capability to synthesize labeled data to improve text classification tasks. Embodiments may be specifically useful when only a small amount of data is available, and provide improved performance in such cases. For example, in an embodiment, a method implemented in a computer system may comprise a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, and the method may comprise fine-tuning a language model using a training dataset, synthesizing a plurality of samples using the fine-tuned language model, filtering the plurality of synthesized samples, and generating an augmented training dataset comprising the training dataset and the filtered plurality of synthesized sentences.
-
公开(公告)号:US20220374410A1
公开(公告)日:2022-11-24
申请号:US17317922
申请日:2021-05-12
Applicant: International Business Machines Corporation
Inventor: Naama Tepper , Esther Goldbraich , Boaz Carmeli , Naama Zwerdling , GEORGE KOUR , Ateret Anaby Tavor
Abstract: Balancing an imbalanced dataset, by: Receiving a balancing policy and the imbalanced dataset. Performing initial adjustment of the imbalanced dataset to comply with the balancing policy, by: oversampling one or more underrepresented classes, and, if one or more of the classes are overrepresented, undersampling them. Operating a generative machine learning model to generate samples for the one or more underrepresented classes, based on the initially-adjusted dataset. Operating a machine learning classification model to label the generated samples with class labels corresponding to the one or more underrepresented classes. Selecting some of the generated samples which, according to the labeling, have a relatively high probability of preserving their class labels. Composing a balanced dataset which complies with the balancing policy and comprises: the samples belonging to the one or more underrepresented classes, the selected generated samples, and an undersampling of the samples belonging to the one or more overrepresented classes.
-
-
-
-
-
-
-
-
-