Abstract:
Systems and methods are provided for analyzing and visualizing document corpuses based on user-defined semantic features, including initializing a Natural Language Inference (NLI) classification model pre-trained on a diverse linguistic dataset, analyzing a corpus of textual documents with semantic features described in natural language by a user. For each semantic feature, a classification process is executed using the NLI model to assess implication strength between sentences in the documents and the semantic feature, the classification process including a confidence scoring mechanism to quantify implication strength. Implication scores can be aggregated for each of the documents to form a composite semantic implication profile, and a dimensionality reduction technique, can be applied to the composite semantic implication profiles of each of the documents to generate a two-dimensional semantic space representation. The two-dimensional semantic space representation can be dynamically adjusted based on iterative user feedback regarding the accuracy of semantic implication assessments.
Abstract:
A method for neural network training is provided. The method inputs a training set of textual claims, lists of evidence including gold evidence chains, and claim labels labelling the evidence with respect to the textual claims. The claim labels include refutes, supports, and not enough information (NEI). The method computes an initial set of document retrievals for each of the textual claims. The method also includes computing an initial set of page element retrievals including sentence retrievals from the initial set of document retrievals for each of the textual claims. The method creates, from the training set of textual claims, a Leave Out Training Set which includes input texts and target texts relating to the labels. The method trains a sequence-to-sequence neural network to generate new target texts from new input texts using the Leave Out Training Set.
Abstract:
Systems and methods are disclosed to answer free form questions using recursive neural network (RNN) by defining feature representations at every node of a parse trees of questions and supporting sentences, when applied recursively, starting with token vectors from a neural probabilistic language model; and extracting answers to arbitrary natural language questions from supporting sentences.
Abstract:
Systems and methods are disclosed for classifying histological tissues or specimens with two phases. In a first phase, the method includes providing off-line training using a processor during which one or more classifiers are trained based on examples, including: finding a split of features into sets of increasing computational cost, assigning a computational cost to each set; training for each set of features a classifier using training examples; training for each classifier, a utility function that scores a usefulness of extracting the next feature set for a given tissue unit using the training examples. In a second phase, the method includes applying the classifiers to an unknown tissue sample with extracting the first set of features for all tissue units; deciding for which tissue unit to extract the next set of features by finding the tissue unit for which a score: S=U−h*C is maximized, where U is a utility function, C is a cost of acquiring the feature and h is a weighting parameter; iterating until a stopping criterion is met or no more feature can be computed; and issuing a tissue-level decision based on a current state.
Abstract:
A method for neural network training is provided. The method inputs a training set of textual claims, lists of evidence including gold evidence chains, and claim labels labelling the evidence with respect to the textual claims. The claim labels include refutes, supports, and not enough information (NEI). The method computes an initial set of document retrievals for each of the textual claims. The method also includes computing an initial set of page element retrievals including sentence retrievals from the initial set of document retrievals for each of the textual claims. The method creates, from the training set of textual claims, a Leave Out Training Set which includes input texts and target texts relating to the labels. The method trains a sequence-to-sequence neural network to generate new target texts from new input texts using the Leave Out Training Set.
Abstract:
Systems and methods for opinion summarization are provided for extracting and counting frequent opinions. The method includes performing a frequency analysis on an inputted list of product reviews for a single item and an inputted corpus of reviews for a product category containing the single item to identify one or more frequent phrases; fine tuning a pretrained transformer model to produce a trained neural network claim generator model, and generating a trained neural network opposing claim generator model based on the trained neural network claim generator model. The method further includes generating a pair of opposing claims for each of the one or more frequent phrases, wherein a generated positive claim is entailed by the product reviews for the single item and a negative claim refutes the positive claim, and outputting a count of sentences entailing the positive claim and a count of sentences entailing the negative claim.
Abstract:
Systems and methods for matching job descriptions with job applicants is provided. The method includes allocating each of one or more job applicants' curriculum vitae (CV) into sections; applying max pooled word embedding to each section of the job applicants' CVs; using concatenated max-pooling and average-pooling to compose the section embeddings into an applicant's CV representation; allocating each of one or more job position descriptions into specified sections; applying max pooled word embedding to each section of the job position descriptions; using concatenated max-pooling and average-pooling to compose the section embeddings into a job representation; calculating a cosine similarity between each of the job representations and each of the CV representations to perform job-to-applicant matching; and presenting an ordered list of the one or more job applicants or an ordered list of the one or more job position descriptions to a user.
Abstract:
Disclosed is a computer implemented method for fully automated tissue diagnosis that trains a region of interest (ROI) classifier in a supervised manner, wherein labels are given only at a tissue level, the training using a multiple-instance learning variant of backpropagation, and trains a tissue classifier that uses the output of the ROI classifier. For a given tissue, the method finds ROIs, extracts feature vectors in each ROI, applies the ROI classifier to each feature vector thereby obtaining a set of probabilities, provides the probabilities to the tissue classifier and outputs a final diagnosis for the whole tissue.
Abstract:
Disclosed is a computer implemented method for fully automated tissue diagnosis that trains a region of interest (ROI) classifier in a supervised manner, wherein labels are given only at a tissue level, the training using a multiple-instance learning variant of backpropagation, and trains a tissue classifier that uses the output of the ROI classifier. For a given tissue, the method finds ROIs, extracts feature vectors in each ROI, applies the ROI classifier to each feature vector thereby obtaining a set of probabilities, provides the probabilities to the tissue classifier and outputs a final diagnosis for the whole tissue.
Abstract:
Methods and systems for language processing include augmenting an original training dataset to produce an augmented dataset that includes a first example that includes a first scrambled replacement for a first word and a definition of the first word, and a second example that includes a second scrambled replacement for the first word and a definition of an alternative to the first word. A neural network classifier is trained using the augmented dataset.