Abstract:
Embodiments described herein locate objects in input. Embodiments first parse the input into a form that can be used to perform the analysis required to construct a set of one or more objects. Embodiments then form, when possible, object character strings by using the grammatical values of the underlying terms. The set of object character strings can be used in a variety of textual analysis procedures, such as search, comparisons, and other combinatorial analysis that requires the use of objects in performing tasks related to an information repository of documents, files, messages, etc.
Abstract:
A processor receives a string of binary data that represents an initial phrase that includes multiple words and is associated with a specific category. The processor removes one or more letters from an end of a word in the initial phrase to form an initial truncated version of the phrase. The processor runs a TF-IDF algorithm on the initial truncated version of the phrase, and lemmatizes subsequent truncated versions of the initial phrase by recursively removing remaining letters from the end of the word. The processor runs the TF-IDF algorithm on subsequent truncated versions of the initial truncated version of the initial phrase until a highest TF-IDF value is identified. The processor defines a breadth of a lemma for a lexeme based on the specific category of the phrase, and assigns the specific truncated version having the highest TF-IDF value to the specific category.
Abstract:
The information replying method includes: receiving to-be-replied information, where the to-be-replied information includes text content and contact information; searching a database for corresponding dialog style information according to the text content and the contact information; performing preprocessing on the text content, where the preprocessing includes word segmentation processing and stop word removal processing; and searching, according to data that has undergone the preprocessing, the database corresponding to the dialog style information, to determine reply information.
Abstract:
A confidentiality preserving system and method for performing a rank-ordered search and retrieval of contents of a data collection. The system includes at least one computer system including a search and retrieval algorithm using term frequency and/or similar features for rank-ordering selective contents of the data collection, and enabling secure retrieval of the selective contents based on the rank-order. The search and retrieval algorithm includes a baseline algorithm, a partially server oriented algorithm, and/or a fully server oriented algorithm. The partially and/or fully server oriented algorithms use homomorphic and/or order preserving encryption for enabling search capability from a user other than an owner of the contents of the data collection. The confidentiality preserving method includes using term frequency for rank-ordering selective contents of the data collection, and retrieving the selective contents based on the rank-order.
Abstract:
A processor receives a string of binary data that represents an initial phrase that includes multiple words and is associated with a specific category. The processor removes one or more letters from an end of a word in the initial phrase to form an initial truncated version of the phrase. The processor runs a TF-IDF algorithm on the initial truncated version of the phrase, and lemmatizes subsequent truncated versions of the initial phrase by recursively removing remaining letters from the end of the word. The processor runs the TF-IDF algorithm on subsequent truncated versions of the initial truncated version of the initial phrase until a highest TF-IDF value is identified. The processor defines a breadth of a lemma for a lexeme based on the specific category of the phrase, and assigns the specific truncated version having the highest TF-IDF value to the specific category.
Abstract:
A processor-implemented method, system, and/or computer program product lemmatizes a phrase for a specific category. An initial phrase, which is associated with a specific category, is received by a processor. The processor removes a last letter or set of letters from a word in the initial phrase to form an initial truncated version of the phrase, and then runs a term frequency-inverse document frequency (TF-IDF) algorithm on the initial truncated version of the phrase. The processor lemmatizes subsequent truncated versions of the initial phrase, and then runs the TF-IDF algorithm until a highest TF-IDF value is identified for a specific truncated version of the initial phrase when compared to TF-IDF values of other truncated versions of the initial phrase. The specific truncated version of the initial phrase that is associated with the highest TF-IDF value is then associated with the specific category.
Abstract:
A question-answering device, a question-answering method, and a question-answering program that can obtain an answer to an inputted query with high probability are described. A score calculation element 305 determines a matching degree between the group of the style and the topic of an inputted query and the group of the style and the topic of the query of question-answer pairs. A search result presentation element 306 narrows the question-answer pairs on the basis of the matching degree.
Abstract:
Systems and methods for building an interface that receives and responds to varied natural language expressions. In an embodiment, the system receives a natural language expression in text or audio, and translates it by building at least one data structure which reflects the concepts expressed in the natural language expression. The data structure may comprise a symbol representing each concept. In an embodiment, a parser utilizes the data structure to parse language expressions to single concept symbols that represent the meaning of the expressions. Response actions may also be performed in response to the parsed language expressions. In addition, a parser may receive a single concept symbol, and generate one or many natural language expressions of the meaning of the concept symbol. Furthermore, the system may be configured to understand the local meaning of words and phrases.
Abstract:
A processor-implemented method, system, and/or computer program product lemmatizes a phrase for a specific category. An initial phrase, which is associated with a specific category, is received by a processor. The processor removes a last letter or set of letters from a word in the initial phrase to form an initial truncated version of the phrase, and then runs a term frequency-inverse document frequency (TF-IDF) algorithm on the initial truncated version of the phrase. The processor lemmatizes subsequent truncated versions of the initial phrase, and then runs the TF-IDF algorithm until a highest TF-IDF value is identified for a specific truncated version of the initial phrase when compared to TF-IDF values of other truncated versions of the initial phrase. The specific truncated version of the initial phrase that is associated with the highest TF-IDF value is then associated with the specific category.
Abstract:
A stopword detection component detects stopwords (also stop-phrases) in search queries input to keyword-based information retrieval systems. Potential stopwords are initially identified by comparing the terms in the search query to a list of known stopwords. Context data is then retrieved based on the search query and the identified stopwords. In one implementation, the context data includes documents retrieved from a document index. In another implementation, the context data includes categories relevant to the search query. Sets of retrieved context data are compared to one another to determine if they are substantially similar. If the sets of context data are substantially similar, this fact may be used to infer that the removal of the potential stopword(s) is not material to the search. If the sets of context data are not substantially similar, the potential stopword can be considered material to the search and should not be removed from the query.