Abstract:
A search query of a search word entered by the user is received, the received search queries are stored in accordance with reception order, a preceding search query whose reception order is earlier than that of the received search query is extracted on the basis of a preset search query extracting condition, a preceding search word constructing the extracted preceding search query and a search word constructing the received search query are stored as a character string set, a character string set having the search word which is the same or similar to the preceding search word is extracted in accordance with a preset character string set extraction start condition (S51), a character set as a related word is specified from the extracted character string set on the basis of a preset registration condition (S53), and the specified character string set is registered as related words into a related-word database (S54).
Abstract:
Systems and methods are provided for providing domain name suggestions based on user preferences and terms extracted from one or more information sources. Terms may be continuously extracted from information sources and used to generate domain name suggestions. Generated domain name suggestions may then be delivered to customers. The systems and methods may utilize customer preferences in providing the domain name suggestions, such as preferences as to information sources or topics of interest. The systems and methods may be self-learning, taking historical domain name registration information into account to improve the domain name suggestions.
Abstract:
A clustering-based approach to data standardization is provided. Certain embodiments take as input a plurality of addresses, identify one or more features of the addresses, cluster the addresses based on the one or more features, utilize the cluster(s) to provide a data-based context useful in identifying one or more synonyms for elements contained in the address(es), and standardize the address(es) to an acceptable format, with one or more synonyms and/or other elements being added to or taken away from the input address(es) as part of the standardization process.
Abstract:
Natural language vocabulary generation and usage techniques are described. In one or more implementations, one or more search results are mined for a domain to determine a frequency at which words occur in the one or more search results, respectively. A set of the words is selected based on the determined frequency. A sense is assigned to each of the selected set of the words that identifies a part-of-speech for a respective word. A vocabulary is then generated that includes the selected set of the words and a respective said sense, the vocabulary configured for use in natural language processing associated with the domain.
Abstract:
Embodiments are directed to refining hierarchies in object-oriented models. A method includes providing a business object model in the form of an object-oriented model having one or more members with multiple distinct verbalizations and identifying distinct verbalizations of a given business object model member. The method also includes reviewing existing rules of the business object model to produce mappings of the distinct verbalizations and any attributes or operations used in conjunction with the distinct verbalizations of members of the business object model and analysing the mappings to identify patterns of use of the distinct verbalizations. The method further includes categorising a distinct verbalization as a superclass or subclass.
Abstract:
A document-term matrix may be generated based on a corpus. A term representation matrix may be generated based on modifying a plurality of elements of the document-term matrix based on antonym information included in the corpus. Similarities may be determined based on a plurality of elements of the term representation matrix.
Abstract:
The present invention is a method and system for enhancing the output of standard thesaurus databases. The user requires little knowledge of the meaning of a word for which he is seeking related words. The system requires at least one starter word, and it returns all synonyms regardless of meaning from multiple databases. The synonyms are then arranged in a two dimensional array, and sorted according to frequency. The user then scans the list, starting from the top, and selects one or more entries from the sorted frequency array, and the re-runs. After several cycles of running and selecting new entries, the related words having the highest relevance to the searcher will rise to top of the frequency array. The end result is a group of related words having one or more meanings, and also having a relationship to a single concept being sought by the user.
Abstract:
According to one embodiment, a speech translation apparatus includes a receiving unit, a first recognition unit, a second recognition unit, a first generation unit, a translation unit, a second generation unit, a synthesis unit. The receiving unit is configured to receive a speech in a first language and convert to speech signal. The first recognition unit is configured to perform speech recognition and generate a transcription. The second recognition unit is configured to recognize which emotion type is included in the speech and generate emotion identification information including recognized emotion type(s). The first generation unit is configured to generate a filtered sentence. The translation unit is configured to generate a translation of the filtered sentence in the first language in a second language. The second generation unit is configured to generate an insertion sentence. The synthesis unit is configured to convert the filtered and the insertion sentences into speech signal.
Abstract:
A system for supervised automatic code generation and tuning for natural language interaction applications, comprising a build environment comprising a developer user interface, automated coding tools, automated testing tools, and automated optimization tools, and an analytics framework software module. Text samples are imported into the build environment and automated clustering is performed to assign them to a plurality of input groups, each input group comprising a plurality of semantically related inputs. Language recognition rules are generated by automated coding tools. Automated testing tools carry out automated testing of language recognition rules and generate recommendations for tuning language recognition rules. The analytics framework performs analysis of interaction log files to identify problems in a candidate natural language interaction application. Optimizations to the candidate natural language interaction application are carried out and an optimized natural language interaction application is deployed into production and stored in the solution data repository.
Abstract:
The invention enables creation of grammar networks that can regulate, control, and define the content and scope of human-machine interaction in natural language voice user interfaces (NLVUI). The invention enables phrase-based modeling of generic structures of verbal interaction to be used for the purpose of automating part of the design of such grammar networks. Most particularly, the invention enables such grammar networks to be used in providing a voice-controlled user interface to human readable text data that is also machine-readable (such as a Web page, a word processing document, a PDF document, or a spreadsheet).