Abstract:
A digitally readable identifier located proximate to a physical object communicates a personality identifier. The personality identifier is automatically or manually associated with document content. The personality identifier is automatically associated with document content using context information that identifies the place of a physical object and/or the time the personality identifier is communicated. Once the personality identifier is associated with document content, a meta-document server enriches the document content in accordance with a predefined thematic set of document services identified by the personality identifier.
Abstract:
A system includes a meta-document, i.e., a document including content information which has a set of document service requests associated with it. A document service is a process which uses a portion of the document content as a starting point to obtain other information pertaining to that content. A scheduler selects a document service request from the set, then initiates and manages managing communication with a service provider to satisfy the selected document service. Any results received from the selected document service are integrated into the document.
Abstract:
A system includes a meta-document, i.e., a document including content information which has a set of document service requests associated with it. A document service is a process which uses a portion of the document content as a starting point to obtain other information pertaining to that content. A scheduler selects a document service request from the set, then initiates and manages managing communication with a service provider to satisfy the selected document service. Any results received from the selected document service are integrated into the document.
Abstract:
A system includes a meta-document, i.e., a document including content information which has a set of document service requests associated with it. A document service is a process which uses a portion of the document content as a starting point to obtain other information pertaining to that content. A scheduler selects a document service request from the set, then initiates and manages managing communication with a service provider to satisfy the selected document service. Any results received from the selected document service are integrated into the document.
Abstract:
A method of generating an ideographic representation of a name given in a letter based system begins with a determination of the language of original. After determining the language of origin for the name, the name is segmented into a segmentation sequence in response to the determined language of origin. A candidate representation is generated for the segmentation sequence based on ideographic representations of the segments. A corpus is used to validate the candidate representation. The corpus can be either a monolingual corpus or a multilingual corpus. The method can also include adding an additional validation step using either a monolingual corpus or a multilingual corpus, which ever was not used in the first validation step. Because of the rules governing abstracts, this abstract should not be used to construe the claims.
Abstract:
The present invention relates to an automatic translation method.When a sentence in a source language is translated into a sentence in a target language, the method comprises: a step (1) of extracting the set of sentence portions of the target language from a textual database that correspond to a total or partial translation of the source sentence to be translated; a step (2) of determining all the assemblies of these target sentence portions that overlap the source sentence; a step (3) of choosing the best assemblies according to a criterion of maximum overlap between the target sentence portions assembled in the preceding step and according to a criterion of minimizing the number of assembled elements; a step (4) of determining the target sentence by choosing the best assembly according to coherence criteria. The invention is notably applicable to the translation of texts in a rare language.More generally, it applies to translation with no previously established bilingual texts.
Abstract:
A system and method are provided for translating an input text from a natural source language to a natural target language. The system stores a database that contains a plurality of pairs of text fragments with each pair including a text fragment in the source language and a corresponding text fragment in the target language. Each text fragment contains at least one word phrase and represents a primary grammatical unit such as a sentence or a clause. For translating a word phrase, the database is queried using a phrase index of the database, where the phrase index indexes text fragments by word phrases. Word phrases are noun phrases or word phrases. Alternatively, word phrases are predicates involving at least one verb and one noun or adjective used as a noun. The system further comprises a phrase extractor for extracting a word phrase from a text fragment of an input text.
Abstract:
Text is summarized using part-of-speech (POS) data indicating parts of speech for tokens in the text. The POS data can be obtained using input text data defining the text, such as by POS tagging. The POS data can be used to obtain group data indicating groups of tokens of the text, such as verb groups and noun groups. The group data can also indicate, within each group, any tokens that meet a POS based removal criterion. The group data can be used to obtain summarized text data by removing tokens that meet the removal criterion. The original text may be obtained via scanner or video camera from a user's document, and may be recognized to obtain input text data. The summarized text may output as text or as audio pronunciation using a speech synthesizer.
Abstract:
A processor implemented method of identifying the text genre of a machine-readable, untagged text. The processor implemented method begins by generating a cue vector from the text, which represents occurrences in the text of a first set of nonstructural, surface cues, which are easily computable. Afterward, the processor determines whether the text is an instance of a first text genre using the cue vector and a weighting vector associated with the first text genre.