Abstract:
Disclosed are a method and a device for expanding data of a bilingual corpus. The method for expanding data of a bilingual corpus includes: searching, in a source language-pivot language corpus, for at least one first pivot language phrase semantically matching a first source language phrase; searching, in the source language-pivot language corpus, for at least one second source language phrase semantically matching each of the first pivot language phrases to form a source language phrase set by the second source language phrases; searching, in a pivot language-target language corpus, for at least one first target language phrase semantically matching each of the first pivot language phrases to form a target language phrase set by the first target language phrases; combining the second source language phrases in the source language phrase set with the first target language phrases in the target language phrase set, so as to form at least one phrase pair in which a source language phrase and a target language phrase semantically match; and storing the formed at least one phrase pair in which the source language phrase and the target language phrase semantically match into a source language-target language corpus. Data in a bilingual corpus is expanded, so that the problem of data sparseness in the bilingual corpus is solved.
Abstract:
Disclosed are a method and a device for expanding data of a bilingual corpus. The method for expanding data of a bilingual corpus includes: searching, in a source language-pivot language corpus, for at least one first pivot language phrase semantically matching a first source language phrase; searching, in the source language-pivot language corpus, for at least one second source language phrase semantically matching each of the first pivot language phrases to form a source language phrase set by the second source language phrases; searching, in a pivot language-target language corpus, for at least one first target language phrase semantically matching each of the first pivot language phrases to form a target language phrase set by the first target language phrases; combining the second source language phrases in the source language phrase set with the first target language phrases in the target language phrase set, so as to form at least one phrase pair in which a source language phrase and a target language phrase semantically match; and storing the formed at least one phrase pair in which the source language phrase and the target language phrase semantically match into a source language-target language corpus. Data in a bilingual corpus is expanded, so that the problem of data sparseness in the bilingual corpus is solved.