-
公开(公告)号:US10185713B1
公开(公告)日:2019-01-22
申请号:US14867932
申请日:2015-09-28
Applicant: Amazon Technologies, Inc.
Inventor: Michael Denkowski , Alon Lavie , Gregory Alan Hanneman , Austin Matthews , Matthew Ryan Fiorillo , Robert Thomas Olszewski , Christopher James Dyer , William Joseph Kaper , Alexandre Alexandrovich Klementiev , Gavin R. Jewell
Abstract: Technologies are disclosed herein for statistical machine translation. In particular, the disclosed technologies include extensions to conventional machine translation pipelines: the use of multiple domain-specific and non-domain-specific dynamic language translation models and language models; cluster-based language models; and large-scale discriminative training. Incremental update technologies are also disclosed for use in updating a machine translation system in four areas: word alignment; translation modeling; language modeling; and parameter estimation. A mechanism is also disclosed for training and utilizing a runtime machine translation quality classifier for estimating the quality of machine translations without the benefit of reference translations. The runtime machine translation quality classifier is generated in a manner to offset imbalances in the number of training instances in various classes, and to assign a greater penalty to the misclassification of lower-quality translations as higher-quality translations than to misclassification of higher-quality translations as lower-quality translations.
-
2.
公开(公告)号:US10437933B1
公开(公告)日:2019-10-08
申请号:US15238101
申请日:2016-08-16
Applicant: Amazon Technologies, Inc.
Inventor: Ann Clifton , Michael Denkowski , Alon Lavie
IPC: G06F17/28
Abstract: A machine translation system capable of clustering training data and performing dynamic domain adaptation is disclosed. An unsupervised domain clustering process is utilized to identify domains in general training data that can include in-domain training data and out-of-domain training data. Segments in the general training data are then assigned to the domains in order to create domain-specific training data. The domain-specific training data is then utilized to create domain-specific language models, domain-specific translation models, and domain-specific model weights for the domains. An input segment to be translated can be assigned to a domain at translation time. The domain-specific model weights for the assigned domain can be utilized to translate the input segment.
-
公开(公告)号:US10268684B1
公开(公告)日:2019-04-23
申请号:US14868166
申请日:2015-09-28
Applicant: Amazon Technologies, Inc.
Inventor: Michael Denkowski , Alon Lavie , Gregory Alan Hanneman , Matthew Ryan Fiorillo , Laura Josephine Kieras , Robert Thomas Olszewski , William Joseph Kaper , Alexandre Alexandrovich Klementiev , Gavin Richard Jewell
Abstract: Technologies are disclosed herein for statistical machine translation. In particular, the disclosed technologies include extensions to conventional machine translation pipelines: the use of multiple domain-specific and non-domain-specific dynamic language translation models and language models; cluster-based language models; and large-scale discriminative training. Incremental update technologies are also disclosed for use in updating a machine translation system in four areas: word alignment; translation modeling; language modeling; and parameter estimation. A mechanism is also disclosed for training and utilizing a runtime machine translation quality classifier for estimating the quality of machine translations without the benefit of reference translations. The runtime machine translation quality classifier is generated in a manner to offset imbalances in the number of training instances in various classes, and to assign a greater penalty to the misclassification of lower-quality translations as higher-quality translations than to misclassification of higher-quality translations as lower-quality translations.
-
公开(公告)号:US09959271B1
公开(公告)日:2018-05-01
申请号:US14868083
申请日:2015-09-28
Applicant: Amazon Technologies, Inc.
Inventor: Kartik Goyal , Alon Lavie , Michael Denkowski , Gregory Alan Hanneman , Matthew Ryan Fiorillo , Robert Thomas Olszewski , Ehud Hershkovich , William Joseph Kaper , Alexandre Alexandrovich Klementiev , Gavin R. Jewell
CPC classification number: G06F17/2818 , G06F17/2854
Abstract: Technologies are disclosed herein for statistical machine translation. In particular, the disclosed technologies include extensions to conventional machine translation pipelines: the use of multiple domain-specific and non-domain-specific dynamic language translation models and language models; cluster-based language models; and large-scale discriminative training. Incremental update technologies are also disclosed for use in updating a machine translation system in four areas: word alignment; translation modeling; language modeling; and parameter estimation. A mechanism is also disclosed for training and utilizing a runtime machine translation quality classifier for estimating the quality of machine translations without the benefit of reference translations. The runtime machine translation quality classifier is generated in a manner to offset imbalances in the number of training instances in various classes, and to assign a greater penalty to the misclassification of lower-quality translations as higher-quality translations than to misclassification of higher-quality translations as lower-quality translations.
-
-
-