Unsupervised hypernym induction machine learning
摘要:
Training a machine learning model such as a neural network, which can automatically extract a hypernym from unstructured data, is disclosed. A preliminary candidate list of hyponym-hypernym pairs can be parsed from the corpus. A preliminary super-term—sub-term glossary can be generated from the corpus, the preliminary super-term—sub-term glossary containing one or more super-term—sub-term pairs. A super-term—sub-term pair can be filtered from the preliminary super-term—sub-term glossary, responsive to detecting that the super-term—sub-term pair is not a candidate for hyponym-hypernym pair, to generate a final super-term—sub-term glossary. The preliminary candidate list of hyponym-hypernym pairs and the final super-term—sub-term glossary can be combined to generate a final list of hyponym-hypernym pairs. An artificial neural network can be trained using the final list of hyponym-hypernym pairs as a training data set, the artificial neural network trained to identify a hypernym given new text data.
公开/授权文献
信息查询
0/0