-
公开(公告)号:US11847406B1
公开(公告)日:2023-12-19
申请号:US17217807
申请日:2021-03-30
Applicant: Amazon Technologies, Inc.
Inventor: Sunil Mallya Kasaragod , Yahor Pushkin , Saman Zarandioon , Graham Vintcent Horwood , Miguel Ballesteros Martinez , Yogarshi Paritosh Vyas , Yinxiao Zhang , Diego Marcheggiani , Yaser Al-Onaizan , Xuan Zhu , Liutong Zhou , Yusheng Xie , Aruni Roy Chowdhury , Bo Pang
IPC: G06F17/00 , G06F40/143 , G06F40/169 , G06N20/00 , G06F40/154 , G06F40/103 , G06F40/284
CPC classification number: G06F40/143 , G06F40/103 , G06F40/154 , G06F40/169 , G06F40/284 , G06N20/00
Abstract: Techniques for performing natural language processing (NLP) on semi-structured data are described. An exemplary method includes receiving a semi-structured document to perform NLP on using a trained NLP model; converting the semi-structured document into a secondary format, wherein the secondary format includes spatial information for tokens of the semi-structured document; flattening the converted, secondary formatted semi-structured document into a Unicode Transformation Format text file; performing NLP on the Unicode Transformation Format text file using the trained NLP model; and providing a result of the NLP to a requester.