Document content analysis based on topic modeling

    公开(公告)号:US10558657B1

    公开(公告)日:2020-02-11

    申请号:US15269458

    申请日:2016-09-19

    Abstract: A mechanism for progressive topic modeling is disclosed to facilitate document content analysis. Input documents can be sorted and divided into multiple groups. Topic modeling is performed for each group, where the topic modeling for one group is based on the generated topic model from a previous group, if available. The vocabulary used in the topic modeling process can also be updated for each group of documents. The generated topics can be presented in a user interface to facilitate a user in analyzing the documents. The topic modeling mechanism can also be utilized to enhance a document search experience by generating topics from documents contained in search results and presenting topic words to a user as suggested search terms.

    Extracting keywords from a document

    公开(公告)号:US10796094B1

    公开(公告)日:2020-10-06

    申请号:US16534407

    申请日:2019-08-07

    Abstract: An unsupervised keyword extraction process is disclosed. A single input document can be analyzed to identify multiple candidate keywords by utilizing splitting terms. A keyword score is calculated for each of the candidate keywords. The keyword score for a particular candidate keyword is determined based on the length of the candidate keywords that contain the candidate keyword and the frequency of the words appearing in the candidate keywords. One or more keywords having the highest keyword scores are selected as the extracted keywords. The extracted keywords can be used in applications, such as refining search results, providing suggested search terms, or improving the match rate of a network page at a search engine.

    Document content analysis based on topic modeling

    公开(公告)号:US10255283B1

    公开(公告)日:2019-04-09

    申请号:US15269189

    申请日:2016-09-19

    Abstract: A mechanism for progressive topic modeling is disclosed to facilitate document content analysis. Input documents can be sorted and divided into multiple groups. Topic modeling is performed for each group, where the topic modeling for one group is based on the generated topic model from a previous group, if available. The vocabulary used in the topic modeling process can also be updated for each group of documents. The generated topics can be presented in a user interface to facilitate a user in analyzing the documents. The topic modeling mechanism can also be utilized to enhance a document search experience by generating topics from documents contained in search results and presenting topic words to a user as suggested search terms.

    Generating a set of representative items using a clustering-selection strategy

    公开(公告)号:US10114885B1

    公开(公告)日:2018-10-30

    申请号:US15162376

    申请日:2016-05-23

    Inventor: Weiwei Cheng

    Abstract: Systems and methods are directed to a computing device for selecting a set of representative items from a set of items using a clustering-selection strategy. The computing device may determine a first set of characteristics for each item in the set. The computing device may then include each item into one of a number of clusters based on the first set of characteristics of the item. For each cluster of items, the computing device may determine a utility value for each item in the cluster based on a second set of characteristics distinct form the first set of characteristics. The computing device may select the item from each cluster having the highest utility value within the cluster. The selected items may include a number of items that is desired and may substantially represent the diverse characteristics of the set of items.

    Artificial intelligence system for automated generation of realistic question and answer pairs

    公开(公告)号:US10929392B1

    公开(公告)日:2021-02-23

    申请号:US16193951

    申请日:2018-11-16

    Inventor: Weiwei Cheng

    Abstract: Generally described, one or more aspects of the present application correspond to machine learning techniques for generating realistic question-answer (QA) pairs for populating an initial community ask feature of electronic store item detail pages. The machine learning model can use a shared encoder to generate an embedding of a seed sentence from existing description of an item, and then pass that embedding to a question decoder to generate a question. The embedding of the seed sentence can be combined with a state representation of the question and provided to an answer decoder, which can generate an answer to the generated question. This can help overcome the cold start problem, where customers are less likely to ask questions about items that have no existing QA set. This can also help surface relevant information about items in a concise QA format that is easy for customers to find and read.

    Extracting keywords from a document

    公开(公告)号:US10387568B1

    公开(公告)日:2019-08-20

    申请号:US15269539

    申请日:2016-09-19

    Abstract: An unsupervised keyword extraction process is disclosed. A single input document can be analyzed to identify multiple candidate keywords by utilizing splitting terms. A keyword score is calculated for each of the candidate keywords. The keyword score for a particular candidate keyword is determined based on the length of the candidate keywords that contain the candidate keyword and the frequency of the words appearing in the candidate keywords. One or more keywords having the highest keyword scores are selected as the extracted keywords. The extracted keywords can be used in applications, such as refining search results, providing suggested search terms, or improving the match rate of a network page at a search engine.

    Generating a set of representative items using a maximum-set-coverage selection strategy

    公开(公告)号:US10248712B1

    公开(公告)日:2019-04-02

    申请号:US15162365

    申请日:2016-05-23

    Inventor: Weiwei Cheng

    Abstract: Systems and methods are directed to a computing device for selecting a set of representative items from a set of items using a maximum-set-coverage selection strategy. The computing device may derive an associated collection of elements from the set of items. The computing device may determine a marginal utility value for the item based on elements related to the review. The computing device may similarly determine the marginal utility value for each item in the set and may select the item in the set having the highest marginal utility value. The computing device may remove elements related to the selected item from the associated collection of elements, determine updated marginal utility values for the items based on the remaining elements, and select another item having the highest updated marginal utility value. The computing device may repeat the above process until a number of items that is desired has been selected.

    Generating a set of representative items using a dynamic selection strategy

    公开(公告)号:US10114887B1

    公开(公告)日:2018-10-30

    申请号:US15162437

    申请日:2016-05-23

    Inventor: Weiwei Cheng

    Abstract: Systems and methods are directed to a computing device for dynamically selecting a subset of representative items from a set of items. The computing device may determine characteristics regarding the set of reviews. Based on these characteristics, the computing device may determine whether to utilize a maximum-set-coverage selection strategy or a clustering-selection strategy to select the representative set of reviews. Further, the computing device may monitor for change in the characteristics of the set of reviews (e.g., the addition or deletion of a review or the change of a preexisting review) and may make a subsequent determination regarding the selection strategy to use to select a set of representative reviews based at least in part on the changed characteristics.

Patent Agency Ranking