AUTOMATED METHOD AND SYSTEM FOR DISCOVERY AND IDENTIFICATION OF A COMPANY NAME FROM A PLURALITY OF DIFFERENT WEBSITES

    公开(公告)号:US20200242632A1

    公开(公告)日:2020-07-30

    申请号:US16261312

    申请日:2019-01-29

    Abstract: Methods and systems are provided for automatically determining and selecting correct company names from websites based on HTML extracted from home webpages of different companies. An HTML source file is downloaded from a home webpage of a company, and many candidate company names are extracted from the HTML source file along with support indicators that are used as support for determining the company names. Each support indicator is an extracted name that has been determined to have similarities to the company name extracted from the home webpage of each company. A clustering algorithm clusters similar company names and supporters together into different clusters. A score is computed for each cluster using a heuristic formula, and a cluster having the highest score is selected. Selection rules are then applied to select a top ranked name from each of the selected clusters as a company name.

    Method and system for automatically enriching collected seeds with information extracted from one or more websites

    公开(公告)号:US11126673B2

    公开(公告)日:2021-09-21

    申请号:US16261335

    申请日:2019-01-29

    Abstract: Methods and systems are provided for automatically enriching collected seeds. Each website that is associated with each collected seed is processed via a web crawler that crawls a home webpage for the company associated with that collected seed to verify, based on similarity between company name and website name, that a website associated with that home page belongs to that company. When verification is successful, other webpages on the website are processed to fetch information using different extractor algorithms each being designed to fetch a specific attribute for that company. Search engine(s) and third-party APIs can also be used to collect additional company information that can be added to each collected seed. Each collected seed is then enriched by adding all of the additional company information to the original seed data.

    METHOD AND SYSTEM FOR AUTOMATICALLY ENRICHING COLLECTED SEEDS WITH INFORMATION EXTRACTED FROM ONE OR MORE WEBSITES

    公开(公告)号:US20200242170A1

    公开(公告)日:2020-07-30

    申请号:US16261335

    申请日:2019-01-29

    Abstract: Methods and systems are provided for automatically enriching collected seeds. Each website that is associated with each collected seed is processed via a web crawler that crawls a home webpage for the company associated with that collected seed to verify, based on similarity between company name and website name, that a website associated with that home page belongs to that company. When verification is successful, other webpages on the website are processed to fetch information using different extractor algorithms each being designed to fetch a specific attribute for that company. Search engine(s) and third-party APIs can also be used to collect additional company information that can be added to each collected seed. Each collected seed is then enriched by adding all of the additional company information to the original seed data.

Patent Agency Ranking