-
公开(公告)号:US08135728B2
公开(公告)日:2012-03-13
申请号:US11619230
申请日:2007-01-03
Applicant: Wen-tau Yih , Joshua T. Goodman , Vitor Rocha de Carvalho
Inventor: Wen-tau Yih , Joshua T. Goodman , Vitor Rocha de Carvalho
CPC classification number: G06F17/241 , G06F17/27 , G06F17/30 , G06F17/30616
Abstract: Extraction analysis techniques biased, in part, by query frequency information from a query log file and/or search engine cache are employed along with machine learning processes to determine candidate keywords and/or phrases of web documents. Web oriented features associated with the candidate keywords and/or phrases are also utilized to analyze the web documents. A keyword and/or phrase extraction mechanism can be utilized to score keywords and/or phrases in a web document and estimate a likelihood that the keywords and/or phrases are relevant, for example, in an advertising system and the like.
Abstract translation: 提取分析技术部分地通过来自查询日志文件和/或搜索引擎高速缓冲存储器的查询频率信息以及机器学习过程来偏移来确定web文档的候选关键字和/或短语。 与候选关键字和/或短语相关联的面向Web的功能也用于分析网络文档。 可以使用关键字和/或短语提取机制来评估网络文档中的关键字和/或短语,并估计关键词和/或短语相关的可能性,例如在广告系统等中。