一种海量web日志数据查询与分析方法

发明授权

CN104298771B 一种海量web日志数据查询与分析方法失效 - 权利终止

请登陆查看更多内容

专利标题： 一种海量web日志数据查询与分析方法
申请号： CN201410596395.6

申请日： 2014-10-30
公开(公告)号： CN104298771B

公开(公告)日： 2017-09-05
发明人: 马廷淮 , 瞿晶晶 , 田伟 , 薛羽 , 曹杰
申请人： 南京信息工程大学
申请人地址： 江苏省南京市宁六路219号
专利权人： 南京信息工程大学
当前专利权人： 北京智融时代信息技术有限公司
当前专利权人地址： 江苏省南京市宁六路219号
代理机构： 南京众联专利代理有限公司
代理商 顾进; 叶涓涓
主分类号： G06F17/30
IPC分类号： G06F17/30

摘要：

本发明利用Hadoop/Hive分布式计算平台的高可靠性、高扩展性、高效性以及高容错性，公开了一种基于Hadoop和Hive的海量web日志数据的查询与分析方法。本发明包括以下步骤：对各个数据源的数据进行解析；将数据装载进数据仓库中；接收HiveQL语句；对接受语句进行优化，得到初步map结果；将接受语句转换成MapReduce任务执行并存储查询结果；数据分割；对数据进行分析挖掘；将数据装载进Mysql数据库中。本发明针对海量的web日志数据，实现精确地查询和数据分析，既能实现海量数据存储查询分析的可扩展性和高效性，也避免数据倾斜带来的job分布不均整体性能下降的问题。

摘要（英）：

The invention discloses a massive web log data query and analysis method based on Hadoop and Hive by means of high reliability, high expansibility, high efficiency and high fault tolerance of a Hadoop and Hive distributed computing platform. The method includes the following steps that data of each data source are analyzed; the data are loaded into a database; HiveQL sentences are received; the received sentences are optimized to obtain a primary map result; the received sentences are converted into a Map Reduce task, the task is executed, and a query result is stored; the data are segmented; the data are analyzed and dug; the data are loaded into a Mysql database. According to massive web log data, precise query and data analysis are achieved, expansibility and effectiveness of storage, query and analysis of the massive data are achieved, and the problem that due to uneven job distribution caused by data skew, overall performance is reduced is avoided.

公开/授权文献

CN104298771A 一种海量web日志数据查询与分析方法公开/授权日：2015-01-21

信息查询

中国专利公布公告 Global Dossier Espacenet