一种面向文本信息的敏感词过滤方法

发明授权

请登陆查看更多内容

专利标题： 一种面向文本信息的敏感词过滤方法
申请号： CN201510083247.9

申请日： 2015-02-15
公开(公告)号： CN104850574B

公开(公告)日： 2018-07-06
发明人: 白春玲
申请人： 博彦科技股份有限公司
申请人地址： 北京市海淀区东北旺西路8号中关村软件园9号楼三区A座
专利权人： 博彦科技股份有限公司
当前专利权人： 易博互通企服科技有限公司
当前专利权人地址： 北京市海淀区东北旺西路8号中关村软件园9号楼三区A座
代理机构： 北京汲智翼成知识产权代理事务所
代理商 陈曦; 董烨飞
主分类号： G06F17/30
IPC分类号： G06F17/30

摘要：

本发明公开了一种面向文本信息的敏感词过滤方法，包括以下步骤：(1)接收用户的文本信息，验证文本信息的数据格式是否正确，若数据格式验证不通过，则回到步骤(1)；若通过验证，则转至步骤(2)；(2)对文本信息进行语义分析：从文本信息中取出一个词组，在语义分析库中进行匹配，得到词组的词重；按照词重对文本信息的所有词组重新排序，然后将排序后的文本信息转换成数组格式；(3)对数组格式的文本信息进行敏感词过滤；如果有敏感词存在，将匹配出的敏感词返回给用户；如果不存在，返回给用户一个空信息。本发明不仅对敏感词进行词重分类，并且对分完类的敏感词又按照字母类别分类，有效提高了敏感词的过滤速度。

摘要（英）：

The invention discloses a text information oriented sensitive word filtering method. The filtering method comprises the following steps: (1) receiving text information of a user, verifying whether a data format of the text information is correct, returning to the step (1) if the data format verification fails to pass; or shifting to the step (2) if the verification is passed; (2) performing semantic analysis on the text information: extracting a word group from the text information, matching in a semantic analysis library to obtain term weights of the word group; re-ordering all word groups of the text information according to term weights, and converting the ordered text information into an array format; (3) filtering sensitive words in the text information in the array format; returning the matched sensitive words to the user if the sensitive words are existent; or returning empty information to the user if the sensitive words are nonexistent. The sensitive words can be classified based on the term weights, and the classified sensitive words can be further classified according to letter types, so that the filtering speed of the sensitive words is effectively improved.

公开/授权文献

CN104850574A 一种面向文本信息的敏感词过滤方法公开/授权日：2015-08-19

信息查询

中国专利公布公告 Global Dossier Espacenet