Method for automatically identifying sentence boundaries in noisy conversational data
    12.
    发明授权
    Method for automatically identifying sentence boundaries in noisy conversational data 有权
    在嘈杂会话数据中自动识别句子边界的方法

    公开(公告)号:US08364485B2

    公开(公告)日:2013-01-29

    申请号:US11845462

    申请日:2007-08-27

    CPC classification number: G10L15/26

    Abstract: Sentence boundaries in noisy conversational transcription data are automatically identified. Noise and transcription symbols are removed, and a training set is formed with sentence boundaries marked based on long silences or on manual markings in the transcribed data. Frequencies of head and tail n-grams that occur at the beginning and ending of sentences are determined from the training set. N-grams that occur a significant number of times in the middle of sentences in relation to their occurrences at the beginning or ending of sentences are filtered out. A boundary is marked before every head n-gram and after every tail n-gram occurring in the conversational data and remaining after filtering. Turns are identified. A boundary is marked after each turn, unless the turn ends with an impermissible tail word or is an incomplete turn. The marked boundaries in the conversational data identify sentence boundaries.

    Abstract translation: 嘈杂会话转录数据中的句子边界自动识别。 删除噪声和转录符号,并且形成一个训练集,其中以基于长期沉默或手写标记的转录数据标记的句子边界。 从训练集确定在句子的开头和结尾出现的头和尾n-gram的频率。 在句子中间出现相当于句子开头或结尾的出现次数的N-gram被过滤掉。 在每个头n-gram之前和之后的每个尾部n-gram出现在对话数据中并且在过滤之后保留边界。 确认车辙。 每转后,边界都会被标记出来,除非转弯以不允许的尾字结束,或者是不完整的转弯。 会话数据中的标记边界识别句子边界。

    Cleansing a Database System to Improve Data Quality
    13.
    发明申请
    Cleansing a Database System to Improve Data Quality 审中-公开
    清理数据库系统以提高数据质量

    公开(公告)号:US20120150825A1

    公开(公告)日:2012-06-14

    申请号:US12966281

    申请日:2010-12-13

    CPC classification number: G06F16/217 G06F16/215 G06F16/2462

    Abstract: According to one embodiment of the present invention, a system controls cleansing of data within a database system, and comprises a computer system including at least one processor. The system receives a data set from the database system, and one or more features of the data set are selected for determining values for one or more characteristics of the selected features. The determined values are applied to a data quality estimation model to determine data quality estimates for the data set. Problematic data within the data set are identified based on the data quality estimates, where the cleansing is adjusted to accommodate the identified problematic data. Embodiments of the present invention further include a method and computer program product for controlling cleansing of data within a database system in substantially the same manner described above.

    Abstract translation: 根据本发明的一个实施例,系统控制数据库系统内的数据清理,并且包括包括至少一个处理器的计算机系统。 系统从数据库系统接收数据集,并且选择数据集的一个或多个特征以确定所选特征的一个或多个特征的值。 将确定的值应用于数据质量估计模型以确定数据集的数据质量估计。 基于数据质量估计来识别数据集中的有问题的数据,其中调整清洁以适应所识别的有问题的数据。 本发明的实施例还包括一种方法和计算机程序产品,用于以与上述基本相同的方式控制数据库系统内的数据清洗。

    METHOD FOR SEGMENTING COMMUNICATION TRANSCRIPTS USING UNSUPERVISED AND SEMI-SUPERVISED TECHNIQUES
    14.
    发明申请
    METHOD FOR SEGMENTING COMMUNICATION TRANSCRIPTS USING UNSUPERVISED AND SEMI-SUPERVISED TECHNIQUES 有权
    使用不受限制的和受监督的技术分隔通信转录的方法

    公开(公告)号:US20090112571A1

    公开(公告)日:2009-04-30

    申请号:US12060469

    申请日:2008-04-01

    CPC classification number: G06F17/3071 G10L15/04

    Abstract: A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a set of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence clusters within sequences of the collection.

    Abstract translation: 提供了一种用于从事务通信的通信转录语料库形成一个或多个顺序句子的离散段聚类的方法,其包括将语料库的通信记录分成由呼叫者说出的第一组句子和第二组句子 由答复者 通过使用无监督分数聚类方法,根据词汇相似度的度量,对第一和第二组句子进行分组,从而生成一组句子群; 通过为每个句子集分配不同的句子类型并以分配给句子分组的句子集合的句子类型表示语料库的每个通信录音的每个句子来生成句子序列的集合; 以及通过根据在集合的序列内分配给句子集群的句子类型之间的基于邻近度的度量连续地合并语句集群来生成指定数量的离散分段集群。

    Customer service analysis
    16.
    发明授权
    Customer service analysis 有权
    客户服务分析

    公开(公告)号:US09118759B2

    公开(公告)日:2015-08-25

    申请号:US12855944

    申请日:2010-08-13

    CPC classification number: H04M3/51 H04M3/5175

    Abstract: A method, a system and a computer program product for analyzing customer service quality is disclosed. A plurality of customer call service quality parameters is identified using historical data. The plurality of customer call service quality parameters is quantified and correlated. The customer service quality is analyzed using the plurality of customer call service quality parameters. A repository is generated using the historical data of a plurality of customer calls and a set of pre-defined customer call flow templates. A subset of service quality queries is identified using contextual information of the customer call from the repository of service quality queries. The subset of service quality queries is then interspersed in the customer call. The customer service quality is analyzed using responses to the subset of service quality queries.

    Abstract translation: 公开了一种用于分析客户服务质量的方法,系统和计算机程序产品。 使用历史数据来识别多个客户呼叫服务质量参数。 多个客户呼叫服务质量参数被量化和相关。 使用多个客户呼叫服务质量参数分析客户服务质量。 使用多个客户呼叫的历史数据和一组预定义的客户呼叫流模板来生成存储库。 使用来自服务质量查询库的客户呼叫的上下文信息来识别服务质量查询的子集。 然后将服务质量查询的子集散布在客户呼叫中。 使用对服务质量查询子集的响应来分析客户服务质量。

    Rule set management
    17.
    发明授权
    Rule set management 失效
    规则集管理

    公开(公告)号:US08700542B2

    公开(公告)日:2014-04-15

    申请号:US12969497

    申请日:2010-12-15

    CPC classification number: G06N5/025

    Abstract: Systems, methods, and computer products for optimally managing large rule sets are disclosed. Rule dependencies of rules within a set of rules may be determined as a function of rules execution frequency data generated from applying the rules over a data set. The rules within the set of rules may be clustered into rules clusters based on the determined rule dependencies, in which the rules clusters comprise disjoint subsets of the rules within the set of rules. Cluster frequency data for the rules clusters may be used to arrive at an optimal ordering. Each rule within the set of rules may be assigned a unique identification that may capture an execution order of the rules within the set of rules.

    Abstract translation: 公开了用于最佳管理大规则集的系统,方法和计算机产品。 一组规则中的规则的规则依赖性可以被确定为通过在数据集上应用规则而生成的规则执行频率数据的函数。 基于所确定的规则依赖性,该组规则中的规则可以被聚集到规则集群中,其中规则集合包括规则集合内的规则的不相交的子集。 可以使用规则集群的群集频率数据来获得最佳排序。 该组规则中的每个规则可以被分配唯一的标识,其可以捕获规则集合内的规则的执行顺序。

    SYSTEMS AND METHODS FOR EFFICIENT DEVELOPMENT OF A RULE-BASED SYSTEM USING CROWD-SOURCING
    20.
    发明申请
    SYSTEMS AND METHODS FOR EFFICIENT DEVELOPMENT OF A RULE-BASED SYSTEM USING CROWD-SOURCING 失效
    使用CROWD-SOURCING的基于规则的系统的有效开发的系统和方法

    公开(公告)号:US20120221508A1

    公开(公告)日:2012-08-30

    申请号:US13036454

    申请日:2011-02-28

    CPC classification number: G06F17/00 G06F17/30 G06F17/30303

    Abstract: Described herein are methods, systems, apparatuses and products for efficient development of a rule-based system. An aspect provides a method including accessing data records; converting said data records to an intermediate form; utilizing intermediate forms to compute similarity scores for said data records; and selecting as an example to be provided for rule making at least one record of said data records having a maximum dissimilarity score indicative of dissimilarity to already considered examples.

    Abstract translation: 这里描述了用于有效开发基于规则的系统的方法,系统,设备和产品。 一方面提供了一种包括访问数据记录的方法; 将所述数据记录转换成中间形式; 利用中间形式来计算所述数据记录的相似度分数; 并且选择为规则提供用于规则制作所述数据记录的至少一个记录,其具有指示与已经考虑的示例的不相似性的最大不相似性分数。

Patent Agency Ranking