Identifying entities in semi-structured content

    公开(公告)号:US10353905B2

    公开(公告)日:2019-07-16

    申请号:US14695996

    申请日:2015-04-24

    Abstract: Identifying entities in semi-structured content is described. A system assigns a corresponding entity type based on a corresponding entity type score for each token in a sequence of tokens in semi-structured content, based on multiple entity types, wherein each token is a corresponding character set. The system assigns a corresponding boundary type based on a corresponding boundary type score for each token in the sequence of tokens, based on a begin boundary type or a continue boundary type. The system identifies an entity based on a corresponding entity type score and a corresponding boundary type for each token in the sequence of tokens. The system outputs the sequence of tokens as an identified set of entities based on the identified entity.

    USER TRUST SCORES BASED ON REGISTRATION FEATURES
    2.
    发明申请
    USER TRUST SCORES BASED ON REGISTRATION FEATURES 审中-公开
    基于注册功能的用户信用评分

    公开(公告)号:US20160140355A1

    公开(公告)日:2016-05-19

    申请号:US14548027

    申请日:2014-11-19

    CPC classification number: G06F21/6218 G06F2221/2117

    Abstract: User trust scores based on registration features is described. A system identifies registration features associated with a user registered to interact with a database. The system calculates a registration trust score for the user based on a comparison of multiple registration features associated with the user to corresponding registration features associated with previous users who are restricted from interacting with the database and/or corresponding registration features associated with previous users who are enabled to interact with the database. The system restricts the user from interacting with the database if the registration trust score is above a registration threshold.

    Abstract translation: 描述基于注册功能的用户信任评分。 系统识别与注册为与数据库交互的用户相关联的注册特征。 该系统基于与用户相关联的多个注册特征与与先前用户相关联的对应注册特征的比较来计算用户的注册信任分数,所述注册特征与被限制在与数据库交互的先前用户和/或与先前用户相关联的相应注册特征 启用与数据库交互。 如果注册信任分数高于注册阈值,则系统限制用户与数据库交互。

    Systems and methods for out-of-distribution classification

    公开(公告)号:US11481636B2

    公开(公告)日:2022-10-25

    申请号:US16877325

    申请日:2020-05-18

    Abstract: An embodiment provided herein preprocesses the input samples to the classification neural network, e.g., by adding Gaussian noise to word/sentence representations to make the function of the neural network satisfy Lipschitz property such that a small change in the input does not cause much change to the output if the input sample is in-distribution. Method to induce properties in the feature representation of neural network such that for out-of-distribution examples the feature representation magnitude is either close to zero or the feature representation is orthogonal to all class representations. Method to generate examples that are structurally similar to in-domain and semantically out-of domain for use in out-of-domain classification training. Method to prune feature representation dimension to mitigate long tail error of unused dimension in out-of-domain classification. Using these techniques, the accuracy of both in-domain and out-of-distribution identification can be improved.

    Systems and methods for named entity recognition

    公开(公告)号:US11436481B2

    公开(公告)日:2022-09-06

    申请号:US16134957

    申请日:2018-09-18

    Abstract: A method for natural language processing includes receiving, by one or more processors, an unstructured text input. An entity classifier is used to identify entities in the unstructured text input. The identifying the entities includes generating, using a plurality of sub-classifiers of a hierarchical neural network classifier of the entity classifier, a plurality of lower-level entity identifications associated with the unstructured text input. The identifying the entities further includes generating, using a combiner of the hierarchical neural network classifier, a plurality of higher-level entity identifications associated with the unstructured text input based on the plurality of lower-level entity identifications. Identified entities are provided based on the plurality of higher-level entity identifications.

    Systems and Methods for Out-of-Distribution Classification

    公开(公告)号:US20210150365A1

    公开(公告)日:2021-05-20

    申请号:US16877325

    申请日:2020-05-18

    Abstract: An embodiment provided herein preprocesses the input samples to the classification neural network, e.g., by adding Gaussian noise to word/sentence representations to make the function of the neural network satisfy Lipschitz property such that a small change in the input does not cause much change to the output if the input sample is in-distribution. Method to induce properties in the feature representation of neural network such that for out-of-distribution examples the feature representation magnitude is either close to zero or the feature representation is orthogonal to all class representations. Method to generate examples that are structurally similar to in-domain and semantically out-of domain for use in out-of-domain classification training. Method to prune feature representation dimension to mitigate long tail error of unused dimension in out-of-domain classification. Using these techniques, the accuracy of both in-domain and out-of-distribution identification can be improved.

    SUBCOMPONENT MODEL TRAINING
    8.
    发明公开

    公开(公告)号:US20230229957A1

    公开(公告)日:2023-07-20

    申请号:US17576724

    申请日:2022-01-14

    CPC classification number: G06N20/00

    Abstract: Methods, apparatuses, and computer-program products are disclosed. The method may include inputting one or more subcomponent training datasets into the plurality of subcomponent models of the machine learning model, the machine learning model may be configured to perform a final task, and the plurality of subcomponent models may be configured to perform sequential subtasks that result in the final task. The method may include computing one or more weights for data points of the one or more subcomponent training datasets and the one or more weights may be based on a contribution of the data points to an end-to-end error loss measurement associated with performing the final task of the machine learning model. The method may include training the plurality of subcomponent models based on the one or more weights for the data points of the one or more subcomponent training datasets.

    IDENTIFYING ENTITIES IN SEMI-STRUCTURED CONTENT
    10.
    发明申请
    IDENTIFYING ENTITIES IN SEMI-STRUCTURED CONTENT 审中-公开
    识别半结构化内容中的实体

    公开(公告)号:US20160314123A1

    公开(公告)日:2016-10-27

    申请号:US14695996

    申请日:2015-04-24

    Abstract: Identifying entities in semi-structured content is described. A system assigns a corresponding entity type based on a corresponding entity type score for each token in a sequence of tokens in semi-structured content, based on multiple entity types, wherein each token is a corresponding character set. The system assigns a corresponding boundary type based on a corresponding boundary type score for each token in the sequence of tokens, based on a begin boundary type or a continue boundary type. The system identifies an entity based on a corresponding entity type score and a corresponding boundary type for each token in the sequence of tokens. The system outputs the sequence of tokens as an identified set of entities based on the identified entity.

    Abstract translation: 描述半结构化内容中识别实体。 系统基于多个实体类型,在半结构化内容中的令牌序列中基于每个令牌的相应实体类型分数分配对应的实体类型,其中每个令牌是对应的字符集。 基于开始边界类型或继续边界类型,系统基于令牌序列中的每个令牌的相应边界类型分数来分配相应的边界类型。 该系统基于相应的实体类型分数和令牌序列中的每个令牌的对应边界类型来识别实体。 该系统基于所识别的实体将令牌序列作为确定的一组实体输出。

Patent Agency Ranking