SYSTEMS AND METHODS FOR ENTITY SET EXPANSION

    公开(公告)号:US20240338391A1

    公开(公告)日:2024-10-10

    申请号:US18295757

    申请日:2023-04-04

    申请人: Recruit Co., Ltd.

    IPC分类号: G06F16/31 G06F16/38

    CPC分类号: G06F16/313 G06F16/38

    摘要: Disclosed embodiments relate to entity set expansion to associate with a text corpus. Techniques can include receiving unstructured data and a set of concepts associated with the data to determine, using a language model, a set of candidate entities in the data associated with the set of concepts, wherein the association is measured based on the relevancy of each candidate entity of the set of candidate entities to context of the data. Techniques can then determine, using a plurality of methods, associations between each candidate entity in the set of candidate entities and each concept in the concept of the set of concepts, wherein the each candidate entity is assigned a rank for each method of the plurality of methods. Techniques can use the assigned ranks to determine a combined rank of the each candidate entity of the set of candidate entities, wherein the combined rank of the each candidate entity is based on the assigned rank of the each candidate entity for the each method of the plurality of methods. Techniques can finally expand the entity set by determining a subset of entities of the set of candidate entities based on the combined rank of each candidate entity, wherein the subset of entities form the expanded entity set associated with the data.

    SYSTEMS AND METHODS FOR GENERALIZED ENTITY MATCHING

    公开(公告)号:US20230342558A1

    公开(公告)日:2023-10-26

    申请号:US17660813

    申请日:2022-04-26

    申请人: Recruit Co., Ltd.

    摘要: Disclosed embodiments relate to generalized entity matching. Techniques can include receiving a data pair of two entities that may be pre-processed to have parsable data structures, and serializing the data pair into a sequence of tokens based on data structure of each entity in the data pair. Techniques can further include encoding the serialized data pair to include topic attributes that may be mapped to data in the data pair and the topic of the mapped data matches the topic represented by topic attribute and the data in the data pair is concatenated. Techniques can further include pooling attributes in the data pair based on contextualized attributed representations of each encoded entity in the data pair and schema of each entity of the data pairs, where the contextual attribute representations are based on a first token of each encoded attribute in the sequence of tokens, and predicting matching labels between the data pairs based on pooled attributes.

    Simple Sperm Test Kit, System, and Method for Performing Simple Test on Sperm

    公开(公告)号:US20190265223A1

    公开(公告)日:2019-08-29

    申请号:US16083148

    申请日:2017-03-08

    申请人: Recruit Co., Ltd.

    发明人: Ryo Irisawa

    摘要: A simple sperm test kit 10 for performing a simple sperm test has a substrate 20 capable of being placed over a camera 30 of an information terminal 11, a recess 21 for containing semen A provided in the surface of the substrate 20, a cover 22 that covers the recess 21 for allowing external light to enter the recess 21, and a lens 23 provided within the substrate 20 on the lower side of the recess 21 for magnifying the semen A in the recess 21 and projecting an image on the back side of the substrate 20.

    Suspicious person detection system, suspicious person detection method

    公开(公告)号:US10176654B2

    公开(公告)日:2019-01-08

    申请号:US15769419

    申请日:2016-10-19

    申请人: Recruit Co., Ltd.

    摘要: A suspicious person detection technology which is less likely to cause a blind spot of detection of a suspicious person is provided. A suspicious person detection system detects a suspicious person present in a predetermined area and includes a probe request detection terminal (100) configured to detect a probe request transmitted from a mobile terminal (400) to generate probe information including first identification information specific to the mobile terminal which transmits the probe information, and an analyzing apparatus (200) configured to acquire the probe information from the probe request detection terminal, and, in the case where the first identification information included in the probe information matches none of one or more pieces of second identification information set in advance, transmit suspicious person information indicating that a suspicious person is detected to a predetermined information processing apparatus (300).

    SYSTEMS AND METHODS FOR MULTI-PURPOSE DATA MANAGEMENT

    公开(公告)号:US20240289629A1

    公开(公告)日:2024-08-29

    申请号:US18305657

    申请日:2023-04-24

    申请人: Recruit Co., Ltd.

    摘要: Disclosed embodiments relate to data management of entity pairs. Techniques can include receiving at least two sets of data and a data management task request with each including a set of entities. Techniques can determine a location of each entity in received data sets in a representative space by determining representative structure of the set of entities. Techniques can then for an entity, a set of representative entity pairs from each set of the at least two sets of data based on how close they are in the representative space. Technique can then analyze the set of representative entity pairs to identify most similar entity pairs include in a set of candidate pairs by determining closeness of location of entities in each entity pair in the representative space. Technique can then determine matched entity pairs of the candidate pairs using a first machine learning model is trained using the candidate pairs by applying labels, and utilizing the matched pairs to perform the requested data management task.

    SYSTEMS AND METHODS FOR MULTILINGUAL SENTENCE EMBEDDINGS

    公开(公告)号:US20220067279A1

    公开(公告)日:2022-03-03

    申请号:US17008569

    申请日:2020-08-31

    IPC分类号: G06F40/263

    摘要: Disclosed embodiments relate to natural language processing. Techniques can include obtaining an encoding model, obtaining a first sentence in a first language and a label associated with the first sentence, obtaining a second sentence in a second language, encoding the first sentence and second sentence using the encoding model, determining the intent of the first encoded sentence, determining the language of the first encoded sentence and the language of the second encoded sentence, and updating the encoding model based on the determined intent of the first encoded sentence, the label, the determined language of the first encoded sentence, and the determined language of the second encoded sentence

    Similarity Learning System and Similarity Learning Method

    公开(公告)号:US20190164109A1

    公开(公告)日:2019-05-30

    申请号:US16073447

    申请日:2017-01-25

    申请人: Recruit Co., Ltd.

    IPC分类号: G06Q10/06 G06Q10/10 G06F16/27

    摘要: Even if the number of documents for the number of words is insufficient, appropriate similarity degree learning is performed. An analysis method by a topic model is used to perform learning of a degree of similarity between recruitment information and resume information. By analyzing recruitment information registered with a recruitment card database DB 310 and resume information registered with a resume database DB 320 using a topic model, a characteristics extracting portion 330 collects words (keywords) extracted from documents constituting the recruitment information and the resume information for each topic; and a similarity degree learning portion 360 performs similarity degree learning for each topic.

    SYSTEMS AND METHODS FOR UNSUPERVISED PARAPHRASE MINING

    公开(公告)号:US20240020485A1

    公开(公告)日:2024-01-18

    申请号:US18366890

    申请日:2023-08-08

    申请人: Recruit Co., Ltd.

    摘要: Disclosed embodiments relate to aligning pairs of sentences. Techniques can include receiving a plurality of sentences; generating a graph for each of at least two sentences of the plurality of sentences, wherein generating a graph for each sentence of the at least two sentences comprises: identifying one or more tokens for the sentence; and connecting via edges the one or more tokens; generating a combined graph for the at least two sentences wherein generating a combined graph comprises: aligning the identified tokens of the at least two sentences of the plurality of sentences; identifying matching and non-matching tokens between the at least two sentences based on the alignment; and merging matching tokens into a combined graph node.

    SYSTEMS AND METHODS FOR SEMI-SUPERVISED EXTRACTION OF TEXT CLASSIFICATION INFORMATION

    公开(公告)号:US20220229984A1

    公开(公告)日:2022-07-21

    申请号:US17151088

    申请日:2021-01-15

    摘要: Disclosed embodiments relate to extracting classification information from input text. Techniques can include obtaining input text, identifying a plurality of tokens in the input text, pre-training a machine learning model, determining tagging information of the plurality of tokens using a first classification layer of the machine learning model, pairing sequences of tokens using the tagging information associated with the plurality of tokens, wherein the paired sequences of tokens are determined by a second classification layer, determining one or more attribute classifiers to apply to the one or more paired sequences, wherein the attribute classifiers are determined by a third classification layer of the machine learning model, evaluating sentiments of the paired sequences, wherein the sentiments of the paired sequences are determined by a fourth classification layer of the language machine learning model, aggregating sentiments of the paired sequences associated with an attribute classifier, and storing the aggregated sentiments.