-
公开(公告)号:US11983209B1
公开(公告)日:2024-05-14
申请号:US18303943
申请日:2023-04-20
IPC分类号: G06F16/33 , G06F16/31 , G06F16/338
CPC分类号: G06F16/3347 , G06F16/313 , G06F16/338
摘要: Operations of a search management system are disclosed. The operations may include: identifying a data corpus containing a plurality of documents, generating sets of feature vectors representing the plurality of documents, receiving a query to search the data corpus, generating a query vector for the query, identifying a target feature vector that meets a similarity threshold by comparing the query vector to the feature vectors, and presenting a query result that includes at least part of the document. The feature vectors may be generated by executing a multi-step partitioning process for partitioning a respective document into a plurality of document partitions, such that the sets of feature vectors that are generated correspond to the plurality of document partitions for the respective document. The query result may include a target partition from among the plurality of document partitions represented by the target feature vector.
-
公开(公告)号:US20240354323A1
公开(公告)日:2024-10-24
申请号:US18634293
申请日:2024-04-12
IPC分类号: G06F16/33 , G06F16/31 , G06F16/338
CPC分类号: G06F16/3347 , G06F16/313 , G06F16/338
摘要: Operations of a search management system are disclosed. The operations may include: identifying a data corpus containing a plurality of documents, generating sets of feature vectors representing the plurality of documents, receiving a query to search the data corpus, generating a query vector for the query, identifying a target feature vector that meets a similarity threshold by comparing the query vector to the feature vectors, and presenting a query result that includes at least part of the document. The feature vectors may be generated by executing a multi-step partitioning process for partitioning a respective document into a plurality of document partitions, such that the sets of feature vectors that are generated correspond to the plurality of document partitions for the respective document. The query result may include a target partition from among the plurality of document partitions represented by the target feature vector.
-
公开(公告)号:US20230066143A1
公开(公告)日:2023-03-02
申请号:US17464534
申请日:2021-09-01
发明人: Liviu Sebastian Matei , Filip Trojan , Marc Michiel Bron , Andrew Kenneth Hind , Yingzhao Zhou , Maria-Monica Petrica , Rajesh Ashwinbhai Shah
摘要: A document may be received as part of a request to identify similar documents in a collection of documents. However, the received document and the documents in the collection may have different schemas or formats. To provide semantic context to the search and allow similarity scores to be generated between different document types, a configuration may be accessed that defines how to generate queries from one schema into another schema. The configuration may map queries between different fields in both schemas. Results of the multiple queries can be combined to generate a weighted combination for each document that can be used as a similarity score between different document types.
-
-