SCALABLE PIPELINE FOR MACHINE LEARNING-BASED BASE-VARIANT GROUPING

    公开(公告)号:US20250029364A1

    公开(公告)日:2025-01-23

    申请号:US18908565

    申请日:2024-10-07

    Abstract: A system comprises one or more processors and non-transitory computer-readable media storing computing instructions that, when executed, perform operations comprising: generating an adjacency list for candidate items using a distance threshold with the maximum allowable neighbor distance equal to a Siamese model cut-off. The operations can also comprise loading data for the candidate items in the adjacency list and generating graphs of the candidate items in the adjacency list. The operations further can comprise determining, using breakdown logic, first graphs of the graphs that exceed a predetermined size, and building hierarchy dendrograms of nested subclusters of the first graphs. The operations additional can comprise determining cut-off values based on p-th percentiles of density for the first graphs, and identifying recommended variant groups of the candidate item in the nested subclusters of the hierarchy dendrograms below the cut-off values. Other embodiments are disclosed.

    MISMATCH DETECTION MODEL
    2.
    发明申请

    公开(公告)号:US20210241076A1

    公开(公告)日:2021-08-05

    申请号:US16779510

    申请日:2020-01-31

    Abstract: A system including one or more processors and one or more non-transitory computer-readable media storing computing instructions configured to run on the one or more processors and perform obtaining a set of items that have been grouped together as matching items in a group; performing an ensemble mismatch detection; performing multiple detection models on the set of items to generate respective outputs regarding mismatches; combining the respective outputs to determine whether a quantity of detected mismatches is at least a predetermined threshold; when the quantity of detected mismatches is at least the predetermined threshold, the acts also can include separating at least one of the set of items from the group; and when the quantity of detected mismatches is not at least the predetermined threshold, the acts additionally can include maintaining each item of the set of items in the group. Other embodiments are disclosed.

    AUTOMATICALLY DETERMINING ITEMS TO INCLUDE IN A VARIANT GROUP

    公开(公告)号:US20210240739A1

    公开(公告)日:2021-08-05

    申请号:US16779473

    申请日:2020-01-31

    Abstract: A method including obtaining image data and attribute information of a first item in an item catalog. The method also can include generating candidate variant items from the item catalog for the first item using a combination of (a) a k-nearest neighbors approach to search for first candidate variant items based on text embeddings for the attribute information of the first item, and (b) an elastic search approach to search for second candidate variant items based on image embeddings for the image data of the first item. The method additionally can include performing respective classifications based on respective pairs comprising the first item and each of the candidate variant items to filter the candidate variant items. The method further can include determining a respective distance between the first item and each of the candidate variant items, as filtered. The method additionally can include determining one or more items in the candidate variant items, as filtered, to include in a variant group for the first item, based on a decision function using a predetermined threshold and the respective distance for the each of the candidate variant items, as filtered. Other embodiments are described.

    MISMATCH DETECTION MODEL
    4.
    发明公开

    公开(公告)号:US20240070438A1

    公开(公告)日:2024-02-29

    申请号:US18387159

    申请日:2023-11-06

    CPC classification number: G06N3/045 G06N3/043 G06Q30/0633 G06N3/08

    Abstract: A system comprising one or more processors and one or more non-transitory computer-readable media storing computing instructions that, when executed on the one or more processors, cause the one or more processors to perform operations comprising: obtaining a set of items that have been grouped together as matching items in a group; generating, using an ensemble learning model, a predictive indication of a mismatched item grouped together in error as part of the set of items, wherein the ensemble learning model comprises at least two detection models that are performed simultaneously with each other to output predictive indications comprising the predictive indication; and determining a final mismatch decision for an item of the set of items, wherein the final mismatch decision is based on the predictive indication, and wherein the item comprises the mismatched item. Other embodiments are disclosed.

    SCALABLE PIPELINE FOR MACHINE LEARNING-BASED BASE-VARIANT GROUPING

    公开(公告)号:US20220222924A1

    公开(公告)日:2022-07-14

    申请号:US17589768

    申请日:2022-01-31

    Abstract: A system including one or more processors and one or more non-transitory computer- readable media storing computing instructions configured to run on the one or more processors and perform: creating an adjacency list for candidate items using a distance threshold; generating graphs of the candidate items in the adjacency list, wherein nodes of the graphs represent the candidate items, and wherein edges of the graphs represent respective predicted variant neighbor links between pairs of the candidate items; determining, using breakdown logic, first graphs of the graphs that exceed a predetermined size; performing divisive hierarchical clustering on each of the first graphs; and identifying recommended variant groups of the candidate item in the nested subclusters of the hierarchy dendrogram below the respective cut-off value. Other embodiments are described.

    Scalable pipeline for machine learning-based base-variant grouping

    公开(公告)号:US12112520B2

    公开(公告)日:2024-10-08

    申请号:US17589768

    申请日:2022-01-31

    CPC classification number: G06V10/7625 G06N3/045 G06V10/7747 G06V10/82

    Abstract: A system including one or more processors and one or more non-transitory computer-readable media storing computing instructions configured to run on the one or more processors and perform: creating an adjacency list for candidate items using a distance threshold; generating graphs of the candidate items in the adjacency list, wherein nodes of the graphs represent the candidate items, and wherein edges of the graphs represent respective predicted variant neighbor links between pairs of the candidate items; determining, using breakdown logic, first graphs of the graphs that exceed a predetermined size; performing divisive hierarchical clustering on each of the first graphs; and identifying recommended variant groups of the candidate item in the nested subclusters of the hierarchy dendrogram below the respective cut-off value. Other embodiments are described.

    Automatically determining items to include in a variant group

    公开(公告)号:US11977561B2

    公开(公告)日:2024-05-07

    申请号:US16779473

    申请日:2020-01-31

    Abstract: A method including obtaining image data and attribute information of a first item in an item catalog. The method also can include generating candidate variant items from the item catalog for the first item using a combination of (a) a k-nearest neighbors approach to search for first candidate variant items based on text embeddings for the attribute information of the first item, and (b) an elastic search approach to search for second candidate variant items based on image embeddings for the image data of the first item. The method additionally can include performing respective classifications based on respective pairs comprising the first item and each of the candidate variant items to filter the candidate variant items. The method further can include determining a respective distance between the first item and each of the candidate variant items, as filtered. The method additionally can include determining one or more items in the candidate variant items, as filtered, to include in a variant group for the first item, based on a decision function using a predetermined threshold and the respective distance for the each of the candidate variant items, as filtered. Other embodiments are described.

    Mismatch detection model
    8.
    发明授权

    公开(公告)号:US11809979B2

    公开(公告)日:2023-11-07

    申请号:US16779510

    申请日:2020-01-31

    CPC classification number: G06N3/045 G06N3/043 G06Q30/0633 G06N3/08

    Abstract: A system including one or more processors and one or more non-transitory computer-readable media storing computing instructions configured to run on the one or more processors and perform obtaining a set of items that have been grouped together as matching items in a group; performing an ensemble mismatch detection; performing multiple detection models on the set of items to generate respective outputs regarding mismatches; combining the respective outputs to determine whether a quantity of detected mismatches is at least a predetermined threshold; when the quantity of detected mismatches is at least the predetermined threshold, the acts also can include separating at least one of the set of items from the group; and when the quantity of detected mismatches is not at least the predetermined threshold, the acts additionally can include maintaining each item of the set of items in the group. Other embodiments are disclosed.

Patent Agency Ranking