USING UNSUPERVISED CLUSTERING AND LANGUAGE MODEL TO NORMALIZE ATTRIBUTE TUPLES OF ITEMS IN A DATABASE

    公开(公告)号:US20250005279A1

    公开(公告)日:2025-01-02

    申请号:US18215505

    申请日:2023-06-28

    Abstract: A computer system uses clustering and a large language model (LLM) to normalize attribute tuples for items stored in a database of an online system. The online system collects attribute tuples, each attribute tuple comprising an attribute type and an attribute value for an item. The online system initially clusters the attribute tuples into a first plurality of clusters. The online system generates prompts for input into the LLM, each prompt including a subset of attribute tuples grouped into a respective cluster of the first plurality. Based on the prompts, the LLM generates a second plurality of clusters, each cluster including one or more attribute tuples that have a common attribute type and a common attribute value. The online system maps each attribute tuple to a respective normalized attribute tuple associated with each cluster. The online system rewrites each attribute tuple in the database to a corresponding normalized attribute tuple.

Patent Agency Ranking