System for nearest neighbor search of dataset

    公开(公告)号:US12166503B1

    公开(公告)日:2024-12-10

    申请号:US17655121

    申请日:2022-03-16

    Abstract: Low latency search for nearest neighbors in a dataset containing a large number of entries is improved using an error correction code (ECC) for partitioning data into clusters and retrieval. During initialization and preprocessing a d-dimensional space with clusters corresponding to ECC codewords is specified. Entries in the dataset are embedded into this space and associated with respective codewords, each codeword specifying a cluster. An index associates the codewords, clusters, and entries. During a query of the dataset, a query entry is processed to determine a query embedding in the d-dimensional space. The query embedding is used as input for a list decoder of the ECC. The list decoder provides a set of nearest codewords, with those codewords representing a set of candidate clusters that may contain nearest neighbors. The dataset entries associated with the candidate clusters may then be searched to determine query results comprising specific entries.

Patent Agency Ranking