-
公开(公告)号:US12166503B1
公开(公告)日:2024-12-10
申请号:US17655121
申请日:2022-03-16
Applicant: AMAZON TECHNOLOGIES, INC.
Inventor: Nissim Halabi , Noam Touitou
Abstract: Low latency search for nearest neighbors in a dataset containing a large number of entries is improved using an error correction code (ECC) for partitioning data into clusters and retrieval. During initialization and preprocessing a d-dimensional space with clusters corresponding to ECC codewords is specified. Entries in the dataset are embedded into this space and associated with respective codewords, each codeword specifying a cluster. An index associates the codewords, clusters, and entries. During a query of the dataset, a query entry is processed to determine a query embedding in the d-dimensional space. The query embedding is used as input for a list decoder of the ECC. The list decoder provides a set of nearest codewords, with those codewords representing a set of candidate clusters that may contain nearest neighbors. The dataset entries associated with the candidate clusters may then be searched to determine query results comprising specific entries.