DEPLOYING A VECTOR INDEX ON MULTIPLE NODES OF A CLUSTER

    公开(公告)号:US20250094400A1

    公开(公告)日:2025-03-20

    申请号:US18885640

    申请日:2024-09-14

    Abstract: Techniques for deploying a vector index on multiple nodes of a cluster are provided. In one technique, an instruction is received to create a vector index on a set of vectors that is stored in a vector database that is connected to the multiple nodes. In response, an HNSW index is created based on the set of vectors and the HNSW index is stored on each node. In response to receiving a vector query, a node processes the vector query against its copy of the HNSW index. In another technique, each node retrieves, from a vector database, a respective subset of a set of vectors and generates, based on the respective subset, a respective HNSW index. A vector query is transmitted to each node, which traverses its HNSW index to generate results of the vector query. The results from each node are combined to generate final results.

    OPTIMIZE WORKLOAD PERFORMANCE BY AUTOMATICALLY DISCOVERING AND IMPLEMENTING IN-MEMORY PERFORMANCE FEATURES

    公开(公告)号:US20240111772A1

    公开(公告)日:2024-04-04

    申请号:US18374852

    申请日:2023-09-29

    CPC classification number: G06F16/24561 G06F11/3414 G06F11/3419

    Abstract: Techniques are provided for optimizing workload performance by automatically discovering and implementing performance optimizations for in-memory units (IMUs). A system maintains a set of IMUs for processing database operations in a database. The system obtains a database workload information for the database system and filters the database workload information to identify database operations in the database workload information that may benefit from performance optimizations. The system analyzes the database operations to identify a set of performance optimizations and ranks the performance optimizations based on their potential benefit. The system selects a subset of the performance optimizations, based on their ranking, and generates new versions of IMUs that reflect the performance optimizations. The system performs verification tests on the new versions of IMUs and analyzes the tests to determine whether the new versions of IMUs yield expected performance benefits. The system then categorizes the new set of IMUs into a first set of IMUs to be retained and a second set of IMUs to be discarded. The system then makes the first set of IMUs available to the current workload and discards the second set of IMUs.

    EFFICIENT IN-MEMORY DB QUERY PROCESSING OVER ANY SEMI-STRUCTURED DATA FORMATS
    28.
    发明申请
    EFFICIENT IN-MEMORY DB QUERY PROCESSING OVER ANY SEMI-STRUCTURED DATA FORMATS 审中-公开
    任何半结构化数据格式的高效内存数据库查询处理

    公开(公告)号:US20170060973A1

    公开(公告)日:2017-03-02

    申请号:US15162235

    申请日:2016-05-23

    Abstract: Techniques are described herein for maintaining two copies of the same semi-structured data, where each copy is organized in a different format. One copy is in a first-format that may be convenient for storage, but inefficient for query processing. For example, the first-format may be a textual format that needs to be parsed every time a query needs to access individual data items within a semi-structured object. The database system intelligently loads semi-structured first-format data into volatile memory and, while doing so, converts the semi-structured first-format data to a second-format. Because the data in volatile memory is in the second-format, processing queries against the second-format data both allows disk I/0 to be avoided, and increases the efficiency of the queries themselves. For example, the parsing that may be necessary to run a query against a cached copy of the first-format data is avoided.

    Abstract translation: 本文描述了维护相同半结构化数据的两个副本的技术,其中每个副本以不同的格式组织。 一个副本是第一格式,可能方便存储,但查询处理效率低下。 例如,第一格式可以是每次查询需要访问半结构化对象内的各个数据项时需要解析的文本格式。 数据库系统将半结构化的第一格式数据智能地加载到易失性存储器中,同时将半结构化的第一格式数据转换为第二格式。 因为易失性存储器中的数据是第二格式,所以针对第二格式数据的处理查询都允许避免磁盘I / 0,并提高查询本身的效率。 例如,可以避免对第一格式数据的缓存副本运行查询所需的解析。

    AUTOMATIC INDEX SELECTION
    29.
    发明申请

    公开(公告)号:US20250094399A1

    公开(公告)日:2025-03-20

    申请号:US18885639

    申请日:2024-09-14

    Abstract: Techniques for automatically selecting a type of vector index are provided. In one technique, in response to determining to generate a vector index based on a base table that stores a plurality of vectors, a number of the plurality of vectors is identified. Based at least on the number of the plurality of vectors, a particular type of vector index is identified from among a plurality of types of vector indexes. Examples of the plurality of types include an HNSW index and an IVF index. A vector index of the particular type is generated for the base table. Another criterion in identifying a type of vector index to generate is the number of neighbors that is a parameter in generating a certain type of vector index.

    PROCESSING TOP-K QUERIES ON DATA IN RELATIONAL DATABASE SYSTEMS

    公开(公告)号:US20240126760A1

    公开(公告)日:2024-04-18

    申请号:US17966749

    申请日:2022-10-14

    CPC classification number: G06F16/24554 G06F16/23 G06F16/24537 G06F16/24542

    Abstract: Techniques for processing top-K queries are provided. In one technique, a database statement is received that requests top-K results related to a database object and that indicates two columns thereof: a first column by which to partition a result set and a second column by which to order the result set. A buffer is generated. For each of multiple rows in the database object: a first key value that associated with a first value in the first column of said each row is identified; a second key value that associated with a second value in the second column of said each entry is identified; a slot in the buffer is identified based on the first key value and the second key value; and the slot in the buffer may be updated based on the second key value. A response to the database statement is generated based on the buffer.

Patent Agency Ranking