Indexed file system and a method and a mechanism for accessing data records from such a system
    21.
    发明授权
    Indexed file system and a method and a mechanism for accessing data records from such a system 失效
    索引文件系统以及从这种系统访问数据记录的方法和机制

    公开(公告)号:US06292795B1

    公开(公告)日:2001-09-18

    申请号:US09250760

    申请日:1999-02-16

    IPC分类号: G06F1202

    摘要: A computer filing system includes a data access and allocation mechanism including a directory and a plurality of indexed data files or hash tables. The directory is preferably a radix tree including directory entries which contain pointers to respective ones of the hash tables. Using a plurality of hash tables avoids the whole database ever having to be re-hashed all at once. If a hash table exceeds a preset maximum size as data is added, it is replaced by two hash tables and the directory is updated to include two separate directory entries each containing a pointer to one of the new hash tables. The directory is locally extensible such that new levels are added to the directory only where necessary to distinguish between the hash tables. Local extensibility prevents unnecessary expansion of the size of the directory while also allowing the size of the hash tables to be controlled. This allows optimisation of the data access mechanism such that an optimal combination of directory-look-up and hashing processes is used. Additionally, if the number of keys mapped to an indexed data file is less than a threshold number (corresponding to the number of entries which can be held in a reasonable index), the index for the data file is built with a one-to-one relationship between keys and index entries such that each index entry identifies a data block holding data for only one key. This avoids the overhead of the collision detection of hashing when it ceases to be useful.

    摘要翻译: 计算机归档系统包括包括目录和多个索引数据文件或散列表的数据访问和分配机制。 该目录优选地是包括目录条目的基数树,其包含指向相应散列表的指针。 使用多个散列表避免了整个数据库不必一次重新散列。 如果哈希表超过预设的最大大小,则数据被添加,则将被两个散列表替换,并将目录更新为包含两个单独的目录条目,每个条目包含指向其中一个新哈希表的指针。 该目录是本地可扩展的,以便只有在必要时将新级别添加到目录才能区分哈希表。 本地可扩展性可以防止目录的大小不必要的扩展,同时还允许控制哈希表的大小。 这允许数据访问机制的优化,使得使用目录查找和散列过程的最佳组合。 另外,如果映射到索引数据文件的键的数量小于阈值数(对应于可以保持在合理索引中的条目数),则数据文件的索引用一对一 密钥和索引条目之间的一个关系,使得每个索引条目标识仅保存一个密钥的数据的数据块。 这避免了当它不再有用时哈希的碰撞检测的开销。