Star and snowflake schemas in extract, transform, load processes

    公开(公告)号:US09298787B2

    公开(公告)日:2016-03-29

    申请号:US13292234

    申请日:2011-11-09

    IPC分类号: G06F17/30

    摘要: A computer-implemented method, computer program product and a system for supporting star and snowflake data schemas for use with an Extract, Transform, Load (ETL) process, comprising selecting a data source comprising dimensional data, where the dimensional data comprises at least one source table comprising at least one source column, importing a data model for the dimensional data into a data integration system, analyzing the imported data model to select a star or snowflake target data schema comprising target dimensions and target facts, generating a meta-model representation by mapping at least one source table or source column to each target fact and target dimension, automatically converting the meta-model representation into one or more ETL jobs, and executing the ETL jobs to extract the dimensional data from the data source and loading the dimensional data into the selected target data schema in a target data system.

    Slowly Changing Dimension Attributes in Extract, Transform, Load Processes
    3.
    发明申请
    Slowly Changing Dimension Attributes in Extract, Transform, Load Processes 审中-公开
    在提取,转换,加载过程中缓慢改变维度属性

    公开(公告)号:US20130124454A1

    公开(公告)日:2013-05-16

    申请号:US13618158

    申请日:2012-09-14

    IPC分类号: G06F7/00 G06F17/30

    CPC分类号: G06F17/30563

    摘要: A computer-implemented method, computer program product and a system for identifying and handling slowly changing dimension (SCD) attributes for use with an Extract, Transform, Load (ETL) process, comprising importing a data model for dimensional data into a data integration system, where the dimensional data comprises a plurality of attributes, identifying via a data discovery analyzer one or more attributes in the data model as SCD attributes, importing the identified SCD attributes into the data integration system, selecting a data source comprising dimensional data, automatically generating an ETL job for the dimensional data utilizing the imported SCD attributes, and executing the automatically generated ETL to extract the dimensional data from the data source and loading the dimensional data into the imported SCD attributes in a target data system.

    摘要翻译: 一种计算机实现的方法,计算机程序产品和用于识别和处理与提取,变换,加载(ETL)过程一起使用的缓慢变化的维度(SCD)属性的系统,包括将维数据的数据模型导入数据集成系统 其中尺寸数据包括多个属性,通过数据发现分析器将数据模型中的一个或多个属性识别为SCD属性,将所识别的SCD属性导入到数据集成系统中,选择包括尺寸数据的数据源,自动生成 用于使用导入的SCD属性的维数据的ETL作业,以及执行自动生成的ETL以从数据源提取尺寸数据,并将维数据加载到目标数据系统中的导入的SCD属性中。

    Speed selective table scan operation
    4.
    发明授权
    Speed selective table scan operation 失效
    速度选择表扫描操作

    公开(公告)号:US07937541B2

    公开(公告)日:2011-05-03

    申请号:US11548889

    申请日:2006-10-12

    IPC分类号: G06F12/06

    摘要: Disclosed are a method, information processing system, and computer readable medium for scanning a storage medium table. The method includes retrieving location information associated with at least one other storage medium table scan. A storage medium table scan is started at a location within a storage medium table based on at least a location of the one other storage medium table scan. A weight is assigned to at least one storage medium block based on at least a current scanning location within the storage medium table relative to the location of the one other table scan. The method determines if a distance between the current scanning location and the location of the one other table scan is greater than a first given threshold. A current scanning operation is delayed, in response to the distance being greater than the given threshold, until the distance is below a second given threshold.

    摘要翻译: 公开了一种用于扫描存储介质表的方法,信息处理系统和计算机可读介质。 该方法包括检索与至少一个其他存储介质表扫描相关联的位置信息。 基于至少另一个存储介质表扫描的位置,在存储介质表中的位置处开始存储介质表扫描。 基于至少一个存储介质表中的当前扫描位置相对于另一个表扫描的位置,将权重分配给至少一个存储介质块。 该方法确定当前扫描位置与另一个表扫描的位置之间的距离是否大于第一给定阈值。 响应于距离大于给定阈值,当前扫描操作被延迟,直到距离低于第二给定阈值。

    Increasing buffer locality during multiple table access operations
    5.
    发明申请
    Increasing buffer locality during multiple table access operations 失效
    在多个表访问操作期间增加缓冲区的位置

    公开(公告)号:US20080144128A1

    公开(公告)日:2008-06-19

    申请号:US11548875

    申请日:2006-10-12

    IPC分类号: H04N1/04

    摘要: Disclosed are a method, information processing system, and computer readable medium for managing table scan processes. The method includes monitoring a plurality of storage medium table scan processes. Each storage medium table scan process in the plurality of storage medium table scan processes is placed into a plurality of scan groups based on storage medium pages to be scanned by each of the storage medium table scan processes. Each storage medium table scan process in a scan group can share data within a storage medium page.

    摘要翻译: 公开了一种用于管理表扫描处理的方法,信息处理系统和计算机可读介质。 该方法包括监视多个存储介质表扫描处理。 将多个存储介质台扫描处理中的每个存储介质台扫描处理基于要通过每个存储介质表扫描处理扫描的存储介质页被放置到多个扫描组中。 扫描组中的每个存储介质表扫描处理可以在存储介质页内共享数据。

    Method for merging multiple ranked lists with bounded memory
    6.
    发明申请
    Method for merging multiple ranked lists with bounded memory 审中-公开
    将多个排名列表与有界记忆合并的方法

    公开(公告)号:US20060190425A1

    公开(公告)日:2006-08-24

    申请号:US11064605

    申请日:2005-02-24

    IPC分类号: G06F17/30

    CPC分类号: G06F16/5838 G06F16/24549

    摘要: Systems and methods for conducting attribute-based queries over a plurality of objects using bounded memory locations and minimizing costly input and output operations are provided. A plurality of attributes are associated with each object, and a plurality of data groups, one each for the identified attributes are created. The objects associated with the attributes are placed into the appropriate data groups, and the objects contained within each data group are sorted into blocks such that each block within a given attribute contains that objects having the same attribute value. Results to the query are created by loading blocks into a primary memory location in a middleware system and combining the loaded blocks to create the desire query results. Block combinations are created based upon the fit of the given block combination to the query as expressed in an aggregation function. A second dedicated memory location can also be provided to hold multiple block combinations to optimize the order in which blocks are loaded and combined. Empty block buffers and external storage devices can also be provided to further enhance the generation of query results.

    摘要翻译: 提供了使用有限存储器位置对多个对象进行基于属性的查询并最小化昂贵的输入和输出操作的系统和方法。 多个属性与每个对象相关联,并且创建多个数据组,每个数据组各自用于所识别的属性。 与属性相关联的对象被放置到适当的数据组中,并且每个数据组中包含的对象被排序成块,使得给定属性中的每个块包含具有相同属性值的对象。 通过将块加载到中间件系统中的主存储器位置并组合加载的块以创建期望查询结果来创建查询的结果。 基于给定块组合对于在聚合函数中表达的查询的拟合来创建块组合。 还可以提供第二专用存储器位置以保持多个块组合以优化块被加载和组合的顺序。 还可以提供空块缓冲区和外部存储设备,以进一步增强查询结果的生成。

    Transparent edge-of-network data cache
    7.
    发明授权
    Transparent edge-of-network data cache 有权
    透明的网络边缘数据缓存

    公开(公告)号:US06950823B2

    公开(公告)日:2005-09-27

    申请号:US10328229

    申请日:2002-12-23

    IPC分类号: G06F7/00 G06F15/16 G06F17/30

    摘要: A system, apparatus and method are provided for the dynamic caching of data based on queries performed by a local application, where the system includes a remote server having a complete database, a local database on an edge server including a subset of the complete database, the edge server in communication with the remote server, shared tables within the local database on the edge server for caching results from the complete database, receiving locally generated data, and adjusting the contents of the cache based on available storage requirements while ensuring consistency of the data between the local database and the remote database; the apparatus includes an edge data cache including a query evaluator, a cache index, cache repository, resource manager, containment checker, query parser and consistency manager all in signal communication with the query evaluator; and the method for a local server to satisfy a database query meant for at least one remote server includes dynamically caching results of previous database queries of the remote server, associating a local database with the local server, storing a plurality of the caching results in shared tables in the local database, and using the plurality of the caching results in satisfying a new database query with the local server.

    摘要翻译: 提供了一种用于基于由本地应用执行的查询来动态缓存数据的系统,装置和方法,其中系统包括具有完整数据库的远程服务器,包括完整数据库的子集的边缘服务器上的本地数据库, 与远程服务器通信的边缘服务器,边缘服务器上的本地数据库中的共享表,用于从完整数据库缓存结果,接收本地生成的数据,并根据可用存储要求调整缓存的内容,同时确保 本地数据库与远程数据库之间的数据; 该装置包括与查询评估器进行信号通信的包括查询评估器,缓存索引,高速缓存存储库,资源管理器,容纳检查器,查询解析器和一致性管理器的边缘数据高速缓存器; 并且本地服务器满足用于至少一个远程服务器的数据库查询的方法包括动态缓存远程服务器的先前数据库查询的结果,将本地数据库与本地服务器相关联,将多个缓存结果存储在共享 本地数据库中的表,并且使用多个高速缓存结果来满足与本地服务器的新的数据库查询。

    Method, computer program product, and system converting relational data into hierarchical data structure based upon tagging trees
    8.
    发明申请
    Method, computer program product, and system converting relational data into hierarchical data structure based upon tagging trees 失效
    方法,计算机程序产品和系统将关系数据转换为基于标记树的分层数据结构

    公开(公告)号:US20050138052A1

    公开(公告)日:2005-06-23

    申请号:US10788141

    申请日:2004-02-26

    IPC分类号: G06F17/00 G06F17/30

    摘要: Tagging trees are generated and used to facilitate transforming data from relational databases into hierarchical formats, such as in XML documents. Tagging trees contain both XML hierarchical structure information as well a query information that is needed to access different data sources, e.g., databases, to retrieve the information to be placed in the hierarchical structure. A designer optionally creates a mapping script that specifies the transformation from relational databases to the hierarchical format. A tagging tree is created by either parsing that mapping script or by other means. A runtime environment then processes the tagging tree by a depth first traversal. The runtime environment is able to be configured to output a hierarchical data object, such as an XML document, or pipelined to control, for example, SAX processing.

    摘要翻译: 生成标记树以便于将数据从关系数据库转换为分层格式,如XML文档。 标记树包含XML层次结构信息以及访问不同数据源(例如数据库)以检索要放置在分层结构中的信息所需的查询信息。 设计人员可以选择创建一个映射脚本,该脚本指定从关系数据库到分层格式的转换。 标记树通过解析该映射脚本或其他方式来创建。 然后,运行时环境通过深度优先遍历处理标记树。 运行时环境能够被配置为输出分层数据对象,例如XML文档,或流水线控制,例如SAX处理。