摘要:
Disclosed are a method and system for finding an optimal data layout. The approach of the present invention is to try one of several data layouts in the memory, measure the impact of said one data layout on a performance of a program, and decide which of said several data layouts to try next. The trying and measuring steps are repeated, and one of said several data layouts is selected as best or optimal based on the measurings. The preferred embodiment of the invention provides layout auditing, a framework for picking the best data layout online without requiring any user input. Layout auditing optimizes data layouts with a try-measure-decide feedback loop: use a data reorganizer to try one of several data layouts, use a profiler to measure the impact of the data layout on performance, and use a controller to decide which data layout to try next.
摘要:
A temporal profiling framework useful for dynamic optimization with hot data stream prefetching provides profiling of longer bursts and lower overhead. For profiling longer bursts, the framework employs a profiling phase counter, as well as a checking phase counter, to control transitions to and from instrumented code for sampling bursts of a program execution trace. The temporal profiling framework further intelligently eliminates some checks at procedure entries and loop back-edges, while still avoiding unbounded execution without executing checks for transition to and from instrumented code. Fast hot data stream detection analyzes a grammar of a profiled data reference sequence, calculating a heat metric for recurring subsequences based on length and number of unique occurrences outside of other hot data streams in the sequence with sufficiently low-overhead to permit use in a dynamic optimization framework.
摘要:
A temporal profiling framework useful for dynamic optimization with hot data stream prefetching provides profiling of longer bursts and lower overhead. For profiling longer bursts, the framework employs a profiling phase counter, as well as a checking phase counter, to control transitions to and from instrumented code for sampling bursts of a program execution trace. The temporal profiling framework further intelligently eliminates some checks at procedure entries and loop back-edges, while still avoiding unbounded execution without executing checks for transition to and from instrumented code. Fast hot data stream detection analyzes a grammar of a profiled data reference sequence, calculating a heat metric for recurring subsequences based on length and number of unique occurrences outside of other hot data streams in the sequence with sufficiently low-overhead to permit use in a dynamic optimization framework.
摘要:
A garbage collection algorithm that achieves hierarchical copy order with parallel garbage collection threads. More specifically, the present invention provides a garbage collection method and system for copying objects from a from-space to a to-space. The method comprises the steps of (a) having multiple threads that simultaneously perform work for garbage collection (GC), (b) examining the placement of objects on blocks, and (c) changing the placement of objects on blocks based on step (b). Preferably, the method includes the additional step of calculating a placement of object(s) based on step (b), and using the result of the calculation for step (c). For example, the calculation may be used to increase the frequency of intra-block pointers and/or to increase the frequency of siblings on the same block.
摘要:
Disclosed is a garbage collection algorithm that achieves hierarchical copy order with parallel garbage collection threads. More specifically, the present invention provides a garbage collection method and system for copying objects from a from-space to a to-space. The method comprises the steps of (a) having multiple threads that simultaneously perform work for garbage collection (GC), (b) examining the placement of objects on blocks, and (c) changing the placement of objects on blocks based on step (b). Preferably, the method includes the additional step of calculating a placement of object(s) based on step (b), and using the result of the calculation for step (c). For example, the calculation may be used to increase the frequency of intra-block pointers and/or to increase the frequency of siblings on the same block.
摘要:
A method and system for creating and injecting code into a running program that identifies a hot data stream, and prefetching data elements in the stream so they are available when needed by the processor. The injected code identifies the first few elements in a hot data stream (i.e. the prefix), and prefetches the balance of the elements in the stream (i.e., the suffix). Since the hot data stream identification code and prefetch code is injected at run time, pointer related time-dependencies inherent in earlier prefetch systems are eliminated. A global deterministic finite state machine (DFSM) is used to help create conceptual logic used to generate the code injected into the program for prefix detection.
摘要:
Disclosed is a garbage collection algorithm that achieves hierarchical copy order with parallel garbage collection threads. More specifically, the present invention provides a garbage collection method and system for copying objects from a from-space to a to-space. The method comprises the steps of (a) having multiple threads that simultaneously perform work for garbage collection (GC), (b) examining the placement of objects on blocks, and (c) changing the placement of objects on blocks based on step (b). Preferably, the method includes the additional step of calculating a placement of object(s) based on step (b), and using the result of the calculation for step (c). For example, the calculation may be used to increase the frequency of intra-block pointers and/or to increase the frequency of siblings on the same block.
摘要:
A method and system for creating and injecting code into a running program that identifies a hot data stream, and prefetching data elements in the stream so they are available when needed by the processor. The injected code identifies the first few elements in a hot data stream (i.e. the prefix), and prefetches the balance of the elements in the stream (i.e., the suffix). Since the hot data stream identification code and prefetch code is injected at run time, pointer related time-dependencies inherent in earlier prefetch systems are eliminated. A global deterministic finite state machine (DFSM) is used to help create conceptual logic used to generate the code injected into the program for prefix detection.
摘要:
A garbage collection algorithm that achieves hierarchical copy order with parallel garbage collection threads. More specifically, the present invention provides a garbage collection method and system for copying objects from a from-space to a to-space. The method comprises the steps of (a) having multiple threads that simultaneously perform work for garbage collection (GC), (b) examining the placement of objects on blocks, and (c) changing the placement of objects on blocks based on step (b). Preferably, the method includes the additional step of calculating a placement of object(s) based on step (b), and using the result of the calculation for step (c). For example, the calculation may be used to increase the frequency of intra-block pointers and/or to increase the frequency of siblings on the same block.
摘要:
Disclosed is a garbage collection algorithm that achieves hierarchical copy order with parallel garbage collection threads. More specifically, the present invention provides a garbage collection method and system for copying objects from a from-space to a to-space. The method comprises the steps of (a) having multiple threads that simultaneously perform work for garbage collection (GC), (b) examining the placement of objects on blocks, and (c) changing the placement of objects on blocks based on step (b). Preferably, the method includes the additional step of calculating a placement of object(s) based on step (b), and using the result of the calculation for step (c). For example, the calculation may be used to increase the frequency of intra-block pointers and/or to increase the frequency of siblings on the same block.