摘要:
A system and method of parallelizing programs assigns write tokens and read tokens to data objects accessed by computational operations. During run time, the write sets and read sets for computational operations are resolved and the computational operations executed only after they have obtained the necessary tokens for data objects corresponding to the resolved write and read sets. A data object may have unlimited read tokens but only a single write token and the write token may be released only if no read tokens are outstanding. Data objects provide a wait list which serves as an ordered queue for computational operations waiting for tokens.
摘要:
Execution of a computer program on a multiprocessor system is monitored to detect possible excess parallelism causing resource contention and the like and, in response, to controllably limit the number of processors applied to parallelize program components.
摘要:
A system and method of parallelizing programs employs runtime instructions to identify data accessed by program portions and to assign those program portions to particular processors based on potential overlap between the access data. Data dependence between different program portions may be identified and used to look for pending “predicate” program portions that could create data dependencies and to postpone program portions that may be dependent while permitting parallel execution of other program portions.
摘要:
Execution of a computer program on a multiprocessor system is monitored to detect possible excess parallelism causing resource contention and the like and, in response, to controllably limit the number of processors applied to parallelize program components.
摘要:
A system and method of parallelizing programs assigns write tokens and read tokens to data objects accessed by computational operations. During run time, the write sets and read sets for computational operations are resolved and the computational operations executed only after they have obtained the necessary tokens for data objects corresponding to the resolved write and read sets. A data object may have unlimited read tokens but only a single write token and the write token may be released only if no read tokens are outstanding. Data objects provide a wait list which serves as an ordered queue for computational operations waiting for tokens.
摘要:
A computer architecture allowing reuse of previously determined instruction results, indexes instruction results according to instruction addresses. The continued validity of operand values in registers or memory for the instructions is determined prior to the fetching of any given instruction by an invalidation system which detects an intervening register or memory write. Thus, the need to evaluate the operand values themselves which would delay execution is avoided. In one embodiment, dependencies for operands between instructions are recorded so as to avoid invalidating instructions having operand register or memory locations which are overwritten when the overwriting will be corrected by an intervening instruction immediately preceding the dependent instructions.
摘要:
An over-provisioned multicore processor employs more cores than can simultaneously run within the power envelope of the processor, enabling advanced processor control techniques for more efficient workload execution, despite significantly decreasing the duty cycle of the active cores so that on average a full core or more may not be operating.
摘要:
A predictor circuit permits advanced execution of instructions depending for their data on previous instructions by predicting such dependencies based on previous mis-speculations detected at the final stages of processing. Synchronization of dependent instructions is provided by a table creating entries for each instance of potential dependency. Table entries are created and deleted dynamically to limit total memory requirements.
摘要:
Parallelization of a program is performed by creating a distilled version of the program having higher execution speed but with unverified execution. The distilled program is executed rapidly to create state snapshots of the program that may be forwarded to secondary processors for execution of the actual program in parallel with other secondary processors similarly allocated. Each state snapshot is verified as the task is executed on a secondary processor by the preceding processor. The degree of parallelization is limited only by the speed up of the distilled program.
摘要:
A data dependence prediction technique is used to establish linkage between two instructions using data so that accessing the data from memory may be bypassed. Instead, the data retrieved in the first data using instruction is temporarily stored in a local register to be used by the second data using instruction. Parallel processing techniques of squashing are used in the event that the prediction is erroneous.