William stallings computer organization and architecture 6th edition - Chapter 14: Instruction level parallelism and superscalar processors

Direct descendent of IBM 801, RT PC and RS/6000 All are RISC RS/6000 first superscalar PowerPC 601 superscalar design similar to RS/6000 Later versions extend superscalar concept

ppt42 trang | Chia sẻ: nguyenlam99 | Lượt xem: 897 | Lượt tải: 0download
Bạn đang xem trước 20 trang tài liệu William stallings computer organization and architecture 6th edition - Chapter 14: Instruction level parallelism and superscalar processors, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
William Stallings Computer Organization and Architecture 6th EditionChapter 14Instruction Level Parallelismand Superscalar ProcessorsWhat is Superscalar?Common instructions (arithmetic, load/store, conditional branch) can be initiated and executed independentlyEqually applicable to RISC & CISCIn practice usually RISCWhy Superscalar?Most operations are on scalar quantities (see RISC notes)Improve these operations to get an overall improvementGeneral Superscalar OrganizationSuperpipelinedMany pipeline stages need less than half a clock cycleDouble internal clock speed gets two tasks per external clock cycleSuperscalar allows parallel fetch executeSuperscalar v SuperpipelineLimitationsInstruction level parallelismCompiler based optimisationHardware techniquesLimited byTrue data dependencyProcedural dependencyResource conflictsOutput dependencyAntidependencyTrue Data DependencyADD r1, r2 (r1 := r1+r2;)MOVE r3,r1 (r3 := r1;)Can fetch and decode second instruction in parallel with firstCan NOT execute second instruction until first is finishedProcedural DependencyCan not execute instructions after a branch in parallel with instructions before a branchAlso, if instruction length is not fixed, instructions have to be decoded to find out how many fetches are neededThis prevents simultaneous fetchesResource ConflictTwo or more instructions requiring access to the same resource at the same timee.g. two arithmetic instructionsCan duplicate resourcese.g. have two arithmetic unitsEffect of DependenciesDesign IssuesInstruction level parallelismInstructions in a sequence are independentExecution can be overlappedGoverned by data and procedural dependencyMachine ParallelismAbility to take advantage of instruction level parallelismGoverned by number of parallel pipelinesInstruction Issue PolicyOrder in which instructions are fetchedOrder in which instructions are executedOrder in which instructions change registers and memoryIn-Order Issue In-Order CompletionIssue instructions in the order they occurNot very efficientMay fetch >1 instructionInstructions must stall if necessaryIn-Order Issue In-Order Completion (Diagram)In-Order Issue Out-of-Order CompletionOutput dependencyR3:= R3 + R5; (I1)R4:= R3 + 1; (I2)R3:= R5 + 1; (I3)I2 depends on result of I1 - data dependencyIf I3 completes before I1, the result from I1 will be wrong - output (read-write) dependencyIn-Order Issue Out-of-Order Completion (Diagram)Out-of-Order Issue Out-of-Order CompletionDecouple decode pipeline from execution pipelineCan continue to fetch and decode until this pipeline is fullWhen a functional unit becomes available an instruction can be executedSince instructions have been decoded, processor can look aheadOut-of-Order Issue Out-of-Order Completion (Diagram)AntidependencyWrite-write dependencyR3:=R3 + R5; (I1)R4:=R3 + 1; (I2)R3:=R5 + 1; (I3)R7:=R3 + R4; (I4)I3 can not complete before I2 starts as I2 needs a value in R3 and I3 changes R3Register RenamingOutput and antidependencies occur because register contents may not reflect the correct ordering from the programMay result in a pipeline stallRegisters allocated dynamicallyi.e. registers are not specifically namedRegister Renaming exampleR3b:=R3a + R5a (I1)R4b:=R3b + 1 (I2)R3c:=R5a + 1 (I3)R7b:=R3c + R4b (I4)Without subscript refers to logical register in instructionWith subscript is hardware register allocatedNote R3a R3b R3cMachine ParallelismDuplication of ResourcesOut of order issueRenamingNot worth duplication functions without register renamingNeed instruction window large enough (more than 8)Branch Prediction80486 fetches both next sequential instruction after branch and branch target instructionGives two cycle delay if branch takenRISC - Delayed BranchCalculate result of branch before unusable instructions pre-fetchedAlways execute single instruction immediately following branchKeeps pipeline full while fetching new instruction streamNot as good for superscalarMultiple instructions need to execute in delay slotInstruction dependence problemsRevert to branch predictionSuperscalar ExecutionSuperscalar ImplementationSimultaneously fetch multiple instructionsLogic to determine true dependencies involving register valuesMechanisms to communicate these valuesMechanisms to initiate multiple instructions in parallelResources for parallel execution of multiple instructionsMechanisms for committing process state in correct orderPentium 480486 - CISCPentium – some superscalar componentsTwo separate integer execution unitsPentium Pro – Full blown superscalarSubsequent models refine & enhance superscalar designPentium 4 Block DiagramPentium 4 OperationFetch instructions form memory in order of static programTranslate instruction into one or more fixed length RISC instructions (micro-operations)Execute micro-ops on superscalar pipelinemicro-ops may be executed out of orderCommit results of micro-ops to register set in original program flow orderOuter CISC shell with inner RISC coreInner RISC core pipeline at least 20 stagesSome micro-ops require multiple execution stagesLonger pipelinec.f. five stage pipeline on x86 up to PentiumPentium 4 PipelinePentium 4 Pipeline Operation (1)Pentium 4 Pipeline Operation (2)Pentium 4 Pipeline Operation (3)Pentium 4 Pipeline Operation (4)Pentium 4 Pipeline Operation (5)Pentium 4 Pipeline Operation (6)PowerPCDirect descendent of IBM 801, RT PC and RS/6000All are RISCRS/6000 first superscalarPowerPC 601 superscalar design similar to RS/6000Later versions extend superscalar conceptPowerPC 601 General ViewPowerPC 601 Pipeline StructurePowerPC 601 PipelineRequired ReadingStallings chapter 14Manufacturers web sitesIMPACT web siteresearch on predicated execution

Các file đính kèm theo tài liệu này:

  • pptch_14_4034_9989.ppt
Tài liệu liên quan