Phần cứng - Chapter 8: Multiprocessors shared memory architectures

8.1 Introduction – the big picture 8.3 Centralized Shared Memory Architectures We’ve looked at what happens to caches when we have multiple processors or devices looking at memory.

24 trang | Chia sẻ: nguyenlam99 | Lượt xem: 685 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Phần cứng - Chapter 8: Multiprocessors shared memory architectures, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Computer ArchitectureChapter 8MultiprocessorsShared Memory ArchitecturesProf. Jerry BreecherCSCI 240Fall 2003Chap. 8 - Multiprocessors*Chapter OverviewWe’re going to do only one section from this chapter, that part related to how caches from multiple processors interact with each other.8.1 Introduction – the big picture8.3 Centralized Shared Memory ArchitecturesChap. 8 - Multiprocessors*Introduction8.1 Introduction8.3 Centralized Shared Memory ArchitecturesThe Big Picture: Where are We Now? The major issue is this:We’ve taken copies of the contents of main memory and put them in caches closer to the processors. But what happens to those copies if someone else wants to use the main memory data?How do we keep all copies of the data in synch with each other?Chap. 8 - Multiprocessors*The Multiprocessor PictureProcessor/MemoryBusPCI BusI/O BussesExample: Pentium System OrganizationChap. 8 - Multiprocessors*MemoryDisk & other IOShared Memory MultiprocessorRegistersCachesProcessorRegistersCachesProcessorRegistersCachesProcessorRegistersCachesProcessorChipsetMemory: centralized with Uniform Memory Access time (“uma”) and bus interconnect, I/OExamples: Sun Enterprise 6000, SGI Challenge, Intel SystemPro Chap. 8 - Multiprocessors*Several processors share one address spaceconceptually a shared memoryoften implemented just like a multicomputeraddress space distributed over private memoriesCommunication is implicitread and write accesses to shared memory locationsSynchronizationvia shared memory locationsspin waiting for non-zerobarriersPMNetwork/BusPPConceptual ModelShared Memory MultiprocessorChap. 8 - Multiprocessors*Message Passing MulticomputersComputers (nodes) connected by a networkFast network interfaceSend, receive, barrierNodes not different than regular PC or workstationCluster conventional workstations or PCs with fast network cluster computingBerkley NOWIBM SP2PMPMPMNetworkNodeChap. 8 - Multiprocessors*Large-Scale MP DesignsMemory: distributed with nonuniform memory access time (“numa”) and scalable interconnect (distributed memory)1 cycleLow LatencyHigh Reliability40 cycles100 cyclesChap. 8 - Multiprocessors*Shared Memory ArchitecturesIn this section we will understand the issues around:Sharing one memory space among several processors.Maintaining coherence among several copies of a data item.8.1 Introduction8.3 Centralized Shared Memory ArchitecturesChap. 8 - Multiprocessors*The Problem of Cache CoherencyCPUCache100200A’B’Memory100200ABI/Oa) Cache and memory coherent: A’ = A, B’ = B.CPUCache550200A’B’Memory100200ABI/OOutput of A gives 100b) Cache and memory incoherent: A’ ^= A.CPUCache100200A’B’Memory100440ABI/OInput 440 to Bc) Cache and memory incoherent: B’ ^= B.Shared Memory ArchitecturesChap. 8 - Multiprocessors*Some Simple DefinitionsShared Memory ArchitecturesMechanismHow It WorksPerformanceCoherency IssuesWrite BackWrite ThroughWrite modified data from cache to memory only when necessary.Write modified data from cache to memory immediately.Good, because doesn’t tie up memory bandwidth.Not so good - uses a lot of memory bandwidth.Can have problems with various copies containing different values.Modified values always written to memory; data always matches.Chap. 8 - Multiprocessors*What Does Coherency Mean?Informally:“Any read must return the most recent write”Too strict and too difficult to implementBetter:“Any write must eventually be seen by a read”All writes are seen in proper order (“serialization”)Two rules to ensure this:“If P writes x and P1 reads it, P’s write will be seen by P1 if the read and write are sufficiently far apart”Writes to a single location are serialized: seen in one order Latest write will be seenOtherwise could see writes in illogical order (could see older value after a newer value)Shared Memory ArchitecturesChap. 8 - Multiprocessors*There are Different Types of Memory In The CacheWhat kinds of memory are there in the cache?Shared Memory ArchitecturesTest_and_set(lock) shared_data = xyz;Clear(lock);TYPEShared?WritableHow Kept CoherentCodeSharedNoNo Need.Private DataExclusiveYesWrite BackShared DataSharedYesWrite Back *Interlock DataSharedYesWrite Through *** Write Back gives good performance, but if you use write through here, there will be performance degradation.** Write through here means the lock state is seen immediately. You want a write through here to flush the cache.Chap. 8 - Multiprocessors*Potential HW Coherency SolutionsSnooping Solution (Snoopy Bus):Send all requests for data to all processorsProcessors snoop to see if they have a copy and respond accordingly Requires broadcast, since caching information is at processorsWorks well with bus (natural broadcast medium)Dominates for small scale machines (most of the market)Directory-Based SchemesKeep track of what is being shared in one centralized placeDistributed memory => distributed directory for scalability(avoids bottlenecks)Send point-to-point requests to processors via networkScales better than SnoopingActually existed BEFORE Snooping-based schemesShared Memory ArchitecturesChap. 8 - Multiprocessors*An Example Snoopy ProtocolMaintained by HardwareInvalidation protocol, write-back cacheEach block of memory is in one state:Clean in all caches and up-to-date in memory (Shared)OR Dirty in exactly one cache (Exclusive)OR Not in any cachesEach cache block is in one state (track these):Shared : block can be readOR Exclusive : cache has only copy, its writeable, and dirtyOR Invalid : block contains no dataRead misses: cause all caches to snoop busWrites to clean line are treated as missesShared Memory ArchitecturesChap. 8 - Multiprocessors*Snoopy-Cache State Machine-I State machinefor CPU requestsfor each cache blockInvalidShared(read/only)Exclusive(read/write)CPU ReadCPU WriteCPU Read hitPlace read misson busPlace Write Miss on busCPU read missWrite back blockCPU WritePlace Write Miss on BusCPU Read missPlace read miss on busCPU Write MissWrite back cache blockPlace write miss on busCPU read hitCPU write hitCache BlockStateShared Memory ArchitecturesApplies to Write Back DataChap. 8 - Multiprocessors*Snoopy-Cache State Machine-IIState machinefor bus requests for each cache blockAppendix E gives details of bus requestsInvalidShared(read/only)Exclusive(read/write)Write BackBlock; (abortmemory access)Write miss for this blockRead miss for this blockWrite miss for this blockWrite BackBlock; (abortmemory access)Shared Memory ArchitecturesChap. 8 - Multiprocessors*ExampleAssumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2Processor 1Processor 2BusMemoryRemote Writeor MissWrite BackRemote Write or MissInvalidSharedExclusiveCPU Read hitRead miss on busWrite miss on busCPU WritePlace Write Miss on BusCPU read hitCPU write hitRemote Read Write BackShared Memory ArchitecturesThis is the Cache for P1.Chap. 8 - Multiprocessors*Example: Step 1InvalidSharedExclusiveWrite miss on busShared Memory ArchitecturesChap. 8 - Multiprocessors*Example: Step 2Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2InvalidSharedExclusiveCPU read hitShared Memory ArchitecturesChap. 8 - Multiprocessors*Example: Step 3Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2.InvalidSharedExclusiveRead miss on busRemote Read Write BackA1Shared Memory ArchitecturesChap. 8 - Multiprocessors*Example: Step 4Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2Remote Write InvalidSharedExclusiveA1Shared Memory ArchitecturesChap. 8 - Multiprocessors*Example: Step 5A1A1Assumes initial cache state is invalid and A1 and A2 map to same cache block,but A1 ≠ A2Shared Memory ArchitecturesChap. 8 - Multiprocessors*Summary8.1 Introduction – the big picture8.3 Centralized Shared Memory ArchitecturesWe’ve looked at what happens to caches when we have multiple processors or devices looking at memory.

Các file đính kèm theo tài liệu này:

chapter08_multiprocessors_6809.ppt