Khoa học máy tính - Chapter 10: Synchronization and scheduling in multiprocessor operating systems

Multiprocessor OS algorithms must be scalable Use of special kinds of locks: Spin locks and sleep locks Important scheduling concepts in multiprocessor OSs: Affinity scheduling Coscheduling Process shuffling

30 trang | Chia sẻ: nguyenlam99 | Lượt xem: 1078 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Khoa học máy tính - Chapter 10: Synchronization and scheduling in multiprocessor operating systems, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Chapter 10Synchronization and Scheduling in Multiprocessor Operating SystemsCopyright © 2008Operating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*IntroductionArchitecture of Multiprocessor SystemsIssues in Multiprocessor Operating SystemsKernel StructureProcess SynchronizationProcess SchedulingCase StudiesOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Architecture of Multiprocessor SystemsPerformance of uniprocessor systems depends on CPU and memory performance, and CachesFurther improvements in system performance can be obtained only by using multiple CPUsOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Architecture of Multiprocessor Systems (continued)Operating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Architecture of Multiprocessor Systems (continued)Use of a cache coherence protocol is crucial to ensure that caches do not contain stale copies of dataSnooping-based approach (bus interconnection)CPU snoops on the bus to analyze traffic and eliminate stale copiesWrite-invalidate variantAt a write, CPU updates memory and invalidates copies in other cachesDirectory-based approachDirectory contains information about copies in cachesTLB coherence is an analogous problemSolution: TLB shootdown actionOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Architecture of Multiprocessor Systems (continued)Multiprocessor Systems are classified according to the manner of associating CPUs and memory unitsUniform memory access (UMA) architecturePreviously called tightly coupled multiprocessorAlso called symmetrical multiprocessor (SMP)Examples: Balance system and VAX 8800Nonuniform memory access (NUMA) architectureExamples: HP AlphaServer and IBMNUMA-QNo-remote-memory-access (NORMA) architectureExample: Hypercube system by IntelIs actually a distributed system (discussed later)Operating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Architecture of Multiprocessor Systems (continued)Operating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Operating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*SMP ArchitecturePopularly use a bus or a cross-bar switch as the interconnection networkOnly one conversation can be in progress over the bus at any time; other conversations are delayedCPUs face unpredictable delays in accessing memoryBus may become a bottleneckWith a cross-bar switch, performance is betterSwitch delays are also more predictableCache coherence protocols add to the delaysSMP systems do not scale well beyond a small number of CPUsOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*NUMA ArchitectureActual performance of a NUMA system depends on the nonlocal memory accesses made by processesOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Issues in Multiprocessor Operating Systems Synchronization and scheduling algorithms should be scalable, so that system performance does not degrade with a growth in its sizeOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Kernel StructureKernel of a multiprocessor OS (SMP architecture) is called an SMP kernelAny CPU can execute code in the kernel, and many CPUs could do so in parallelBased on two fundamental provisions:Kernel is reentrantCPUs coordinate their activities through synchronization and interprocessor interruptsOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Kernel Structure: SynchronizationMutex locks for synchronizationLocking can be coarse-grained or fine-grainedTradeoffs: simplicity vs. loss of parallelismDeadlocks are an issue in fine-grained locking Parallelism can be ensured without substantial locking overhead:Use of separate locks for kernel functionalitiesPartitioning of the data structures of a kernel functionalityOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Kernel Structure: Heap ManagementParallelism in heap management can be provided by maintaining several free listsLocking is unnecessary if each CPU has its own free listWould degrade performanceAllocation decisions would not be optimalAlternative: separate free lists to hold free memory areas of different sizesCPU locks an appropriate free listOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Kernel Structure: SchedulingSuffers from heavy contention for mutex locks Lrq and Lawt because every CPU needs to set/release these locks while schedulingAlternative: Partition processes into subsets and entrust each subset to a CPU for schedulingFast scheduling but suboptimal performanceAn SMP kernel provides graceful degradationOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Kernel Structure: NUMA KernelCPUs in NUMA systems have different memory access times for local and nonlocal memoryEach node in a NUMA system has its own separate kernelExclusively schedules processes whose address spaces are in local memory of the nodeConcept can be generalized: An application region ensures good performance of an application. It hasA resource partition with one or more CPUs An instance of the kernelOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Process SynchronizationOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Process Synchronization (continued)Queued locks may not be scalableIn NUMA, spin locks may lead to lock starvationSleep locks may be preferred to spin locks if the memory or network traffic densities are highOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Special Hardware for Process SynchronizationThe Sequent Balance system uses a special bus called system link and interface controller (SLIC) for synchronizationSpecial 64-bit register in each CPU in the systemEach bit implements a spin lock using SLICSpinning doesn’t generate memory/network trafficOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*A Scalable Software Scheme for Process SynchronizationScheme for process synchronizationNUMA and NORMA architecturesScalable performanceMinimizes synchronization traffic to nonlocal memory units (NUMA) and over network (NORMA)Operating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Process Synchronization (continued)Scheduling aware synchronizationAdaptive lockA process waiting for this lock spins if holder of the lock is scheduled to run in parallelOtherwise, the process is preempted and queued as in a queued lockOperating Systems, by Dhananjay Dhamdhere*Operating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Process SchedulingCPU scheduling decisions affect performanceHow, when and where to schedule processesAffinity scheduling: schedule a process on a CPU where it has executed in the pastGood cache hit ratioInterferes with load balancing across CPUsIn SMP kernel CPUs can perform own schedulingPrevents kernel from becoming bottleneckLeads to scheduling anomaliesCorrecting requires shuffling of processesOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Example: Process Shuffling in an SMP KernelProcess shuffling can be implemented by using the assigned workload table AWT and the interprocessor interrupt (IPI)However, it leads to high scheduling overheadEffect is more pronounced in a system containing a large number of CPUsOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Process Scheduling (continued)Processes of an application should be scheduled on different CPUs at the same time if they use spin locks for synchronizationCalled coscheduling or gang schedulingA different approach is required when processes exchange messages by using a blocking protocolIn some situations, special efforts should be made not to schedule such processes in same time sliceOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Case StudiesMachLinuxSMP Support in WindowsOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*MachMach OS implements scheduling hintsThread issues hint to influence processor schedulingFor example, a hands-off hint to relinquish CPU in favor of a specific threadOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*LinuxMultiprocessing support introduced in 2.0 kernelCoarse-grained locking was employedGranularity of locks was made finer in later releasesKernel was still nonpreemptible until 2.6 kernelKernel provides:Spin locks for locking of data structuresReader–writer spin locksSequence lockPer-CPU data structures to reduce lock contentionOther features: hard and soft affinity, load balancingOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*SMP Support in WindowsA hyperthreaded CPU is considered to be several logical processorsSpin locks provide mutual exclusion over kernel data A thread holding a spinlock is never preemptedQueued spinlock uses a scalable software implementation schemeUses many free lists of memory for parallel accessProcess default processor affinity and thread processor affinity together define thread affinity setIdeal processor defines hard affinity for a threadUses both hard and soft affinityOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*SummaryMultiprocessor OS exploits multiple CPUs in computer to provide high throughput (system), computation speedup (application), and graceful degradation (of OS, when faults occur)Classification of uniprocessorsUniform memory architecture (UMA)Also called Symmetrical multiprocessor (SMP)Nonuniform memory architecture (NUMA)OS efficiently schedules user processes in parallelIssues: kernel structure and synchronization delaysOperating Systems, by Dhananjay Dhamdhere Copyright © 200810.*Operating Systems, by Dhananjay Dhamdhere*Summary (continued)Multiprocessor OS algorithms must be scalableUse of special kinds of locks:Spin locks and sleep locksImportant scheduling concepts in multiprocessor OSs:Affinity schedulingCoschedulingProcess shuffling

Các file đính kèm theo tài liệu này:

chapter_10_735.ppt