Khoa học máy tính - Chapter 18: Distributed control algorithms

Actions of a distributed control algorithms are performed in many nodes of the system Two aspects of correctness are liveness and safety Distributed system models: physical and logical Examples of distributed control algorithms: Distributed mutual exclusion: e.g., token based Distributed deadlock detection: diffusion computation Distributed scheduling (to balance load) Distributed termination: e.g., credit-based Election (highest priority process wins)

40 trang | Chia sẻ: nguyenlam99 | Lượt xem: 815 | Lượt tải: 0

Bạn đang xem trước 20 trang tài liệu Khoa học máy tính - Chapter 18: Distributed control algorithms, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên

Chapter 18Distributed Control AlgorithmsCopyright © 20081Operating Systems, by Dhananjay Dhamdhere*IntroductionOperation of Distributed Control AlgorithmsCorrectness of Distributed Control AlgorithmsDistributed Mutual ExclusionDistributed Deadlock HandlingDistributed Scheduling AlgorithmsDistributed Termination DetectionElection AlgorithmsPractical Issues in Using Distributed Control Algorithms2OS Control Functions in a Distributed EnvironmentSpecial features of distributed OS control functionsMutual exclusionInvolves synchronization of processes in different computersDeadlock handlingDeadlocks may involve use of resources in different hostsSchedulingPerform load balancing for comparable loading of computersTermination detectionCheck whether all processes of a computation, which may operate in different computers, have completedElectionElect coordinator for a privileged functionOperating Systems, by Dhananjay Dhamdhere*3Operating Systems, by Dhananjay Dhamdhere*Operation of Distributed Control Algorithms (continued)A distributed control algorithm operates in parallel with its clients, so that it can respond readily to events related to its serviceEach process has a control part and a basic part4Operating Systems, by Dhananjay Dhamdhere*Correctness of Distributed Control AlgorithmsAlgorithm correctness has two facets:Liveness: eventually performs correct actionsi.e., without indefinite delaysSafety: does not perform wrong actionsProving correctness of a distributed algorithm is a complex task5Operating Systems, by Dhananjay Dhamdhere*Correctness of Distributed Control Algorithms (continued)6Operating Systems, by Dhananjay Dhamdhere*Distributed Mutual ExclusionA Permission-Based AlgorithmToken-Based Algorithms 7Ricart and Agrawala algorithm: CS entry in FCFS orderRequires 2 x (n – 1 ) messages per CS entryOperating Systems, by Dhananjay Dhamdhere*A Permission-Based Algorithm8Operating Systems, by Dhananjay Dhamdhere*9Maekawa AlgorithmEach process has a request set of processes; it seeks the permission of only processes in the request set (Ri represents the request set of process Pi)Correctness is ensured through the following rules:For all Pi : Pi is included in RiFor all Pi, Pj: Ri ∩ Rj is non-nullThe algorithm requires 2 x √n messages per CS entryOperating Systems, by Dhananjay Dhamdhere*10Operating Systems, by Dhananjay Dhamdhere*Token-Based Algorithms for Mutual ExclusionA token represents the privilege to use a CSOnly a process possessing token can enter CSSafety of algorithm follows from this ruleLiveness: must ensure token eventually reaches a (requesting) processThese algorithms use abstract system modelsEdges represent the paths used to pass control messagesFor example:Abstract ring and tree topologies11Operating Systems, by Dhananjay Dhamdhere*An Algorithm Employing the Ring Topology12Raymond’s Token-Based AlgorithmFeatures of the algorithmThe algorithm uses an abstract inverted tree to reduce the number of messages. It has three invariantsPholder, the process holding the token, is at root of the treeEach process in the system belongs to the treeEach process other than the Pholder has only one edge, which points to its parent in the tree Thus, each process has a path that ends on PholderEach process has a local request queue When it receives a request, it puts the requestor’s id in the queueWhen it makes a request, it puts its own id in the queue Operating Systems, by Dhananjay Dhamdhere*13Uses an abstract inverted tree as the system modelOperating Systems, by Dhananjay Dhamdhere*Raymond’s AlgorithmToken is transferred to P314Operating Systems, by Dhananjay Dhamdhere*15Operating Systems, by Dhananjay Dhamdhere*Distributed Deadlock HandlingDeadlock detection, prevention, and avoidance approaches studied earlier use state informationProblems arise in extending these approaches to a distributed systemApproaches in distributed systems:Detection: centralized and distributedPreventionNo special techniques for distributed deadlock avoidance have been discussed in OS literatureModel used in this section:SISR model of resource allocation16Operating Systems, by Dhananjay Dhamdhere*Problems in Centralized Deadlock DetectionSteps in centralized deadlock detection:Collect WFGs of all nodes at a central node Superimpose them to form a merged WFGUse a conventional deadlock detection algorithmProblem: can lead to phantom deadlocksViolates safety property in deadlock detection17Operating Systems, by Dhananjay Dhamdhere*Distributed Deadlock DetectionKey issues in deadlock detection:A cycle indicates a deadlock in an SISR systemA knot indicates a deadlock in an MISR systemDistributed deadlock detection approach:Cycles and knots detected through joint actions of nodes in systemEvery node can detect and declare a deadlockTwo such algorithms:Diffusion computation-based algorithmEdge chasing algorithm18Diffusion Computation-Based Deadlock DetectionDiffusion computation: used to collect info about nodesDiffusion phaseComputation that has originated in one node, spreads to other nodesA control message called a query is sent along each edgeThe first query received by a node is called an engaging query. On receiving it, the node sends queries along all its out-edgesInformation collection phaseEach node sends information in response to each queryIt sends a dummy reply for a non-engaging queryIt collects information from all replies it received, adds its own information, and sends the result as the reply to the engaging query. We call it an engaging replyOperating Systems, by Dhananjay Dhamdhere*19Operating Systems, by Dhananjay Dhamdhere*Diffusion Computation-Based AlgorithmAlgorithm 18.4 was proposed by Chandy, Misra, and Haas (1983)Works for SISR and MISR systems P2, P3 are blocked. P1 becomes blocked and sends a query but does not receive a reply because P4 is not blocked P4 requests a resource held by P1, becomes blocked and sends a query. It would receive a reply and declare a deadlock20Operating Systems, by Dhananjay Dhamdhere*Diffusion Computation-Based Algorithm (continued)21Mitchell–Merritt Algorithmfor Distributed Deadlock DetectionIt is an edge chasing algorithm—control messages are sent over WFG edges to detect cyclesA provision is made to ensure that the cycle has not been broken before it was detectedEach process is assigned a public label and a private labelThe labels are identical when a process is createdThe public label of a process changes when it gets blocked on a resource requestIt also changes when it waits for a process having a larger public labelA wait-for edge with a specific relation between public and private labels of its source and destination processes indicates presence of a deadlockOperating Systems, by Dhananjay Dhamdhere*22Rules of the algorithm:Operating Systems, by Dhananjay Dhamdhere*Mitchell-Merritt Algorithmpublic labelprivate label23Distributed Deadlock PreventionCycles are prevented as follows:A pair (local time, node id) is used to time-stamp creation of a processWhen process Pi requests a resource allocated to Pj, time-stamps of Pi, Pj are used to decide whether Pi can wait for PjTwo approachesWait-or-diePi is allowed to wait if older than Pj; otherwise, it is killedWould-or-waitPi is allowed to wait if younger than Pj; otherwise Pj is killedA killed process retains original timestamp if restartedOperating Systems, by Dhananjay Dhamdhere*24Operating Systems, by Dhananjay Dhamdhere*Distributed Scheduling AlgorithmsA distributed scheduling algorithm balances computational loads in the nodesUses process migrationApply a threshold to decide if a node is heavily or lightly loaded25Operating Systems, by Dhananjay Dhamdhere*Distributed Scheduling Algorithms (continued)Migration may be preemptive or nonpreemptiveStability is an important issueAn unstable algorithm may lead to a situation similar to thrashingA process is transferred very often; does not make progressAlgorithms can be sender- or receiver-initiatedA heavily loaded node is a senderA lightly loaded node is a receiverA symmetrically initiated algorithm contains features of both26Operating Systems, by Dhananjay Dhamdhere*Distributed Scheduling Algorithms (continued)27Operating Systems, by Dhananjay Dhamdhere*Distributed Scheduling Algorithms (continued)28Operating Systems, by Dhananjay Dhamdhere*Distributed Termination DetectionWhen a process terminates, OS frees its resourcesThis approach is not adequate for distributed systemsProcesses of a distributed computation execute in different nodes of a distributed system. They should be terminated when all of them have completed their tasksThese processes perform work assigned to themA process is active when it is performing work, and passive when it has no work Work is assigned to a process through a messagePassive process becomes active on receiving a messageDistributed termination condition (DTC):All processes of a distributed computation are passiveNo basic messages are in transit29Operating Systems, by Dhananjay Dhamdhere*Distributed Termination Detection (continued)Credit-distribution-based termination detectionEvery process is assigned a numerical creditIt sends some of its credit in each messageWhen a process becomes passive, it sends its credit to collector processThe distributed computation is known to have terminated when credit accumulated by collector is CDiffusion computation-based termination detectionEach process that becomes passive initiates a diffusion computation to determine if DTC holds30Operating Systems, by Dhananjay Dhamdhere*Distributed Termination Detection (continued)31Operating Systems, by Dhananjay Dhamdhere*Election AlgorithmsCritical system functions are assigned to a coordinator (for the function)E.g., replacing lost token in a token-based algorithmCoordinator is typically the highest-priority process in setA process that finds that coordinator is not responding assumes it has failedInitiates election algorithmAlgorithm chooses highest-priority nonfailed process as new coordinatorThen, announces its id to all nonfailed processes32Operating Systems, by Dhananjay Dhamdhere*Election Algorithms for Unidirectional Ring TopologiesLinks in ring are assumed to be FIFO channelsAssumption: control part of failed process continues to function and forwards received messages along out-edgeElection performed by obtaining ids of all nonfailed processes and electing highest-priority processTwo types of messages:“elect me” and “new coordinator”O(n2) messages or 3n−1 (worst case) in refined version33Election AlgorithmsAlgorithm for the ring topologyProcess Pi initiates by sending (“elect me”, Pi) messageProcess Pj receiving an (“elect me”, Pi) message sends an (“elect me”, Pj) message and then forwards Pi’s messagePi receives back its own message after receiving message of every other process; it elects the highest priority process as leaderIt sends a “new coordinator” message to inform othersAlgorithm 2: Refinement of algorithm 1In Step 2, Pj sends its own message if its priority is higher than Pi’s; otherwise, it sends Pi’s messageOnly highest priority process would get back its own messageOperating Systems, by Dhananjay Dhamdhere*34Election AlgorithmBully algorithmInitiator Pi sends (“elect me”, Pi) messages to all higher priority processes and starts a time-out interval T1If a time-out occurs, it sends a “new coordinator” message to lower priority processesIf it receives a “don’t you dare” message from a higher priority process Pj, it starts another time-out interval T2If a time-out occurs, it starts a new electionIf a process Pj receives an “elect me” message from a lower priority processIt sends a “don’t you dare” message to its senderStarts a new election by sending (“elect me”, Pj) messagesRequires O(n2) messages per electionOperating Systems, by Dhananjay Dhamdhere*35Operating Systems, by Dhananjay Dhamdhere*Practical Issues in Using Distributed Control AlgorithmsResource Management Process Migration36Operating Systems, by Dhananjay Dhamdhere*Resource ManagementName server in each node must be updated when resources are addedSolution: use an arrangement of name servers as in the domain name service (DNS)Only the name server of a domain needs to be updated when a resource is added37Operating Systems, by Dhananjay Dhamdhere*Process MigrationReasons for process migration:To achieve load balancingTo reduce network traffic involved in utilizing a remote resourceTo provide availability of services when a node has to be shut down for maintenance38Process MigrationDifficultiesProcess state is distributed in various data structures of the OSProcess id’s may change due to migrationProcess id’s are used in interprocess communicationSolution: Use global process ids as in Sun clusterDelivery of messages requires a special provisionA node receiving a message would redirect it if the destination process has migrated out of itThis residual state causes poor reliabilityAlternatively, all processes may be informed when a process is migratedRequires a complex protocolOperating Systems, by Dhananjay Dhamdhere*39Operating Systems, by Dhananjay Dhamdhere*SummaryActions of a distributed control algorithms are performed in many nodes of the systemTwo aspects of correctness are liveness and safetyDistributed system models: physical and logicalExamples of distributed control algorithms:Distributed mutual exclusion: e.g., token basedDistributed deadlock detection: diffusion computationDistributed scheduling (to balance load)Distributed termination: e.g., credit-basedElection (highest priority process wins)40

Các file đính kèm theo tài liệu này:

chapter_18_7517.ppt