Khoa học máy tính - Chapter 20: Distributed file systems
Transparency concerns association between path name of a file and location of the file
File sharing semantics may differ between DFSs:
Unix semantics
Session semantics
Transaction semantics (atomic transactions)
Stateless server design provides high availability
Notion of a hint used to improve performance
DFS uses file caching to improve performance
Cache coherence techniques are needed
26 trang |
Chia sẻ: nguyenlam99 | Lượt xem: 927 | Lượt tải: 0
Bạn đang xem trước 20 trang tài liệu Khoa học máy tính - Chapter 20: Distributed file systems, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
Chapter 20Distributed File SystemsCopyright © 20081Operating Systems, by Dhananjay Dhamdhere*IntroductionDesign Issues in Distributed File SystemsTransparencySemantics of File SharingFault ToleranceDFS PerformanceCase Studies2Operating Systems, by Dhananjay Dhamdhere*Design Issues in Distributed File Systems3Operating Systems, by Dhananjay Dhamdhere*Overview of DFS OperationRemote file processing modelFile server agent and client agent are analogous to RPC’s stub processesFor efficiency, the client agent and the cache manager are typically rolled into a single unit4Operating Systems, by Dhananjay Dhamdhere*TransparencyIn a conventional file system, a user identifies a file through a path nameUser is aware that file belongs in a specific directory, but is not aware of its location in the systemLocation info field of the file’s directory entry indicates the file’s location on diskLocation transparency can be provided in a DFS through a similar mechanismLocation info: (node id, location)Location independence requires information in location info field to vary dynamically5Operating Systems, by Dhananjay Dhamdhere*Semantics of File SharingSemantics determine manner in which effect of file manipulations performed by concurrent users of a file are visible to one another6Operating Systems, by Dhananjay Dhamdhere*Semantics of File Sharing (continued)A session consists of some clients of a file that are located in the same node of a systemProblem with session semantics: poor portabilitySession semantics are easy to implement in a DFS employing file cachingFile changes are not visible to clients in other nodes7Operating Systems, by Dhananjay Dhamdhere*Fault ToleranceFile system reliability has several facets:A file must be robust, recoverable, availableRobustness is achieved using techniques for reliable storage of dataRobustness and recoverability depend on how files are stored and backed up, respectivelyAvailability depends on how files are opened and accessedOnly defense against client node crashes is use of transaction semantics in file server8Operating Systems, by Dhananjay Dhamdhere*Fault Tolerance (continued)9Operating Systems, by Dhananjay Dhamdhere*AvailabilityFile is available if a copy can be opened and accessed by clientAbility to open file depends on path name resolutionAccess requires functional client and server nodesAn anomalous situation may arise when path names span many nodesIf a node in path crashes, file operation will fail even if the node that contains the file has not crashedSolution: cached directoriesFile replication is transparent to clientsUpdating techniques: 2PC, use of primary copies10Operating Systems, by Dhananjay Dhamdhere*Client and Server Node FailuresFile server can maintain FCBs and OFT in memoryStateful designGood performanceProblems in event of client and server crashesSolution: client and file server share a virtual circuitVirtual circuit “owns” the file processing actions and resources like file server metadataActions and resources become orphans after crashActions are rolled back and metadata destroyedClient–server protocol implementing transaction semantics may be used to ensure this11Operating Systems, by Dhananjay Dhamdhere*Stateless File ServersFile server does not maintain state information about file processing activityClient must:Keep state information about file processing activityProvide all relevant information in a file system callread (“alpha”, , );Many actions traditionally performed only at file open time are repeated at every file operationIf file server crashes, time-outs and retransmissions occur in clientCannot employ file caching12Operating Systems, by Dhananjay Dhamdhere*DFS PerformanceDFS design is scalable if DFS performance doesn’t degrade with increase in size of distributed system13Operating Systems, by Dhananjay Dhamdhere*Efficient File AccessInherent efficiency of file access depends on how the operation of a file server is structuredTwo server structures that provide efficient file access:Multithreaded file serverHint-based file serverState information is used as a hintServer operation is stateless if hint is not available14Operating Systems, by Dhananjay Dhamdhere*File CachingFile cache and copy of file on disk in server node form a memory hierarchyOperation of the file cache and its benefits are analogous to those of a CPU cacheChunks of file data are loaded from the file server into the file cacheStudies of file size distributions indicate small average file sizeWhole-file caching is feasibleFile server may use a separate attributes cache15Operating Systems, by Dhananjay Dhamdhere*File Caching (continued)Key issues:Location of the file cache: memory or diskFile updating policy: write-through or delayed writeCache validation policy: client- or server- initiatedChunk size: large or small? Fixed or variable?16Operating Systems, by Dhananjay Dhamdhere*ScalabilityDFS scalability achieved through techniques that localize most data traffic generated by file processing activities within clustersClusters typically represent subnets like high-speed LANsAn increase in the number of clusters does not lead to degradation of performanceIt does not add much network traffic17Operating Systems, by Dhananjay Dhamdhere*Case StudiesSun Network File SystemAndrew and Coda File SystemsGPFSWindows18Operating Systems, by Dhananjay Dhamdhere*Sun Network File SystemVFS implements mount protocol and creates a system-wide unique vnode for each fileNFS layer interacts with remote node containing file through NFS protocol19Operating Systems, by Dhananjay Dhamdhere*Sun Network File System (continued)Several techniques to improve performanceA directory names cache is used in each client nodeA file attributes cache caches inode informationCached attributes are discarded after 3 seconds for files and after 30 seconds for directoriesFile blocks cache is the conventional file cacheServer uses large (8 Kbytes) data blocksCache validation performed through timestamps associated with each file, and cache blockFile server is statelessNeither Unix semantics nor session semantics20Operating Systems, by Dhananjay Dhamdhere*Andrew and Coda File SystemsTargeted at gigantic distributed systemsAll clients have an identical shared name spaceIs location transparent in natureImplemented by dedicated servers (Vice)Clusters localize file processing activitiesTraffic within cluster reduced by caching entire file on local diskA volume typically contains files of a single user64 KB chunks (size adapted on a per-client basis)User process called Venus performs open/close21Operating Systems, by Dhananjay Dhamdhere*Andrew and Coda File Systems (continued)Server-initiated cache validation using callbacksPath name resolution performed on a component-by-component basisVenus maintains a mapping cacheFile servers are multithreadedClient–server communication uses RPCsTwo features to achieve high availability:Replication and disconnected operationRead one, write all policySupports hoarding of files22Operating Systems, by Dhananjay Dhamdhere*GPFSGeneral parallel file system: high-performance shared-disk file systemFor large computing clusters operating under LinuxUses data striping across all disks in clusterA large-size block (strip) used to minimize seek overhead during a file read/writeA smaller subblock is used for small filesLocking used to maintain consistency of file dataLock granularity is as coarse as possible, but as fine as necessaryCentralized lock manager and few distributed lock managers23Operating Systems, by Dhananjay Dhamdhere*GPFS (continued)Notion of lock tokens to reduce latency and overhead of lockingRace conditions may arise over metadata of a fileSolution: one of the nodes is designated as the metanode for the file; it performs file updatesCentral allocation manager partitions free space map and gives one partition to each nodeEach node writes a separate journal for recoveryIf network is partitioned, only nodes in the majority partition can perform file processing at any time24Operating Systems, by Dhananjay Dhamdhere*WindowsWindows Server 2003 provides two features for data replication and data distribution:Remote differential compression (RDC)DFS namespacesReplication organized using notion of a replication groupDFS namespace is created by a system administratorOther key concepts: referrals and hot standbys25Operating Systems, by Dhananjay Dhamdhere*SummaryTransparency concerns association between path name of a file and location of the fileFile sharing semantics may differ between DFSs:Unix semanticsSession semanticsTransaction semantics (atomic transactions)Stateless server design provides high availabilityNotion of a hint used to improve performanceDFS uses file caching to improve performanceCache coherence techniques are needed26
Các file đính kèm theo tài liệu này:
- chapter_20_9896.ppt