Like a uniprocessor operating system Manages multiple CPUs transparently to the user Each processor has its own hardware cache – Maintain consistency of cached data
24 trang | Chia sẻ: nguyenlam99 | Ngày: 07/01/2019 | Lượt xem: 1243 | Lượt tải: 0
The processes are organized as a ring – Step 1: Initially, each process is given 1 row of the matrix A and 1 column of the matrix B – Step 2: Each process uses vector multiplication to get 1 element of the product matrix C. – Step 3: After a process has used its column of matrix B, it fetches the next column of B from its successor in the ri...
23 trang | Chia sẻ: nguyenlam99 | Ngày: 07/01/2019 | Lượt xem: 1456 | Lượt tải: 0
Step (a) – Each processor is allocated with its share of values Step (b) – Each processor computes the sum of its local elements Step (c) – The prefix sums of the local sums are computed and distributed to all processor
30 trang | Chia sẻ: nguyenlam99 | Ngày: 07/01/2019 | Lượt xem: 1412 | Lượt tải: 0
Message-passing model – More flexible than the data-parallel model – Lacks support for the work pool paradigm and applications that need to manage a global data structure – Be widely-accepted – Exploit large-grain parallelism and can be executed on machines with native shared-variable model (multiprocessors: DSMs, PVPs, SMPs) Shared-varia...
28 trang | Chia sẻ: nguyenlam99 | Ngày: 07/01/2019 | Lượt xem: 1417 | Lượt tải: 0
Tasks can be formed into groups Tasks in a group can be scheduled in any of the following ways: – A task can be scheduled or preempted in the normal manner – All the tasks in a group are scheduled or preempted simultaneously – Tasks in a group are never preempted. In addition, a task can prevent its preemption irrespective of the sched...
26 trang | Chia sẻ: nguyenlam99 | Ngày: 07/01/2019 | Lượt xem: 1154 | Lượt tải: 0
2k nodes form a k-dimensional hypercube Nodes are labeled 0, 1, 2, , 2k-1 Two nodes are adjacent if their labels differ in exactly one bit position Diameter=k Bisection width= 2k-1 Number of edges per node is k Length of the longest edge: increasing
21 trang | Chia sẻ: nguyenlam99 | Ngày: 07/01/2019 | Lượt xem: 1204 | Lượt tải: 0
Parallelizing a code does not always result in a speedup; sometimes it actually slows the code down! This can be due to a poor choice of algorithm or to poor coding The best possible speedup is linear, i.e. it is proportional to the number of processors: T(N) = T(1)/N where N = number of processors, T(1) = time for serial run. A code that...
19 trang | Chia sẻ: nguyenlam99 | Ngày: 07/01/2019 | Lượt xem: 1183 | Lượt tải: 0
MPP (Massively Parallel Processing) – Total number of processors > 1000 Cluster – Each node in system has less than 16 processors. Constellation – Each node in system has more than 16 processors
37 trang | Chia sẻ: nguyenlam99 | Ngày: 07/01/2019 | Lượt xem: 1241 | Lượt tải: 0
Proposed by Kai Hwang & Zhiwei Xu Similar to the BSP: – A parallel program: sequence of phases – Next phase cannot begin until all operations in the current phase have finished – Three types of phases: » Parallelism phase: the overhead work involved in process management, such as process creation and grouping for parallel processing » Co...
22 trang | Chia sẻ: nguyenlam99 | Ngày: 07/01/2019 | Lượt xem: 1191 | Lượt tải: 0
Description – Applies a reduction operation to the vector sendbuf over the set of processes specified by communicator and places the result in recvbuf on root – Both the input and output buffers have the same number of elements with the same type – Users may define their own operations or use the predefined operations provided by MPI Pred...
63 trang | Chia sẻ: nguyenlam99 | Ngày: 07/01/2019 | Lượt xem: 1400 | Lượt tải: 0