## M.TECH/CSE/1<sup>ST</sup> SEM /CSEN 5105/2015 2015 # Advanced Computer Architecture (CSEN 5105) 20 Time Allotted: 3 hrs Full Marks: 70 Figures out of the right margin indicate full marks. Candidates are required to answer Group A and any 5 (five) from Group B to E, taking at least one from each group. Candidates are required to give answer in their own words as far as practicable. | ·a | indidates are required to g | ive answer in | their own words as ta | r as practicable. | | | | |----|---------------------------------------------------------------------------------|--------------------|-----------------------------------------|--------------------|--|--|--| | | | | oup – A | M. Donald and | | | | | | (1 | Multiple Choice | e Type Questions) | | | | | | 1. | <ul> <li>Choose the correct alternat</li> </ul> | ives for the foll | owing: | 10 x 1=10 | | | | | | (i) CC-NUMA stands for | | | | | | | | | (a) Cache coherent N (c) Cache Co-ordinate | | (b) Cyclical Co-or<br>(d) None of the a | | | | | | | (ii) Which of the following | is false regardi | ng parallel architectures | ? | | | | | | | | uniformly shared by pro- | | | | | | | (b) In NUMA, memo | ry access time | varies with the location of | of the memory word | | | | | | (c) In SuperScalar machines, multiple pipelined functional units may be present | | | | | | | | | (d) Multicomputers | can have share | d memory systems. | | | | | | | (iii) On a dynamic instruct | ion pipeline | | | | | | | | (a) throughput is alw | yays 1 instruction | on per cycle | | | | | | | (b) throughput is alw | ays > 1 instruc | tion per cycle | | | | | | | (c) throughput is alw | ays < 1 instruc | tion per cycle | | | | | | | (d) two initiations ca | n happen at the | same time. | | | | | | | (iv) An initial collision vec<br>table has three rows with 3 | | | | | | | | | pipeline is | 0.2.2.2 | (1) 22 | (1) 2.4 | | | | | | (a) 2,2 | (b) 2,3 | (c) 3,3 | (d) 3,4. | | | | | | (v) Which if the following i | s not a feature | of Vector Processor? | | | | | | | (a) Register-to-register | | (b) Memory-to-memory | | | | | | | (c) Memory-to-regist | er | (d) None of the a | above. | | | | | | (vi) One example of an MIS | SD architecture | is: | | | | | | | (a) Message passing | | (b) Superpipeline | ed | | | | | | (c) VLIW | | (d) Systolic Array | | | | | | | | | | 15 5 - 35 - 120 | | | | | M.TECH/CSE | INST SE | MICSEN | 5105/2015 | |----------------|---------|-----------|-----------| | IVI. I ECH/COL | // SE | INI ICSEN | 2102/2012 | - (vii) Which of the following is true? - (a) RAW,WAR and WAW hazards are possible on any pipeline - (b) WAR and WAW hazards are more natural as these refer to flow dependencies - (c) Use of multiple functional units can reduce the possibility of structural hazards - (d) WAR and WAW hazards can happen even if the pipeline is not a multiple issue. - (viii) Assume an 8-stage instruction pipeline. The probability of a branch instruction in an instruction stream is 0.2. There is a 50% chance of a branch being taken. What is the degradation in pipeline performance? (a) 40% (b) 41% (c) 45% (d) 46%. (ix) Suppose the architects of a new machine are considering the options of either using the Snooping protocol or Directory Based protocol while maintaining Cache Coherency in the system. One critical decision parameter is the number of messages generated in the system while a read or write memory request is initiated by various processors for Snooping (S) and Directory Based (D) protocols. Which of the following is always true regarding S and D? (a) S > D (b) S < D (c) S = D (d) No definite relationship between S and D exists. (x) Suppose an algorithm has 30% component which can be parallelized. The remaining has to be executed sequentially. Then what would be the speed up if 2 parallel processors are employed? (a) 1 (b) 1.2 (c) 2 (d) Cannot be determined. #### Group - B 2.(a) For the following Reservation Table find: | | | X | | X | | | |---|---|---|---|---|---|---| | | X | | X | | X | | | X | | X | | | | X | i) Permissible latencies ii) Forbidden latencies iii) ICV iv) Then draw the state transition table and find all the permissible latency cycles. v) What is the Minimum Average Latency? (b) For any pipeline, prove that the lower bound for the minimum average latency is the maximum number of checkmarks in any row of the corresponding Reservation Table. (1.5 + 1.5 + 2 + 3 + 1) + 3 = 12 - 3.(a) For the latency cycle (1,5), find Gc, Hc, Gc mod p and Hc mod p, where the symbols have their usual meaning. - (b) Suppose you have $Hc = \{0,1,2,4,6,9,11,13,14\}$ . Find some maximally compatible sets from the above and describe the corresponding algorithms also. You may assume p=15. [Hint: Do not try to enumerate all, two to three should suffice.] ### M.TECH/CSE/1ST SEM /CSEN 5105/2015 (c) You are given two compatible sets {0,2,4} and {0,2,6}. Can you recreate the Reservation Table shown in Q 2(a)? $(1 \times 4) + 5 + 3 = 12$ #### Group - C 4.(a) Multiply the following two 3 X 3 matrices on a mesh network using an efficient algorithm. The complexity of your algorithm should be O(n). $$\begin{bmatrix} -2 & 3 & 4 \\ 4 & -1 & 7 \\ 3 & 6 & 8 \end{bmatrix} \qquad x \qquad \begin{bmatrix} 5 & -2 & 3 \\ 7 & 4 & -3 \\ 5 & 4 & 1 \end{bmatrix}$$ (b) For a Boolean cube consisting of 8 processing elements show how the nodes are numbered and how data can be routed between these nodes. (c) From two 3-cubes show how can you configure a 4-cube? 5 + 4 + 3 = 12 5.(a) Draw the diagram of an 8 x 8 Omega NW built with 2 x 2 switching elements. (b) What do you mean by the perfect shuffle operation? How is the Omega NW configured to implement the perfect shuffle operation? (c) Show the switching setting for routing a message from node 001 to node 100 and from node 011 to node101 simultaneously. Does blocking exist in this case? 6+3+3=12 #### Group - D 6.(a) Write the algorithm for Vertical\_Merge() using the procedures Row\_Merge() and Column\_Merge(). You do not have to write the code for Row\_Merge() and Column\_Merge() and you can assume these are supplied to you. Then show how the Vertical\_Merge() algorithm proceeds step by step for the given 4 x 4 matrix: Clearly write down any assumption you are making about the structure of the matrix. (b) Write the algorithm for Horizontal\_Merge() using the procedures Row\_Merge() and Column\_Merge() or any suitable variations of these procedures. You do not have to write the code for these procedures and you can assume these are supplied to you as in Q6 (a). Then show how Horizontal\_Merge() algorithm proceeds step by step for the given 4 x 8 matrix: ## M.TECH/CSE/1ST SEM /CSEN 5105/2015 $6 \div 6 = 12$ - 7.(a) Show that if you perform n x n matrix multiplication on a UMA multiprocessor with n number of computing elements working in parallel using a naive aigorithm, then you can achieve almost linear time improvement in speedup. - (b) Multiply the following two matrices using the Block-matrix approach to parallelmatrix multiplication approach: $\begin{bmatrix} 1 & 0 & 2 & 3 \\ 4 & -1 & 1 & 5 \\ -2 & -3 & -4 & 2 \\ -1 & 2 & 0 & 0 \end{bmatrix} \qquad x \qquad \begin{bmatrix} -1 & 1 & 2 & -3 \\ -5 & -4 & 2 & -2 \\ 3 & -1 & 0 & 2 \\ 1 & 0 & 4 & 5 \end{bmatrix}$ (c) Suppose you have nr processors at your disposal. You have an algorithm like the Matrix multiplication which has O(n3) time complexity. Can you get time complexities O(nr-3) for r > 3? What interesting event would have occurred had this been achievable? 4+6+2=12 #### Group - E 8.(a) Draw the data flow graph for the following set of instructions: 1. P = X + Y 2. Q=P/Y 3. R=P\*X 4. S=R-0 5. T=R\*P 6. U=S/T - (b) Explain the differences between a data-flow and a control-flow architecture. - (c) How can you generate the initial collision matrix in a dynamic pipleline? Explain with the example given in the ReservationTable in Q. No. 2(a) and another one given below: | Y | Y | | | | 1200 | Y | Y | |---|---|---|---|-----|------|---|---| | | | Y | | Y | | | | | | | | Y | 100 | Y | | | | | | | | | | | | 5 + 3 + 4 = 12 - 9.(a) Explain what problem will occur in the following situation of a multiprocessor machine consisting of two CPUs A and B each with its own cache: - i) A reads from memory location X then ii) B reads from X then iii) A updates value of X. (Please state any assumption you made for analysing the above situation.) - (b) Explain the following terms formally in the context of Cache Coherence: i) Coherence ii) Consistency iii) Write serialization. (c) Briefly explain the role of the Snooping protocol to enforce cache coherency. Also mention its main differences with the Directory-based protocol. 4+3+(3+2)=12 **CSEN 5105**