Driller for pipelining & ILP !!

wood · November 5, 2003

Hey... some of them are damn easy... but that's what I'm afraid of... get all the difcicult ones and make hasty decision on easy ones. So, let's go.

Q1) What are the limits on how much a processor's performance can be improved using pipelining??

Q2) Given an unpipelined processor with a 10ns cycle time and pipeline latches with 0.5ns latency:

(a) what are the cycle times of pipelined versions of the processor with 2, 4, 8, and 16 stages

(b) how many stages of pipelining are required to achieve a cycle time of 2ns? 1ns?

Q3) Consider this instruction sequence:

DIV r2, r5, r8
SUB r9, r2, r7
ASH r5, r14, r6
MUL r11, r9, r5
BEQ r10, #0, r12
OR  r8, r15, r2

(a) Identy all of the RAW hazards

(b) Identify all of the WAR hazards

Q4) Suppose the branch frequencies (as percentages of all instructions) are as follows:

15% - Conditional branches

1% - Jumps and calls

60% - Taken condiitional branches

We are examining a four-deep pipeline where the branch is resolved at the end of the second cycle for unconditional branches and at the end of the third cycle for conditional branches. Assuming that only the first pipe stage can always be done independent of whether the branch goes and ignoring other pipeline stalls, how much faster would the machine be without any branch hazards?

Q5) Here is an unusual loop. First, list the dependences and then rewrite the loop so that it is parallel.

for( i=1; i    a[i] = b[i] + c[i];     /* S1 */
   b[i] = a[i] + d[i];     /* S2 */
   a[i+1] = a[i] + e[i];   /* S3 */
}

AlbaLed · November 6, 2003

Q1) What are the limits on how much a processor's performance can be improved using pipelining??

Hey, if latches were free (i.e. no latency at all) we could have a processor with googol stage pipeline, but that is not the case on earth so here is the equation

Cycle time = longest stage time + latch time

Hazards, branches are other barriers, not physical though, logical.

Q2) Given an unpipelined processor with a 10ns cycle time and pipeline latches with 0.5ns latency:

(a) what are the cycle times of pipelined versions of the processor with 2, 4, 8, and 16 stages

2 stages

Cycle time = (unpiped cycl. tyme)/stage # + latch time = 10/2 + 0.5 = 5.5ns

Similarly

4 stage pipe. cycle time = 3ns

8 stage pipe. cycle time = 1.75ns

16 stage pipe. cycle time = 1.125ns

(b) how many stages of pipelining are required to achieve a cycle time of 2ns? 1ns?

cycle time = 2ns

Cycle time = (unpiped cycl. tyme)/stage # + latch time

2 = 10/stages # + 0.5

stages = 10/1.5 = 7 stages (I think we should round up)

7 stages -> 1.9 ns cycle time

minumum cycle time = (total time thru pipeline)/(stage number)

min cycl. = [10 + (latches * latch latency)]/4 = 1

4 stages = 3 latches ( stage(-), latch(|) -|-|-|-)

min cycl. = 11.5/4 = 2.87 ns

Not really sure tho!!!

Q3) Consider this instruction sequence:

DIV r2, r5, r8

SUB r9, r2, r7

ASH r5, r14, r6

MUL r11, r9, r5

BEQ r10, #0, r12

OR r8, r15, r2

(a) Identy all of the RAW hazards

(b) Identify all of the WAR hazards

assumming

operation result, operand1, operand2

DIV r2, r5, r8

SUB r9, r2, r7

RAW hazard

ASH r5, r14, r6

MUL r11, r9, r5

RAW hazard

SUB r9, r2, r7

ASH r5, r14, r6

MUL r11, r9, r5

RAW hazard (depending on how deep is the pipeline)

DIV r2, r5, r8

SUB r9, r2, r7

ASH r5, r14, r6

WAR hazard

Don't think there is any WAW hazards

BEQ r10, #0, r12

OR r8, r15, r2

Control hazard (if BEQ = branch on equal)

wood · November 6, 2003

Good job AlbaLed. All of them are right !! Yes, you should round up.

On Q2) b), you'll get 20 stages for 1ns.

I just posted the missing instruction sequence! :)

Wood

nonevent99 · November 6, 2003

Wow, you guys know a lot. What's a latch frequency, and how does it figure in?

AlbaLed · November 6, 2003

latch frequency ?? where did you see that ???

Latch latency, maybe ??

Between each stage in the pipeline there is latches (small buffers) that hold the output of stage n-1 for a fraction of time untill stage n reads it in. The reading and writing of these latches consumes time pure overhead. (no free lunch !!!! eeeeveeeeeerrrrr!!!!!!!! )

wood · November 6, 2003

I believe you meant latch latency, right?

Between each stage of the pipeline, there are latches, which generate latency. These latencies are usually small, but can become significant as the number of stages grows.

wood · November 6, 2003

Originally posted by AlbaLed
Q3) Consider this instruction sequence:

DIV r2, r5, r8
SUB r9, r2, r7
ASH r5, r14, r6
MUL r11, r9, r5
BEQ r10, #0, r12
OR r8, r15, r2

(a) Identy all of the RAW hazards
(b) Identify all of the WAR hazards
© Identify all of the WAW hazards
© Identify all of the control hazards

assumming
operation result, operand1, operand2

DIV r2, r5, r8
SUB r9, r2, r7
RAW hazard

ASH r5, r14, r6
MUL r11, r9, r5
RAW hazard

SUB r9, r2, r7
ASH r5, r14, r6
MUL r11, r9, r5
RAW hazard (depending on how deep is the pipeline)

DIV r2, r5, r8
SUB r9, r2, r7
ASH r5, r14, r6
WAR hazard

Don't think there is any WAW hazards

BEQ r10, #0, r12
OR r8, r15, r2
Control hazard (if BEQ = branch on equal)

Right !!

You missed these though:

DIV r2, r5, r8

OR r8, r15, r2

RAW ! You don't know how deep the pipeline is...

DIV r2, r5, r8

OR r8, r15, r2

WAR !

AlbaLed · November 6, 2003

Very much correct wood, thank you !!!!!!!!!!

nonevent99 · November 6, 2003

Oh, those things are called "latches". Ok. Now your guys earlier posts make a lot more sense.

Originally posted by AlbaLed

latch frequency ?? where did you see that ???

Latch latency, maybe ??

Between each stage in the pipeline there is latches (small buffers) that hold the output of stage n-1 for a fraction of time untill stage n reads it in. The reading and writing of these latches consumes time pure overhead. (no free lunch !!!! eeeeveeeeeerrrrr!!!!!!!! )

ic10503 · April 7, 2010

Can anyone post the answers to problems 4 and 5 please?

eng muna · May 11, 2011

file:///C:/Users/user/AppData/Local/Temp/moz-screenshot-5.png[ATTACH]6065[/ATTACH]

Good Luck:grad:

eng muna · May 11, 2011

solution for Q5

the solution in the picture

and i upload the file that contain the solution

you can find it

file:///C:/Users/user/AppData/Local/Temp/moz-screenshot-6.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-7.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-8.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-9.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-10.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-11.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-12.png[ATTACH=CONFIG]6066[/ATTACH]

good luck

eng muna · May 11, 2011

please I need a solution for this question

Determine the improvement from branch folding for unconditional branches.

Assume a 90% hit rate, a base CPI without unconditional branch stalls of 1, and an unconditional

branch frequency of 5%. How much improvement is gained by this

enhancement versus a processor whose effective CPI is 1.1?

eng muna · May 12, 2011

please I need a solution for this problem

Determine the improvement from branch folding for unconditional branches.

Assume a 90% hit rate, a base CPI without unconditional branch stalls of 1, and an unconditional

branch frequency of 5%. How much improvement is gained by this

enhancement versus a processor whose effective CPI is 1.1?

Driller for pipelining & ILP !!

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation