Jump to content
Urch Forums

Driller for pipelining & ILP !!


wood

Recommended Posts

Hey... some of them are damn easy... but that's what I'm afraid of... get all the difcicult ones and make hasty decision on easy ones. So, let's go.

 

Q1) What are the limits on how much a processor's performance can be improved using pipelining??

 

Q2) Given an unpipelined processor with a 10ns cycle time and pipeline latches with 0.5ns latency:

 

(a) what are the cycle times of pipelined versions of the processor with 2, 4, 8, and 16 stages

(b) how many stages of pipelining are required to achieve a cycle time of 2ns? 1ns?

© what is the minimum cycle time achievable with a 4-stage pipeline if additional logic is assigned to the final stage to balance the additional latency of the pipeline latches in the other stages?

 

Q3) Consider this instruction sequence:

 

DIV r2, r5, r8
SUB r9, r2, r7
ASH r5, r14, r6
MUL r11, r9, r5
BEQ r10, #0, r12
OR  r8, r15, r2

(a) Identy all of the RAW hazards

(b) Identify all of the WAR hazards

© Identify all of the WAW hazards

© Identify all of the control hazards

 

Q4) Suppose the branch frequencies (as percentages of all instructions) are as follows:

 

15% - Conditional branches

1% - Jumps and calls

60% - Taken condiitional branches

 

We are examining a four-deep pipeline where the branch is resolved at the end of the second cycle for unconditional branches and at the end of the third cycle for conditional branches. Assuming that only the first pipe stage can always be done independent of whether the branch goes and ignoring other pipeline stalls, how much faster would the machine be without any branch hazards?

 

Q5) Here is an unusual loop. First, list the dependences and then rewrite the loop so that it is parallel.

 

for( i=1; i    a[i] = b[i] + c[i];     /* S1 */
   b[i] = a[i] + d[i];     /* S2 */
   a[i+1] = a[i] + e[i];   /* S3 */
}

 

Link to comment
Share on other sites

Q1) What are the limits on how much a processor's performance can be improved using pipelining??

 

Hey, if latches were free (i.e. no latency at all) we could have a processor with googol stage pipeline, but that is not the case on earth so here is the equation

 

Cycle time = longest stage time + latch time

 

Hazards, branches are other barriers, not physical though, logical.

 

Q2) Given an unpipelined processor with a 10ns cycle time and pipeline latches with 0.5ns latency:

 

(a) what are the cycle times of pipelined versions of the processor with 2, 4, 8, and 16 stages

 

2 stages

Cycle time = (unpiped cycl. tyme)/stage # + latch time = 10/2 + 0.5 = 5.5ns

 

Similarly

4 stage pipe. cycle time = 3ns

8 stage pipe. cycle time = 1.75ns

16 stage pipe. cycle time = 1.125ns

 

 

(b) how many stages of pipelining are required to achieve a cycle time of 2ns? 1ns?

cycle time = 2ns

Cycle time = (unpiped cycl. tyme)/stage # + latch time

2 = 10/stages # + 0.5

stages = 10/1.5 = 7 stages (I think we should round up)

7 stages -> 1.9 ns cycle time

 

© what is the minimum cycle time achievable with a 4-stage pipeline if additional logic is assigned to the final stage to balance the additional latency of the pipeline latches in the other stages?

 

minumum cycle time = (total time thru pipeline)/(stage number)

min cycl. = [10 + (latches * latch latency)]/4 = 1

4 stages = 3 latches ( stage(-), latch(|) -|-|-|-)

min cycl. = 11.5/4 = 2.87 ns

 

Not really sure tho!!!

 

Q3) Consider this instruction sequence:

 

DIV r2, r5, r8

SUB r9, r2, r7

ASH r5, r14, r6

MUL r11, r9, r5

BEQ r10, #0, r12

OR r8, r15, r2

 

(a) Identy all of the RAW hazards

(b) Identify all of the WAR hazards

© Identify all of the WAW hazards

© Identify all of the control hazards

 

assumming

operation result, operand1, operand2

 

DIV r2, r5, r8

SUB r9, r2, r7

RAW hazard

 

ASH r5, r14, r6

MUL r11, r9, r5

RAW hazard

 

SUB r9, r2, r7

ASH r5, r14, r6

MUL r11, r9, r5

RAW hazard (depending on how deep is the pipeline)

 

 

DIV r2, r5, r8

SUB r9, r2, r7

ASH r5, r14, r6

WAR hazard

 

Don't think there is any WAW hazards

 

BEQ r10, #0, r12

OR r8, r15, r2

Control hazard (if BEQ = branch on equal)

 

Link to comment
Share on other sites

latch frequency ?? where did you see that ???

 

Latch latency, maybe ??

 

Between each stage in the pipeline there is latches (small buffers) that hold the output of stage n-1 for a fraction of time untill stage n reads it in. The reading and writing of these latches consumes time pure overhead. (no free lunch !!!! eeeeveeeeeerrrrr!!!!!!!! )

 

 

Link to comment
Share on other sites

Originally posted by AlbaLed

Q3) Consider this instruction sequence:

 

DIV r2, r5, r8

SUB r9, r2, r7

ASH r5, r14, r6

MUL r11, r9, r5

BEQ r10, #0, r12

OR r8, r15, r2

 

(a) Identy all of the RAW hazards

(b) Identify all of the WAR hazards

© Identify all of the WAW hazards

© Identify all of the control hazards

 

assumming

operation result, operand1, operand2

 

DIV r2, r5, r8

SUB r9, r2, r7

RAW hazard

 

ASH r5, r14, r6

MUL r11, r9, r5

RAW hazard

 

SUB r9, r2, r7

ASH r5, r14, r6

MUL r11, r9, r5

RAW hazard (depending on how deep is the pipeline)

 

 

DIV r2, r5, r8

SUB r9, r2, r7

ASH r5, r14, r6

WAR hazard

 

Don't think there is any WAW hazards

 

BEQ r10, #0, r12

OR r8, r15, r2

Control hazard (if BEQ = branch on equal)

Right !!

 

You missed these though:

 

DIV r2, r5, r8

OR r8, r15, r2

RAW ! You don't know how deep the pipeline is...

 

DIV r2, r5, r8

OR r8, r15, r2

WAR !

 

 

Link to comment
Share on other sites

Oh, those things are called "latches". Ok. Now your guys earlier posts make a lot more sense.

 

Originally posted by AlbaLed

 

latch frequency ?? where did you see that ???

 

Latch latency, maybe ??

 

Between each stage in the pipeline there is latches (small buffers) that hold the output of stage n-1 for a fraction of time untill stage n reads it in. The reading and writing of these latches consumes time pure overhead. (no free lunch !!!! eeeeveeeeeerrrrr!!!!!!!! )

 

 

Link to comment
Share on other sites

  • 6 years later...
  • 1 year later...

solution for Q5

the solution in the picture

and i upload the file that contain the solution

you can find it

 

file:///C:/Users/user/AppData/Local/Temp/moz-screenshot-6.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-7.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-8.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-9.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-10.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-11.pngfile:///C:/Users/user/AppData/Local/Temp/moz-screenshot-12.png[ATTACH=CONFIG]6066[/ATTACH]

 

good luck

Link to comment
Share on other sites

please I need a solution for this question

Determine the improvement from branch folding for unconditional branches.

Assume a 90% hit rate, a base CPI without unconditional branch stalls of 1, and an unconditional

branch frequency of 5%. How much improvement is gained by this

enhancement versus a processor whose effective CPI is 1.1?

Link to comment
Share on other sites

please I need a solution for this problem

Determine the improvement from branch folding for unconditional branches.

Assume a 90% hit rate, a base CPI without unconditional branch stalls of 1, and an unconditional

branch frequency of 5%. How much improvement is gained by this

enhancement versus a processor whose effective CPI is 1.1?

 

 

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...