driller for software systems

nonevent99 · November 3, 2003

Incoming! Sorry in advance if these aren't pure "software systems and methodology" problems.

Q1) Define "expression compatible", "name compatible", and "structural compatability"

Q2) List the four key ingredients in object-oriented programming.

Q3) In a sequential file of 512B records, how many bytes must be read from the drive to acquire the contents of the 111th record?

Q4) Suppose an indexed file has records of 512B each, and each record pointer requires 4B. The index is implemented as a B-tree, wherein each node can store up to 512B of pointers to child nodes. If the B-tree contained exactly two levels (including the root), then what is the maximal number of records it can contain? What would the file's size be?

Q5) Can recursion be used to implement any iterative function? Can iteration be used to implement any recursive function? If not, give counterexamples.

Q6) What is the value of f(n) in big-O notation?

function f(n) {
 if (n   return f(n/2) + 2*f(n/4);
}

Q7) A specification requires the implementation of the following functionality: test-and-set memory, semaphores, monitors, critical regions, and mutexes. Which of these functions is most primitive (ie: cannot be implemented "on top of" the other functions)? Which is the least primitive? What is the hierarchy of primitiveness among these functions?

Q8) List the conditions which automatically lead to deadlock.

Q9) Is wound-wait an effective solution to deadlock? How about wait-die? Is either preemptive?

Q10) How do you know when an object is safe to garbage collect?

Q11) If a system stores arrays in row-major order, and an array of 4-byte words A[100][200] starts at location 0x8000, give a formula for the location of element A[x][y]. How about if it's in column-major order?

Q12) What does the following program print?

int x = 1;
function f(i) {
   int x = 2;
   I *= 3;
   for (int j = 1; j         g(j);
}

function g(j) {
   print j+x;
}
f(x);

Consider four scenarios: dynamic scope with pass-by-value, dynamic scope with pass-by-reference, static scope with pass-by-value, static scope with pass-by-reference.

Q13) Convert to postfix and prefix: 3*4+3-7/(8+2)

Q14) Evaluate the expression in Q13, assuming integer division. Then do the same, assuming that the system always performs floating point division.

Q15) What printout does the following code snippet generate, assuming >> is a logical right shift operator on 8-bit 2's complement integers, and & is a bit-wise AND operator?

int x = -23;
int m = 0x1;
for (int I = 0; I     if (m & x == 1) then 
       print i
   x = x >> 1
}

Q16) Why might a compiler make two passes through a source file?

Q17) What "independent" code compilation?

Q18) What is a "capability"?

Q19) What is the difference between "protection" and "security"?

Q20) Suppose a software system uses access control lists to control access to files. By default, each file is inaccessible to all users. If Bob is a member of the "vsusers" group, how many files below can he access? Assume that lines later in an ACL override earlier lines.

file1
   allow vsusers
file2
   allow vsusers
   deny Bob
file3
   allow Bob
   deny vsusers
file4
   deny Bob
file5
   allow all
   deny vsusers
   allow Bob

Q21) What is a profiler good for?

Q22) What is a linker good for?

Q23) What is the difference between a distributed file system and a networked file system?

Q24) What are the tradeoffs between user-level threading and kernel-level threading?

Q25) What are the tradeoffs between state-ful networking protocols and state-less networking protocols?

Q26) What is a one-way hash good for?

AlbaLed · November 3, 2003

[/size]Q1) Define "expression compatible", "name compatible", and "structural compatability"
Name compatible = two variable are compatible if they are declared from the same type
Structural compatability= two variables are compatible if they have same structure, (two struct can be compatible if the contain same things in them)
Expression compatible = No idea,

Q2) List the four key ingredients in object-oriented programming.

1. Abstract data types
2. Inheritances
3. Polymorphism
4. Dynamic Binding

Q3) In a sequential file of 512B records, how many bytes must be read from the drive to acquire the contents of the 111th record?

assuming read has to sequential as in tape-drive we'll need to read 512*111B
Assuming read from a random access drive
512B have to be read

Q4) Suppose an indexed file has records of 512B each, and each record pointer requires 4B. The index is implemented as a B-tree, wherein each node can store up to 512B of pointers to child nodes. If the B-tree contained exactly two levels (including the root), then what is the maximal number of records it can contain? What would the file's size be?

With 512B worth of pointers to other children nodes, we can have 455 pointers to other children using 9 BITS to identify a child, which means each node has 454 pointers to data records, so
with two levels we have 454 (root) + 455*454 = 207024 records can that be identified

Q5) Can recursion be used to implement any iterative function? Can iteration be used to implement any recursive function? If not, give counterexamples.

Recursion can be used to implement iteration (many functional languages do) but the reverse in not necessarily true or at least very hard.

Q6) What is the value of f(n) in big-O notation?
function f(n) {
 if (n   return f(n/2) + 2*f(n/4);
}
f(n) = O(n^(lg(3))

Q7) A specification requires the implementation of the following functionality: test-and-set memory, semaphores, monitors, critical regions, and mutexes. Which of these functions is most primitive (ie: cannot be implemented "on top of" the other functions)? Which is the least primitive? What is the hierarchy of primitiveness among these functions?

test-and-set is the most primitive
monitors are build on top of almost everything else

Q8) List the conditions which automatically lead to deadlock.

Mutual Exclusion
Hold and wait
No preemtion
Circular Wait

Q9) Is wound-wait an effective solution to deadlock? How about wait-die? Is either preemptive?

Yes they are, but lead to unnecessary deaths
wound-wait is preemtive, because older guys kill the young ones effectively preemting them
wait-die on the other hand process instead of waiting they kill themseves, so they willingly release their resources

Q10) How do you know when an object is safe to garbage collect?

If it cannot be reached from a root i.e. it cannot be reached from any of the references in the program

Q11) If a system stores arrays in row-major order, and an array of 4-byte words A[100][200] starts at location 0x8000, give a formula for the location of element A[x][y]. How about if it's in column-major order?

row major
A[x][y] = 0x8000 + x*4*100 + y*4
column major
A[x][y] = 0x8000 + y*4*200 + x*4

Q12) What does the following program print?
int x = 1;
function f(i) {
   int x = 2;
   i*= 3;
   for (int j = 1; j         g(j);
}

function g(j) {
   print j+x;
}
f(x);
Consider four scenarios: dynamic scope with pass-by-value, dynamic scope with pass-by-reference, static scope with pass-by-value, static scope with pass-by-reference.

Dynamic scope
Pass by value: 3 4 5
Pass by reference: 3 4 5
Static scope
Pass by value: 2 3 4
Pass by reference: 4 5 6

Q13) Convert to postfix and prefix: 3*4+3-7/(8+2)

Postfix
3 4 * 3 + 7 8 2 + / -
Prefix
- + * 3 4 3 / 7 + 8 2

Q14) Evaluate the expression in Q13, assuming integer division. Then do the same, assuming that the system always performs floating point division.

Integer arith. value = 15
Floating point arith. value = 15 - 0.7 (0.7 cannot be represented accurately in the computer) so the answer is going to be ~ 14.3
[/size]

nonevent99 · November 3, 2003

Dude, Alba, you're fast. So fast that I haven't even created an answer key yet!

Q4 Is that 512 the capacity of a node, or specifically for pointers to OTHER nodes??

Each node has 512B worth of pointers to children, as well as a certain number of 512B records. So our nodes are big. This is a B tree.

samlai · November 3, 2003

Q5) Can recursion be used to implement any iterative function? Can iteration be used to implement any recursive function? If not, give counterexamples.

I believe recursive function can always be implemented with an iterative function and a stack. Imagine how recursion is done in the background. Push stack frame (activation record) as you call the function recursively, then pop it when reaching the base case. All that could be simulated with a stack of some ADT for the stack frame.

AlbaLed · November 3, 2003

Q15) What printout does the following code snippet generate, assuming >> is a logical right shift operator on 8-bit 2's complement integers, and & is a bit-wise AND operator?

int x = -23;
int m = 0x1;
for (int i= 0; i    if (m & x == 1) then 
       print i;
   x = x >> 1
}

The code above will print the positions of 1s in the representation of -23 in 2s complement, which is 1110 1001

therefore the code will print (counting from left to right) 0 3 5 6 7

Q16) Why might a compiler make two passes through a source file?

To identify variables that are used before their declaration but are still legal. For example when writing a class in C++ one can put the declarations of the variables after the functions are defined. There might be more tho??

Q17) What is "independent" code compilation?

Independent compilation is compilation that may happen in different points in time, each part being compiled does not need any information of other parts being compiles (say in a big project).

Q18) What is a "capability"?

Ability to do 'something' :p. In what context

More to come soooooooon

samlai · November 3, 2003

Q16) Why might a compiler make two passes through a source file?

First pass - create the symbol table (figure the addresses)

Second pass - compile the program with help of the symbol table (fill in the addresses)

AlbaLed · November 3, 2003

samlai

That is a really nice approach indeed, but I don't think that it effectively uses iteration to implement recursion, it merely mimics the way recursion works. What you're basically doing is implementing recursion. For example

fact(x)
{
   if(x == 1) return 1
   return x*fact(x-1);
}

coulde be nicely transformed in the following code
fact(x) 
  int temp= 1;
  for(int I = 2 to I == x; I++)
     temp = temp * I;
  return temp;
}

I think this is what is required when using iteration instead of recursion. Two main reasons why iteration is prefered over recursion are:the recusion will run so deep that will get out of stack memory error, or performance.

AlbaLed · November 4, 2003

Q19) What is the difference between "protection" and "security"?

I think protection is the ability to deny or grant access to a specific resource such as keyboard. A policy + protection = security. Correct ???

Q20) Suppose a software system uses access control lists to control access to files. By default, each file is inaccessible to all users. If Bob is a member of the "vsusers" group, how many files below can he access? Assume that lines later in an ACL override earlier lines.

file1
   allow vsusers     [color=Red]file2
   allow vsusers  [color=Red]    deny Bob
file3
   allow Bob       [color=Red]    deny vsusers
file4
   deny Bob      [color=Red]file5
   allow all      [color=Red]    deny vsusers
   allow Bob

This depends on the method that the access list is processed, i.e. if scanning stops as soon as permission is granted than that case is shown above (this method is the most commonly used), another method is scan all the list and then decide, grant access if no deny-s for this user where found (slower). That's y someone has to be carefull when implementing ACL-s because the effect that you introduce might not be the one you want.

Q21) What is a profiler good for?

Anyone ???

Q22) What is a linker good for?

The linker brings together into one object file all object code that the current program does not implement but are provided by either the language or his/her own or libraries

Q23) What is the difference between a distributed file system and a networked file system?

Distributed file systems abstract many independed storage devices resident on different machines to one file system. The user does not see the detail, especially location where a file is stored, duplication. Networked file system is a network of independent file systems, use knows where a file is stored and details s/he does not need to know

Q24) What are the tradeoffs between user-level threading and kernel-level threading?

User level threads reside within a process, OS has not the least clue of them, usually implemented by some library. OS is fully aware of kernel level threads and ..... someone has to explain me the rest !!!!

Q25) What are the tradeoffs between state-ful networking protocols and state-less networking protocols?

statefull networking protocols are reliable at the cost of overhead, stateless protocols on the other hand don't remember a thing and are not reliable, but are faster and cheaper. Needs more elaboration, why don;t you add something if you know

Q26) What is a one-way hash good for?

One way hashing is usually used for storing data that needs to be verified not retrieved, such as paswords. In one way hashing the conversion from A(data) -> B(cypher) is efficient and the B -> A is considered hard or impossible. MD5 is a one way hashing algorithm

samlai · November 4, 2003

Q21) What is a profiler good for?

Profiler tells you what parts of the program are executed most often. It is very important to pin point where the bottle-neck of the program is when trying to optimize it.

wood · November 4, 2003

Can someone elaborate on Q4?

AlbaLed · November 4, 2003

Q4

512 bytes available for children. If we use 1 Byte to represent a child we only get 256 children, if we add another bit to the child address we get 512 children possible to be addressed, but we only have 512 bytes, so we can only fit 512*8/9 = 455 addresses of children pointers. In a B tree we have 1 more pointer than actual keys in the node, so 454 keys per node (this is key data + pointer to the data record in file). At level 1 there is 1 node (root) i.e. 454 pointers to data records, in second level we have 455 nodes each 454 pointers to data records we get

454 + 454*455

The actual size of the record pointer does not matter, neither does the size of the key, because we are told that we have 512B for children pointers (B-tree of index) and therefore we are assuming that we can fit 454 key+record pointers in a node

does that make sense???

wood · November 4, 2003

Alba, I believe the question mentioned the algo it is using in the blue part. The resource shouldn't be granted once you find a first match, but you should go through the whole list to make sure no lines override earlier ones.

Q20) Suppose a software system uses access control lists to control access to files. By default, each file is inaccessible to all users. If Bob is a member of the "vsusers" group, how many files below can he access? Assume that lines later in an ACL override earlier lines.
file1
   allow vsusers     [color=Red]file2
   allow vsusers  [color=Red]    deny Bob
file3
   allow Bob       [color=Red]    deny vsusers
file4
   deny Bob      [color=Red]file5
   allow all      [color=Red]    deny vsusers
   allow Bob
This depends on the method that the access list is processed, i.e. if scanning stops as soon as permission is granted than that case is shown above (this method is the most commonly used), another method is scan all the list and then decide, grant access if no deny-s for this user where found (slower). That's y someone has to be carefull when implementing ACL-s because the effect that you introduce might not be the one you want.

AlbaLed · November 4, 2003

Great, then read them in reverse :D

nonevent99 · November 4, 2003

Here is an answer key, compiled from all your answers, and my own. (In some cases, your answers were so much better than anything I could come up with that I just plagiarized your work. Good job, guys!)

Items that could really use more discussion will be highlighted in orange. I will update this entry until no orange items remain.

Q1) Define "expression compatible", "name compatible", and "structural compatability"

Name compatible = two variable are compatible if they are declared from the same type

Structural compatibility= two variables are compatible if they have same structure

Expression compatibility = two variables can be used together in a certain binary operator

Q2) List the four key ingredients in object-oriented programming.

Abstraction

Inheritance

Polymorphism

Encapsulation

What’s dynamic binding? Is it the same thing as encapsulation?

cf http://www.developer.com/vb/article.php/882111

Q3) In a sequential file of 512B records, how many bytes must be read from the drive to acquire the contents of the 111th record?

assuming read has to sequential as in tape-drive we'll need to read 512*111B

Assuming read from a random access drive

512B have to be read

Q4) Suppose an indexed file has records of 512B each, and each record pointer requires 4B. The index is implemented as a B-tree, wherein each node can store up to 512B of pointers to child nodes. If the B-tree contained exactly two levels (including the root), then what is the maximal number of records it can contain? What would the file's size be?

Each record pointer requires 4B, and each B-tree node contains 512B of pointers, so that means there are 512/4 = 128 pointers to children in each B-tree node.

If a B-tree node contains t pointers to children, then it can contain up to t-1 records within the node (interleaved with the pointers). So each node in this tree has 127 records.

In a 2-level tree, you get to have a root and t = 128 children for a total of 129 nodes. Each node contains 127 records, so our 2-level tree can contain up to 127 * 129 = 16383 records.

Each node contains 127 records at 512B each, plus 512B of pointers, for a total of 512*128B = 64KB. We have 129 nodes. So the tree/file uses a total of 129*64 KB, approximately 8GB.

Q5) Can recursion be used to implement any iterative function? Can iteration be used to implement any recursive function? If not, give counterexamples.

Recursion and iteration are equally powerful, according to Brookshear.

Q6) What is the value of f(n) in big-O notation?

function f(n) {
 if (n   return f(n/2) + 2*f(n/4);
}

Consider

f(n) is very close to twice f(n/2). So f(n) is O(n).

Q7) A specification requires the implementation of the following functionality: test-and-set memory, semaphores, monitors, critical regions, and mutexes. Which of these functions is most primitive (ie: cannot be implemented "on top of" the other functions)? Which is the least primitive? What is the hierarchy of primitiveness among these functions?

test-and-set memory can be used to implement mutexes

mutexes can be used to implement semaphores

semaphores can be used to implement critical regions

critical regions can be used to implement monitors

Q8) List the conditions which automatically lead to deadlock.

Mutual Exclusion

Hold and wait

No preemtion

Circular Wait

Q9) Is wound-wait an effective solution to deadlock? How about wait-die? Is either preemptive?

Yes they are, but lead to unnecessary deaths

wound-wait is preemtive, because older guys kill the young ones effectively preemting them

wait-die on the other hand process instead of waiting they kill themseves, so they willingly release their resources

Q10) How do you know when an object is safe to garbage collect?

If it cannot be reached from a root i.e. it cannot be reached from any of the references in the program

Q11) If a system stores arrays in row-major order, and an array of 4-byte words A[100][200] starts at location 0x8000, give a formula for the location of element A[x][y]. How about if it's in column-major order?

[orange]

In row-major order, each row contains 200 elements. Each element is 4B. So each row is 800 bytes.

I should have mentioned that these arrays are indexed from 0, as in C and Java (but not old-style Basic!)

With that assumption, the location of A[x][y] would be 0x8000 + 800*rownum + 4*colnum, which is 0x8000 + 800*y + 4*x.

In column major order, each column contains 100 elements, or 400 bytes per column. So the location is 0x8000 + 400*colnum + 4*rownum = 0x8000 + 400*x + 4*y

Q12) What does the following program print?

int x = 1;
function f(i) {
   int x = 2;
   I *= 3;
   for (int j = 1; j         g(j);
}

function g(j) {
   print j+x;
}
f(x);

Consider four scenarios: dynamic scope with pass-by-value, dynamic scope with pass-by-reference, static scope with pass-by-value, static scope with pass-by-reference.

Dynamic scope

Pass by value: 3 4 5

Pass by reference: 3 4 5

Static scope

Pass by value: 2 3 4

Pass by reference: 4 5 6

Q13) Convert to postfix and prefix: 3*4+3-7/(8+2)

Postfix

3 4 * 3 + 7 8 2 + / -

Prefix

- + * 3 4 3 / 7 + 8 2

Q14) Evaluate the expression in Q13, assuming integer division. Then do the same, assuming that the system always performs floating point division.

Integer arith. value = 15

Floating point arith. value = 15 - 0.7 (0.7 cannot be represented accurately in the computer) so the answer is going to be ~ 14.3

Q15) What printout does the following code snippet generate, assuming >> is a logical right shift operator on 8-bit 2's complement integers, and & is a bit-wise AND operator?

int x = -23;
int m = 0x1;
for (int I = 0; I     if (m & x == 1) then 
       print i
   x = x >> 1
}

This essentially tracks the 1 bits of -23 in order from least to most significant order.

23 = 16 + 7, so 23(dec) = 00010111(bin). Flip the bits and add one to get -23: 11101001.

Then we read out the positions of I which correspond to the 1 bits:

0 3 5 6 7

The >> is a red herring

Q16) Why might a compiler make two passes through a source file?

Because of forward references, it’s sometimes easier to loop through once, generating a symbol table, and then loop through again, actually generating output. For example when writing a class in C++ one can put the declarations of the variables after the functions are defined.

Q17) What "independent" code compilation?

Independent compilation is compilation that may happen in different points in time, each part being compiled does not need any information of other parts being compiles (say in a big project).

Q18) What is a "capability"?

A capability is a special key that an operating system might hand to a program when that program wants to access a resource. Without the key, the resource is unattainable. Capability management is often handled by the kernel.

Q19) What is the difference between "protection" and "security"?

According to an OS book I read (Tanenbaum or Silberschatz?), protection comprises restricting access against internal threats—protecting one process from another, protecting kernel state from user-level code, and the like. Security is a superset of protection, in that it comprises restricting access against internal as well as external threats, including users on other systems, worms and viruses, and the like.

Q20) Suppose a software system uses access control lists to control access to files. By default, each file is inaccessible to all users. If Bob is a member of the "vsusers" group, how many files below can he access? Assume that lines later in an ACL override earlier lines.

file1
   allow vsusers  
file2
   allow vsusers
   deny Bob
file3
   allow Bob
   deny vsusers
file4
   deny Bob
file5
   allow all
   deny vsusers
   allow Bob

Statements later in the ACL override earlier statements. That means Bob can get into files 1 and 5. Sorry, Bob.

Q21) What is a profiler good for?

Profiler tells you what parts of the program are executed most often. It is very important to pin point where the bottle-neck of the program is when trying to optimize it. You could also use a profiler in order to figure out unecessary steps. change function calls to inline functions/macros, etc. Most good profilers also allow the programmer to view what objects currently reside in memory, which is handy for tracking down memory leaks and other memory hogs.

Q22) What is a linker good for?

The linker brings together into one object file all object code that the current program does not implement but are provided by either the language or his/her own or libraries

Q23) What is the difference between a distributed file system and a networked file system?

Distributed file systems abstract many independed storage devices resident on different machines to one file system. The user does not see the detail, especially location where a file is stored, duplication. Networked file system is a network of independent file systems, use knows where a file is stored and details s/he does not need to know

Q24) What are the tradeoffs between user-level threading and kernel-level threading?

In kernel-level threading, the kernel is “knows” that the process contains multiple threads. The benefit of this is that each thread can be independently scheduled. For example, if one thread blocks on I/O, the entire process need not block. Moreover, in a multi-processor, the scheduler can decide to run one thread on CPU1 and simultaneously run another thread on another CPU.

In user-level threading, the kernel is unaware of the distinct threads within a process, and so is not responsible for managing the threads. As a result, you lose both of the benefits listed above. On the other hand, because threads cannot be scheduled independently, there is no need for a full context-switch when one thread leaves the CPU and another thread goes onto the CPU.

Q25) What are the tradeoffs between state-ful networking protocols and state-less networking protocols?

A protocol that is stateless is more suitable for load balancing; since each message carries with it any information needed to satisfy the request, any server in the cluster can satisfy the request. Moreover, stateless protocols can have lower overhead.

Stateful protocols can be more reliable (e.g.: tcp vs udp), at the cost of more overhead. Stateful protocols are usually also session-based, which offers a clear place to implement authentication (at the beginning of the session), so stateful protocols can be more secure (e.g.: https vs http).

Consequently, to some degree, the tradeoff is one of performance vs reliability.

Q26) What is a one-way hash good for?

One way hashing is usually used for storing data that needs to be verified not retrieved, such as passwords. You may also want to one-way hash a message so that the recipient can verify that the message was not altered in transit.

In one way hashing the conversion from A(data) -> B(cypher) is efficient and the B -> A is considered hard or impossible. MD5 is a one way hashing algorithm

wood · November 4, 2003

Great everybody! That was a heck lot of fun to practice. Kudos to nonevent99 for posting and AlbaLed for quickly fillling out the blanks before anybody can finish reading :) ...

Finally I was able to understand Q4. You wouldn't believe that I was trying to calculate using 48 bytes and not 4B !!! Bad thing trying to study after a long and exhaustive day at work with your eyes swelling...

Q4) Suppose an indexed file has records of 512B each, and each record pointer requires 4B. The index is implemented as a B-tree, wherein each node can store up to 512B of pointers to child nodes. If the B-tree contained exactly two levels (including the root), then what is the maximal number of records it can contain? What would the file's size be?

Each record pointer requires 4B, and each B-tree node contains 512B of pointers, so that means there are 512/4 = 128 pointers to children in each B-tree node.

If a B-tree node contains t pointers to children, then it can contain up to t-1 records within the node (interleaved with the pointers). So each node in this tree has 127 records.

In a 2-level tree, you get to have a root and t = 128 children for a total of 129 nodes. Each node contains 127 records, so our 2-level tree can contain up to 127 * 129 = 16383 records.

Each node contains 127 records at 512B each, plus 512B of pointers, for a total of 512*128B = 64KB. We have 129 nodes. So the tree/file uses a total of 129*64 KB, approximately 8GB.

Makes perfect sense.

Q5) Can recursion be used to implement any iterative function? Can iteration be used to implement any recursive function? If not, give counterexamples.

Recursion and iteration are equally powerful, according to Brookshear.

I agree. I have read that somewhere. Iteration could be very difficult (and undesired?) to obtain, but you can always get one.

Q6) What is the value of f(n) in big-O notation?
function f(n) {
 if (n   return f(n/2) + 2*f(n/4);
}
Consider
n   f(n)
1    1
2    3
4    5
8    11
16   21
32   43 
64   85
f(n) is very close to twice f(n/2). So f(n) is O(n).

Could someone come up with an expansion?

Q7) A specification requires the implementation of the following functionality: test-and-set memory, semaphores, monitors, critical regions, and mutexes. Which of these functions is most primitive (ie: cannot be implemented "on top of" the other functions)? Which is the least primitive? What is the hierarchy of primitiveness among these functions?

test-and-set memory can be used to implement mutexes
mutexes can be used to implement semaphores
semaphores can be used to implement critical regions
critical regions can be used to implement monitors

Agreed!

Q21) What is a profiler good for?
Profiler tells you what parts of the program are executed most often. It is very important to pin point where the bottle-neck of the program is when trying to optimize it. Most good profilers also allow the programmer to view what objects currently reside in memory, which is handy for tracking down memory leaks and other memory hogs.

True... you could also use a profiler in order to figure out unecessary steps. change function calls to inline functions/macros, etc.

Good job guys!! Keep it up.

Wood

AlbaLed · November 4, 2003

Dynamic binding makes possible the call of the correct method. For example we have a superclass called Shape (has a method Area())and another whole bunch of subclasses Circle, Square ...., now we create a method

test(Shape a)

print a.Area()

if we pass a Circle, the circle's Area is going to be called

if we pass a Square the square's Area is going to be called

I don;t know how much the above would make sense, is one of those things you know what it is but hard to explain.

Q4) Suppose an indexed file has records of 512B each, and each record pointer requires 4B. The index is implemented as a B-tree, wherein each node can store up to 512B of pointers to child nodes. If the B-tree contained exactly two levels (including the root), then what is the maximal number of records it can contain? What would the file's size be?

Guys, I am missing something in this question.

This is how I understand it. We have a file of records(indexed file), each record is 512B, now we want to index this file (i.e. create an index file), and we want to implement the index as a B-tree, nothing states how big the B-tree nodes are, except how much space for pointers to children in the B-tree. A pointer to the a record in the file (not the nodes in the B-tree) is 4B long. Where am I going wrong ???

wood · November 4, 2003

Here's how I understood nonevent99 explanation: if each node can store up to 512B of pointers, it must have 128 pointers (since each one is 4B). In a B-tree, if a node has 128 pointers, you have therefore 127 records per node. The first sentence mentioned that the file has records of 512B each. So you have 127 * 512B per node. You have 129 nodes (root+128 children).

Wood

nonevent99 · November 4, 2003

Alba--

Is "dynamic binding" the same thing as supporting delegates? Maybe I should put down "inheritance w/ ability to override and extend".

Re Q4, I guess I did a lousy job of writing the question. Wood is right that I meant each pointer from a node to a child node should require 4B of space. The reason is that if you've got a big enough disk drive, and your nodes are scattered all over the drive, then you will need more than a few bits to tell where on the drive the node is located. In this case, the drive's address space is so big that it requires 2^32 bits

This is a pretty large disk even by today's standards. If each block is indeed 64KB large (which is somewhat large for today's drives) and there are 2^32 such blocks, then you get a 256 TB disk. That's honking large.

(A more reasonable problem would probably have only 32B of pointers to child nodes, and more like 2B per pointer, to give 16 pointers to child nodes. That's 15 records per node. At 512B per record, we then get 15*512 + 32 = 7712B per block (still large, but not as ridiculous). In a two-layer tree, we have 1 + 16 = 17 nodes, for a total of 17*15 = 255 records. The file size is 7712B * 17 = 128 KB or so. The drive would be 2^16 blocks, each at 7712 B -- round this up to 8192 because drives don't have 7712 B blocks -- and you get a drive of 2^16 * 8192B = 512 MB, which is now a little on the small side.)

AlbaLed · November 4, 2003

I don't get this

We have

1. Data File

2. Index file

Records in data file are 512 Bytes each

Records in index file containt 512 Bytes for pointers to other nodes (nothing said about how long they are)

A pointer to the records in data file is 4 Bytes

Wood, the way you're explaining it sounds like a B-tree organized file, which is different from indexing.

nonevent99 · November 4, 2003

Alba, could you please expand on the differences between an indexed file and B-tree file?

Maybe we need more practice problems on indexed files! (Particularly since it's explicitly mentioned in the GRE syllabus from ETS.)

AlbaLed · November 4, 2003

Indexed files are basically the same thing you see at "Index" section of a book, what wood said and I am assuming he is understanding you correctly you are implying multilevel (B-tree) file. Index files are easy to maintain when they are small because you can load them in RAM and do whatever, but when they get large there is a need to organize them in levels (B-trees). Why do you want to do that instead of B-tree-ing a file? The primary reason is that you can locate more records faster. Say you're accessing records via a key, call it K which is 4 B long, with K you can uniqely identify approx 4x10^9 records, now it is better to use multilevels on the keys only than on the records which say are 256B. If we use index and B-tree the nodes will contain only data we need (keys + record pointer) to choose and access a record, in the other case nodes contain 'grabage' too. Yeahhh indexes are the way to go !!! Not always, they are only good when you're accessing records via a predefined number of keys, the more keys you need to access a record the more overhead you loose on indexing.

Makes sense???

wood · November 4, 2003

Thanks Alba! Yeah, B-trees are meant as the data file and not index file. Cormen et al initially presents B-tree for such use, and I believe ETS would do the same.

Wood

nonevent99 · November 4, 2003

Ok, I think what you're saying is that we need to know something about the keys to do anything useful with indexed files.

Example: Our N records are big, maybe 256B, but the keys are only 4B each. Moreover, our records are scattered all over the disk drive, and we need (let's say) 2B to indicate where on the drive they are.

Then our index is a map from key to record location. That is, it maps a 10B key to a 2B record location. If we have N such entries, then we need N*(4B + 2B) = 6N Bytes for our index. This, you are saying, conveniently fits into memory in many circumstances (unlike trying to fit all 256*N Bytes of records into memory).

If we order the entries by key, then we can do a binary search on the key to come up with the record location in O(lg N) time, where the O is a bound on the number of memory, not file, operations needed to find the record location. Once we have the record location, then boom, we just seek to the record location and load it into our application.

Is that what you're saying?

If so, then I understand my confusion. I had read about multilevel indexed files, which are when the index is too big to fit into memory. In that case, I suppose we'd just treat the index as a big file, and then index it (sparsely, if keys are unique, or densely with reference lists if they're not) with another index. That's what Silberschatz described in his db book.

AlbaLed · November 4, 2003

Is that what you're saying?

Exactly !!!!

"That is, it maps a 10B key to a 2B record location." Why 10B key, you said keys were 4 bytes ??

If so, then I understand my confusion. I had read about multilevel indexed files, which are when the index is too big to fit into memory. In that case, I suppose we'd just treat the index as a big file, and then index it (sparsely, if keys are unique, or densely with reference lists if they're not) with another index. That's what Silberschatz described in his db book.

Here I think you're going wrong, eventhough it might be approximately the same thing. Multilevel indexed does not mean the index of the data file is indexed again (eventhough it might be possible). I understand it as, the data file is indexed (so we have an index file), and the index file is organized in a multilevel structure, which is a B-tree. If the index file is multilevel then we don't need to have it all in the memory, but we proceed as usual with B-trees. The advantage of having multilevel index file is that we can find a record with less io or same io but much more records.

Example

Pointer for child 1B

Node-size = 1024B (for data) + 256 B-pointers

Record size = 256 B

Key size = 4 Bytes

Record Ptr = 2B

with 3 level B-tree data file (i.e. 3 disk accesses)

Records/node = 1024/256

Level 1: Records = 4, children 5

Level 2: Records = 4*5, children 5*5

Level 3: Records = 4*5*5, children 0

Total record = 4 + 4*5 + 4*5*5 = 124 records

with 2 level B-tree indexed file (i.e. 3 disk access, 2 in index + 1 record)

Records/node = 1024/6 (6 = key size + rec. ptr. size)

Level 1: Records = 170, children 171

Level 2: Records = 170*170, children 0

Total record =170 + 170*171 = 29241 records

Same I/O, same block size we can address more records. Does this make sense ??

nonevent99 · November 4, 2003

Thanks so much Alba. It makes sense, now.

Originally posted by AlbaLed

Is that what you're saying?

Exactly !!!!
"That is, it maps a 10B key to a 2B record location." Why 10B key, you said keys were 4 bytes ??

Yeah, typo.

Here I think you're going wrong, eventhough it might be approximately the same thing. Multilevel indexed does not mean the index of the data file is indexed again (eventhough it might be possible). I understand it as, the data file is indexed (so we have an index file), and the index file is organized in a multilevel structure, which is a B-tree. If the index file is multilevel then we don't need to have it all in the memory, but we proceed as usual with B-trees. The advantage of having multilevel index file is that we can find a record with less io or same io but much more records.

Example

Pointer for child 1B
Node-size = 1024B (for data) + 256 B-pointers
Record size = 256 B
Key size = 4 Bytes
Record Ptr = 2B

with 3 level B-tree data file (i.e. 3 disk accesses)
Records/node = 1024/256 = 4 records per node
Level 1: Records = 4, children 5
Level 2: Records = 4*5, children 5*5
Level 3: Records = 4*5*5, children 0

Total record = 4 + 4*5 + 4*5*5 = 124 records

So there's 1024B of data in each node, plus 5 children pointers * 1B per child pointer = 1029B per node. But I guess you indicated that you want to set aside space for "256 B-pointers" even though only 5 will actually be used ? That means 1024B+256B=1280B per node? Times 31 nodes = 39680B file

with 2 level B-tree indexed file (i.e. 3 disk access, 2 in index + 1 record)
Records/node = 1024/6 (6 = key size + rec. ptr. size) = 170.x
Level 1: Records = 170, children 171
Level 2: Records = 170*170, children 0 170*171, right?

Total record =170 + 170*171 = 29241 records 29240

We have 1024B of data plus the obligatory 256B for child pointers (tho we only use 171B), so still 1280B per node. We have 1 root node and 171 child nodes, 172*1280B = 220160B, plus the actual file contents.

Same I/O, same block size we can address more records. Does this make sense ??

[green]It does make sense. We have the same block size, but each record is only a key and a record pointer, so we can fit many more records in each node. That means there are many more children per node. Thus, in a certain number of levels, we can reference more records, and thereby provide efficient access to a substantially larger file.

Thank you so much, Alba.

To generalize,

Suppose the block is b bytes (of data, ignoring bytes for child pointers), the actual data record is r, the key plus record pointer is k, and the depth of the pure B-tree is d (the indexed tree is d-1, as you showed above, because we need to "waste" a file access actually reading the record off disk after figuring out its location).

Then the pure B-tree can store up to b/r records per node, so the branching factor is b/r+1. A tree with d levels can therefore store

[(b/r +1)^(d) -1]/(b/r)

nodes using the usual summation of E(i=0,i=N, x^i). Multiplying by b/r to convert to records, there must be

(b/r + 1)^(d) - 1 records.

In contrast, using your indexing scheme with a B-tree on the index, we can fit b/k key-location pairs in each node, leading to branching factor of b/k+1. A tree with d-1 levels can therefore store (treating b/k as floor(b/k))

[(b/k +1)^(d-1) - 1]/(b/k)

which is (b/k + 1)^(d-1)-1 records.

The ratio of the two is the interesting quantity. It is

(b/k+1)^(d-1) - 1 divided by (b/r+1)^d -1

approx equal to

(b/k)^(d-1) divided by (b/r)^d

which equals (r/k)^d divided by (b/k)

driller for software systems

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Join the conversation