Computer Organization and Design Fundamentals by David Tarnoff is now available!

Although the set of notes you have requested is presented below, it has not been maintained since January, 2003. All of the information in these notes has been included in an on-line text titled Computer Organization and Design Fundamentals. The book is available in three formats, two of which are free electronic downloads. Please visit one of the following links in order to access the format you prefer.

Hard copy text from Lulu.com:
http://www.lulu.com/content/138273
Single 6"x9" PDF of entire book (Click on "Download free"):
http://www.lulu.com/content/138273
PDFs of individual chapters scaled to 8.5"x11" format:
http://faculty.etsu.edu/tarnoff/138292/

Thank you for your interest in this textbook. Please feel free to e-mail me at tarnoff etsu.edu if you have any questions or comments.

-Dave Tarnoff

Cache Basics

As we discussed in the last section, a cache memory is a very fast RAM that is placed between the processor and the main memory (the large DRAM provided by the DIMMs you installed on the motherboard).

Basic cache configuration.

It speeds up the processor by providing quicker access to the most recently used data or the most frequently used data.

Remember that the address decoder circuit slows the operation of retrieving data from memory. The larger the memory is, the more complex the address decoder circuit. The more complex the address decoder circuit is, the longer it takes to select a memory location based on the address it received. Therefore, making a memory smaller makes it faster.

It is possible to have multiple layers of caches too. Sometimes a very small cache (very fast) cache is added between the processor and the first cache. This new cache is contained inside of the processor. By placing the new cache inside the processor, the wires that connect the two become shorter and the circuitry becomes more closely integrated. Both of these along with the smaller decoder circuit result in even faster data access. When two caches are present, the one inside the processor is referred to as a level 1 or L1 cache while the one between the L1 cache and memory is referred to as a level 2 or L2 cache.

L1 and L2 cache configuration.

The success of a cache is based on a principle referred to as locality of reference. Due to the nature of programming, instructions and data tend to cluster together. This is due to constructs of the language such as loops and subroutines. Over a short period of time, a cluster will execute over and over again. Over a long period of time, clusters will change moving in and out of focus. As long as the processor is accessing a cluster that is contained in the cache, the overall performance of the processor will improve.

Cache Organization

Since caches are successful only because of the principle of locality of reference, then it would also suggest that when data is loaded into the cache, the memory values close to it should be loaded too. For example, when a program enters a loop, the sequence of instructions that make up a loop are all executed. Therefore, when we load the first instruction of the loop into the cache, it might save some time if we load some of the instructions around it too. That way the processor won't have to go back to main memory for subsequent instructions.

Because of this, caches are typically organized so that when one piece of data is loaded, the block of neighboring data is loaded too. Therefore, caches are organized so that they can store blocks of data.

Each block of data is identified with a number called a tag that can be used to figure out what the original main memory addresses of the data were before they were loaded into the cache. This way, when the processor is looking for a piece of data or code (hereafter referred to as a word), it only needs to look at the tags to see if the word is contained in the cache.

The tags and the blocks of words are combined in the cache much like that shown in the figure below:

Organization of lines in a cache.

Remember that each tag uniquely identifies the block of words in its row so that the processor knows what the original addresses of the data and code were.

The combination of a tag and the block of words associated with that tag is referred to as a line.

Cache Operation

When the processor needs a word from memory, the first place it checks is the cache. The circuits used to access data from main memory may also be activated just in case the data is not found in the cache, often referred to as a "miss". The results from the main memory search will not be used if the word is found in the cache. If the search in the cache was successful, the processor will just use the cache's word and disregard the main memory.

If, however, the cache did not contain the requested word, then one of two operations will happen depending on the processor's design:

The required block of words is read from main memory to cache then delivered from the cache to the processor. This occurs when the cache is physically between the processor and main memory.
The required block of words is read from main memory to the cache and simultaneously delivered to the processor. This occurs when the both the processor and the cache are connected to the data bus, i.e., each can have a direct connection to the main memory.

Dividing Memory into Blocks

The typical memory simply stores all of its words in sequential addresses. It's interaction with the cache, however, dictates that a block is defined using the addresses going to main memory. Note that this has no effect on the operation or storage of words of main memory; it is only a method for using the main memory addresses to organize words within the cache.

The full main memory address defines a specific memory location within memory. For example, a unique twenty-bit address such as 3E9D1₁₆=0011 1110 1001 1101 0001₂ points to exactly one memory location within the 1 Meg memory space.

If we "hide" the last bit of the address, i.e., it could be a one or a zero, than the resulting address could refer to one of two possible locations, 3E9D1₁₆ (0011 1110 1001 1101 0001₂) or 3E9D0₁₆ (0011 1110 1001 1101 0000₂). If we "hide" the last two bits, then the last two bits could be 00, 01, 10, or 11. Therefore, the address could be referring to one of 4 possible sequential locations:

3E9D0₁₆ = 0011 1110 1001 1101 0000₂
3E9D1₁₆ = 0011 1110 1001 1101 0001₂
3E9D2₁₆ = 0011 1110 1001 1101 0010₂
3E9D3₁₆ = 0011 1110 1001 1101 0011₂

This is how a block is defined. By removing a small group of bits at the end of an address, the resulting block identifier points to a group of memory locations. Every additional bit that is removed doubles the size of the group. This group of memory locations is referred to as a block.

The number of words in a block is defined by the number of bits removed from the end of the address to create the block identifier. For example, when one bit is removed, a block contains two memory locations. When two bits are removed, a block contains four memory locations. In the end, the size of a block, K, is defined by:

K = 2^w, where w represents the number of bits "removed"

Organization of blocks in a memory.

Remember from our discussion on the organization of a cache, words are stored in a cache as a block, the combination of which defines a line. Therefore, the length of a line in a cache is the size of the tag (which depending on how data is stored in the cache, may be the same as the block identifier) and the number of words in a block multiplied by the word length.

Organization of lines in a cache.

Cache Characteristics

For this simple discussion on caches, only a few of the characteristics of RAM caches will be discussed in detail. The majority of the characteristics will only be described briefly.

The cache system used by a processor is defined by six traits:

The size of the cache
The size of a block
The number of caches (i.e., L1 and L2 or a split cache)
Mapping Function - the order in which the data is stored in the cache
Replacement Algorithm - the method used to figure out how to free up a line in the cache
Write Policy - how the processor stores data so that main memory eventually gets updated

Cache Size

As far as the size of the cache is concerned, there is a balancing act that need to be performed to come up with the best cache for the system. First of all, larger caches are more expensive. When it comes to performance, however, larger caches are not necessarily better.

The larger a cache is, the more likely it is that the processor will find a piece of recently used data in the cache. The problem is that as a cache gets larger, the address decoding circuits also get larger and therefore slower. In addition, more complicated logic is required to search a cache because of the seemingly random way that data is stored in it.

Block Size

There is also a relationship between size of a block and the performance of the cache. As the block size goes up, the possibility of getting a successful "hit" when looking for data could go up due to more words being available. For a fixed cache size, however, as the block size increases, the number of blocks that can be stored in a cache goes down. This makes it so that the number of hits could go down. A typical size of a block is fairly low, between 4 and 8 words.

Number of Caches

As was mentioned earlier, a processor system might have two caches: one located in the processor (level 1) and one on the motherboard (level 2). There is a second reason why two caches might appear in a system.

In some cases, a processor will use one cache to store code/instructions only and a second cache to store only data. This is used for special architectures where the processor needs to keep the program separate because it treats those values differently. This method of cache organization is called a split cache.

Mapping Function

There are three main methods of determining where to store blocks in the cache so that the processor can quickly find them when it is looking for a word: direct, full associative, and set associative. A detailed description of all of them is beyond the scope of this discussion, but we will introduce the simplest of the three methods, direct.

Assume main memory is divided up into n blocks and the cache has room to contain exactly m blocks. m is much smaller than n because the cache is much smaller than main memory.

If we divide m into n, we should get an integer which represents the number of times that the main memory could fill the cache with different blocks from its contents. For example, if main memory is 128 Meg (2²⁷) and the cache is 256 K (2¹⁸) and a block size is four words (2²), then the main memory can contain n = 2^27-2 = 2²⁵ blocks and the cache can contain m = 2^18-2 = 2¹⁶ blocks. Therefore, the main memory could fill the cache n/m = 2²⁵/2¹⁶ = 2⁹ = 512 times!

Okay, if you didn't keep up with that discussion, just understand this: the memory is much larger than a cache, so each line in the cache is responsible for storing one of many blocks from main memory. In the case of our example above, each line of the cache is responsible for storing one of 512 different blocks from main memory.

Direct mapping assigns each memory block in main memory to a specific line in the cache. If a line is all ready taken up by a memory block when a new block needs to be loaded, the old block is trashed. The figure below shows how multiple blocks are mapped to the same line in the cache.

Diagram of direct mapping method for saving blocks to cache.

As for how this is implemented, remember that a group of bits from the least significant end of an address represent the words in a block. A second group of bits is taken from the least significant end of the remaining bits. This provides the line number from within the cache where the block is to be stored.

For example, if a cache had 2¹⁰=1024 lines, then a line would need 10 bits to be uniquely identified. Therefore, the next ten bits of the address would indicate where in the cache the data is to be stored. The remaining bits will be the tag.

tag

bits identifying line in cache

word id bits

Breakdown of a Memory Address into its Components

Once the block is stored in the line of the cache, the tag is copied to the tag location of the line.

In summary, direct mapping breaks an address into three parts: tag bits, line bits, and word bits. The word bits are the least significant bits and the identify the specific word within a block of memory. The line bits are the next least significant, and they identify what line in the cache the block will be stored in. The remaining most significant bits are stored with the block as the tag and identify where the block of memory is located in main memory.

Example 1

Assume a cache system has been designed such that a block contains 4 words and the cache has 512 lines (i.e., it can hold up to 512 blocks). What line of the cache is supposed to hold the block that contains the word from the sixteen-bit address 3456₁₆? In addition, what is the tag number that will be stored with the bits?

Answer: To begin with, we need to divide the address into the word bits, the line bits, and the tag bits. Since 4=2², then the least two significan bits identify the word within the block. Since the cache has 512=2⁹ lines, then the next 9 bits identify the line number where the data is supposed to be stored in the cache. The remaining bits are the tag bits.

By the way, since the address

5 tag bits

9 bits identifying line in cache

2 word id bits

Converting the address to binary gives us 3456₁₆ = 0011 0100 0101 0110₂. That means that the address breaks into the following components.

5 tag bits	9 bits identifying line in cache	2 word id bits
00110	100010101	10

Therefore, out of the 512 lines of the cache, this block will be stored at line 100010101₂ = 277₁₀. The tag that will be associated with this block is 00110₂.

Example 2

The first 10 lines of a 256 line cache are shown in the table below. Identify the address of the data that is shaded a darker green and contains d8. For this cache, a block contains 4 words. (Note: The tags are given in binary in the table.)

		Block data
Line #	Tag	word 00	word 01	word 10	word 11
0	110101	12	34	56	78
1	010101	54	32	6a	d3
2	000111	29	8c	ed	f3
3	001100	33	a2	2c	c8
4	110011	9a	bc	d8	f0
5	001101	33	44	55	66
6	010100	92	84	76	68
7	000100	fe	ed	00	ed
8	100000	00	11	22	33
9	101000	99	88	77	66

Answer: First, we need to do the same thing that we did in the first problem, find the number of bits that represent each part (word, line #, and tag) of the address. From the table, we can see that 2 bits represent the word and 6 bits represent the tag. But what about the line number?

Since the cache has 256=2⁸ lines, then 8 bits represent the line number in the cache. Therefore, the address is divided up as follows:

6 tag bits

8 bits identifying line in cache

2 word id bits

The shaded cell in the table has a tag number of 110011. The line number is 4 which in 8 bits of binary is 00000100₂. Last of all, it is in the third column which means that it is the 10₂ bit. (Remember to start counting from 0.) Putting all of this information together gives us:

6 tag bits	8 bits identifying line in cache	2 word id bits
110011	00000100	10

Therefore, the address that the shaded cell containing d8 represents is 1100 1100 0001 0010₂ = CC12₁₆.

Example 3

Using the table from Exercise 3, retrieve the data that has an address 101C₁₆.

Answer: First convert the hexadecimal address to binary.

101C₁₆ = 0001 0000 0001 1100₂

Next, using the breakdown of bits for the tag, line number, and word, break down the binary address.

6 tag bits	8 bits identifying line in cache	2 word id bits
000100	00000111	00

From this we see that the tag is 000100, the line number is 00000111₂ = 7₁₀, and the word number is 00₂. Looking at line number 7 we see that the tag is in fact 000100. (Note: If the tag number did not match the tag we calculated, then we know that the cache did not contain our address!) From line number 7, find the word represented by 00 and we get fe.

Example 4

Using the table from Exercise 3, determine if the data from address 9827₁₆is in the cache.

Answer: First convert the hexadecimal address to binary.

9827₁₆ = 1001 1000 0010 0111₂

Next, using the breakdown of bits for the tag, line number, and word, break down the binary address.

6 tag bits	8 bits identifying line in cache	2 word id bits
100110	00001001	11

From this we see that the tag is 100110, the line number is 00001001₂ = 9₂, and the word number is 11₂. Looking at line number 9 we see that the tag is equal to 101000 which does not equal 100110. Therefore, the data from that address is not contained in this cache and we will have to get it from the main memory.

Created by David L. Tarnoff for exclusive use by sections of CSCI 2150 at ETSU.