CPU pipelining

Hi guys,

so it's been a while since I've been at college. In our first year we learned about computer architecture and admittedly I didn't pay much attention in that class in fact I barely showed up and only scrapped a pass. Unfortunately for me comp arch is probably and arguably the most foundation and imperative of all subjects in a computer science degree. So I decided to pick up a few books to not only revitalize the little knowledge gathered from my first year comp arch classes but obviously to learn in depth about how computers actually work.

I picked up three books, "the nand to tetris book(The elements of computing systems)", "Code by Charles Petzold" and finally "but how do it know".

I just finished my first out of the three, which was "but how do it know". I will first start of by saying I enjoyed it and I found it was a rather relatively light and easy read. Although it does lack a bit of depth but with that being said it goes into enough detail to get a decent understanding of how a simple processor functions. I'm not going to detail everything I learned in the book but I'll give a quick summary which will lead on to my question about CPU pipelining.

As mentioned the book lacked a little depth and unfortunately did leave me asking some questions. The author decided to keep it simple and use a 1 byte CPU. The books CPU consisted of an ALU,control unit,flags and registers. The computer in the book also obviously had some main memory to store instructions and a bus to transfer the data to and from the CPU to memory and vice versa.

This is how the computer in the book functioned. The instructions which were one byte long were stored in main memory which was also only 1 byte( we can only run very small programs), The CPU first reads in the beginning instruction, the Instruction Address Register first starts off with the value 0. This means we want to grab address 0 from memory(start of the program). In order to this we enable the "enable" line of the Instruction Address register on to the bus and enable the "set" on the MAR(memory address register) line thus setting the MAR to 0. Since the value of the first address is already on the bus we will also enable the first input to the ALU as 0 and enable bus 1 which will be the second input to the ALU which will mean input A with the value 0 will be added to bus 1 which just enables a 1. We then enable the Accumulator's set line and the value in the Accumulator is now the result of the ALU's add instruction from the two inputs. The Accumulator register now holds the value 1.

On the next clock cycle, MAR now holds the value of 0, so we enable the set wire of the IR(instruction register). We also enable the "enable" wire of RAM onto the bus, and hence the contents of what's in address 0 is now inside the IR register.

On the next clock cycle, we enable the "enable" wire of the Accumulator and enable the set wire of the instruction address register effectively incrementing the value of IAR by 1.

So now that we have our instruction in the IR, we can now execute that instruction, the IR is wired in a way that if the first bit of the IR is 1 this means we are going to execute an ALU instruction, if the 0th bit is 0 then we may have a LOAD or MOVE(to RAM)instruction or something similar. So the ALU takes 3 bits to indicate the type of operation it will do such as binary addition. If in the case it is indeed an add instruction the first bit will be as mentioned 1, the second,third and fourth bit will indicate the type of instruction,the fifth and sixth bit will be the first register and the last 2 bits will be the last register. So for example an ADD regA,regB instruction. The 3 ALU input bits will be wired to three and gates, one for each instruction bit. The first input to the and gate will be the first bit of the IR which will be 1 if it's an ALU instruction, the second input bit will be from the stepper which will make sure the ALU instruction occurs on the correct clock cycle. And lastly the third input will come from the IR bit's 1 2 and 3 respectively that denote the ALU instruction.

So if on the correct clock cycle, and ALU instruction is selected then the first,second and third bits of the IR will select the correct ALU instruction. All of this happens of course in sync, this is done using a clock in the CPU. In the book the clock is just a NOT gate with it's output fed back into it's input. The clock signal is then set to a clock enable and a clock set circuits. I mentioned earlier that each register has an enable and set wire. The enable wire will be connected to clock e and the set wire of each register will be connected to clock s. clock s with the help of gates ensures the set line will be on for a shorter time period than the enable wires.

All of the steps that I detailed above are connected to a thing called a stepper, the stepper as it's name suggests makes sure the CPU performs instructions in order. The stepper will stay on for 1 whole clock cycle. And in the example CPU in the book there are 7 steps. The 7th step just resets the stepper back to step 1 when it reaches 7.

Ok so that is a lot to read, and if you read all of that you have admirable patience. This is an overfly simplified processor. but now to my question that relates to pipelining. The book never mentions pipelining. So how does it occur in regards to my simple processor? The book only has one bus, so only one byte of data can be on the bus through each step(clock cycle) is this true in a real CPU?? and I'm guessing if not this would have something to do with pipelining? maybe the circuitry in a real CPU can handle multiple inputs on the bus at the same time? or maybe there are more buses?

Since on each clock cycle only one byte can be on the bus how can pipelining work? because wouldn't this mean multiple values would have to be on the bus to do a fetch,decode and execute with one clock cycle?

I also hear a lot about the fetch decode and execute cycle when it comes to CPU's but the book fails to mention it, how would it relate to my simple CPU?

Thanks for sticking with me and if you read all that mumbo jumbo, hats off to you.
Last edited on
This is an overfly simplified processor

Not only is it simplified it is ancient, what ISA is it designed for, the computer model is obselete
but now to my question that relates to pipelining
It does not this CPU does not support pipelining are you describe it
The book only has one bus, so only one byte of data can be on the bus through each step(clock cycle) is this true in a real CPU??
No modern CDBs are very wide
Since on each clock cycle only one byte can be on the bus how can pipelining work? because wouldn't this mean multiple values would have to be on the bus to do a fetch,decode and execute with one clock cycle?

The bus can be pipelined
Some intro books on computer systems start simply by introducing a hypothetical machine with a vastly simplified architecture to explain the basis of how things work. This gives a grounding that can then be expanded to consider more complex architectures once the basics have been mastered. It sounds like this book is one such intro book - that doesn't go very far past the simplistic basics. However, it sounds like it's too simplistic as I would have expected the fetch/decode/execute cycle to be discussed in terms of this hypothetical machine.
Registered users can post here. Sign in or register to post.