| |
The Microprocessor Behind The Personal Computer
Very Basically, a microprocessor combines the functions of a CPU (Central
Processing Unit) within one chip. It includes a ALU (Arithmetic Logic Unit),
internal registers and a CU (control unit) for sequencing the system. The
processor has three buses, a bi-directional data bus, mono-directional address
bus and control bus. The data bus carries data between various components of the
system, typically from memory to the processor or input output controller. The
address bus carries an address generated by the processor, which will select one
internal register within one of the chips attached to the system and specifies
the source or destination of the data which will carry along the data bus. The
control bus carries various synchronization signals. The processor needs some
sort of clock to synchronize the precise timing references of the system. The
8086 processor model is still intact in the latest Pentium processors. The
processor design includes a Bus Unit, and ALU, Execution Unit (EU) and an
instruction queue. Later Pentium designs include cache, a page unit, a Floating
Point Unit, a branch target buffer and RISC (Reduced Instruction Set Computer)
concepts in the execution unit.
To understand the PCs capabilities and performance a brief history of the
microprocessors follows.
Intel Microprocessors
Intel had introduced the 8086 processor three years before the announcement of
the IBM PC. However because of the cost of designing the personal computer
around this microprocessor IBM choose the 8088 microprocessor also released by
Intel. The 8088 microprocessor has a 16 bit internal bus but only supports a 8
bit external bus making it easier to use standard 8 bit peripheral chips that
were already around, and allowed a smaller entry level of system memory. Using
the 8088 processor then, also mapped the way for easy migration to the 8086 and
286 microprocessors that were to follow. The 8088 processor accomplished 1 MB
addressing by using a technique known as segmentation. A two step process
used to address memory. First a segment register was loaded with a pointer to a
64 KB block of memory, then normal 8 bit registers could manipulate data in that
64 KB segment. The segment register needed to be loaded with new data to access
memory outside the 64 KB. (It was not until the 80386 processor that memory
addressing mode allowed full linear mapping).
The PC AT was announced in 1984 by IBM, and used the Intel 286, 16 bit
processor, which supported 16 bit bus transfers and 24 bit memory addressing and
protected mode memory management which allows programs to be written that
prevent one portion effect another portion and hence one requirement of
multitasking. Increasing the expansion bus to 16 bits and remaining backward
compatible allowed existing expansion cards to work in the new architecture. All
PC designs still support the 16 bit (AT) bus, so cards that were designed for
the original IBM PC should still work properly in a modern bus motherboard (
Some manufacturers will stop supporting this bus soon).
The 80386 microprocessor was announced by Intel in 1985. The processor could
process 32 bit data and access memory on a 32 bit bus. Chips were added to the
motherboard to allow the AT bus to run asynchronously to the processors clock,
and permits their speeds to run independently. The memory was also moved
from the external bus to the microprocessors local bus and no longer dependent
on the speed of the external bus. By adding cache memory (much faster, smaller
and more expensive) to the local bus the whole system was speeded up and freed
the external bus from some of the constraints namely a bottleneck on the PC.
Windows applications also pushed the PC to its limits and soon the graphics card
performance became the bottle neck in the PC system performance. The 386
introduced linear addressing along with Demand Paging. Demand paging
automatically detects when a block of memory is not in system memory and
requires retrieving from the hard disk, Virtual Addressing. The 386 processor
allowed Virtual 8086 mode. Each user or task could operate as though having the
entire system to itself. 32 bit operating systems can use the full features of
the 386 protected modes and offer 32 bit support. Bank Interleaving increased
access by partitioning memory into multiple blocks that could be accessed
simultaneously. The 386 was later shipped with a 16 bit internal bus which
lowered system costs in a competitive market and was named the 386SX. The
original 386 was named the 386DX.
In April 1989 Intel announced the 486 microprocessor. Apart from performance
gains, not much changed to the architecture design however the new chip took
advantage of advances in transistor size by adding a math coprocessor and a
small amount of cache on the chip. The processor bus had changed somewhat from
the 386 processor to allow burst transfer. Only one pointer, the start address
needed to be loaded in a register to process blocks of memory. When the graphics
card became the major bottleneck in the PC system, VESA (Video Electronics
Standards Association) used the 486 local bus and extended it to include VESA
local bus slots, adding them behind the AT bus slots, thus combining both buses
in a system card, to provide high speed peripheral performance without replacing
the function of the AT bus. Every time Intel introduced a new microprocessor
they changed the processor architecture, so the chipset had to change. To solve
this problem Intel introduced a new bus called PCI (Peripheral Connection
Interface) that would attach to the microprocessor bus via a local bus to PCI
bridge chipset. Only the bridge chip needs to be changed if the microprocessor
and local bus design change which Intel do to improve functionality, speed and
take advantage of new technology. External buses were reaching a limit .
Incorporating on chip cache meant it was possible to run the processor at much
higher clock rates inside while the external bus runs at lower speeds. A PLL
(Phase Lock Loop) is also used and will accept an input from a reference clock
and can multiply or divide the clock to accomplish the processor internal bus
running faster. The 486 processor introduced a new System Management Mode,
totally hidden from the other modes, but can be entered from the other modes and
was developed to be used in notebook technology to allow power management
functions to perform transparent to the operating system and applications. The
486 processor was eventually released in a SX version which had no math
coprocessor . The original 486 was renamed the 486DX.
The Intel Pentium processor (P54C) was introduced in March 1993. The design completely
enhanced the math coprocessor performance and increased the size of on chip
cache. The Pentium local bus width is 64 bits and, under certain conditions the
processor could execute two instructions in a single clock cycle. The Pentium
also includes advanced system integrity features such as parity checking on each
byte of data transferred on the external bus and generated on the address bus.
Internal parity checking is done on instruction and data caches and nearly all
internal registers and internal ROM instructions and data. The Pentium will also
shut down if internal errors are detected. Did you notice Intel stopped using
X86 to describe the model of processor and favored a naming convention.
The Intel Pentium Pro processor was introduced in November 1995. It is a
superpipelined superscaler processor supporting ECC (Error Correcting Code),
Fault Analysis & Recovery, Functional Redundancy Checking, Multi-branch
prediction, data flow analysis and supports multiple processors and is supplied
with 16 KB of L1 cache and 32 KB of on die L2 cache that operates at the
processor bus speed. The processor can address 64 GB of main memory through the
addition of four more address lines. This is a RISC chip with a 486 hardware emulator on it. Several techniques are used by this chip to produce more performance.
A performance increase is achieved by dividing processing into stages, three instructions can be decoded in each one, as opposed to two for the Pentium.
In addition, instruction decoding and execution decoupled, instructions can still be executed if one pipeline stops.
The Pentium Pro was first aimed at the server market and optimised to run 32 bit
code.
The Intel Pentium MMX (P55C) processor was announced (quietly) in January 1997,
followed by an uproar by consumers who were not issued the new processor in pre-Christmas
purchased PC's. The MMX (Matrix Math Extensions or Multimedia Extensions) chip
incorporates a lot of RISC (reduced Instruction Set Computer) architecture as
opposed to CISC (Complex Instruction Set Computer), and will be the subject of
another guide. Multimedia extensions enhance audio, video playback and graphics
performance. All Intel CPU processors support MMX extensions. The MMX processor
was also the last in the line to be mounted in the ZIF socket on the
motherboard (presumably so that they could patent the design and stop AMD from
taking over the market as most popular desktop processor).
The Pentium II processor from Intel includes MMX instructions (which enhance
multimedia performance), it has 32kb onboard L1 cache. The L2 cache is mounted
on a riser card (dual cavity package) along with the CPU, interconnected by the DIB
(Dual Independent Bus) and fits into a slot on the motherboard. The Pentium
II processor included SMP (Symmetric Multi-Processor) support for 2
CPU's through the GTL+ bus and uses two MMX execution units both
execution units and the secondary cache are supplied with ECC. Pentium
Pro and Pentium II processors contain a bug in the FPU (Floating Point
Unit). The conversion of certain large negative numbers into integers
sometimes fails to detect an overflow. Software solutions are available.
In February 1999, Intel unveiled its latest processor, the Pentium III. The
Pentium III in addition to being faster than the Pentium Pro and Pentium
II processors has many new features, including a unique processor ID and
new processor instructions, Streaming SIMD Extensions, or SSE. SIMD, stands for
Single Instruction Multiple Data, the capability to process more than one data
element in one instruction. Though SSE adds new features, existing applications
are not affected. These new instructions do for the Pentium II what MMX did for
the Pentium.
The competition
To further complicate processor options several manufacturers introduced
clone processors. AMD (Advanced Micro Device) produced 286 processors under license
from Intel and then claimed the license covered 386 and 486 designs. Up until
the introduction of the K5 (Pentium equivalent), there was no real performance
or functionality gains over Intel's processors. The K5 is not a clone of the
Intel Pentium and claims performance gains due to superscaler design such as
dual pipelines, branch prediction and execution in anticipation of a branch. AMD
introduced the K6 in mid 1996 and like the Intel MMX, used 64KB L1 cache. The
chip fits into existing processor sockets on the motherboard unlike the Intel
design which needed a new motherboard. The AMD K6-2 is similar to the K6 except
it offers 3DNow! technology, which is AMD's version of MMX - but much more
powerful. The K6-2 has been proven to outperform a Pentium II machine of an
equivalent clock speed. The K6-2 also introduced the 100MHz FSB (front side bus).
The K6-2 should work in any system a K6 (Socket 7), however the K6-2 requires
less voltage. The K6-2 is the best performing member of the Pentium-compatible
family of Socket 7 processors. The K6-3 a higher performance version of the
K6-2, due to Tri-level cache design and improved manufacturing process. The
K6-III is roughly comparable in performance to the Pentium II. I do not have a
lot of details about the AMD processors at thee time of typing.
Cyrix designed there processors from the ground up, using non of Intel's technology.
Initial designs of 386 and 486 processors are not actual clones of Intel's
processors, but hybrids. All the designs use a 486 like processor with a five
stage pipeline, which allows many instructions to be executed in a single clock
cycle. However a smaller level of data and instruction cache has been added to
the chip. Cyrix also licensed its processor design to IBM, SGS Thompson and
Texas Instruments. Texas Instruments also developed its own version of the 486
processor with larger caches and PCI bus interfaces built in.
The Maths CoProcessor
This
processor goes by several names, the coprocessor, the math coprocessor, the
floating point processor and the NPX (Numerical Processor Extension). The
processor can only directly work with whole integer numbers. Math's functions
perform calculations on numbers in non-integer format, so Intel introduced the
Maths Coprocessor , capable of performing numeric operations 20 to 100 times
faster than equivalent software routines using integer arithmetic processors.
The trend is to have the math coprocessor integrated on the same chip as the
integer processor. In the past Intel based computers were slow compared to RISC
workstations, but since the release of the Pentium processor Intel redesigned
the structure and functions, so performance is 5 to 10 times that of 486
processors and competitive with RISC workstations.T he math coprocessor or also
capable of handling integers packed numeric data. The math coprocessor can
output data in several formats, internally all data is represented as temporary
real numbers, a standard 80 bit format. To software, the coprocessor appears as
additional registers, data types and instructions. The coprocessor has a number
of embedded constants such as PI, Sine, Cosine, Tangent, etc and arithmetic functions
in addition to add and subtract.
The Future
VLIW (Very Long Instruction Word) processors receive several instructions
packed into a single instruction word from compiled software that is executed
along a set of parallel execution units for simultaneous processing. Most
programs process blocks of instructions between branches, typically small blocks
and if compiled so that instructions and tasks are arranged such that the very
long instruction word (VLIW) contained no branching, pipelines wouldn't stall.
Which is the technology behind caching. Too be continued...
|