Firstborn - Intel 4004
Intel sold its first microprocessor in 1971. It was a 4-bit chip with the code name 4004. It was designed to work together with three other microchips, ROM 4001, RAM 4002 and shift register 4003. 4004 performed the calculations directly, and the remaining components were critical for the processor. The 4004 chips were mainly used in calculators and other similar devices, and were not intended for computers. Its maximum clock speed was 740 kHz.
4004 was followed by a similar processor called 4040, which, in fact, represented an improved version of the 4004 with an extended command system and higher performance.
8008 and 8080
With the help of 4004 Intel announced itself in the market of microprocessors, and to benefit from the situation introduced a new series of 8-bit processors. Chips 8008 appeared in 1972, then in 1974 appeared processors 8080, and in 1975 - chips 8085. Although 8008 is the first 8-bit microprocessors of Intel, it was not so well known as its predecessor or successor - model 8080. Thanks to the possibility Processing data with 8-bit blocks 8008 was faster than 4004, but had a rather modest clock speed of 200-800 kHz and did not particularly attract the attention of system designers. 8008 was produced using 10-micron technology.
Intel 8080 was much more successful. The architectural design of 8008 chips was changed due to the addition of new instructions and the transition to 6-micron transistors. This allowed Intel to more than double the clock speeds, and the fastest 8080 processors in 1974 worked at a frequency of 2 MHz. The CPU 8080 was used in countless devices, and several software developers, for example, the newly formed Microsoft, focused on software for Intel processors.
Ultimately, the later 8086 microchips had a common architecture with 8080 to keep backward compatibility with the software written for them. As a result, the key hardware blocks of the 8080 processors were present in all the x86-based processors that were ever produced. The software for the 8080 can also technically work on any x86 processor.
Processors 8085, in fact, represented a cheaper version of the 8080 with increased clock speed. They were very successful, although they left a small footprint in history.
8086: the beginning of the x86 era
The first 16-bit Intel processor was 8086. It had significantly higher performance than the 8080. In addition to the increased clock speed, the processor had a 16-bit data bus and hardware execution blocks, allowing the 8086 to simultaneously perform two 8-bit instructions. In addition, the processor could perform more complex 16-bit operations, but most of the programs of that time were developed for 8-bit processors, so support for 16-bit operations was not as relevant as multitasking. The bit size of the address bus was expanded to 20-bits, which gave the processor 8086 access to 1 MB of memory and increased performance.
8086 also became the first processor on the x86 architecture. He used the first version of the x86 command set, on which almost all AMD and Intel processors are based since the appearance of this chip.
Around the same time, Intel released the chip 8088. It was built on the base of 8086, but it had half of the address bus turned off, and it was limited to the performance of 8-bit operations. However, he had access to 1 MB of RAM and worked at higher frequencies, so it was faster than previous 8-bit Intel processors.
80186 and 80188
After 8086 Intel introduced several other processors, all of which used a similar 16-bit architecture. The first chip was 80186. It was designed to simplify the design of ready systems. Intel relocated some hardware elements, which were usually located on the motherboard, to the CPU, including a clock generator, an interrupt controller, and a timer. By integrating these components into the CPU, the 80186 has become many times faster than the 8086. Intel also increased the chip clock speed to further improve performance.
The 80188 processor also had a number of hardware components integrated in the chip, but it cost an 8-bit data bus, like the 8088, and was offered as a budget solution.
80286: more memory, more performance
After the release of 80186 in the same year appeared 80286. It had almost identical characteristics, except for the extended to 24-bit address bus, which, in the so-called protected mode of the processor, allowed it to work with RAM up to 16 MB.
IAPX 432 was an early attempt by Intel to get away from the x86 architecture in a completely different direction. According to Intel calculations, iAPX 432 should be several times faster than other solutions of the company. But, in the long run, the processor failed due to significant architecture flaws. Although x86 processors were considered relatively complex, iAPx 432 raised the complexity of CISC to a whole new level. The configuration of the processor was rather cumbersome, which forced Intel to release the CPU on two separate crystals. The processor was also designed for high loads and could not work well in conditions of a lack of bus bandwidth or data entry. IAPX 432 was able to overtake 8080 and 8086, but it was quickly eclipsed by newer processors on the x86 architecture, and eventually it was abandoned.
I960: Intel's first RISC processor
In 1984 Intel created its first RISC-processor. It was not a direct competitor to x86-based processors, since it was intended for secure embedded solutions. These chips used a 32-bit superscalar architecture, in which the Berkeley RISC design concept was applied. The first i960 processors had relatively low clock speeds (the younger model worked at 10 MHz), but over time the architecture was improved and transferred to thinner processes, which allowed raising the frequency to 100 MHz. They also supported 4 GB of protected memory.
I960 was widely used in military systems and also in the corporate segment.
80386: x86 to 32-bit transition
The first 32-bit processor on the x86 architecture from Intel was 80386, which appeared in 1985. Its key advantage was the 32-bit address bus, which allowed to address up to 4 GB of system memory. Although at that time, almost no one used memory, the limitations of RAM often harmed the performance of previous x86 processors and competing CPUs. In contrast to modern CPUs, at the time of the appearance of 80386, increasing the amount of RAM almost always meant an increase in performance. Intel also implemented a number of architectural improvements that helped improve performance above 80286, even when both systems used the same amount of RAM.
To add more affordable models to the product line, Intel introduced the 80386SX. This processor was almost identical to the 32-bit 80386, but was limited to a 16-bit data bus and supported the work with RAM only up to 16 MB.
In 1989, Intel made another attempt to get away from x86 processors. She created a new CPU with a RISC architecture called i860. Unlike the i960, this CPU was designed as a high-performance model for the desktop market, but the processor design had some drawbacks. The main one was that, in order to achieve high performance, the processor fully relied on software compilers, which were supposed to place instructions in the order they were executed when the executable was created. This helped Intel save the crystal size and reduce the complexity of the i860 chip, but when compiling programs it was almost impossible to correctly position each instruction from start to finish. This forced the CPU to spend more time processing data, which drastically reduced its performance.
80486: FPU integration
The 80486 processor was Intel's next big step in terms of performance. The key to success was more dense integration of components in the CPU. 80486 was the first x86 processor with L1 cache (first level). The first samples of the 80486 had 8 KB of cache on the chip and were manufactured using a 1000 nm process technology. But with the transition to 600 nm the cache volume L1 increased to 16 KB.
Intel also included in the CPU FPU, which previously was a separate functional unit for data processing. Moving these components to the central processor, Intel noticeably reduced the delay between them. To increase the throughput, the 80486 processors also used a faster FSB interface. To improve the processing speed of external data, many improvements have been made in the core and other components. These changes significantly increased the performance of processors 80486, which at times overtook the old 80386.
The first processors of the 80486 reached a frequency of 50 MHz, and later models produced by the process technology of 600 nm could operate at a frequency of up to 100 MHz. For customers with a smaller budget, Intel released 80486SX, in which the FPU was blocked.
P5: The first Pentium processor
The Pentium appeared in 1993 and was the first Intel x86 processor that did not follow the 80x86 numbering system. Pentium used the architecture P5 - the first superscalar microarchitecture x86 Intel. Although the Pentium was generally faster than 80486, its main feature was the significantly improved FPU. FPU of the original Pentium was more than ten times faster than the old block in 80486. The significance of this improvement only increased when Intel released the Pentium MMX. In terms of microarchitecture, this processor is identical to the first Pentium, but it supported the Intel MMX SIMD instruction set, which could significantly increase the speed of individual operations.
Compared to 80486, Intel has increased the amount of L1 cache in new Pentium processors. The first Pentium models had 16 KB of the first level cache, and the Pentium MMX already received 32 KB. Naturally, these chips worked at higher clock speeds. The first Pentium processors used transistors with a process technology of 800 nm and reached only 60 MHz, but subsequent versions created using the Intel 250 nm manufacturing process reached 300 MHz (the Tillamook core).
P6: Pentium Pro
Soon after the first Pentium, Intel planned to release a Pentium Pro based on the P6 architecture, but faced technical difficulties. Pentium Pro performed 32-bit operations much faster than the original Pentium due to the extraordinary execution of commands. These processors had a highly redesigned internal architecture, which decoded the instructions in micro-operations that were performed on general-purpose modules. In connection with additional decoding hardware, the Pentium Pro also used a significantly extended 14-level pipeline.
Since the first Pentium Pro processors were designed for the server market, Intel again expanded the address bus to 36-bit and added PAE technology, which allows addressing up to 64 GB of RAM. This is much more than the average user needed, but the ability to support large amounts of RAM was extremely important for server customers.
Also, the cache memory system of the processor was reworked. The L1 cache was limited to two 8 KB segments, one for instructions and one for data. To make up for the 16KB memory deficit compared to the Pentium MMX, Intel added from 256K to 1MB of L2 cache on a separate chip attached to the CPU case. It was connected to the CPU using an internal data bus (BSB).
Initially Intel planned to sell the Pentium Pro to simple users, but, in the final analysis, limited its release to models for server systems. The Pentium Pro had several revolutionary features, but continued to compete with the Pentium and Pentium MMX in terms of performance. Two older Pentium processors were significantly faster when performing 16-bit operations, while 16-bit software was predominant. The processor also grabbed the MMX command set support, as a result, the Pentium MMX overtook the Pentium Pro in optimized MMX programs.
The Pentium Pro had a chance to stay in the consumer market, but it was quite expensive in production because of a separate chip containing the L2 cache. The fastest Pentium Pro processor reached a clock frequency of 200 MHz and was produced using 500 and 350 nm process technology.
P6: Pentium II
Intel did not abandon the architecture of the P6 and in 1997 introduced the Pentium II, which corrected almost all the shortcomings of the Pentium Pro. The underlying architecture was similar to the Pentium Pro. He also used a 14-level pipeline and had some kernel improvements that increased the speed of execution of instructions. The volume of the L1 cache grew - 16 KB for data plus 16 KB for instructions.
To reduce the cost of production, Intel also moved to cheaper cache chips attached to a larger processor casing. It was an effective way to make the Pentium II cheaper, but memory modules could not run at maximum CPU speed. As a result, the L2 cache clock rate was only half that of the processor cache, but this was sufficient for early CPU models to increase performance.
Intel also added a set of MMX commands. The core of the CPU in Pentium II, codenamed "Klamath" and "Deschutes", was also sold under the Xeon and Pentium II Overdrive brands, server-centric. The models with the highest performance had 512 KB of L2 cache and a clock speed of up to 450 MHz.
P6: Pentium III and 1 GHz battle
After Pentium II, Intel planned to release a processor based on the Netburst architecture, but it was not yet ready. Therefore, in Pentium III, the company again used the P6 architecture.
The first Pentium III processor was code-named "Katmai" and was very similar to the Pentium II: it used a simplified L2 cache, running at only half the CPU speed. The basic architecture has undergone significant changes, in particular, several parts of the 14-level conveyor have been combined to 10 steps. Thanks to the updated pipeline and the clock speed increase, the first Pentium III processors, as a rule, slightly outperformed the Pentium II.
Katmai was produced using 250 nm technology. However, after switching to a production process of 180 nm, Intel was able to significantly increase the performance of the Pentium III. In an updated version codenamed "Coppermine", the L2 cache was moved to the CPU, and its volume was reduced by half (up to 256 KB). But since it could run on the processor frequency, the performance level still improved.
Dr. Thomas Pabst of Tom's Hardware discovered instability in his work. Coppermine participated in the race with AMD Athlon for 1 GHz and succeeded. Later, Intel tried to release a 1.13 GHz processor model, but eventually it was withdrawn after. As a result, the chip with a frequency of 1 GHz remained the fastest Pentium III processor based on Coppermine.
The latest version of the Pentium III core was called "Tualatin". When it was created, a 130 nm process technology was used, which allowed to achieve a clock frequency of 1.4 GHz. The L2 cache was increased to 512 KB, which also allowed a bit to improve performance.
P5 and P6: Celeron and Xeon
Together with the Pentium II, Intel also introduced the Celeron and Xeon processors. They used a Pentium II or Pentium III core, but with different cache sizes. The first models of Celeron processors based on the Pentium II base had no L2 cache at all, and the performance was terrible. Later models based on Pentium III had half of its L2 cache size. So we got Celeron processors that used the Coppermine core and had only 128 KB of L2 cache, and later models based on Tualatin are already 256 KB.
Versions with half the cache were also called Coppermine-128 and Tualatin-256. The frequency of these processors was comparable to the Pentium III and allowed to compete with AMD Duron processors. Microsoft used a Celeron Coppermine-128 processor with a frequency of 733 MHz in the Xbox game console.
The first Xeon processors were also based on the Pentium II, but had more second-level cache. At entry-level models, its volume was 512 KB, whereas older brothers could have up to 2 MB.
Before discussing the architecture of Intel Netburst and Pentium 4, it is important to understand the advantages and disadvantages of its long pipeline. The concept of a pipeline means moving instructions through the kernel. At each stage of the pipeline, many tasks are performed, but sometimes only one single function can be performed. The conveyor can be increased by adding new hardware blocks or by dividing one stage into several. And you can also reduce it by removing hardware blocks or combining several processing steps into one.
The length or depth of the conveyor has a direct effect on delay, IPC, clock speed and throughput. Longer pipelines usually require more bandwidth from other subsystems, and if the pipeline constantly receives the required amount of data, then each stage of the pipeline will not idle. Also, processors with long pipelines can usually work at higher clock speeds.
The disadvantage of the long pipeline is the increased delay in execution, since the data passing through the conveyor are forced to "stop" at each stage for a certain number of cycles. In addition, processors with a long pipeline can have a lower IPC, so they use higher clock speeds to improve performance. Over time, processors that use a combined approach have proven effective without significant flaws.
Netburst: Pentium 4 Willamette and Northwood
In 2000, the architecture of Intel Netburst was finally ready and was released in Pentium 4 processors, dominating for the next six years. The first version of the kernel was called "Willamette", under which Netburst and Pentium 4 existed for two years. However, this was a difficult time for Intel, and the new processor barely outperformed the Pentium III. Netburst's microarchitecture allowed higher frequencies to be used, and the Willamette-based processors could reach 2 GHz, but in some tasks the Pentium III with a frequency of 1.4 GHz turned out to be faster. During this period, AMD Athlon processors had a greater performance advantage.
The problem with Willamette was that Intel expanded the pipeline to 20 stages and planned to beat the 2 GHz band, but because of the limitations imposed by power consumption and heat generation, it could not achieve its goals. The situation improved with the appearance of Intel's "Northwood" microarchitecture and the use of a new 130 nm process technology that allowed to increase the clock speed to 3.2 GHz and double the L2 cache size from 256 KB to 512 KB. However, the problems with the power consumption and heat dissipation of the architecture of Netburst have not gone away. However, Northwood's performance was significantly higher, and it could compete with AMD's new chips.
In high-end processors, Intel has implemented Hyper-Threading technology, which increases the efficiency of using kernel resources in multitasking environments. The benefit of Hyper-Threading in Northwood chips was not as great as in modern Core i7 processors - the performance gain was a few percent.
The Willamette and Northwood cores were also used in the Celeron and Xeon series processors. As in the previous generations of Celeron and Xeon CPUs, Intel accordingly reduced and increased the size of the second-level cache to differentiate them in performance.
Netburst microarchitecture was developed for high-performance Intel processors, so it was quite power-intensive and not suitable for mobile systems. Therefore, in 2003, Intel created its first architecture, designed exclusively for laptops. The Pentium-M processors were based on the P6 architecture, but with longer 12-14-level pipelines. In addition, the first time a variable-length pipeline was implemented - if the information necessary for the command was already loaded into the cache, the instructions could be followed after 12 steps. Otherwise, they had to go through two additional steps to download the data.
The first of these processors was produced on a technical process of 130 nm and contained 1 MB of L2 cache. It reached a frequency of 1.8 GHz with a power consumption of only 24.5 watts. A later version with the name "Dothan" with 90 nanometer transistors was released in 2004. The switch to a more subtle production process allowed Intel to increase the L2 cache to 2 MB, which, combined with some kernel improvements, markedly increased the performance per clock. In addition, the maximum CPU frequency has risen to 2.27 GHz with a slight increase in power consumption to 27 watts.
The architecture of the Pentium-M processors was subsequently used in mobile chips Stealey A100, the replacement of which came from the Intel Atom processors.
The core of Northwood with the architecture of Netburst lasted on the market from 2002 to 2004, after which Intel introduced Prescott core with numerous improvements. The production process used a 90 nm process technology, which allowed Intel to increase the L2 cache to 1 MB. Also, Intel introduced a new LGA 775 processor interface, which had support for DDR2 memory and expanded four times the FSB bus. Because of these changes, Prescott had more bandwidth than Northwood, which was necessary to improve Netburst's performance. In addition, based on Prescott, Intel showed the first 64-bit x86 processor, which has access to larger RAM.
Intel expected that the Prescott processors would be the most successful among the chips based on the Netburst architecture, but instead they failed. Intel again expanded the pipeline execution of commands, this time to 31 stages. The company hoped that the increase in clock speeds would be enough to compensate for the presence of a longer pipeline, but they managed to achieve only 3.8 GHz. The Prescott processors were too hot and consumed too much energy. Intel calculated that switching to a 90 nm process technology would eliminate this problem, but the increased density of transistors only complicated the cooling of the processors. It was impossible to achieve a higher frequency, and changes in the Prescott core negatively affected overall performance.
Even with all the enhancements and the extra Prescott cache, at best, it went up one level with Northwood in terms of arbitrariness per beat. At the same time, AMD K8 processors also made the transition to a thinner process technology, which allowed them to increase their frequencies. AMD has dominated the desktop CPU market for some time.
Netburst: Pentium D
In 2005, two major manufacturers competed for the championship in the announcement of a dual-core processor for the consumer market. AMD first announced dual-core Athlon 64, but it was long out of stock. Intel sought to bypass AMD, using a multi-core module (MCM) containing two Prescott cores. The company christened its dual-core Pentium D processor, and the first model was code-named "Smithfield".
However, Pentium D was criticized because it had the same problems as the original Prescott chips. Heat dissipation and power consumption of two cores based on Netburst limited this to a frequency of 3.2 GHz (at best). And because the architecture's efficiency was highly dependent on the congestion of the pipeline and the speed of data arrival, Smithfield's IPC score dropped significantly, because the bandwidth of the channel was divided between the two cores. In addition, the physical implementation of the dual-core processor was not elegant (in fact, they are two crystals under one lid). And two cores on a single chip in AMD CPUs were considered a more advanced solution.
After Smithfield, Presler appeared, which was transferred to a 65 nm process technology. The multi-core module contained two Ceder Mill crystals. This helped reduce the heat dissipation and power consumption of the processor, and raise the frequency to 3.8 GHz.
There were two main versions of Presler. The first had a higher heat capacity of 125 watts, and the later model was limited to 95 watts. Due to the reduced size of the crystal, Intel also doubled the L2 cache size, eventually each crystal had 2 MB of memory. Some models for enthusiasts also supported Hyper-Threading technology, which allows the CPU to perform tasks in four threads simultaneously.
All Pentium D processors supported 64-bit software and more than 4 GB of RAM.