Linux from scratch 01: How does the CPU execute an instruction?

Linux from scratch 01: How does the CPU execute an instruction?

Author: Dao Ge, 10+ years of embedded development veteran.

Public number: [IOT Internet of Things Town], focusing on: C/C++, Linux operating system, application design, Internet of Things, single-chip microcomputer and embedded development. Reply to the official account [Books] to get classic books in the Linux and embedded fields.

Reprint: Welcome to reprint the article, and the source must be indicated.

[TOC]

[Linux from scratch] What is it

Over the past two years, my focus has always been on the x86 Linux system, from the driver to the middle layer to the application layer development.

With the continuous expansion of the content, I feel that many basic things have almost forgotten before, such as the following table ("In-depth understanding of the LINUX kernel" chapter

47
page):

This table describes

Linux
Several segment descriptor information in the system.

Data segment and code segment, a closer look at the relevant books will know what these descriptors mean, but:

Why are the Base addresses of these segments

0x00000000
?

Why are Limits

0xfffff
?

Why are their Type and priority DPL different?

If not right

x86
To understand some basic knowledge of the platform, it is really laborious to finish this book!

What s worse is that with

Linux
The size of the kernel code continues to expand, and the latest 5.13 version compressed file is already more than one hundred megabytes:

Such a behemoth, how to start to really learn well

Linux
What? !

Even from the Linux version 0.11, many of the codes seem to be very laborious!

When sorting out some dusty books this weekend, I found several good books I have read before: Wang Shuang's "Assembly Language", Li Zhong's "From Real Mode to Protected Mode", Ma Zhaohui's "Assembly Language Programming", etc. Wait.

They are all very-very-old books. I turned it over again, and I really feel that the content is really good!

The description of some concepts, principles, and design ideas is clear and thorough.

Linux
A lot of design related to segmentation, memory, and registers in the system can be found in these books.

Ever since, I had an idea: whether these books can be combined with

Linux
System-related content is re-read and organized once, but it is by no means a simple knowledge transfer.

After thinking about it, there are probably the following ideas:

  1. First determine the goal of the final goal: learning the Linux operating system;

  2. These books are written in assembly language and relatively basic low-level knowledge. We will play down the assembly language part and focus on the principle part related to the Linux operating system;

  3. Do not output articles strictly in accordance with the content and order of the book, but put the relevant parts of the content in several books together for study and discussion;

  4. Some content can be compared and analyzed with the relevant parts in the Linux 2.6 version, so that when you learn the Linux kernel part in the future, you can find the underlying support;

  5. In the end, I hope I can stick to this series myself, which can be considered as a sorting out for myself.

In a word: based on basic knowledge!

As the first chapter of the opening, this article will describe the execution steps of the following picture:

Start now!

Old Intel 8086 processor

8086
Yes
Intel
The first paragraph of the company
16
Bit processor, born in
1978
Years, it should be a bit older than your friends.

in

Intel
Among all the company s processors, it occupies a very important position. It is the entire
Intel 32
The originator of the bit architecture processor (IA-32).

So, the question is, what is a 16-bit processor?

Some people confuse the number of bits in the processor with the number of address buses!

we know,

CPU
When accessing the memory, the physical address is transmitted through the address bus.

8086 CPU
Have
20
Bit address line, can be transmitted
20
Bit address.

Each address line represents one

bit
, Then
20
A
bit
The maximum value that can be represented is 2 to the 20th power.

In other words: the maximum can be located

1M
Address of memory, this is called
CPU
Addressability.

but,

8086
The processor is
16
Bit because:

  1. The arithmetic unit can process up to 16 bits of data at a time;

  2. The maximum width of the register is 16 bits;

  3. The path between the register and the arithmetic unit is 16 bits;

In other words: in

8086
Inside the processor, the maximum length that can be processed, transmitted, and temporarily stored at one time is
16
Therefore, we say that it is a 16-bit CPU.

What is main memory?

The essence of a computer is the storage and processing of data, so where does the data involved in the calculation come from? That is a physical device called storage (Storage or Memory).

Broadly speaking, any device that can store data can be called a memory, such as a hard disk, a USB flash drive, and so on.

However, inside the computer, there is a special

CPU
The memory that is connected to store the programs and data being executed is generally called internal memory or main memory, abbreviated as memory or main memory.

The memory is organized in bytes, and the smallest unit of a single access is

1
A byte, this is the most basic storage unit.

Each storage unit, that is, a byte, corresponds to an address, as shown in the following figure:

CPU
It is determined by the address bus: which storage unit in the memory to access the data.

The address of the first byte is 0000H, the address of the second byte is 0001H, and so on.

In this memory in the figure, the address of the largest storage unit is

FFFF
H, converted to decimal is
65535
, So the capacity of this memory is
65536
Bytes, that is
64 KB
.

There is an atomic operation issue to consider.

in

Linux
In the kernel code, atomic operations are used in many places, such as the implementation code of mutex locks.

Why do atomic operations need to limit the type of variables to

int
Type? This involves reading and writing to memory.

Although the smallest unit of memory is a byte, after careful design and arrangement, different bits of

CPU
, Can be accessed according to byte, word, double word.

In other words, with only a single visit,

16
Bit processor can handle
16
Bit binary number,
32
Bit processor can handle
32
The binary number of bits.

What is a register?

in

CPU
Internally, some of them are electrical signals representing 0 or 1. A group of electrical signals of these binary numbers appear on the internal circuits of the processor. They are a combination of high and low levels, representing each bit in the binary number.

Inside the processor, a circuit called a register must be used to latch these data.

Therefore, registers are essentially a type of memory. It's just that they are located inside the processor,

CPU
Access to registers is faster than access to memory.

The processor is always very busy. During its operation, all data can only temporarily exist in the register for a short time, and then be sent elsewhere. This is why it is called a "register".

8086
The registers in are
16
Bit, can store
2
Bytes, or
1
Words. The high byte is first (bit8 ~ bit15), and the low byte is last (bit0 ~ bit7).

8086
There are the following registers:

As mentioned earlier, these registers are

16
Bit. Due to the need to be compatible with older processors before, among them
4
Two registers: AX, BX, CX, DX can also be used as two 8-bit registers.

such as:

AX
Represents one
16
Bit register,
AH, AL
Each represents one
8
Bit register.

mov AX, 5D means to send 005D to the AX register (16 bits) mov AL, 5D means to send 5D to the AL register (8 bits) Copy code

3.buses

When we start an application, the code and data of this program are loaded into physical memory.

CPU
Whether it is to read instructions or manipulate data, it needs to interact with the memory:

  1. Determine the address of the storage unit (address information);

  2. Device selection, read or write command (control information);

  3. Data read or written (data information);

In the computer, there is a dedicated connection

CPU
The data with other chips is called a bus.

Logically classified, including the following

3
Kind of bus:

Address bus: used to determine the address of the storage unit;
control bus: CPU controls the external period;
data bus: transfer data between the CPU and memory or other devices;

8086 Yes

20
The root address line is called the width of the address bus. It can address 2 to the 20th power of memory cells.

In the same way, the width of the 8086 data bus is

16
, That is, it can be sent at one time
16 bit
The data.

The control bus determines

CPU
How many kinds of external control can be carried out determines
CPU
The ability to control external devices.

How does the CPU address the memory?

in

Linux 2.6
In the kernel code, the address generated by the compiler is called a virtual address (also known as a logical address). After segment conversion, this logical address becomes a linear address. After the linear address undergoes paging conversion, the physical address on the final physical memory is obtained. .

Remember the table of paragraph descriptors at the beginning of the article?

The starting addresses of the code segment and data segment descriptors are

0x00000000
, That is to say: the virtual address and the converted linear address are equal in value (we will see why this is the case later).

Let's take a look again

8086
Simpler address translation.

As mentioned earlier, memory is a linear storage device,

CPU
Rely on the address to locate each storage unit.

for

8086 CPU
For it has
20
Root address line, can send
20
Bit address, reach
1MB
Addressability.

but

8086
Again
16
The bit structure, the address that is internally processed, transmitted, and temporarily stored is only 16 bits.

From the internal structure point of view, if the address is simply sent from the inside to the address bus, it can only be sent

16
Bit address, in this case, the addressing ability is only
64KB
.

So how can we make full use of

20
What about the root address line?

8086 CPU
Adopt: The method of combining two 16-bit addresses internally to form one
20
The physical address of the bit is as follows:

First

16
The bit address is called the segment address, the second
16
The address of the bit is called the offset address.

The address adder uses the following formula to "synthesize" to get a

20
Physical address of bits:

Physical address = segment address x 16 + offset address

For example: the program we write is placed in a memory space after being loaded into the memory.

When the CPU executes these instructions, it puts

CS
Regarding the register as a segment register, consider
IP
The register is used as an offset register, and then the value of CS x 16 + IP is calculated to get the physical address of the instruction.

It can be seen from the above description: 8086 CPU seems to be because the register cannot directly output

20
For the physical address of the bit, such an address synthesis method is used as a last resort.

In fact, the more essential reason is: 8086 CPU just wants to address the memory through the base address + offset (the base address here means the segment address is shifted 4 bits to the left).

That is, even

CPU
Have the ability to directly output one
20
Bit address, it may still use base address + offset for memory addressing.

Think about it: we are

Linux
When compiling a library file in the system, it is usually added in the compilation options
-fPIC
The option indicates that the compiled dynamic library is address-independent and needs to be relocated when it is loaded into the memory.

The base address + offset addressing mode provides the bottom layer support for relocation.

How do we control the CPU?

CPU
In fact, it is a very pure and rigid thing. The only thing it does is to fetch an instruction from the memory unit specified by the two registers of CS:IP, and then execute this instruction:

Of course, a set of instruction sets need to be defined in advance. In the instruction area of the memory, all stored in the instruction area must be legal instructions, otherwise the CPU will not recognize it.

Each instruction is indicated by some specific number (instruction code)

CPU
Perform specific operations.

CPU
Recognize these instructions, when you see these scripts,
CPU
We know that there are several bytes of operands behind this instruction code and what kind of operation needs to be performed.

For example: script

F4
H means to stop the processor, when
CPU
When this instruction is executed, it stops working.

(Actually, it says

CPU
It s a bit inaccurate, because the CPU is a whole that includes many devices. Maybe it s said here
CPU
The execution unit in will be more accurate. )

Another point can be said in advance: everything in the memory is data. As for which part of the data is executed as an instruction, which part of the data is regarded as a "variable" operated by the instruction, this is entirely up to the designer of the operating system. Planned.

At the 8086 processor level, any memory area pointed to by CS:IP is executed as an instruction.

As can be seen from the above description:

CPU
The only devices that programmers can read and write with instructions are registers. We can control the CPU by changing the contents of the registers.

To put it more bluntly: we can control by changing the contents of the CS and IP registers

CPU
Execute the target instruction.

As a qualified embedded developer, everyone has estimated that they have configured some registers in the microcontroller to achieve some function definitions and port reuse purposes. In fact, these operations can be regarded as our control of the CPU.

If you compare the CPU to a puppet, then the register is the rope that controls the puppet.

Let's put

CPU
In the field of industrial control
PLC
Make an analogy with programming.

We are getting a new one

PLC
After the device, there is only one runtime (runtime), and the job performed by this runtime is:

  1. Scan all input ports and lock them in the input image area;

  2. Execute an operation and control logic to obtain a series of output signals, which are latched into the output image area;

  3. Refresh the signal in the output image area to the output port;

In a brand-new PLC, the calculation and control logic required in the second step may not exist.

So, just one

runtime
,
PLC
It is impossible to accomplish a meaningful job.

in order to

PLC
To complete a specific control goal, we also need to use
PLC
The upper computer programming software provided by the manufacturer develops an arithmetic and control logic program. The programming language is generally ladder diagram.

When this app is downloaded to

PLC
After the middle, it can control the runtime to do some meaningful work.

We can simply think: Ladder diagram is used to control the running time of the PLC.

for

CPU
In other words, if you want it to execute the instruction of a certain memory unit, you only need to modify the register
CS
with
IP
That's it.

In other words: as long as the memory layout of a program is clear enough, you can play with the CPU between the palms of the hand and let it execute any code.

CPU execution instruction flow

Now we have understood address translation, memory addressing, distance

CPU
The smallest unit needed to execute an instruction is left: instruction buffer and control circuit.

Simply put: the instruction buffer is used to cache instructions read from the memory, and the control circuit is used to coordinate the use of resources such as buses by various devices.

For the picture below, it has a total of

4
Instructions:

Take the first instruction as an example, it goes through

5
Steps:

  1. Send the content of CS:IP to the address adder, and calculate the 20-bit physical address 20000H;

  2. The control circuit sends the 20-bit address to the address bus;

  3. The instruction B8 23 01 at unit 20000H in the memory is sent to the instruction buffer via the data bus;

  4. The value of the instruction offset register IP must be increased by 3 to point to the next offset address waiting to be executed (because the instruction code B8 represents the length of the current instruction is 3 bytes);

  5. Execute the instructions in the instruction buffer: send the value 0123H into the register AX;

The above is the most basic step of the execution of an instruction. Of course, the instruction execution flow of modern processors is much more complicated than here.


------ End ------

High-rise buildings rise from the ground!

This article only describes

CPU
The minimum knowledge points required to execute an instruction.

In the next article, we will continue to take a closer look at the memory segmentation mechanism.

Recommended reading

Album 0: Featured Articles

Album 1: C language

Album 2: Application Programming

Album 3: Linux Operating System

Album 4: Internet of Things

Star public account, you can find me faster!