Key Concepts
Computer Parts
- 控制部件
- 执行部件
Machine Representation
Floating-Points
(IEEE 754 based)
- 32-bit single precision
-
1:S 8:Exponent 23:Significand
-
- 64-bit double precision
-
1:S 11:Exponent 52:Significand
-
Norms
- 32-bit single precision
- \[(-1)^{\text{S}} \times (\overline{1. \text{Significand}}) \times 2^{\text{Exponent} - 127}\]
- 64-bit double precision
- \[(-1)^{\text{S}} \times (\overline{1. \text{Significand}}) \times 2^{\text{Exponent} - 1023}\]
Denorms
Exponent is zero, Significand non-zero.
- 32-bit single precision
- \[(-1)^{\text{S}} \times (\overline{0. \text{Significand}}) \times 2^{-126}\]
- 64-bit double precision
- \[(-1)^{\text{S}} \times (\overline{0. \text{Significand}}) \times 2^{-1022}\]
Zero
Exponent is zero, Significand is zero.
- Positive:
0 000... 000...
- Negative:
1 000... 000...
Infinity
Exponent all
1
, Significand is zero.
- Positive:
0 111... 000...
- Negative:
1 111... 000...
NaN
Exponent all
1
, Significand is non-zero.
- Quiet:
? 111... 1??...
- Signaling:
? 111... 0??...
Instructions
微指令
微指令是微程序级命令,属于 硬件范畴。
控制部件 通过 控制线路 向 执行部件 发出各种控制指令(即 微指令)。
机器指令
机器指令处于 硬件与软件的交界面。
若干条 微指令 可以构成一个 微程序,而一个微程序就对应了一个 机器指令。
微程序固化在 CPU 内部的控制存储器中,机器运行时只读不写。微程序规定了电路级的工作方式。
-
操作码 寻址方式 寄存器编号 立即数(位移量)
伪指令 / 宏指令
伪指令是由若干条 机器指令 组成的指令序列,属于 软件范畴。
Compilation
- C source file (
hello.c
) to preprocessor, get preprocessed source (hello.i
)gcc -E
flag
- Preprocessed source (
hello.i
) to compiler, get assembly source (hello.s
)gcc -S
flag
- Assembly source (
hello.s
) to assembler, get relocatable object file (hello.o
) - Object file (
hello.o
) and precompiled libraries (e.g.printf.o
) to linker, get binary executable (hello
)
Binary File Conventions
ELF (Executable and Linkable File)
- ELF header section
- 包含文件结构说明信息
.text
: code section.rodata
: read-only data section.data
: initialized data section.bss
: uninitialized data section.symtab
: symbol table section.rel.txt
: code relocation info section.rel.dat
: data relocation info section.debug
: debug info sectiongcc -g
flag
.strtab
.line
- Section header table
- 由若干个表项组成,每个节都有一个表项与之对应,每个表项描述对应的一个节的信息
Executable
- ELF header section
- Program header section
.init
: initialization section.text
: code section.rodata
: read-only data section.data
: initialized data section.bss
: uninitialized data section.symtab
: symbol table section.rel.txt
: code relocation info section.rel.dat
: data relocation info section.debug
: debug info section.strtab
.line
- Section header table
In which, 0~5 are read-only, 6~7 are read-write, rest are not loaded during runtime.
Symbols
Classification
- Global Symbols
- Non-static function and non-static global variables.
- Local Symbols
- Static functions and global variables.
Note
Local variables are defined during runtime, stored temporarily in stack. They are NOT local symbols, and NOT linker’s concern.
- Static functions and global variables.
- External Symbols
- Global symbols defined externally, i.e. defined in other modules but referenced by this module.
Symbol Resolution
也称 符号绑定,是将 引用符号 与 定义符号 建立关联的过程。
全局符号的解析
强 / 弱特性
- 强符号定义
- 函数名和已初始化的全局变量名
- 弱符号定义
- 未初始化的全局变量名
Note
局部符号无强弱特性
链接器对多重定义符号的处理规则
- 强符号不能多次定义
- 若一个符号被定义为一次强符号和多次弱符号,则按强定义为准
- 若有多个弱符号定义,则任选其中一个
Procedure
Linker maintains three sets, namely
- E: target files that will be merged into the final executable
- U: unresoluted references of symbols
- D: resoluted symbols
Example
gcc -static -o myfunc func.o libx.a liby.a libz.a
Note
Order counts! First define, then reference. Otherwise, there will be symbols left in the U set, causing linker error.
Relocation / 重定位
\[\begin{align} \text{Relocation Value} &= \text{Target Addr.} - \text{PC} \,, \\ \\ \text{where} \> \text{PC} &= \text{Addr. of the next instruction} \\ &= \text{addr. of section .text} + \text{addr. of current instruction} - \text{init} \,. \\ \end{align}\]Example
Assuming <main>
is at 0x08048380
, <swap>
at 0x08048394
.
00000000 <main>:
...
6: e8 fc ff ff ff call <main+0x7>
...
After linking,
08048380 <main>:
...
8048386: e8 09 00 00 00 call 8048394
...
Methodology
Integer Division
Example
\[-14 / 4 = -3\]Solution
\[\begin{align} &\because 4_{10} = 100_{2} \,, \\ &\therefore K := 2 \,, \\ &\therefore (-14 + (2^K - 1)) \gg 4 = -3 \,. \end{align}\]I.e.
1111_0010 + 0000_0011 = 1111_0101
1111_0101 >> 2 = 1111_1101 = -3
Integer Multiplication
- Break down to additions
Floating-Point Addition
- 求阶差
- 对阶
- 小阶向大阶看齐
- 为保证精度,保留部分尾数为 附加位
- 尾数加减
- [IEEE 754] 定点源码小数加减,带附加位
- 规格化
- \[\pm 0.\text{bbb}\dots \> \text{and} \> \pm 1\text{b}.\text{bbb}\dots \Rightarrow \pm 1.\text{bbb}\dots\]
- 尾数舍入
- 若运算结果尾数为 0,则需要将阶码也置 0
Floating-Point Multiplication / Division
- 尾数相乘 / 除,阶码相加减
Special Notes
- In the C programming language…
- when using the
<<
and>>
operators, logical shift is applied tounsigned
-s, arithmetic shift is applied tosigned
-s. - when casting between types of different bit length, bit extension / cut-off will take place.
- sign-bit extension is applied to
unsigned
-s, whilst 0 extension is applied tosigned
-s.
- sign-bit extension is applied to
- when using the