原名叫: Computer Systems, a programmer’s perspective
应该在译名前加一个 “从程序员的角度”
http://book.douban.com/subject/1230413/
前言节选 3
preface 7
about the authors 21
1 a tour of computer systems 35
1.1 information is bits + context 37
1.2 programs are translated by other programs into different forms 38
1.3 it pays to understand how compilation systems work 40
1.4 processors read and interpret instructions stored in memory 41
1.4.1 hardware organization of a system 41
1.4.2 running the hello program 44
1.5 caches matter 46
1.6 storage devices form a hierarchy 47
1.7 the operating system manages the hardware 48
1.7.1 processes 50
1.7.2 threads 51
1.7.3 virtual memory 51
1.7.4 files 53
1.8 systems communicate with other systems using networks 54
1.9 important themes 55
1.9.1 concurrency and parallelism 55
.1.9.2 the importance of abstractions in computer systems 58
1.10 summary 59
bibliographic notes 60
part i program structure and execution
2 representing and manipulating information 63
2.1 information storage 67
2.1.1 hexadecimal notation 68
2.1.2 words 72
2.1.3 data sizes 72
computer systems contents
a programmer’s perspective, 2e
24 contents
2.1.4 addressing and byte ordering 73
2.1.5 representing strings 80
2.1.6 representing code 81
2.1.7 introduction to boolean algebra 82
2.1.8 bit-level operations in c 85
2.1.9 logical operations in c 88
2.1.10 shift operations in c 88
2.2 integer representations 90
2.2.1 integral data types 91
2.2.2 unsigned encodings 92
2.2.3 two’s-complement encodings 94
2.2.4 conversions between signed and unsigned 99
2.2.5 signed vs. unsigned in c 103
2.2.6 expanding the bit representation of a number 105
2.2.7 truncating numbers 109
2.2.8 advice on signed vs. unsigned 110
2.3 integer arithmetic 113
2.3.1 unsigned addition 113
2.3.2 two’s-complement addition 117
2.3.3 two’s-complement negation 121
2.3.4 unsigned multiplication 122
2.3.5 two’s-complement multiplication 123
2.3.6 multiplying by constants 126
2.3.7 dividing by powers of two 129
2.3.8 final thoughts on integer arithmetic 132
2.4 floating point 133
2.4.1 fractional binary numbers 134
2.4.2 ieee floating-point representation 137
2.4.3 example numbers 139
2.4.4 rounding 144
2.4.5 floating-point operations 147
2.4.6 floating point in c 148
2.5 summary 152
bibliographic notes 153
homework problems 153
solutions to practice problems 168
3 machine-level representation of programs 187
3.1 a historical perspective 190
3.2 program encodings 193
contents 25
3.2.1 machine-level code 194
3.2.2 code examples 196
3.2.3 notes on formatting 199
3.3 data formats 201
3.4 accessing information 202
3.4.1 operand specifiers 203
3.4.2 data movement instructions 205
3.4.3 data movement example 208
3.5 arithmetic and logical operations 211
3.5.1 load effective address 211
3.5.2 unary and binary operations 212
3.5.3 shift operations 213
3.5.4 discussion 214
3.5.5 special arithmetic operations 216
3.6 control 219
3.6.1 condition codes 219
3.6.2 accessing the condition codes 221
3.6.3 jump instructions and their encodings 223
3.6.4 translating conditional branches 227
3.6.5 loops 231
3.6.6 conditional move instructions 240
3.6.7 switch statements 247
3.7 procedures 253
3.7.1 stack frame structure 253
3.7.2 transferring control 255
3.7.3 register usage conventions 257
3.7.4 procedure example 258
3.7.5 recursive procedures 263
3.8 array allocation and access 266
3.8.1 basic principles 266
3.8.2 pointer arithmetic 267
3.8.3 nested arrays 269
3.8.4 fixed-size arrays 271
3.8.5 variable-size arrays 272
3.9 heterogeneous data structures 275
3.9.1 structures 275
3.9.2 unions 278
3.9.3 data alignment 282
3.10 putting it together: understanding pointers 286
3.11 life in the realworld: using the gdb debugger 288
3.12 out-of-bounds memory references and buffer overflow 290
3.12.1 thwarting buffer overflow attacks 295
26 contents
3.13 x86-64: extending ia32 to 64 bits 301
3.13.1 history and motivation for x86-64 302
3.13.2 an overview of x86-64 304
3.13.3 accessing information 307
3.13.4 control 313
3.13.5 data structures 324
3.13.6 concluding observations about x86-64 325
3.14 machine-level representations of floating-point programs 326
3.15 summary 327
bibliographic notes 328
homework problems 328
solutions to practice problems 342
4 processor architecture 367
4.1 the y86 instruction set architecture 370
4.1.1 programmer-visible state 370
4.1.2 y86 instructions 371
4.1.3 instruction encoding 373
4.1.4 y86 exceptions 378
4.1.5 y86 programs 379
4.1.6 some y86 instruction details 384
4.2 logic design and the hardware control language hcl 386
4.2.1 logic gates 387
4.2.2 combinational circuits and hcl boolean expressions 388
4.2.3 word-level combinational circuits and hcl integer
expressions 389
4.2.4 set membership 394
4.2.5 memory and clocking 395
4.3 sequential y86 implementations 398
4.3.1 organizing processing into stages 398
4.3.2 seq hardware structure 409
4.3.3 seq timing 413
4.3.4 seq stage implementations 417
4.4 general principles of pipelining 425
4.4.1 computational pipelines 426
4.4.2 a detailed look at pipeline operation 427
4.4.3 limitations of pipelining 428
4.4.4 pipelining a system with feedback 432
4.5 pipelined y86 implementations 434
4.5.1 seq+: rearranging the computation stages 434
contents 27
4.5.2 inserting pipeline registers 435
4.5.3 rearranging and relabeling signals 439
4.5.4 next pc prediction 440
4.5.5 pipeline hazards 442
4.5.6 avoiding data hazards by stalling 447
4.5.7 avoiding data hazards by forwarding 449
4.5.8 load/use data hazards 452
4.5.9 exception handling 454
4.5.10 pipe stage implementations 457
4.5.11 pipeline control logic 465
4.5.12 performance analysis 478
4.5.13 unfinished business 480
4.6 summary 483
4.6.1 y86 simulators 484
bibliographic notes 485
homework problems 485
solutions to practice problems 491
5 optimizing program performance 507
5.1 capabilities and limitations of optimizing compilers 510
5.2 expressing program performance 514
5.3 program example 516
5.4 eliminating loop inefficiencies 520
5.5 reducing procedure calls 524
5.6 eliminating unneeded memory references 525
5.7 understanding modern processors 530
5.7.1 overall operation 531
5.7.2 functional unit performance 534
5.7.3 an abstract model of processor operation 536
5.8 loop unrolling 543
5.9 enhancing parallelism 547
5.9.1 multiple accumulators 548
5.9.2 reassociation transformation 552
5.10 summary of results for optimizing combining code 558
5.11 some limiting factors 559
5.11.1 register spilling 559
5.11.2 branch prediction and misprediction penalties 560
5.12 understanding memory performance 565
5.12.1 load performance 565
5.12.2 store performance 566
28 contents
5.13 life in the real world: performance improvement techniques 573
5.14 identifying and eliminating performance bottlenecks 574
5.14.1 program profiling 574
5.14.2 using a profiler to guide optimization 576
5.14.3 amdahl’s law 579
5.15 summary 581
bibliographic notes 582
homework problems 583
solutions to practice problems 586
6 the memory hierarchy 593
6.1 storage technologies 595
6.1.1 random-access memory 595
6.1.2 disk storage 604
6.1.3 solid state disks 615
6.1.4 storage technology trends 617
6.2 locality 620
6.2.1 locality of references to program data 621
6.2.2 locality of instruction fetches 622
6.2.3 summary of locality 623
6.3 the memory hierarchy 625
6.3.1 caching in the memory hierarchy 626
6.3.2 summary of memory hierarchy concepts 629
6.4 cache memories 630
6.4.1 generic cache memory organization 631
6.4.2 direct-mapped caches 633
6.4.3 set associative caches 640
6.4.4 fully associative caches 642
6.4.5 issues with writes 645
6.4.6 anatomy of a real cache hierarchy 646
6.4.7 performance impact of cache parameters 648
6.5 writing cache-friendly code 649
6.6 putting it together: the impact of caches on program performance 654
6.6.1 the memory mountain 655
6.6.2 rearranging loops to increase spatial locality 659
6.6.3 exploiting locality in your programs 663
6.7 summary 663
bibliographic notes 664
homework problems 665
solutions to practice problems 676
contents 29
part ii running programs on a system
7 linking 687
7.1 compiler drivers 689
7.2 static linking 691
7.3 object files 691
7.4 relocatable object files 692
7.5 symbols and symbol tables 694
7.6 symbol resolution 697
7.6.1 how linkers resolve multiply defined global symbols 698
7.6.2 linking with static libraries 701
7.6.3 how linkers use static libraries to resolve references 704
7.7 relocation 706
7.7.1 relocation entries 706
7.7.2 relocating symbol references 707
7.8 executable object files 712
7.9 loading executable object files 713
7.10 dynamic linking with shared libraries 715
7.11 loading and linking shared libraries from applications 717
7.12 position-independent code (pic) 721
7.13 tools for manipulating object files 724
7.14 summary 725
bibliographic notes 725
homework problems 726
solutions to practice problems 732
8 exceptional control flow 735
8.1 exceptions 737
8.1.1 exception handling 738
8.1.2 classes of exceptions 740
8.1.3 exceptions in linux/ia32 systems 742
8.2 processes 746
8.2.1 logical control flow 746
8.2.2 concurrent flows 747
8.2.3 private address space 748
8.2.4 user and kernel modes 748
8.2.5 context switches 750
30 contents
8.3 system call error handling 751
8.4 process control 752
8.4.1 obtaining process ids 753
8.4.2 creating and terminating processes 753
8.4.3 reaping child processes 757
8.4.4 putting processes to sleep 763
8.4.5 loading and running programs 764
8.4.6 using fork and execve to run programs 767
8.5 signals 770
8.5.1 signal terminology 772
8.5.2 sending signals 773
8.5.3 receiving signals 776
8.5.4 signal handling issues 779
8.5.5 portable signal handling 786
8.5.6 explicitly blocking and unblocking signals 787
8.5.7 synchronizing flows to avoid nasty concurrency bugs 789
8.6 nonlocal jumps 793
8.7 tools for manipulating processes 796
8.8 summary 797
bibliographic notes 797
homework problems 798
solutions to practice problems 805
9 virtual memory 809
9.1 physical and virtual addressing 811
9.2 address spaces 812
9.3 vm as a tool for caching 813
9.3.1 dram cache organization 814
9.3.2 page tables 814
9.3.3 page hits 816
9.3.4 page faults 816
9.3.5 allocating pages 817
9.3.6 locality to the rescue again 818
9.4 vm as a tool for memory management 819
9.5 vm as a tool for memory protection 820
9.6 address translation 821
9.6.1 integrating caches and vm 825
9.6.2 speeding up address translation with a tlb 825
9.6.3 multi-level page tables 826
9.6.4 putting it together: end-to-end address translation 828
9.7 case study: the intel core i7/linux memory system 833
contents 31
9.7.1 core i7 address translation 834
9.7.2 linux virtual memory system 837
9.8 memory mapping 841
9.8.1 shared objects revisited 841
9.8.2 the fork function revisited 843
9.8.3 the execve function revisited 844
9.8.4 user-level memory mapping with the mmap function 844
9.9 dynamic memory allocation 846
9.9.1 the malloc and free functions 848
9.9.2 why dynamic memory allocation? 850
9.9.3 allocator requirements and goals 851
9.9.4 fragmentation 853
9.9.5 implementation issues 854
9.9.6 implicit free lists 854
9.9.7 placing allocated blocks 856
9.9.8 splitting free blocks 857
9.9.9 getting additional heap memory 857
9.9.10 coalescing free blocks 858
9.9.11 coalescing with boundary tags 858
9.9.12 putting it together: implementing a simple allocator 861
9.9.13 explicit free lists 869
9.9.14 segregated free lists 870
9.10 garbage collection 872
9.10.1 garbage collector basics 873
9.10.2 mark&sweep garbage collectors 874
9.10.3 conservative mark&sweep for c programs 876
9.11 common memory-related bugs in c programs 877
9.11.1 dereferencing bad pointers 877
9.11.2 reading uninitialized memory 877
9.11.3 allowing stack buffer overflows 878
9.11.4 assuming that pointers and the objects they point to are the same size 878
9.11.5 making off-by-one errors 879
9.11.6 referencing a pointer instead of the object it points to 879
9.11.7 misunderstanding pointer arithmetic 880
9.11.8 referencing nonexistent variables 880
9.11.9 referencing data in free heap blocks 881
9.11.10 introducing memory leaks 881
9.12 summary 882
bibliographic notes 882
homework problems 883
solutions to practice problems 887
32 contents
part iii interaction and communication between
programs
10 system-level i/o 895
10.1 unix i/o 896
10.2 opening and closing files 897
10.3 reading and writing files 899
10.4 robust reading and writing with the rio package 901
10.4.1 rio unbuffered input and output functions 901
10.4.2 rio buffered input functions 902
10.5 reading file metadata 907
10.6 sharing files 909
10.7 i/o redirection 911
10.8 standard i/o 913
10.9 putting it together: which i/o functions should i use? 914
10.10 summary 915
bibliographic notes 916
homework problems 916
solutions to practice problems 917
11 network programming 919
11.1 the client-server programming model 920
11.2 networks 921
11.3 the global ip internet 925
11.3.1 ip addresses 927
11.3.2 internet domain names 929
11.3.3 internet connections 933
11.4 the sockets interface 934
11.4.1 socket address structures 935
11.4.2 the socket function 936
11.4.3 the connect function 937
11.4.4 the open_clientfd function 937
11.4.5 the bind function 938
11.4.6 the listen function 939
11.4.7 the open_listenfd function 939
11.4.8 the accept function 941
11.4.9 example echo client and server 942
contents 33
11.5 web servers 945
11.5.1 web basics 945
11.5.2 web content 946
11.5.3 http transactions 948
11.5.4 serving dynamic content 950
11.6 putting it together: the tiny web server 953
11.7 summary 961
bibliographic notes 962
homework problems 962
solutions to practice problems 963
12 concurrent programming 967
12.1 concurrent programming with processes 969
12.1.1 a concurrent server based on processes 970
12.1.2 pros and cons of processes 971
12.2 concurrent programming with i/o multiplexing 973
12.2.1 a concurrent event-driven server based on i/omultiplexing 976
12.2.2 pros and cons of i/o multiplexing 980
12.3 concurrent programming with threads 981
12.3.1 thread execution model 982
12.3.2 posix threads 982
12.3.3 creating threads 984
12.3.4 terminating threads 984
12.3.5 reaping terminated threads 985
12.3.6 detaching threads 985
12.3.7 initializing threads 986
12.3.8 a concurrent server based on threads 986
12.4 shared variables in threaded programs 988
12.4.1 threads memory model 989
12.4.2 mapping variables to memory 990
12.4.3 shared variables 990
12.5 synchronizing threads with semaphores 991
12.5.1 progress graphs 994
12.5.2 semaphores 997
12.5.3 using semaphores for mutual exclusion 998
12.5.4 using semaphores to schedule shared resources 1000
12.5.5 putting it together: a concurrent server based on prethreading 1004
12.6 using threads for parallelism 1008
34 contents
12.7 other concurrency issues 1013
12.7.1 thread safety 1013
12.7.2 reentrancy 1014
12.7.3 using existing library functions in threaded programs 1016
12.7.4 races 1017
12.7.5 deadlocks 1019
12.8 summary 1022
bibliographic notes 1023
homework problems 1023
solutions to practice problems 1028
a error handling 1033
a.1 error handling in unix systems 1034
a.2 error-handling wrappers 1035
references 1039
index 1045