kernel 2.6.6 init/main.c 208－214行随想

gradetwo · 发表于 2005-7-19 23:25:09

版本2.6.6  init/main.c 208－214行

while ( lps_precision-- && (loopbit >>= 1) ) {
  loops_per_jiffy |= loopbit;
  ticks = jiffies;
  while (ticks == jiffies);
  ticks = jiffies;
  __delay(loops_per_jiffy);

  if (jiffies != ticks)                      /* longer than 1 tick */
loops_per_jiffy &= ~loopbit;

}

看到loops_per_jiffy |= loopbit; loops_per_jiffy &= ~loopbit; 时，
我突然很好奇它们会比loops_per_jiffy+=loopbit; loops_per_jiffy－=loopbit;
效率高多少呢，于是写了类似代码进行测试。
下面是我的例子

代码一

crow@gradetwo:~$ cat test.c
main(){

unsigned long a = 10000 ;
unsigned long b = 1542 ;
unsigned long i ,j;
for( i=0; i<65534; i++){
      for (j=0; j<65534; j++){
            a+=b ;
            a-=b ;
            }
      }
}

代码二
crow@gradetwo:~$ cat hhh.c
main(){

unsigned long a = 10000 ;
unsigned long b = 1542 ;
unsigned long i ,j;
for( i=0; i<65534; i++){
      for (j=0; j<65534; j++){
            a|=b ;
            a&=~b ;
            }
      }
}


反汇编代码：
crow@gradetwo:~$ diff test.s hhh.s
1c1
<    .file "test.c"
---
>    .file "hhh.c"
28,29c28,31
<    addl %edx, (%eax)
<    movl -8(%ebp), %edx
---
>    orl    %edx, (%eax)
>    movl -8(%ebp), %eax
>    movl %eax, %edx
>    notl %edx
31c33
<    subl %edx, (%eax)
---
>    andl %edx, (%eax)

运行时间：
crow@gradetwo:~$ time ./test

real 0m29.030s
user 0m28.910s
sys    0m0.000s
crow@gradetwo:~$ time ./hhh

real 0m23.869s
user 0m23.820s
sys    0m0.000s
按照每条指令的理论运行时间分析，test 应该比hhh高效
实际结果和理论分析完全相反，证明了代码的高效性
附各指令
ADD - Arithmetic Addition

      Usage:  ADD    dest,src
      Modifies flags: AF CF OF PF SF ZF

      Adds "src" to "dest" and replacing the original contents of "dest".
      Both operands are binary.

                              Clocks                Size
      Operands       808x  286 386 486       Bytes

      reg,reg          3    2    2    1          2
      mem,reg       16+EA 7    7    3          2-4  (W88=24+EA)
      reg,mem       9+EA 7    6    2          2-4  (W88=13+EA)
      reg,immed       4    3    2    1          3-4
      mem,immed    17+EA 7    7    3          3-6  (W88=23+EA)
      accum,immed    4    3    2    1          2-3

AND - Logical And

      Usage:  AND    dest,src
      Modifies flags: CF OF PF SF ZF (AF undefined)

      Performs a logical AND of the two operands replacing the destination
      with the result.

                              Clocks                Size
      Operands       808x  286 386 486       Bytes

      reg,reg          3    2    2    1          2
      mem,reg       16+EA 7    7    3          2-4  (W88=24+EA)
      reg,mem       9+EA 7    6    1          2-4  (W88=13+EA)
      reg,immed       4    3    2    1          3-4
      mem,immed    17+EA 7    7    3          3-6  (W88=23+EA)
      accum,immed    4    3    2    1          2-3

MOV - Move Byte or Word

      Usage:  MOV    dest,src
      Modifies flags: None

      Copies byte or word from the source operand to the destination
      operand.  If the destination is SS interrupts are disabled except
      on early buggy 808x CPUs.  Some CPUs disable interrupts if the
      destination is any of the segment registers

                              Clocks                Size
      Operands       808x  286 386 486       Bytes

      reg,reg          2    2    2    1          2
      mem,reg       9+EA 3    2    1          2-4  (W88=13+EA)
      reg,mem       8+EA 5    4    1          2-4  (W88=12+EA)
      mem,immed    10+EA 3    2    1          3-6  (W88=14+EA)
      reg,immed       4    2    2    1          2-3
      mem,accum       10 3    2    1          3 (W88=14)
      accum,mem       10 5    4    1          3 (W88=14)
      segreg,reg16    2    2    2    3          2
      segreg,mem16    8+EA 5    5    9          2-4  (W88=12+EA)
      reg16,segreg    2    2    2    3          2
      mem16,segreg    9+EA 3    2    3          2-4  (W88=13+EA)
      reg32,CR0/CR2/CR3 -    -    6    4
      CR0,reg32       -    -    10 16
      CR2,reg32       -    -    4    4          3
      CR3,reg32       -    -    5    4          3
      reg32,DR0/DR1/DR2/DR3 -    22 10          3
      reg32,DR6/DR7    -    -    22 10          3
      DR0/DR1/DR2/DR3,reg32 -    22 11          3
      DR6/DR7,reg32    -    -    16 11          3
      reg32,TR6/TR7    -    -    12 4          3
      TR6/TR7,reg32    -    -    12 4          3
      reg32,TR3                         3
      TR3,reg32                         6

      - when the 386 special registers are used all operands are 32 bits

NOT - One's Compliment Negation (Logical NOT)

      Usage:  NOT    dest
      Modifies flags: None

      Inverts the bits of the "dest" operand forming the 1s complement.

                              Clocks                Size
      Operands       808x  286 386 486       Bytes

      reg             3    2    2    1          2
      mem          16+EA 7    6    3          2-4  (W88=24+EA)

OR - Inclusive Logical OR

      Usage:  OR    dest,src
      Modifies flags: CF OF PF SF ZF (AF undefined)

      Logical inclusive OR of the two operands returning the result in
      the destination.  Any bit set in either operand will be set in the
      destination.

                              Clocks                Size
      Operands       808x  286 386 486       Bytes

      reg,reg          3    2    2    1          2
      mem,reg       16+EA 7    7    3          2-4  (W88=24+EA)
      reg,mem       9+EA 7    6    2          2-4  (W88=13+EA)
      reg,immed       4    3    2    1          3-4
      mem8,immed8    17+EA 7    7    3          3-6
      mem16,immed16 25+EA 7    7    3          3-6
      accum,immed    4    3    2    1          2-3

SUB - Subtract

      Usage:  SUB    dest,src
      Modifies flags: AF CF OF PF SF ZF

      The source is subtracted from the destination and the result is
      stored in the destination.

                              Clocks                Size
      Operands       808x  286 386 486       Bytes

      reg,reg          3    2    2    1          2
      mem,reg       16+EA 7    6    3          2-4  (W88=24+EA)
      reg,mem       9+EA 7    7    2          2-4  (W88=13+EA)
      reg,immed       4    3    2    1          3-4
      mem,immed    17+EA 7    7    3          3-6  (W88=25+EA)
      accum,immed    4    3    2    1          2-3

daemeon · 发表于 2005-7-20 03:20:18

这是算bogomips的代码吗？
楼主牛人啊！

homesp · 发表于 2005-8-22 22:06:36

gradetwo：
loops_per_jiffy |= loopbit;
与
loops_per_jiffy+=loopbit;
两个式子是实现相同功能吗？
要是两个变量的对应位上都是1,第一个还是1,但第二个式子就要成了0,还要进位吧

homesp · 发表于 2005-8-23 12:42:26

难道这两个变量不可能出现对应位同为1的情况嘛

kakuyou · 发表于 2005-9-1 14:50:28

同意homesp
loops_per_jiffy |= loopbit
不是loops_per_jiffy += loopbit

loops_per_jiffy &= ~loopbit
不是loops_per_jiffy -= loopbit

gradetwo · 发表于 2005-9-2 22:05:03

注意loops_per_jiffy在实际系统中不可能溢出进位的，所以以上替换式子是成立的

homesp · 发表于 2005-9-3 19:04:29

也就是说loops_per_jiffy 与 loopbit不存在对应位同为1的现象，那这两个变量代表什么？为何出现这种奇怪现象？它们的值是如何形成的？
哈哈，问题有点多，楼主能详细说明一下这两个变量嘛，谢谢了！！！

kakuyou · 发表于 2005-9-6 12:24:17

看过代码了，确实是，没有进位，是位循环。问题是，代码的本意真的是加减？因为都是位操作。

但是
add 和 or所消耗的时钟是不是一样intel的手册里没有明确说明，但是楼主的程序已经验证过了，是不一样。
在电路上，add是用or和and组合出来的，量级是不同的。

最后，写程序的人都知道，为了提高效率，经常会使用更快速的指令来代替算法上逻辑的指令。
例如最经典的用位移来代替算法上2乘的操作。

		自动登录	找回密码
密码			注册