关于glibc中strlen
的实现,我有两个问题。
该实现使用了一个带有“holes”的幻数。我不明白这是怎么回事。有人能帮我理解这段话吗:
size_t
strlen (const char *str)
{
const char *char_ptr;
const unsigned long int *longword_ptr;
unsigned long int longword, himagic, lomagic;
/* Handle the first few characters by reading one character at a time.
Do this until CHAR_PTR is aligned on a longword boundary. */
for (char_ptr = str; ((unsigned long int) char_ptr
& (sizeof (longword) - 1)) != 0;
++char_ptr)
if (*char_ptr == '\0')
return char_ptr - str;
/* All these elucidatory comments refer to 4-byte longwords,
but the theory applies equally well to 8-byte longwords. */
longword_ptr = (unsigned long int *) char_ptr;
/* Bits 31, 24, 16, and 8 of this number are zero. Call these bits
the "holes." Note that there is a hole just to the left of
each byte, with an extra at the end:
bits: 01111110 11111110 11111110 11111111
bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD
The 1-bits make sure that carries propagate to the next 0-bit.
The 0-bits provide holes for carries to fall into. */
himagic = 0x80808080L;
lomagic = 0x01010101L;
if (sizeof (longword) > 4)
{
/* 64-bit version of the magic. */
/* Do the shift in two steps to avoid a warning if long has 32 bits. */
himagic = ((himagic << 16) << 16) | himagic;
lomagic = ((lomagic << 16) << 16) | lomagic;
}
if (sizeof (longword) > 8)
abort ();
/* Instead of the traditional loop which tests each character,
we will test a longword at a time. The tricky part is testing
if *any of the four* bytes in the longword in question are zero. */
for (;;)
{
longword = *longword_ptr++;
if (((longword - lomagic) & ~longword & himagic) != 0)
{
/* Which of the bytes was the zero? If none of them were, it was
a misfire; continue the search. */
const char *cp = (const char *) (longword_ptr - 1);
if (cp[0] == 0)
return cp - str;
if (cp[1] == 0)
return cp - str + 1;
if (cp[2] == 0)
return cp - str + 2;
if (cp[3] == 0)
return cp - str + 3;
if (sizeof (longword) > 4)
{
if (cp[4] == 0)
return cp - str + 4;
if (cp[5] == 0)
return cp - str + 5;
if (cp[6] == 0)
return cp - str + 6;
if (cp[7] == 0)
return cp - str + 7;
}}}
这个神奇的数字是用来干什么的?
为什么不简单地增加指针直到空字符并返回计数呢?这种方法更快吗?为什么会这样?
最佳答案
这用于一次性查看4个字节(32位)甚至8个字节(64位),以检查其中一个字节是否为零(字符串结尾),而不是逐个检查每个字节。
下面是一个检查空字节的示例:
unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);
有关更多信息,请参见Bit Twiddling Hacks。
此处使用的(32位示例):
还有一种更快的方法-使用hasless(v,1),它被定义为
下面,它在4个操作中工作,不需要子序列
验证。它简化为
#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)
子表达式(v-0x01010101ul)的计算结果为
当v中的相应字节为零或大于
0x80。子表达式~v&0x808080ul计算为高位集
以字节为单位,其中v的字节没有设置其高位(因此
字节小于0x80)。最后,通过将这两个子表达式
结果是在v中的字节为零的高位集,因为
由于第一个值大于0x80而设置的高位
子表达式被第二个隐藏。
一次查看一个字节的CPU周期至少与查看完整的整数值(寄存器范围)的CPU周期相同。在该算法中,检查完整整数是否包含零。如果不是,则使用少量指令,并可以跳转到下一个完整整数。如果里面有一个零字节,则进一步检查它到底在什么位置。