C-模数上按位运算的算法(非2的幂)

本文介绍了C-模数上按位运算的算法(非2的幂)的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道可以使用按位运算符来计算2的幂的模数

I know that modulo of power of 2 can be calculated using bitwise operator

  x % 2^n == x & (2^n - 1).

但是我想知道是否存在任何广义的按位算法来发现任何数字的模数都不是2的幂.例如，

But I am wondering is there any generalized bitwise algorithm exists to find the modulus of any number is not a power of 2. For example,

7%5

先谢谢您.

推荐答案

有两种特殊情况，包括5.

There are a couple, for special cases, including 5.

从16≡1(模5)开始，您可以做的一个技巧是将变量拆分为4位半字节，在表中查找每个半字节的模数，然后将这些值相加以获得原始模数.数字.

Since 16 ≡ 1 (mod 5), a trick you could do is split your variable into 4-bit nibbles, look up the modulus of each nibble in a table, and add the values together to get the modulus of the original number.

该程序使用位域，表查找和加法.它也适用于模数为3或15的模数，并且可以通过更大的查找表扩展到更大的块.

This program uses bitfields, table lookups, and addition. It would also work for modulo 3 or 15 and could be extended to larger chunks with a bigger lookup table.

#include <assert.h>
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>

typedef struct bitfield64_t {
  uint64_t b0 : 4;
  uint64_t b1 : 4;
  uint64_t b2 : 4;
  uint64_t b3 : 4;
  uint64_t b4 : 4;
  uint64_t b5 : 4;
  uint64_t b6 : 4;
  uint64_t b7 : 4;
  uint64_t b8 : 4;
  uint64_t b9 : 4;
  uint64_t b10 : 4;
  uint64_t b11 : 4;
  uint64_t b12 : 4;
  uint64_t b13 : 4;
  uint64_t b14 : 4;
  uint64_t b15 : 4;
} bitfield64_t;

typedef union pun64_t {
  uint64_t u;
  bitfield64_t b;
} pun64_t;

/* i%5 for i in [0,19].  The upper bound guarantees that nibble_mod5[a+b] is
 * valid whenever a<16 and b<5.
 */
const unsigned nibble_mod5[20] = {
  0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4
};

unsigned add_mod5( const unsigned a, const unsigned b )
/* Returns (a + b) % 5, where
 *   a < 16
 *   b < 5
 */
{
  assert(a < 16);
  assert(b < 5);
  return nibble_mod5[a + b];
}

int main( const int argc, const char* argv[] )
{
  int64_t n;

  if ( argc != 2 ) {
    fprintf( stderr,
             "Call this program with an unsigned number as its argument.\n" );
    return EXIT_FAILURE;
  }

  if ( 1 != sscanf( argv[1], "%lld", &n ) || n < 0 ) {
    fprintf( stderr,
             "The argument must be an unsigned number.\n" );
    return EXIT_FAILURE;
  }

  const pun64_t p = { .u = (uint64_t)n };
  const unsigned result =
    add_mod5( p.b.b15,
    add_mod5( p.b.b14,
    add_mod5( p.b.b13,
    add_mod5( p.b.b12,
    add_mod5( p.b.b11,
    add_mod5( p.b.b10,
    add_mod5( p.b.b9,
    add_mod5( p.b.b8,
    add_mod5( p.b.b7,
    add_mod5( p.b.b6,
    add_mod5( p.b.b5,
    add_mod5( p.b.b4,
    add_mod5( p.b.b3,
    add_mod5( p.b.b2,
    add_mod5( p.b.b1,
    nibble_mod5[p.b.b0] )))))))))))))));

   printf( "%u\n", result );
   assert( result == n % 5 );
   return EXIT_SUCCESS;
}

要求大数的模数，您可以利用16的幂等于1模5的事实.因此，您的单词大小 w 是否为2⁸，2ⁱ⁶，2³²或2⁶⁴，您可以将您的大数写为a⁰w⁰+a¹w¹+a2w²+ ...≅a₀1⁰+a¹1¹+a²1²+ ...≡a₀+a₁+ a2 + ...(mod 5).这也是为什么任何数字的十进制数字之和与原始数字模3或9:10≡1(mod 3)一致的原因.

For finding the modulus of a bignum, you can take advantage of the fact that any power of 16 is congruent to 1 modulo 5. Therefore, whether your word size w is 2⁸, 2ⁱ⁶, 2³² or 2⁶⁴, you can write your bignum as a₀w⁰ + a₁w¹ + a₂w² + ... ≅ a₀1⁰ + a₁1¹ + a₂1² + ... ≡ a₀ + a₁ + a₂ + ... (mod 5). This is also why the sum of the decimal digits of any number is congruent to the original number modulo 3 or 9: 10 ≡ 1 (mod 3).

这也适用于字节的3、5、15和17，16位字的因子为255和257，32位字的因子为65,535和65,537.如果您注意到该模式，那是因为b²ⁿ=(bⁿ+ 1)(bⁿ-1)+1，其中b = 2且n = 2、4、8或16.

This also works for 3, 5, 15 and 17 on bytes, factors of 255 and 257 on 16-bit words and factors of 65,535 and 65,537 on 32-bit words. If you notice the pattern, it's because b²ⁿ = (bⁿ+1)(bⁿ-1) + 1, where b = 2 and n = 2, 4, 8 or 16.

您可以将此方法的变体应用于任何n，以使您的块大小等于-1(mod n):交替加减.之所以起作用是因为a₀w⁰+a₁w¹+a2w²+ ...≡a₀(-1)⁰+a₁(-1)¹+a²(-1)²+ ...≡a₀-a₁+ a2-...(mod n )，但用处不大，因为n的许多这样的值都是梅森素数.这类似于您可以通过从右到左并加，减，加和减数字来获取任意小数的mod 11的方式，例如144≅4-4 +1≡1(mod 11).就像数字一样，您可以对五位块进行相同的操作，因为32(如10)也等于-1模11.

You can apply a variant of this method to any n such that your chunk size is congruent to -1 (mod n): alternate addition and subtraction. It works because a₀w⁰ + a₁w¹ + a₂w² + ... ≡ a₀(-1)⁰ + a₁(-1)¹ + a₂(-1)² + ... ≡ a₀ - a₁ + a₂ - ... (mod n), but is less useful because many such values of n are Mersenne primes. It’s similar to how you can take mod 11 of any decimal by going right to left and adding, subtracting, adding and subtracting the digits, e.g. 144 ≅ 4 - 4 + 1 ≡ 1 (mod 11). Just like with digits, you could do the same trick with five-bit chunks, since 32, like 10, is also congruent to -1 modulo 11.

当 w ≡ w ²c(mod b)时，会发生另一种有用的特殊情况.然后，您有aww + aww + a2w2 + ... aa·1 + acc + a2c + ... aa + c(a₁+ a2 + ...)(mod b).这类似于10≡100≡1000≡...≡4(mod 6)的模数，因此任何数字都等于其最后一位加上模数为6的其余位数之和的四倍.每个字节加一个，然后乘以一个小常数即可乘以一个或两个位移.例如，要获取mod 20，可以将除最低顺序字节之外的所有字节都添加到mod 20中，将总和乘以256 mod 20 = 16(这只是左移4)，然后添加最后一个字节.这可能非常方便:不对余数为1或0的数字进行计数，这适用于以6、10和12为模的半字节，以及以这些值与20、24、30、34、40、48、60、68为模的字节，80、96、102、120、136、160、170、192、204和240.

Another useful special case occurs when w ≡ w² ≡ c (mod b). Then you have a₀w⁰ + a₁w¹ + a₂w² + ... ≡ a₀·1 + a₁c + a₂c + ... ≡ a₀ + c(a₁ + a₂ + ...) (mod b). This is analogous to how 10 ≡ 100 ≡ 1000 ≡ ... ≡ 4 (mod 6), so any number is congruent to its last digit plus four times the sum of its remaining digits, modulo 6. The computation can be a lookup and an addition per byte, and one multiplication by a small constant that you can do with a bit shift or two. For example, to take mod 20, you can add all but the lowest-order bytes mod 20, multiply the sum by 256 mod 20 = 16, which is just a left shift of 4, then add the final byte. This can be very convenient: not counting numbers that give remainders of 1 or 0, this works with nibbles modulo 6, 10 and 12, and with bytes modulo those values and 20, 24, 30, 34, 40, 48, 60, 68, 80, 96, 102, 120, 136, 160, 170, 192, 204 and 240.

如果数字可以表示为特殊情况的乘积，则可以使用中国余数定理来求解.例如，77 = 11×7、32 -1模11和8 -1模7，因此您可以找到除以11和7的余数，确定除以77的余数.大多数小的素数都落入一个前面讨论过的特殊情况.

If a number can be expressed as the product of special cases, you can solve for it using the Chinese Remainder Theorem. For example, 77 = 11×7, 32 ≡ -1 mod 11, and 8 ≡ 1 mod 7, so you could find the remainders divided by 11 and 7, which determine the remainder divided by 77. Most small prime numbers fall into one of the special cases previously discussed.

许多后来的RISC体系结构都有硬件划分，但没有模数，并告诉程序员通过计算a-(a/b)*b来计算a%b. ARM A64是当今使用最多的一种.如果您也没有硬件部门，请查看此答案.在此处中给出了另一种方法的示例，该示例的基数是一个小常数.并且在CISC架构上得到了广泛使用.

Many later RISC architectures had hardware divide but not modulus, and told programmers to compute a%b by computing a-(a/b)*b. ARM A64 is the one most in use today. If you don’t have hardware division either, check out this answer. An example of another approach when the base is a small constant is given here, and is widely-used on CISC architectures.

还有一种算法由肖恩·安德森(Sean Anderson)于2001年编写，但可能更早被发现以小于2的幂的数字来计算模量.它与我上面使用的技术类似，但是依赖于位移，并且可以扩展为任何(1<<s)-1因子.这几乎就是您要寻找的！

There is also an algorithm written by Sean Anderson in 2001 but probably discovered earlier to compute the modulus by a number one less than a power of 2. It’s similar to the technique I used above, but relies on bit shifts and could be extended to any factor of (1<<s)-1. That’s almost what you’re looking for!

通常，您的优化编译器应已使用最有效的方法在硬件上实现%.在您的示例中，任何不错的编译器都将折叠常量并将7%5优化为2.

Generally, your optimizing compiler should be using the most efficient method to implement % on your hardware already. In your example, any decent compiler will just fold the constants and optimize 7%5 to 2.

这篇关于C-模数上按位运算的算法(非2的幂)的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！

MOD