Now, I have nothing against lookup tables, except for the fact that you actually have to do a lookup, which will make use of the computer's cache, or even (*gasp*) main memory.. And why do that, when there is a perfectly reasonable, constant-time (more accurately, log(word_length)-time - it's constant time, relative the number of 1-bits in the word.) method that uses only registers! (well, on a RISCy CPU that has loads of them lying around, anyhow... Almost nothing runs entirely in registers on CISC (Intel's, anyway), alas.)

Anyway, onward.. the code, in C, of course, for machines with 32-bit unsigneds:

/*
MASK1  = 01010101010101010101010101010101
MASK2  = 00110011001100110011001100110011
MASK4  = 00001111000011110000111100001111
MASK8  = 00000000111111110000000011111111
MASK16 = 00000000000000001111111111111111
*/

unsigned numbits(unsigned i)
{
    const unsigned MASK1  = 0x55555555;
    const unsigned MASK2  = 0x33333333;
    const unsigned MASK4  = 0x0f0f0f0f;
    const unsigned MASK8  = 0x00ff00ff;
    const unsigned MASK16 = 0x0000ffff;

    i = (i&MASK1 ) + (i>>1 &MASK1 );
    i = (i&MASK2 ) + (i>>2 &MASK2 );
    i = (i&MASK4 ) + (i>>4 &MASK4 );
    i = (i&MASK8 ) + (i>>8 &MASK8 );
    i = (i&MASK16) + (i>>16&MASK16);

    return i;
}
Guaranteed to amaze...

How does it work?

Simple.

It starts off treating the word as 32 1-bit counters. (ie, 32 entities that each can count from 0 to 1.) It then pairs off adjacent counters and adds each pair, the result being 16 pairs of two-bit counters. (This is ok, because the maximum value in each counter is 1, 1+1=2, and 2 fits comfortably in the range of a 2-bit counter, 0 to 3.) Continuing, it takes these 16 2-bit counters, and pairs them again, again adding each pair, to produce 8 4-bit counters, and again, we are ok, because each 2-bit counter has a max of 2, 2+2=4, and 4 fits in the range 0 to 16. We keep going pairing counters and adding, until there is only a single, 32-bit counter remaining, and it, of course, contains the result.

An example:

(My apologies to those of you using a console-mode browser like Lynx; the first step's columns are wider than 80 characters.. not much, but wider: 82 characters)

Find the number of bits in 10010111011111010101101110101111. (the answer's 22, for those of you keeping score.)

  1. break the word into 32 1-bit counters, pair, and add, to get 2-bit counters
     1 0 0 1 0 1 1 0 0 0 1 1 1 1 1 1 =  1  0  0  1  0  1  1  0  0  0  1  1  1  1  1  1
    +0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 1 =  0  1  1  1  1  1  1  1  1  1  0  1  0  0  1  1
     -------------------------------
     1 1 1 2 1 2 2 1 1 1 1 2 1 1 2 2 = 01 01 01 10 01 10 10 01 01 01 01 10 01 01 10 10
    
  2. break the word into 16 2-bit counters, pair, and add, to get 4-bit counters
     1 1 1 2 1 1 1 2 =   01   01   01   10   01   01   01   10
    +1 2 2 1 1 2 1 2 =   01   10   10   01   01   10   01   10
     ---------------   ---------------------------------------
     2 3 3 3 2 3 2 4 = 0010 0011 0011 0011 0010 0011 0010 0100
    
  3. break the word into 8 4-bit counters, pair, and add, to get 8-bit counters
     3 3 3 4 =     0010     0011     0010     0010
    +2 3 2 2 =     0011     0011     0011     0100
     -------   -----------------------------------
     5 6 5 6 = 00000101 00000110 00000101 00000110
    
  4. break the word into 4 8-bit counters, pair, and add, to get 16-bit counters
      5  5 =         00000101         00000101
    + 6  6 =         00000110         00000110
     -----   ---------------------------------
     11 11 = 0000000000001011 0000000000001011
    
  5. And finally, break the word into 2 16-bit counters, pair, and add, to get the answer, 22, in a single 32-bit counter.
     11 =                 0000000000001011
    +11 =                 0000000000001011
     --   --------------------------------
     22 = 00000000000000000000000000010110
    

I hope you had as much fun reading this, as I had writing it!