The "branchful" version is not only slower for the processor, it's also slower for a human - this one, at least, would never write code like that; the verboseness and repetition just screams "you're doing it wrong". When I see such duplication, it slows me down because I have to inspect each case to determine that there's not one that's subtly different. I would at least use a loop.
Also, shifting right by n and picking off the least significant bit (&1) may save an instruction or two, depending on the processor.
Finally, an array of bools is itself intrinsically wasteful[1], as the processor can easily test whether a certain bit is set, or set and clear bits, with a single instruction if you keep them packed them together into bytes. There's another comment here about memory layout and cache usage.
[1] It reminds me of the questions "what's the fastest way to generate/parse <bloated text format>?" in an application where you control both ends and a human almost never needs to see the data.
Also, shifting right by n and picking off the least significant bit (&1) may save an instruction or two, depending on the processor.
Finally, an array of bools is itself intrinsically wasteful[1], as the processor can easily test whether a certain bit is set, or set and clear bits, with a single instruction if you keep them packed them together into bytes. There's another comment here about memory layout and cache usage.
[1] It reminds me of the questions "what's the fastest way to generate/parse <bloated text format>?" in an application where you control both ends and a human almost never needs to see the data.