I don't think I've posted anything in years, but I just couldn't resist adding to this one because I played with it for some time.
As the previous posters said -- it depends on the device. But I'd also add (in some detail) it also depends even within one device.
To answer the question most directly an 8:1 mux requires two slices in Virtex IV or 2 Adaptive Logic Modules (ALMs) in Stratix II. But whether you actually get that in a full system depends on the struture of your design.
The Virtex IV version is easy to see because it's just the output of the F6 mux provided as dedicated hardware. Spartan III is a cost-reduced Virtex IV, so it should behave identically.
In Stratix II we can do it without the need for dedicated hardware but it's a bit trickier to synthesize:
For Z = mux(d0,d1,d2,d3,d4,d5,d6,d7; s0,s1) synthesis will give you: y0 = mux(d0,d1,d2; s0,s1) y1 = mux(d4,d5,d6; s0,s1) which are two 5-input functions that pack into a single ALM.
In the second ALM z0 = (s0 & s1 & d3) # !(s0 & s1) & y0 z1 = (s0 & s1 & d7) # !(s0 & s1) & y1 Z = mux(z0,z1,s2) will be generated using 7-LUT mode.
I attached Verilog at the end if you want to run it through Quartus, and you can look at the result in the equation file and will see what I just described. Note that depending on what else is in the design the
5-LUTs might get packed differently or synthesized differently i.e. Quartus may prefer to pack the two 5-LUTs with two unrelated 2 or 3-LUTs to make two 7-input ALMs rather than 1 8-input ALM and a second 6 input ALM or may synthesize differently at the cost of area to hit a delay constraint.
On older devices (Altera Stratix, Cyclone; Xilinx Spartan I, 4000) and on MAX II and Cyclone II, you can basically use "4-LUT" in the discussion below, though it will depend on other issues in practice. I haven't thought about PTERM devices like MAX 7000.
But this brings me to the bigger discussion. I would stress that in practice it makes a big difference what the surrounding context is, and also if you have more than one mux in your design, because in a mux system like a barrel shifter or crossbar the amortized cost of k muxes in Stratix II is less than k times the cost of one (which is a benefit over Virtex IV).
In a generic 4-LUT architecture with no dedicated hardware, a simple
2:1 mux is a 3-input function and takes one LUT (with one input going unused). A 4:1 mux would take two LUTs (not three -- exercise to the reader; it's easier than the 8:1 above). An 8:1 mux reqires five vanilla 4-LUTs because it's 2 4:1 muxes and 1 2:1. But it's arguably something like 4.5 LUTs (see two paragraphs down).
I already mentioned the Virtex IV hardware. Stratix-and some earlier Altera architectures have hardware that facilitates other special cases, e.g. a set of mux(a,b,c,0; s0,s1) can be implemented in a LAB cluster by stealing functionality from the LAB-wide SLOAD hardware before the DFF. So you can a restricted 4:1 mux in one LE instead of 2. (that's the "basically" in the above).
When I said context I meant this: If an 8:1 mux is followed by an AND-gate (e.g. Z = mux(a,b,c,d; s0,s1) & e), then the AND gate would be a "free" addtion to the 5 4-LUT implementation in the vanilla architecture (because there's a leftover input on the last LE), but would cost an new LE using the Virtex IV hardware. So F5 gives a a maximal 20% savings for a lone 8:1 mux, but depending on the surrounding logic the relative benefit could disappear. That's not a deficiency, you just can't count on getting the benefit in all cases. Note that if it's a 3-input AND gate, the situation reverses and the dedicated hardware is again ahead by one LE.
In reality, though, you don't probably don't care about one simple mux, you care about systems of muxes that consume huge numbers of LUTs. For example, a simple 16-bit barrel shifter
out[15:0] = in[15:0]