Gaming might be a special case, but perhaps the way round that is to adjust the clock tick period to get the required response time ?. I doubt if 2 or 3 mS would be noticeable, but there are various other issues, such as how long a key must be down to be positively identified. A polling approach may not be good enough, depending on polling interval and or the system load elsewhere. For absolutely instant response, the only way is to add hardware to post an interrupt for any keypress, with the interrupt handler deciding what to do with it. Whatever solution, the keyprees has to get to the consumer process, either through polling ar perhaps a signal and task switch, so overall system architecture matters a lot and not later into the design when problems surface :-).
The reason I use such techniques is that a lot of the work done here is on none rtos platforms, often with inadequate hardware and running things like keyscanners in the background reduces complexity and allows more flexibility in the design of mainline code. Same applies to serial comms for example, where an upper (mainline) and lower (interrupt) layer, with queues on tx and rx to connect the two, can offload a whole wedge of functionality...
Chris