Not exactly periodic but I did 2Mb/s interrupt driven bi-directional serial communication. That is about 5uS between characters and there were 2 interrupts per character (one to receive, the other to transmit answer). In other words, about 400kHz inerrupt rate. That was on STM32F103 running at 72 MHz (that is Cortex M3). I also tried 3Mb/s, but apparently that was too much for USB bus in PC (standard 12Mb/s port).
Concerning interrupt overhead, for STM32F030 running code from RAM overhead seem to be between 26-28 clocks. More precisely, I had very simple interrupt handler that just increments a variable (millisecond counter). "Work" part of the interrupt handler should execute in 7 clocks. When I timed busy loop interrupt increased execution time of the loop by 33-35 clocks. That agrees reasonably well with cycle counts for Cortex-M0 published in ARM forums: 16 clocks to enter to interrupt handler and 12 clocks to get back to main program. Processor in Pi Pico is Cortex-M0+ which is supposed to take 15 clocks to enter to interrupt handler. So you can expect 1 clock less overhead than for Cortex-M0.
Concerning useful procedures, there is a lot of things which can slow down the code. For example read-modify-write cycle on I/O port is likely to insert some extra wait states. Most MCU-s execute code from flash, and usually flash can not run at max CPU speed so there are extra wait states. For example Cortex-M4 running from one RAM bank and having stack in separate RAM bank can do interrupt like above in 27-28 clocks, so overhead probably is 20-21 cycles (I write probably because Cortex-M4 has complex rules concerning instruction times so I am not sure if interrupt handler takes 7 clocks). But different configuration can brings time up to 42-48 clocks. Cortex-M3 (which should have very close times to Cortex-M4) running from flash with 0 wait states (8MHz clock) needs 24 clocks to execute interrupt handler, but with
2 wait states (needed to run at 72MHz) needs 29 to 31 clocks and more when there are more wait states.RP2040 in Pi Pico normally runs form RAM, so should be free from slowdown due to flash. But with two cores and several DMA channels there may be bus contention. Still, interrups rates of order 1M/s should not be a problem.