It really comes down to simple math. Lets say you've been gaming at FullHD until now. You've been playing your favourite game in Ultra settings at ~30 fps. This is currently possible with a decent mid-range GPU, so why is the Rift so much different? I'll try to explain:
The Rift screen "only" has FullHD resolution, but the internal rendertarget of the Rift is actually 2364 x 1461 pixels, due to the lenswarping. So basically every Rift title needs to render 75% more pixels to get the most out of that FullHD screen. Those 30 fps are now already down to 17fps (naively asuming a linear performance scale).
You need two different perspectives, one for each eye. Some parts of the scene might not need to be computed twice, but most do which almost doubles the GPU workload per frame(this is not a static number, some games might see a less dramatic drop and game engines will likely optimize for this over time, but right now they're pretty inefficient for SBS 3D so lets go with the worst-case). That turns our 17 fps into 8fps, effectively little more than ~10% of our desired framerate of 75.
Due to reasons explained by others before, you actually need to at least match the screen refresh rate to have a good VR experience. The DK2 runs at 75 Hz, the consumer version will most likely be 90 Hz AND higher resolution. Taking the above figures we can conclude that in order to run that same game at identical detail settings, you'd actually need at least a 10x more powerfull GPU to have a perfect DK2 experience, and a lot more for the consumer version. Ouch!
For now all we can do is drop detail levels and find a good compromise between visual quality and performance. I know its hard, I've always been an eyecandy over framerate kind of gamer myself. But in the Rift this is an entirely different scenario and framerate is king.
There is some help on the horizon that might make this a bit less dramatic. Asyncronous timewarp (not to be confused with normal timewarp, which is already in the SDK) is a feature Oculus is working on, but it isn't ready yet on PC (only GearVR). What it does is basically create intermediary frames based on the last rendered frame, a depth buffer and the new tracking information. It is supposed to be quite effective in optimal circumstances and makes a couple of dropped frames much less destructive to the experience than it currently is. Hopefully we will get to see this feature added to the runtime in the coming months.