You confused visual complexity with math complexity and completely missed the point - they're doing things in PS which should be done in VS or even deeper in the CPU. Which has nothing to do with final look but performance.
I confused nothing. You did not give any specifics. Well, here are some specifics for you.
Odyssey
should be using deferred rendering (if it's not, oh boy, there's a big optimization right there), thus...
- depth pass: there should be no fragment shader at all
- g-buffer pass: the fragment shader should indeed be simple: just texture look-ups (albedo etc) and blasting the data to the buffer, minimal if any calculation
- lighting pass: this is where things get interesting. Can't do anything interesting in the vertex shader as there's only a single full-screen quad (some use a triangle that covers the screen and let the GPI clip the triangle, but I don't see how that's an optimization).
- translucency pass: this is generally a forward renderer and so quite nasty, but does give opportunities for doing some lighting calculations in the vertex shader and letting the GPU take care of the interpolation). However, very limited on the number of lights it can support before it bogs down. I know of order independent translucency and depth pealing, but that's about all I know about it, so no further comment on that (other than I have more study to do).
- reflection and fracaction: do the above six times
- final coposition (single or multi-stage): another fullscreen quad. Again, not a lot of opportunities to move stuff into the vertex shader or the CPU
Now, that said...
- Messing with quaternions in a fragment shader is probably a bad idea. Matrix-vector multiplication is significantly faster than quaternion-vector multiplication, and gives you translation for free (with quaterions, it's a separate step), and you often need to do perspective calculations anyway. Quaternion-matrix translations should be done on the CPU, of course.
- Unless you have a very good reason, the above probably applies for the vertex shader, too. It probably depends on how volatile the data is and thus how much transfer bandwidth is an issue compared to the calculations.
- If inter-vertex interpolation can be used (1: you have vertices, 2: the results are acceptable), then certainly the relevant calculations should be moved to the vertex shader, or even passed in as vertex attributes (ie, done on the CPU or off-line).
- For full-screen quad rendering (composition, deferred lighting, etc), anything that's effectively constant should be calculated on the CPU and pass in to the shader.
- Any GPU memory management tricks should (and probably must) be done on the CPU (I don't think I've seen any GPU-side mechanism for memory management) (I'm thinking of sparse buffer tricks etc)
I'm sure there's much much more.
These days, I often laugh to myself because years ago, I was told to ditch PVS and BSP from QuakeForge's render because "GPUs are fast enough, just throw the whole map at the GPU and be done with it", but that applies only to Quake's extremely basic lighting model (light-maps). Putting "proper" lighting (ie, real-time, per-pixel) in requires doing more work on the cpu, especially if when using the map-provided lights (average of over 100 lights visible just in the demos). I took one look at those numbers and simply noped out of doing it in a forward renderer. Around 400fps in a deferred renderer (no shadows yet, worried about that, though).
And for reference:
taniwha has 58 repositories available. Follow their code on GitHub.
github.com
https://github.com/quakeforge/quakeforge (and yes, that's the "-qf" in my name)