The same warning as was applied to my last behemoth applies to this monstrosity, but doubly so, you have been warned; if you do decide to take up the challenge, then there will be a multiple choice exam in the morning!
No worries; if I am not playing I am here reading

My reply is also quite long, sorry about that.
No need to apologise, as I said earlier I am enjoying this discussion.
I need to double check, but I believe I am running at max settings, all on max, every single option on, at 110 FPS, on a 50Hz refresh rate (that's what my 1080p lcd uses).
Is this internal to a station or out in deep space, (not in an asteroid field)? Because if this is what you're getting in deep space, then that is not enough, especially at 50Hz.
Vsync is not doing much; since I use a 60Hz display...if you output 120 FPS, your monitor can still only display at 60Hz, which is about 60 screen per seconds; so 60 FPS. Locking your card to the refresh rate of the monitor helps to avoid tearing, because your image output from the graphic card is higher, which force the monitor to do blending, since can't go faster than that.
Many times people forget that the monitor limit the amount of frames that you can display (refresh rate: time needed for the raster to go from top left pixel to bottom right in one unit of time. FPS: frames drawn per second by the video card).
Yup, that's pretty much how it works; but the point of turning VSync off is to get a rough idea of what the rough rendering performance of your system is with respect to ED's graphics renderer.
The monitor doesn't do any blending when your game outputs more frames than the monitor can handle per second, remember the GPU's synchronisation circuits and pixel shift registers are all perfectly matched to the monitor's input characteristics, i.e. Pixel dot clock, HSync, VSync, HBlank, VBlank, etc. What actually happens is that your game renders to a section of the GPU's memory, (backbuffer, rendertarget, etc), at say 120Hz and then the GPU's display controller reads the contents of the frontbuffer, (after a backbuffer to frontbuffer swap), out at whatever frequency it is locked in with the monitor, e.g. 60Hz. The tearing that occurs on the monitor in this situation is actually happening in local memory of the GPU, as the ROPs write the final pixel values to memory at one rate, while the display controller reads them out at a lower rate.
Yup, referring to the Oculus SDK. I made 2 demo with Unity 3d and one with pure C++ code; I am no expert with it but I know where to put my hands on it. Like any API that is still a work in progress, it has limitations to overcome; in part you can circumvent them with your coding skills, but for others, you can't use anything that is not exposed by the public interface. I am waiting for the final release of the SDK, before mark it as bad SDK. It is usable, if that's what you mean....after all, many people can drag and drop the rift controller in Unity and 70% of the work is done for them.
All my work is C/C++ and as I said I find the Rift SDK more than adequate for a beta release; I have in fact used much worse fully released API's, *coughrenderwarecough* in the past.
If there was such thing as 20% increase in performance from the engine, that MAY increase my performances, but it is hard to say, since we are dealing with a complex system that has 3 main variables: the SDK from Oculus, the 3D engine from the game and the OS.
Agreed, but there is one more crucial fundamental that you have already mentioned in a roundabout way and that is your personal experience. One of the issues with the DK2 is that it is a much more subjective experience than a normal 2D display for a whole myriad of technical, biological and psychological reasons. Have you actually tried the DK2 with ED on a top of the line system? Maybe VR in it's current form just doesn't work for you, no matter how good the performance is? Maybe you are overtly sensitive to even the most minute amount of stutter, as ED is not stutter free even in deep space on my system, (it's close but not 100% smooth), though those other games you mention are 100% smooth and locked at 75Hz.
Ok, never-mind, you answered these questions in your next paragraph, (apart from the stutter sensitivity), but I will still leave the previous paragraph intact as I think it is an important, but mostly undefined metric; it may never be able to be measured and may always remain a completely subjective and personal experience.
The only high level analisys that I can make, is the one that other games works fine with no stutter and no ghosting, and they are as intensive (graphic wise, physics is handled mostly by the CPU, not the GPU, unless you tell the GPU to use spare cycles), so if the variables are the same (same os, same SDK), it means that the variable that is influencing the overall outcome is the game engine.
The games you have already mentioned work fantastic on my system as well, but I'd be curious to see how you go with something like Euro Truck Simulator 2 as that is notorious for stuttering even on high end systems, (when you max the graphics options).
True, my bad; if you live on Windows, yes.
Well this discussion is about ED and yeah it currently lives on Windows

At work I write GLSL shaders all day for Linux/OSG/OpenGL and at home I still tinker with my DirectX 3D/HLSL engine and Android/OpenGL ES engine and DK2.
There is no such thing as DX on OSX; there the SDK uses OGL. I just moved back to windows after 14 years of OSX; I am a bit rusty on DX so I need to read more. I recall that also DX does the context switching like OGL, but it hide it from the user, so you don't have to switch context and just send draw commands once done. I will look into it once I have a moment.
Not sure how you can do it otherwise; this is the basic of how OGL works; if you don't declare each context and load the framebuffer, render the scene and switch to display it, there is nothing on screen.
I think there is some miscommunication going on here, in a normal game/application there is only ever one context, (forget Equalizer for OpenGL and deferred contexts for DirectX), each game/application/window has it's own singular context, (sure for some special situations an app can create multiple contexts but lets just disregard that for now). Here is a link to OpenGL's description of a context,
https://www.opengl.org/wiki/FAQ, as you can see you only need to create it once at application initialisation and then use it to interact with the OpenGL stack.
I think you are getting confused between the terms context and rendertarget/FBO/backbuffer, backbuffers are the surfaces that are swapped after a present/flush in a normal game's render loop. Of course if the system swaps your application out for another one, (that has it's own context), then the system flushes the driver's state/caches/memory manager and the GPU's pipelines/state/caches and gives control to the new application and it's context; this is the only time contexts are being swapped, (this is not strictly true, but for this discussion it is accurate enough).
This is actually what happens every frame...load, process, draw, switch context, display and flush. Some info are cached, like light raycast, mesh data (if relevant to the culling area) and such.
There should never be any loading happening on a frame by frame basis, maybe streaming of resources into dynamic resources in an advanced engine, (like ED). But once you get to the top of the render loop all your static assets, (vertex buffers, index buffers, textures, shaders, etc), better be all in a defined and hardware optimised manner in local memory, (GPU memory), or you are going to be in for a world of performance hurt. Dynamic assets are obviously different, they have completely separate access methods and code paths within the driver as to their usage and control, but again if you over use them your engine will suffer as your GPU will start stalling as it waits on CPU synchronisation of these dynamic resources.
In fact one of the major ways of decreasing performance is in not optimising your resources for the API/GPU you are currently working with, some simple examples:
- Using non hardware based, non compressed textures, this can cause 2x-8x more GPU memory being used and a proportional increase in fill rate and TMU, (Texture Management Unit), requirements.
- Shader patching; changing resource format's on the fly for shaders, (such as texture formats), can cause the vendor specific driver to have to recompile the shader on the fly and store it as a new entry in the shader cache. Not only is the compiling of shaders a heavyweight operation, but it places additional strain on the shader caching mechanism.
- Too many programs/shaders being switched per frame, (this can be devastating), as it can stall the GPU, shader switching and constant re-loading are not lightweight operations and cause shader cache thrashing.
Here is the real issue we have been discussing, "switch context", the render loop does not switch context, (a heavy weight operation), it switches backbuffers with the frontbuffer, (a light weight operation), which is at best a pointer swap in the backbuffer swap chain for fullscreen DirectX applications or worst case a GPU fast blit for windowed applications in both DirectX and OpenGL. This swapping of backbuffers is orders of magnitude more efficient than a worst case context switch.
There are in fact wait cycles; the CPU send a semaphore to the GPU; once that the GPU is done and switch context, the CPU resume the operations releasing the lock and send new data from the main memory to the GPU memory. This can be done in chunks or multiple frames at time (depends how many threads you have). You will be surprised at how many cycles are wasted, when CPU and GPU communicate with each other. This is the nature of any IC communication protocol: you cannot send the next batch of instructions unless you get a confirmation from the "slave" that the previous operation has been completed. This is true for network protocols, as per internal bus between core and cache memory, or CPU and memory..and obviously between the CPU and GPU.
Semaphores aren't really being sent anywhere, they are just variables, (usually implemented with non interruptible CPU instructions), that act as arbitrators and counters for resources that are accessible by multiple threads/processes, (I know I am being picky here, but I think it's important to make sure we are on the same page).
Locking can happen at multiple stages of the entire rendering process:
- Within your game.
- Within the API, OpenGL and DirectX, (at user and Kernel levels).
- Within the GPU vendor specific driver at the kernel level.
Lets use an extremely simple example of a game that is single threaded and all resources are static, (there are plenty of games that are still written this way by the way, even AAA titles), the static assets are loaded during the initialisation phase, converted to appropriate hardware formats and uploaded into the GPU's local memory via the PCIE bus, before even one iteration has commenced of the main rendering loop.
- There is no locking required at the application level whatsoever, because the application is not using multiple threads and not using dynamic GPU data structures, (i.e. lockable and modifiable GPU resources). This application has effectively signed an unwavering contract that it will never attempt to modify the static resources it has uploaded to the GPU during it's initialisation phase. The GPU can then make assumptions about where to store the assets in it's local memory, in whatever formats are most optimal, (i.e. usually hardware optimised formats), and that the application will never attempt to access them again, (apart from freeing them at application destruction).
- Locking within the API will be minimal to non-existent, internal data structures such as queues and command buffers are filled until a present/flush, these data structures are double buffered, so that while the vendor specific driver, (which is multi-threaded), starts converting them into vendor specific operations, the application switches to a discarded queue and begins filling it with new GPU commands. This pipeline can be increased in size even further with the Flip Queue Size parameter, this allows the API to collect multiple presents/flushes before passing them onto the vendor specific driver. This makes the huge pipeline even bigger and more efficient, (in reducing GPU stalls), but adds more latency to the user input of the application.
- Locking at the kernel level within the vendor specific driver is mostly an unknown, (well to me anyway), but this is where traditionally a poorly written application will waste a lot of time, as shaders are recompiled, shader cache thrashing rears it's ugly head, resource formats are converted into hardware specific formats, application specific hacks are applied, (i.e. you would be surprised how many shaders are replaced within the driver by the vendor specific "optimised" versions for popular games, to get better FPS in benchmarks and reviews) and excessive memory management of GPU resources occurs. Any locking that does occur, is at the kernel thread level within the vendor specific driver and does not effect the application rendering thread.
The major point here is that there is minimal locking in a simple and well written application, if any, (I'm not talking about initialisation and creation, but obviously about the rendering loop itself), and what locking there is only locks the particular thread involved and does not stall the GPU as long as the CPU is still filling the massive pipeline with draw calls, which in a balanced system with no configuration issues on a well behaved application, (i.e. fast GPU with a CPU capable of feeding it), is happening 100% of the time.
Even if locking does occur at the kernel level in the multi-threaded vendor specific driver, this doesn't stop the application's thread/s and API, (DirectX/OpenGL), threads from continuing their work.
In ED I can't get this all the time, even at minimum detail. I understand that ED has more objects to move, compared to the tinderbox map size of SC, but the models in SC are much more complex, polygon and texture wise, than ED; and it uses one of the most power hungry engine ever made. The expectation was that ED would run smooth, while SC would stutter and give ghosting.
I would expect the opposite given that ED is about to release with a fully featured galaxy, multi-player support and real time generation of procedural content; SC really still is in the prototyping and demoing stage of small isolated and controlled "rooms".
Maybe you should check out the VR multi-stuttering thread, there are some useful experiments going on over there and being able to show FRAPS graphs of what you are experiencing may allow others to offer suggestions. At the very least you could compare your FRAPS graphs to the DK2 graphs that I and others posted.
Yes, that's obvious

each instance is a defined size; you can tell since you can't go from system A to system B just with super cruse; and also the number of players per instance is limited.
I can't make any number out of what I see, but I suspect that each instance is pretty big (altho mostly empty...there is not much beside planetoids and space stations); but the data to draw each instance I suspect is quite small. Traveling trough systems is done with instance changing (covered by the hyperdrive animation); not different from what SC does when you land on a planet.
Yup I see it the same way, with the additional detail that it seems that the current displayable area per user is limited to a solar system and that multiplayer islands are probably much smaller still.
True, each camera draw on its own; which is the same effect that you may experience in any racing game that uses rearview mirrors. When you look from cameraA, you are in the cockpit, while cameraB is behind the car mesh, rendering on a surface (usually a texture, which is what the rift SDK does too), so you can see at the same time, 2 different camera output. This technology was used 25 years or so; nothing new under the sun
No it's nothing new under the sun in terms of implementation, but I feel you are missing the point. Rear view mirrors, shadow casting cameras, remote view drones, etc, usually take advantage of huge optimisation possibilities so that they can be rendered in small percentages of the total render time. e.g. You are never going to code rear view mirrors that double your render time, you would use small render target resolution, small FOV's, low detail shaders, low detail LOD's, etc so that the rear view mirror might increase your render time by 10%-20%. The DK2 by necessity requires a doubling of your scene graph traversal/render list traversal/culling and draw primitive generation times and then the actual vertex processing by the GPU. It also requires nearly a doubling of fill rate requirements because of the extra burden of the enlarged eye render targets for optics correction.
As you pointed out; there are 2 camera, which means you need to render the same scene twice; and while you switch context each time, your wait cycles double. The only saving grace is that our eyes are tied together, so you have a defined space between their field of vision, and most of it is also overlapping. This helps to avoid to draw 2 distinct scene, since you can interpolate what A see, to transfer on B and just recalculate what is different between the 2. It is less expensive than having A and B pointing at 2 different spots (like in a car game, which is why the mirrors are usually so small), but still it takes a toll on the system.
You aren't switching contexts, you are switching FBO's/render targets for the two cameras, this is an extremely light weight operation, (on modern hardware), there is no wait cycles. Not on the GPU or on the CPU, switching FBO/render target is just another GPU call, it's not free or instant, but it is measured in nanoseconds.
You also can't cheat by interpolating, (well you can, by using the depth buffer of the original non DK2 scene), but the results are horrible, (I've tried it). The scale is non linear, so that it doesn't look or feel right and there are quantisation errors due to the screen pixel fidelity which just doesn't work when projecting back into world/camera space. You have to render both scenes separately and uniquely as if they are two separate cameras, because that is what they are. So it is not "less expensive than having A and B pointing at 2 different spots", it is exactly the same as having A and B pointing at 2 different spots.
On top of that, the monitor pipeline has to deal with twice the traffic from 2 camera, and the rift does not use the most advanced LCD screen. If you have 2 monitor, the data runs parallel on 2 different pipelines. This just add on top of everything else.
It doesn't matter if you use two monitors or one, the actual work done by the GPU is effectively the same, the only thing that runs in parallel with two monitors is the display controllers built into the GPU, (I mentioned earlier), and the electrical signals driving the monitors themselves. Now if you had two GPU's, (and the appropriate VR SLI/Crossfire driver), each driving one half of the display or driving two individual displays, then yes you would get work done in parallel and a corresponding performance increase.
BTW you could just run the config setup and change the IPD there, no need to rebuild.
This is incorrect, the IPD is the distance between your pupils when your eyes are focused at infinity. The HDMInfo.LensSeparationInMeters parameter is a value returned by the DK2 itself as a measurement of the distance between it's lenses. The SDK uses these values in different ways, (check the source code, it's all there). IPD is adjusted for world scale and user comfort, HDMInfo.LensSeparationInMeters has to be the physical distance between the center of the lenses.
The issue as you mentioned, is that there is no physical way to change IPD, because in a binocular, there are 2 lenses...here you deal with 1 monitor, and we already loose part of the center area, behind the contraption of the rift. Another reason why I would rather have 2 smaller 1080p monitor for each eye, instead than one big monitor divided in 2
Two displays for each eye is definitely a viable alternative, but it has it's cons apart from it's obvious pros.
- It's more expensive than one display, obviously you need two.
- It's a LOT more expensive than one standard mobile phone display, because now you have two non standard displays that have to be manufactured purely for the Rift, rather than using mass produced consumer technology that is already available.
- Synchronising two displays EXACTLY I would guess is more problematic than it at first seems, I would assume that the eyes are extremely sensitive to differences in flickering images presented to them individually. You can imagine the beat frequencies that would present themselves even if differencies were only minor, (this is just conjecture).
Interesting, I will take a look at this VRGear Interceptor. BTW how do you resolve the issue with tilted planes and aliasing? Once that you rotate a plane, the line has aliasing at the current resolution; can't really do much, unless you are able to blur the line (which influence the readability of text)
There is no issue with tilting and aliasing, what happens is that once the lenses match the actual distance between your pupils, your eyes start using the optically accurate part of the lenses. So the image attains the maximum focus and clarity possible; it's the same as if you used the DK2 stock, but you physically moved your eyes to be 63.5mm apart.
This is actually what I was trying to accomplish with my post...I was not aiming at cause any kind of animosity.
I may have over-reacted in my first post to your frustration, but as I keep saying, this is a thoughtful and logical discussion worth having; there is no issue with any perceived animosity on my end.
On the contrary! I am not saying that they did a bad job, but that there is that extra mile that is not done, because "it is ok as is". I remember those days; I was involved in a game with a proprietary engine (at the time of Dark age of Camelot, to give an idea), which was built by us from the ground up. I also remember the time when there was no 3d, and the programming was done on ASMOne, moving blitter and copper data in assembly; not sure if it was more tedious than dealing with vertices nowadays, since everything is 3d.
I had an Amiga 500, 1200 and 3000, I remember coding in asm for blitter, (Agnus), and the display coprocessor, (Copper, from memory it had 3 instructions, store, wait and skip right? It's been a long time). I never released any commercial games on it though, but I did complete my masters in pattern recognition on a TMS32010 fixed point DSP and TMS34010 display processor boards that I designed, built and wrote the software for, that hooked up to my Amiga3000 as the controlling UI.
I just mourn at that attitude, where the code was over optimized, because even if there were graphic accellerator and memory expansion, the average machine was either an Amiga 1200 with 14MHz 68EC020, and 2 MB of ram, or a PC AT with a 386 DX and at most 4 MB of ram. You won't write code and be sloppy on optimization, hoping that people will upgrade, that was not the mentality at that time, and I am sure you remember it, since you are older than me.
There is no doubt that software engineering has changed in that regard, back in the day you were forced to optimise because of the severe hardware limitations, (and to some extent software limitations), you were faced with. What inevitably separated the great games from the rest, (apart from great gameplay), was the extent that the developer went to in wringing out every last CPU, Blitter, Copper and DMA cycles to perform what seemed like miracles at the time.
Now we live in a world where there are so many layers of software between you and the hardware, (with a corresponding exponential increase in bugs), that it is much more difficult to hit the metal, let alone know how it works. Hopefully new API's like Mantle and DX12 will change that to some extent and the differences between games developed by gurus compared to games developed in API's like Unity will be like the difference between chalk and cheese.
P.S. Sorry I took so long to post this, I had some other matters I had to attend to and of course this took a not insubstantial amount of time to write as well
- - - - - Additional Content Posted / Auto Merge - - - - -
I do
not know it for a fact. But the only thing that makes sense to me is that the point data of the ca 170`000 known star systems are in the installed package, and in the upgrade patches (for any corrected or new astronomical telemetry). I do believe the only data sent server-to- client is updates to the economy and politics of that system. Any solar system entered for the first time (not among the 170K known stars) has data that is generated by the seeds. I suspect each planet's surface texture is generated from a set of basetypes (Terran, Desert, Volcanic etc), the location and the orbits, are all generated based on the seeds. I suspect these are nested seeds (seeds within seeds). Again, I do not know it, but I can take an educated guess since I know a little about procedural things.
Aaaah ok, nice, I didn't realise that the known part of the galaxy was actually within the client's resources.
I think your educated guess is spot on.