This post explains it very well.
Their testing is sub-par. They clearly only do manual play-testing, and to retro-fit automated testing into the project at this stage would take a lot of time and money.
Automated testing of some of the features and bugs of Elite Dangerous would be extremely difficult just because of what it *is*.
Unit tests are easy to write if the code is structured to support them. They help to stop bugs in the behaviour of individual functions. Given the sort of bugs we don't see in live builds, Frontier probably have a ton of these.
Functional tests are harder to write. They help stop bugs in the behaviour of individual features. In this context, a feature would be (at its biggest) "a pulse laser" or "the ability for a ship to roll" rather than anything we might think of as a feature. They're basically a test that you've put the unit-tested functions together in a sensible way so that the end result of calling a high-level function is a set of calls to various low-level functions which does the job. We don't often see bugs make the live builds that these would prevent, but there's a few now and then.
Acceptance tests are the hardest to write. They help stop bugs in the behaviour of the application as a whole. These check that a particular design goal or use case is met - e.g. "it is possible to undock a ship from a station". Most of my test-writing time is spent on these, in the apps I write. They're very powerful, because they can test the behaviour of the application as state changes.
So, for example:
- unitTest "testShipExplode" confirms that this function sets ship->exploding to true.
- functionalTest "firePulseLaser" confirms that the weapon can fire, various objects like the laser beam graphic get instantiated correctly, capacitor levels reduce, etc.
- acceptanceTest "weaponsTest17" confirms that firing a pulse laser on a target until it runs out of hull causes the target to explode (and not, say, the firing ship) ... and that the third time in a row the pulse laser fires you don't suddenly get massive recoil ... and so on
Acceptance tests are extremely difficult to automate for a game like Elite Dangerous because acceptance tests care about (non-minimal) state. So to write them you have to set up a known application state, then carry out a known series of steps, then see if the resulting state matches the desired state in the important aspects.
Let's take the recent comms menu bug that was introduced in 3.0.1 and fixed in 3.0.3 for an example.
All the unit tests for "displayBitOfCommsMenu" will have passed. So will the functional ones - there were lots of situations in which it worked. The bug occurred only after a particular - fairly common in real gameplay, as it happened - sequence of state manipulations. Most theoretical acceptance tests would also have passed. To catch this bug automatically you'd need an acceptance test which:
- added two player ships to an instance
- had one of them leave the instance
- had that one then check its comms panel and perform various actions on it
- have the test interpret the result of that sequence of actions against a known good result
...in a test environment that's running most if not all of the code, so effectively running (perhaps in a headless mode) two game clients and a set of game servers.
That's not
impossible to automate, but it's really difficult. And that's an
easy acceptance test. Consider what a proper automated test that a massacre mission is possible would need to do:
- find mission (the mission board can be pre-seeded to have one, since other tests will theoretically be checking generation, but it can't have an artificially low count)
- take the mission
- launch, fly to the right system, kill a bunch of stuff, fly back, dock
- hand in the mission
...remember that absolutely everything that occurs on this must be either completely non-random, or with randomness that doesn't break the test, over hundreds of thousands of game-loop cycles.
So most acceptance tests - for any application - are not automated. And that means rather than them completing in seconds per test, someone has to sit down, set up the initial situation, run through the steps of the test, and assess the result. And that rapidly gets both expensive and time-consuming the more such tests you have ... would the players be happy with months between the end of beta and live as the full test suite is run through?