Distant Worlds II Event Servers offline after the first jump of Distant Worlds 2. Come on Frontier!

Livyathan · Jan 14, 2019

Brett C said:
Hi folks, our servers are slowly returning to normal operations. We've seen many commanders make their way back into the galaxy.

If you can't get back in, give it a little while so that the servers can continue to spin up as needed.

Brett, is anything being done about this long-term?

Waiting for servers to spin-up isn't a reasonable expectation just because people are actually playing the game. Especially considering Frontier were well aware of what was happening last night at 8pm and have been for months.

I can't see any excuse for this tbh.

Hyde · Jan 14, 2019

I'm actually a specialist in cloud test engineering, performance testing, virtualising traffic to see how performant stacks are. Spin up times, even automatic systems can take a while, an eternity in real-time gaming context. Even AI ones would have a hard time with this scenario.

The smart move would have had organisers stagger launches (like in a rally) to meet server stack performance. Different groups at different times. Easy to say this after the fact but something for the 'lessons learned' session at the end of the sprint.

thistle · Jan 14, 2019

Surely it's NOT that there's any problem with so many people playing the game, as some are suggesting. It's so many playing in one system, all doing the exact same thing at the same time - jumping to the next system.

It's not normal you know

Livyathan · Jan 14, 2019

thistle said:
Surely it's NOT that there's any problem with so many people playing the game, as some are suggesting. It's so many playing in one system, all doing the exact same thing at the same time - jumping to the next system.

It's not normal you know

Frontier knew this was going to happen but didn't prepare.

Also, why should commanders not participating in this event be expected to put up with this? I'm taking part in it but if I wasn't, I expect I'd be a tad more peeved than I actually was last night (and I was mightily peeved). I had an early night instead.

Quite simply, it wasn't good enough.

Deadlock-3000 · Jan 14, 2019

thistle said:
Surely it's NOT that there's any problem with so many people playing the game, as some are suggesting. It's so many playing in one system, all doing the exact same thing at the same time - jumping to the next system.

It's not normal you know

Wow, people jumping from one system to another.

No-one saw that coming.

Valorin · Jan 14, 2019

Livyathan said:
Frontier knew this was going to happen but didn't prepare.

Also, why should commanders not participating in this event be expected to put up with this? I'm taking part in it but if I wasn't, I expect I'd be a tad more peeved than I actually was last night (and I was mightily peeved). I had an early night instead.

Quite simply, it wasn't good enough.

I am pretty sure you can't prepare to something like this. Stresstesting for thousands of people doing one specific thing at the same time is kinda impossible I can imagine.
Also, it was also pretty epic! That event brought the Universe to its knees! Why don't you stop blaming people and being annoyed for a while and smile at this epic occurence!
This is a game, it's all data on computers. It's not a god machine changing the quantum rules of the known offline Universe or something.

Also, again, thanks for working on Sundays, Frontier.

B1rdy · Jan 14, 2019

Alec Turner said:
I was just listening to FD's DW2 livestream from last week. On the subject of servers handling the load, Ed used phrases like "they've got it covered" and "there's no need to worry".

How can he say that, even though the game regularly freezes, due to, most likely, server problems?

Valorin said:
I am pretty sure you can't prepare to something like this. Stresstesting for thousands of people doing one specific thing at the same time is kinda impossible I can imagine.

Sure, especially this is the first time...wait. It is not, it happened before. They should already have data.

jonesskill · Jan 14, 2019

Not trying to poke or trigger anyone, but on gnosis event we heard that the instability problems are caused because the mission board refresh (Board hopping).

Now we don't have the hop, and the issues seems to be worse than ever.

danteuk · Jan 14, 2019

Valorin said:
I am pretty sure you can't prepare to something like this. Stress testing for thousands of people doing one specific thing at the same time is kinda impossible I can imagine.

Actually you can and you should and it's was know well in advance by Frontier. They should have had more servers ready to handle the load.

Anyone that tried to play on the release day for chapter 4 knows that Frontier never run enough servers to handle high load events. They seem to be prefer taking the flames here rather than paying for more servers for a limited time.

Also they just showed the world again the Elite Dangerous is not an MMO - because you can never get more than about 20 people in the same instance, so the massive event of thousands of CMDRs jumping out of the system at the start of DW2 is a joke, instead if you watch all the videos of the event, you'll see about 20 players trying to jump, shortly followed by a connection error!

danteuk · Jan 14, 2019

jonesskill said:
Not trying to poke or trigger anyone, but on gnosis event we heard that the instability problems are caused because the mission board refresh (Board hopping).

Now we don't have the hop, and the issues seems to be worse than ever.

I actually thought they had moved the chat and the missions stuff to a different server process, so a failure on the mission or chat service doesn't log you out.
Of course if those server processes are still on the same physical/virtual machine as the main 'transaction server that handles cmdrs instances' then a massive load will bring all of the services down.

ZeeWolf · Jan 14, 2019

danteuk said:
Actually you can and you should and it's was know well in advance by Frontier. They should have had more servers ready to handle the load.

Anyone that tried to play on the release day for chapter 4 knows that Frontier never run enough servers to handle high load events. They seem to be prefer taking the flames here rather than paying for more servers for a limited time.

Also they just showed the world again the Elite Dangerous is not an MMO - because you can never get more than about 20 people in the same instance, so the massive event of thousands of CMDRs jumping out of the system at the start of DW2 is a joke, instead if you watch all the videos of the event, you'll see about 20 players trying to jump, shortly followed by a connection error!

Which official bodies exist to define standards for the number of concurrent connections in a single instance to be classed as an MMO? Is it the European Union? Is that what the Brexit vote was all about?

Ian Doncaster · Jan 14, 2019

danteuk said:
I wonder if additional server load is caused by the NPC's too, since the instance I was in had about 12 players and about 30 NPC's!! local chat was just spammed by NPC's and the number wake signals on my radar from all the NPC's jumping out a random made it look even more busy. FDev should have done something to remove the NPCs from that system.

If DWE wanted to start from a system without NPCs, then there are plenty they could have chosen without requiring Frontier to poke stuff. Fortunately it shouldn't be a problem at most future meetups. It'll be interesting to see how the CGs go, though.

I don't think the NPCs will have caused much more server load directly, but what they do cause is much more P2P traffic to keep them in sync within the instance, and therefore smaller instance sizes, and therefore more instances total ... and that does seem to cause problems.

Rob At Work · Jan 14, 2019

When it comes to servers, past performance (numbers) is a guide to the future (in fact it's the only guide). If you keep the numbers up it will become stable. Then as they go down it seems to improve. BUT! from small to large (numbers) is always a problem!

VR SenseiMatty · Jan 14, 2019

Valorin said:
Also, again, thanks for working on Sundays, Frontier.

Did they?
I mean, someone for sure did something, but it is my understanding that the servers came back online only after thousands of CMDR's gave up and switched off ED relieving the servers load.

thistle · Jan 14, 2019

SenseiMatty said:
Did they?

Ed and Will confirmed today that the server team prepared, presumably with extra capacity, the server team I think were in (if I remember what they said right), but regardless people were indeed working on Sunday to support the start. Why would you think they weren't working on Sunday? Are you suggesting they were all at home watching Vera on Sunday night, when someone got a text saying btw the servers collapsed, can you have a wee look when you've finished your tea?

This place kills me. Even the people who apparently like the game and arent here just to troll for sh@ts and giggles ... I don't get it.

Change the name from Dangerous Discussions to Entitled Talk, and be done with it FD.

thistle · Jan 14, 2019

Deadlock-3000 said:
Wow, people jumping from one system to another.

No-one saw that coming.

That's very clever, I imagine you thought.

People jumping from one system to another is the definition of normal.

... so many playing in one system, all doing the exact same thing at the same time - jumping to the next system.

that is NOT normal.

danteuk · Jan 14, 2019

thistle said:
Ed and Will confirmed today that the server team prepared, presumably with extra capacity, the server team I think were in (if I remember what they said right), but regardless people were indeed working on Sunday to support the start. Why would you think they weren't working on Sunday? Are you suggesting they were all at home watching Vera on Sunday night, when someone got a text saying btw the servers collapsed, can you have a wee look when you've finished your tea?

This place kills me. Even the people who apparently like the game and arent here just to troll for sh@ts and giggles ... I don't get it.

Change the name from Dangerous Discussions to Entitled Talk, and be done with it FD.

I'm sure there had a couple of people on stand-by to monitor for a serious crash - but actually I think the issue was purely load based, so their cloud load balancing *should* have just fired up more servers to cope with the increased load, but that takes a little time, maybe only 2-3 mins on fast boot, but still time required to replicate the data to the new servers I guess. Anyway the point is the server team probably did nothing at all, they just sit and watch while the new server spawn and watch as players give up and do something else, so the load drops, the combination of the load dropping and new servers being spawned means after an hour or so everything is fine.
The fact that it look over an hour is a bigger worry and implies they set the limit on the max number of servers to launch too low.( it's normal to have a limit, stops a DDOS attack from causing 1000s of servers to run ).

Ian Doncaster · Jan 14, 2019

danteuk said:
The fact that it look over an hour is a bigger worry and implies they set the limit on the max number of servers to launch too low.( it's normal to have a limit, stops a DDOS attack from causing 1000s of servers to run ).

I'm just guessing here, but I wonder if there's a problem with certain types of load going up faster than linearly with the number of players, when those players are all in the same "place".

Wild guess: when you try to enter a location, it looks at all possible instances in that location, and then selects the "best" based on the various instancing factors, or a new one if all the existing ones are below some threshold.

So every new person entering a busy location first has to check all the existing instances, and then may sometimes create a new instance. As more people enter, the number of instances rises, so each new person takes even longer to add (and if lots of people are arriving at once, the instancing calculations may be invalidated by the time they're complete).

That means that the complexity of the instancing problem may be proportional to the *square* of the number of players, and that can lead to the load increasing very rapidly as the number of players in the same place increases. (So a few thousand people in one place will crash it, while a few tens of thousands spread out it'd barely notice)

Lead-up: servers are busy, but people are only joining the instances one or two at a time as they arrive. The load is high, and it takes a little while for them to sync in to the instances, but the server team are monitoring and spinning up servers as needed to cope.

Mass-jump: there's now a few thousand people all jumping to the same target system supercruise simultaneously. The instancing system has to cope with a load possibly hundreds of times higher than it was dealing with during the lead-up. Unsurprisingly, it falls over.

Recovery: those few thousand people all try to log in again. As they crashed out in hyperspace, they get recovered by the game to the "default" position in the new system. But there's a few thousand people now trying to log in at the same time to the same location, so this is basically the same instancing problem as the supercruise one that just killed the servers.

It may not even be a problem you can "just add more servers" to fix at this point - any individual player's instancing calculation may now be too complex and involve so much conversation between servers to work out that at some point adding extra servers might start making it worse by spreading out the data too much.

If I'm right about this, the solution is for people on mass jumps not to all jump to the same destination - pick a bunch of different systems in roughly the same direction.

Maro-Val · Jan 14, 2019

Yes they probably just sit and wait it out until many people stop playing due to the disconnections and then servers at some point recover. We are always told that they know the issue but never from a tech guy. They wont tell us the technical details ofc as there arent any.
It is just old plain simple "dont wanna spend any more money on it", to get this mess fixed. And this is going on for years. We just get the "mmo" advertisement and the fancy trailers, where the network is completely pants when 3 cmdrs get in the same instance.
Last night was embarassing..
Many sundays this has happened so dw wasn't the main problem.

VR SenseiMatty · Jan 14, 2019

thistle said:
Ed and Will confirmed today that the server team prepared, presumably with extra capacity, the server team I think were in (if I remember what they said right), but regardless people were indeed working on Sunday to support the start. Why would you think they weren't working on Sunday? Are you suggesting they were all at home watching Vera on Sunday night, when someone got a text saying btw the servers collapsed, can you have a wee look when you've finished your tea?

This place kills me. Even the people who apparently like the game and arent here just to troll for sh@ts and giggles ... I don't get it.

Change the name from Dangerous Discussions to Entitled Talk, and be done with it FD.

I'm suggesting they were at work but the issued solved only when players gave up trying to connect to the servers reducing the traffic load.
If the system is weak people can't make miracles.
The only solution to this problem is David Braben to open up the safe and spend some money on it.