Distant Worlds II Event Servers offline after the first jump of Distant Worlds 2. Come on Frontier!

Bryski · Jan 13, 2019

Ok I blame my instance

[video=youtube_share;D-tTyGhy7gc]https://youtu.be/D-tTyGhy7gc[/video]

EUS · Jan 13, 2019

Blackcompany said:
You know, for once, I have to take FDEV's side.

If our users told us they needed to host a special event with 10k logins to our servers at EXACTLY THE SAME TIME, we would most likely politely tell them "this is a REALLY bad idea. One that is very likely to slow the network dramatically, or even crash the servers, and will almost certainly have a negative impact on your event." Its so far outside the boundaries of normal of user authentication or network traffic as to be something you simply do not test or prepare for.

FDEV should have been frank with the users. They should have requested staggered jumps. Should have warned people. Should never have pretended this was ok, or even possible. On the other hand...the players KNOW how this is going to end before they do it. But they do it anyway. and then they come here and complain when it ends exactly the way they knew it would.

Willfully submitting 10K simultaneous transactions to the game servers isnt an event. Its a DDOS attack.

there are 3 launches taking place, so it's not 10K people, this launch was for EUR people, US is tonight at 8EST, and AUS a few hours after that.

MuttsNuts · Jan 13, 2019

As predicted, right on time.

Olivia Vespera · Jan 13, 2019

I think while Frontier had ample warning. I don't believe anyone of us could have predicted that there would be 11,000+CMDRs who signed up for the expedition.

CMDR Karrde Sun · Jan 13, 2019

I guessed this would happen. There were issues all weekend.

jasonbarron · Jan 13, 2019

I can't wait for the US jump tonight.

Cryopod · Jan 13, 2019

Thank you for the swift response.

amadeus1171 · Jan 13, 2019

Brett C said:
server turbulence

Sounds like nVidia's excuse of Crypto-mining hangover. lmao

And btw, "It just works!"

Vedmo · Jan 13, 2019

A player created event was so wildly successful the game couldn't handle it and had some hiccups, I'd say this is a good thing! Take your wins where you can get them.

Han Zulu · Jan 13, 2019

I'm surprised it wasn't worse! The game didn't crash while I was in the instance, and for me, the jump worked. It wasn't until I started to plot my route in the other system the game hung-up on me. But the game didn't crash, which is a first for me. All the other mass jumps I've been part of crashed the game or froze it up. This time I was just kicked out to the menu. A slight improvement, I think.

I decided to fly back to Panini, Pallanin, Palpaltini, or whatever it's called, and join the next jump event (unless our dinner guests have arrived by then).

TheoJones · Jan 13, 2019

Wizz15 said:
Do you know how AWS works? If so you probably know not every piece of AWS infrastructure will scale on the fly. Things such as databases (RDS) won't scale based on traffic. If those transaction servers are running on RDS, then this is probably what failed.

If its about server (EC2) instances, they probably hit some maximum limit they have setup. AWS doesn't let you scale indefinitely, as this would probably make you bankrupt before an attack was over. And this was basically a DDOS attack.

Too many transactions at once (the jumps) and after that fell over it became a DDOS on the login system. I honestly don't think they could have done much about this, there are always limits somewhere you can hit somehow even with the best planning.

Respect to the people fixing this on the weekend.

Reasonably well, although I am more of an Azure guy.

However, whilst I appreciate your comments, it's not me that made claims about using AWS to scale like this, It's Amazon and FDev themselves. Amazon even use Frontier as a case study;

https://aws.amazon.com/solutions/case-studies/frontier-games/ and they say "By using AWS, Frontier Games can scale compute resources easily to handle large spikes in user traffic"

Here's the presentation; https://www.slideshare.net/AmazonWe...realtime-commodities-market-aws-reinvent-2015 - Have a look at Slide 18. They make bold claims about the autoscaling infrastructure - using elasticache and auto scaling groups in front of the RDS you mention. Notably, they still refer to this environment as "massively multiplayer".

You mentioned on another post about the rate of change being the thing - and that's why I am annoyed. I have been involved in the architecture of two major systems that are massively scalable - one which had to cope with a massive influx (150k+ users) all at pretty much exactly 8.30am, and one which responded to the "stick the kettle on" ad break in major soaps.

In both cases, the admin team pre-warmed FE instances to cope with demand rather than letting the auto-scaling work completely on it's own. This event was well communicated ahead of time. FDev knew pretty much exactly within a few seconds when the spike was coming.

This should not have happened - and given the coverage of the event in the gaming press (hell, there has even been coverage in the non-gaming press), this absolutely should not have happened.

xxThe_Dice_manxx · Jan 13, 2019

Seeing as things seem to have settled down I think it was a success.I was expecting a complete meltdown of servers and the community which turned out to be a server crash some brief moaning and then business as usual.

Well done all o7

Jdude1 · Jan 13, 2019

2017 was salome, 2018 gnosis and UA bombing Dove enigma. F-dev is the same old F-dev.

Roybe · Jan 13, 2019

Wow...Player explorers griefed the game better than any PVP or Open advocate group has ever done! Well done you lone explorers! Well done!

JesusRocks1988 · Jan 14, 2019

I think that, in honour of this occasion, Frontier should add some of those Q or P type (or other) Anomalies to float around the DW 3302 tourist beacon. Because that many thousands of hyperdrives all going off in the same place at the same time surely must have done something weird with space-time - maybe unleashed some anomalous life forms into the area.

Nice little marker of not only where DW 3302 started from, but also where DW2 broke reality.

IndigoWyrd · Jan 14, 2019

Simpleye said:
Typical Sunday night.
Laughed so hard a little pee came out.

So you usually let a little pee on on Sunday nights?

Kaltern · Jan 14, 2019

Bryski said:
Ok I blame my instance [video=youtube_share;D-tTyGhy7gc]https://youtu.be/D-tTyGhy7gc[/video]

...and they were never seen again.

Adept Geraden · Jan 14, 2019

AFAIK, US launch went off with little drama.

If that is in fact the case, well done FD.

Now keep it all together for the Oceania launch, because that's where I'm launching.

Zeeman · Jan 14, 2019

So.. All 2000+ CMDRs got into the same instance and jumped at once? I seem to remember it only took about 100 CMDRs back in DW!

Z...

Surefoot · Jan 14, 2019

TheoJones said:
This should not have happened - and given the coverage of the event in the gaming press (hell, there has even been coverage in the non-gaming press), this absolutely should not have happened.

Agreed there were between 5K and 10K concurrent connections at the time of the jump (if we look at DW2 registration numbers, optimistically) that's absolutely nothing compared to some production servers running for 100K+ users connecting at the same instant. AWS by itself can handle that perfectly. I suspect the network code of Elite to be quite unhealthy.. (also, distributed write caches are a thing..)

Zeeman said:
So.. All 2000+ CMDRs got into the same instance and jumped at once? I seem to remember it only took about 100 CMDRs back in DW!

Z...

No. Instances were the usual ~20 max (around there).