Viajero
Volunteer Moderator
And once he does buy it, CIG would have started selling the Bengal. He needs to keep up to have a good experience!He flew there in his mere Polaris. Rookie mistake![]()
And once he does buy it, CIG would have started selling the Bengal. He needs to keep up to have a good experience!He flew there in his mere Polaris. Rookie mistake![]()
Bearded-CIG said:Hello! For those that don't know me: I'm one of the server admin and observability engineers at CIG.
The 'what' isn't something we have an answer for, yet. But I do have a bug in for the fact that it's happening and it's being investigated.
What I do know is that the shards are getting into some kind of bad state and when that happens, it stops processing the join queue for them. We have some graphs that help us to identify after they get stuck in this state. When they do, we manually isolate them and have to stow them and see if they recover. If they recover, great! If not, we replace them with a working shard while permanently removing the bad one. We've also gathered some debug info such as a linux coredump, server logs, etc. that have been added to the bug to aid with the investigation so we don't have to worry about removing the bad shard causing any needed debug info to go missing.
Of course the next logical question that comes after this is: Why are people put into a join queue for a shard that is broken? Why does a person have to manually isolate these shards and replace them with a working one in order for people to play the game? The answer for that is: We have to know what is wrong before we can programmatically detect it and avoid matching people into it (notice earlier I said that our graphs are currently able to detect after the shard gets in this state but not when). We also need to make sure that this detection is observable to a human engineer. Observability is important because in the event that we do programmatically isolate an affected shard, but that isolation isn't observable, that makes it possible for multiple shards to be broken without us knowing it which in turn, would make us run out of servers for players to join. Realistically, that scenario shouldn't ever happen because our matchmaker already has logic built into it that shows us when a shard is isolated from match making, so the observability work is already done and any new isolation logic just has to be hooked into it. So really, we just need to be able to figure out why they're getting into this bad state before we can isolate them.
Since we aren't yet able to observe exactly when the issue happens and can only currently identify the issue after it's affected players, it's possible for a bunch of shards to break and then only become obvious that they're having issues when the matchmaker tries to put players in them. If only a few shards are affected, it doesn't take all that long for us to help the environment recover but the more there are that have been affected, the longer it will take for things to work again.
Another question that I could see someone wondering is: Why can this issue only be detected after it happens rather than when it happens? This is pretty normal to have happen in video games. This is the third MMO that I've worked on and I don't even remember how many games I've worked on. While developers do try their best to predict the future and make logging events to help investigate issues that arise, there is always inevitably an issue that arises which needs additional debug information to be added in order to find out how to fix it.
So how is the investigation going? Well, we don't see this issue happen on the Public Test Universe ( at least, we haven't yet ) so it may only be an issue that we see happen at higher player counts. That's going to make the process of iterative fixes slower because we need to be safer with what kinds of changes we make on the public environment. New debugging info has been added to aid in the investigation of this issue, but it has not been hotfixed into the public environment. That hasn't been rules out but we have to gauge the risk for that carefully before doing so. If we decide to hotfix the extra debug into the public environment, that will let us have a faster turn around for knowing if we need more debug, or if there are fixes we can make based on the results of that debug info. If not, then we'll have to wait for the debug code to go through the PTU ( which it currently is ) and make its way to the public environment before we can continue with the investigation.
![]()
Spectrum - v7.46.5
RSI’s Spectrum is our integrated community and player interaction service, including chat, forums, game integration, and Player Organization facilities. Player Organizations (“Org”...robertsspaceindustries.com
We have some graphs that help us to identify after they get stuck in this state.
Since we aren't yet able to observe exactly when the issue happens and can only currently identify the issue after it's affected players, it's possible for a bunch of shards to break and then only become obvious that they're having issues when the matchmaker tries to put players in them.
He really could have stopped at:
'What I do know is that the shards are getting into some kind of bad state...'
Think of it like a car. Stability is how likely it is to flip over and crash. Performance is how well it can accelerate, corner without flipping over and crashing, etc. Usability is how well the driver can interact with the cars various features.
What is the player supposed to have to do to get the eggs?(PS you can totally just yoink the egg which is the heart of the mission puzzle. Looking on track for 4 borked patch missions in a row then?)
This is a subject totally incomprehensibe for 2 dozen guys chatting on the internet...ie, not generally a subject discussed at a sausage party where the main focus is space gamesWhat is the player supposed to have to do to get the eggs?
Import them from Canada?What is the player supposed to have to do to get the eggs?
What is the player supposed to have to do to get the eggs?
Guys, GUYS! they have graphs!
We have some graphs that help us to identify after they get stuck in this state.
How about having some logging going on to work out what's causing the crashes, and, erm, well, you know, fixing it?
Oh and in minor depreciation news, they're no longer selling the Aurora starter packs.
The ship so old it couldn't be fixed...
(The first venerable name on a very long list)