Notice Fleet Carrier Update - Known Issues

3lit3pizza · Jun 10, 2020

Im still trying even to get into the game since the update its just telling me state=WaitingForConnection failCount=x

Rat Catcher · Jun 10, 2020

3lit3pizza said:
Im still trying even to get into the game since the update its just telling me state=WaitingForConnection failCount=x

Servers are currently offline again, down time was anticipated to be 30 minutes, so try again in half an hour

MarkyH · Jun 10, 2020

XpertDBA said:
Ah ok, I thought I read somewhere that it came without any Tritium. So why are all these people leaving their FCs in the source system blocking others from buying? I mean, I guess it doesn't matter with all of the bugs/glitches, but still.

It takes at least 1 hr to set up your FC. You have to transfer your ship fleet and spare modules to it, they may not have already been sent to the buying station ahead of time, you need to set up and buy modules, set commodity markets, set security, etc etc. And for some as it took the 6+ hrs to just be able to buy, they now want to go to bed for 8-10 hrs, and sort all that tomorrow. If they are still there tomorrow they are blocking IMHO. Took me about 4 hrs from hitting "update at 6pm BST to be able to buy, set up and then make my first jump, and I had transferred all my stuff to the selling station days before so it was "on the doorstep" for FC transfers.

3lit3pizza · Jun 10, 2020

I really hope it will work then i tried yesterday for 4 hours and also the whole morning and i only get this error..

MarkyH · Jun 10, 2020

Tanager2626 said:
There seems to be a double (rather than triple) LTD Hotspot at Borann now. Is that worth using? Have never done LTD mining so this will be my first foray. Or is the hunt on for a new Triple Hotspot?

If you own a FC already, fine, carry one, give it a try, if not, LTD mining is not your job now, you need to slave away Tritium mining and sell it to us FC owners cheap. get working now.

7th Shadow Wanderer · Jun 10, 2020

Hi team! It seems that things are getting better, - planetary sites began to appear. Just again saw the mark in the hypercruise Jameson Crash Site. I did not manage to land, - the server asked me to leave for 30 minutes to reboot. However, there was a hope that eliminate all problems with time.
Thanks for your work!

Thatchinho · Jun 10, 2020

znôrt said:
good, since gp seems to have some experience in deploys, unlike you. let's see: first of all, consider that any such update has the following requirements in descending order of priority:

not to corrupt existing data

not to disrupt crucial services

reduce the downtime to a minimun

also, some systems do hot deploys which reduce downtime to zero. this is cool but requires a very sophisticated process and is sometimes not possible, and anyway the integrity of the system and its continuing operation is far more important than a short downtime. this is why in most cases a downtime is due, and in any case there must be a contingency plan in place to restore the system to the previous working state if the update cannot go through as planned for whatever reason.

now let's analyze what you consider flaws in his assertion:

this one has no relation whatsoever with the issue.

correct. this is simply the cost of a failed update. how is it a flaw in his assertion? note that a short downtime is always less important than loss of integrity. so if an update fails you have to reverse it, delay it and wait. that's what backups are for.

that's the provider violating requirements #1 and #2 for the sake of 3#. technically, that's not a failed update, that's a completely arxed up one which failed to preserve integrity on all levels. may be fixable (maybe not) but likely at a huge cost of effort, time and user disruption.

these are just speculation and subjective opinions that are completely beside the point gp is making. he did clearly allude to a point where the update is deemed not successful. that's a complex decision neither he nor you get to make, and neither will i. although we're all entitled to our opinions! you might disagree but that still doesn't make your disagreement a flaw in gp's assertion. i've had a quick glance over the failures and it's pretty impressive for my standards (even for frontier standards) and could merit a rollback but, again, that's just my opinion. however, you saying "hey, it will just go away with a few patches and waiting for server warm up" doesn't really float ... in a professional sense.

well, this comes down to work ethics. it's admittedly a bad situation, very bad, a typical arxe up. granted, it's not the same to stall a business, a hospital, a national election or a game. the response will vary with the perceived importance, possible consequences and the resources available. but note that this is already beyond the point of admissible failure, where every due process has failed. yes, it is not uncommon for tech professionals to be sleep deprived on these occasions. comes with the job and, if these occasions are anything but very rare exceptions, it's time to look for another job.

look, it's not really rocket science. you simply:

1. test the new version in parallel in an environment as similar as possible to production.
2. you back up, and close shop
3. you swap versions and test live internally
4. if all goes well you have succeeded, you open the gates for users and keep the back up handy just in case
5. if not, and if you can't fix it immediately, you restore to previous state and reschedule

it's just that frontier either doesn't know and observe this basic professional behavior that would be expected from any serious software developer, or doesn't give a damn about ... you, the customer. you filthy gamer you! if you're fine with that, that's perfect (i guess), but trying to lecture people who do have a background in these procedures with your ignorance is just ... embarrassing.

Ok, before replying to the rest, let me just say that I've re-read the post I was replying to, and I'd taken it as them saying that the rollback should happen. It's the decision making on that that I was commenting on, not the concept or principle of having rollback plans.

For reference, I've previously suggested an approach to try to avoid issues with releases when they hit the player base. See below. I think you'll see the similarity to what you posted, but please note that in addition to the steps you outlined it also attempted to account for any potential issues in the veracity of FD's testing due to co-location of the FD testers:

Thatchinho said:
Or there are differences between the QA environment and the live environment which mean that some issues don’t occur when testing. (Which IMHO is the most likely scenario, certainly more likely then them never even firing things up on some of the supported platforms.)

Big question in that case is are those differences addressable in practicality.

Also if not, then what can be done? (Will come back to this.)

Another factor involved may be the co-location if all testers.

In terms of solutions, if it was all just client side stuff, wouldn’t be too bad, but the server changes complicate things.

My suggestion would be to follow a process along these lines:

Prelim:

1. Identify general external test needs, inc platforms, key OS’s, hardware types, internet bandwidth, geographical region, etc.

2. Identify/confirm/engage external testers (potentially players)

3. Agree Test Plan, responsibilities and schedule.

Main:

1. Complete final internal testing (inc go/no-go decision, and packaging.

2. Commence server upgrade.

3. Release client package to External Testers.

4. Notify External Testers when server upgrade complete, and commence external testing.

5. External testers complete external testing and submit results.

6. Internal review of results and go/no-go decisions.

Testing passes & go decision:

1. Final prep for huge load on servers.

2. Prep comms on any known issues that have been accepted internally.

2. Comms to players that the client upgrade is ready to download inc notification of any known issues.

3. Release client package to all customers.

Immediately post release:

1. Monitor server load, deal with server side issues etc.

2. Troubleshoot any new player issues.

Fully post-release:

1. Review, inc. was the right decision made with regard to any accepted issues, were there issues that weren’t picked up by the external testers but which occurred in the full release and if so what the causes were and future mitigation, etc.

2. General lessons learnt and update of strategy for future releases.

In terms of negative impact, I reckon it would cause a fair bit of delay on launch day on the first attempt, dropping to adding around an hour to downtime on launch day after a few goes and refinement of the process.

Having said all that, I don’t know that FD don’t already do something similar.

I obviously haven’t covered risk management and approach in the event of critical issues and/or a no-go decision, but that’s enough for now.

Also apologies to anyone reading for who the level of detail is too high or too low - it’s an unknown/mixed audience, so I’ve gone for a middle-ish level.

Anyhoo, to the specific points:

znôrt said:
this one has no relation whatsoever with the issue.

It absolutely does have a relation to the matter of whether to rollback or not. Tolerance of downtime vs tolerance of issues is a key factor in the decision.

znôrt said:
correct. this is simply the cost of a failed update. how is it a flaw in his assertion? note that a short downtime is always less important than loss of integrity. so if an update fails you have to reverse it, delay it and wait. that's what backups are for.

Sure, if the update has failed, roll it back. But what's the criteria for declaring it failed? Game being universally inaccessible would be one, but the game itself is generally accessible, and the level of issues in actually connecting and logging in seem to be a lot lower (based purely on forum posts) than on a normal patch day. So as it has not absolutely failed, it's not an 'absolutely must rollback' scenario, it's a weighing up of different options. Hence the point above about tolerance of downtime.

znôrt said:
that's the provider violating requirements #1 and #2 for the sake of 3#. technically, that's not a failed update, that's a completely arxed up one which failed to preserve integrity on all levels. may be fixable (maybe not) but likely at a huge cost of effort, time and user disruption.

Well strictly speaking those criteria:

znôrt said:
not to corrupt existing data

not to disrupt crucial services

reduce the downtime to a minimun

While solid guiding principles, are not absolutes written in stone, and can be changed at discretion according to specific contextual priorities. I'm sure from what you've said that you already know that so I'm just making the point, as it's heavily related to the decision making as to whether to rollback or not. Anyway, by my evaluation:

1. Corruption of existing data? - Not seen any reports of any. Or certainly not many reports anyway. (some are having trouble with visited stars but the data hasn't gone)
2. Disruption of crucial services? - N/A. It's a game. (though I'm sure FD's financial folks would argue that the store is a crucial service, but that's a different point.)
3. Reducing downtime to a minimum. - It here becomes a question of downtimes - full downtime for a rollback, then full downtime for the next attempt at the updated, for all players, vs there being issues with some in game activities, which don't effect everyone.

Given that a rollback (and associated downtime) is being argued for, would it functionally make any difference in terms of missed game time for people to just wait for a patch?

znôrt said:
these are just speculation and subjective opinions that are completely beside the point gp is making. he did clearly allude to a point where the update is deemed not successful. that's a complex decision neither he nor you get to make, and neither will i. although we're all entitled to our opinions! you might disagree but that still doesn't make your disagreement a flaw in gp's assertion. i've had a quick glance over the failures and it's pretty impressive for my standards (even for frontier standards) and could merit a rollback but, again, that's just my opinion. however, you saying "hey, it will just go away with a few patches and waiting for server warm up" doesn't really float ... in a professional sense.

Well, as per first line of the reply, I may have misunderstood the post as being an argument that the rollback decision should be made, as opposed to a post saying that rollbacks plan should be standard.

The point I was making is that there's a lot of things to be considered and weighed up, and that the rollback itself is not something which would be zero-negative impact.

And sure, it could be said that there was speculation in those items I posted, but it's speculation of the form 'what happens every time will probably also apply this time'. It might not apply this time, but it probably will.

znôrt said:
well, this comes down to work ethics. it's admittedly a bad situation, very bad, a typical arxe up. granted, it's not the same to stall a business, a hospital, a national election or a game. the response will vary with the perceived importance, possible consequences and the resources available. but note that this is already beyond the point of admissible failure, where every due process has failed. yes, it is not uncommon for tech professionals to be sleep deprived on these occasions. comes with the job and, if these occasions are anything but very rare exceptions, it's time to look for another job.

It's not really that bad a situation is it though. It's a game. That doesn't mean it couldn't all be done a lot better, but is it really that bad? Even if it is, it's still a matter of optimum path to resolution, and there's good reasons to say that 'no sleep for devs until everything is fixed' is not that optimum path.

znôrt said:
1. test the new version in parallel in an environment as similar as possible to production.
2. you back up, and close shop
3. you swap versions and test live internally
4. if all goes well you have succeeded, you open the gates for users and keep the back up handy just in case
5. if not, and if you can't fix it immediately, you restore to previous state and reschedule

it's just that frontier either doesn't know and observe this basic professional behavior that would be expected from any serious software developer, or doesn't give a damn about ... you, the customer. you filthy gamer you! if you're fine with that, that's perfect (i guess),

Or there is some systemic issue that means that things still go wrong despite doing all that. Some factors:

Prior to this, FD were all working from a single office, so testing is from a single location (possibly through a single ISP, high bandwidth connection, etc), and therefore misses various connectivity related things. Are things on the FD running through dedicated links to the AWS sites? etc. (The current work from home situation may mean this is potentially not actually a significant factor.)
Very high server loads due to large numbers of people downloading the update, at the same time
Very high server loads due to large numbers of people trying to log in and play at the same time
The above two points in combo

I'm sure there's more.

As regards being fine with it. Well I'm not paying for a gold star service. In fact I'm not paying FD any ongoing service charge whatsoever. So yeah, a few days disruption a few times a year really ain't that big a deal. Say 16 days downtime in a year, that's a >95% availability level. If I was paying a service charge with a 99% uptime, I might be annoyed, and would be expecting contractually due compensation. As it is, like I say, I'm not paying for the service.

Anyway, cast your eye over the process I suggested in the old post I linked to, and let me know your thoughts. (I'm already aware that I've also presented arguments against it in this post and the previous post in terms of tolerance of downtime vs tolerance of issues.)

(Oh and for what it's worth, I also suggested a one day mini-beta prior to release as a final check that all issue fixing following the 2nd Beta worked, and no new issues had been introduced.

)

Oldviboy · Jun 10, 2020

D7 · Jun 10, 2020

This looks bad to me. I believe there are hundreds of issues right now but, they are not all independent. Basic function of the commodities market is messed up for one. There are stations that are fine and others are blank marketplaces.

From what I am seeing, I'm thinking they may have to put a fork in this pig until a later date. These don't look like one-off issues. There's some if not many comprehensive structural programming incompatiabilites and overall inherent high level logic issues that can't be just flipped into obedience. Fundamental design looks wrong here as it appears to be confounded- not just subroutine modules. But, hell what do I know about programming...

CMDR QuantumG · Jun 10, 2020

Not sure if this has been reported yet.

Seems as though your selected target changes when you jump into supercruise or if you get interdicted (drop out of supercruise). For example, you come out of a station and select another location in the system. Jumping to supercruise it selects a different location and you have to change it back.

OldManKnott · Jun 10, 2020

CMDR QuantumG said:
Not sure if this has been reported yet.

Seems as though your selected target changes when you jump into supercruise or if you get interdicted (drop out of supercruise). For example, you come out of a station and select another location in the system. Jumping to supercruise it selects a different location and you have to change it back.

Yeah I've had this a few times this afternoon. I put it down to the vast amount of fleet carriers in the same system, (Cupinook).

CMDR QuantumG · Jun 10, 2020

OldManKnott said:
Yeah I've had this a few times this afternoon. I put it down to the vast amount of fleet carriers in the same system, (Cupinook).

It does always seem to be another carrier that's selected

D7 · Jun 10, 2020

By the way, if the FC fix takes a month or so, do I have to pay the upkeep that's growing by the day?

"You've got nothing! You need to put gold in your hold"

3lit3pizza · Jun 10, 2020

Are the servers back up?

Stephen Benedetti · Jun 10, 2020

Hi everyone! Just to let you know, the maintenance finished a little sooner than expected and the game should be back online.

alexzk · Jun 10, 2020

Minor issue: when I docked on my carrier and open main menu, there is "serivce tarif: xx%" there. It shows 1% less then I set in management by slider.

CMDR Ahlos · Jun 10, 2020

Stephen Benedetti said:
Hi everyone! Just to let you know, the maintenance finished a little sooner than expected and the game should be back online.

Have no station sound anymore, and no missions showing on any station i dock to after this "maintenance"..

photomankc · Jun 10, 2020

Sigh.

3lit3pizza · Jun 10, 2020

3lit3pizza said:
Im still trying even to get into the game since the update its just telling me state=WaitingForConnection failCount=x

I found out why my game was waitingforconnection since yesterday.. the explanation.. I went into my network setings and for some reason my network adapter was set to hamachi and not to ethernet.. Like tha f*ck how could that happend..now its working just fine..

Saxon8848 · Jun 10, 2020

Yep, no missions at any station,

Notice Fleet Carrier Update - Known Issues

Community Manager