Now you have me curious- why would FD not be able to use AWS to provide a service AWS is actually designed for, ie, the quick up- and downspinning of instances depending on demand? I understand from your posts that you have a background in DB engineering (or use) but I'd hazard a guess that AWS as a tool is exactly right for this type of use case short of building your own infrastructure. I'm not saying FD got this right, or that their load prediction algos need a bit of tweaking, etc etc, but I'm honestly curious as to why you think AWS is the wrong tool for this job. PM is fine if this gets off topic.
Sorry for the delay, I went off for an Armored Warfare session and then bed, and this will be a bit of a TLDR as I'm headed off for a bit, I'll do the quick and dirty now -
AWS does two specific services well - Big data, and big processing. Big processing is where you've got something that needs a lot of processing power over an extended period of time and the system can hunker down and get on with it, things like long term simulations (Financial modelling is a really good example). Big data is where you've got something that has a lot of information where you need to drill down through that data and pull information out of it and get specific bits out to examine and analyse, you'll see a lot of big data projects where analytics are a thing.
What AWS really does NOT do well is "realtime" work. AWS is not designed to handle interactive loads, now remember when I mentioned big processing? That's where you basically spin up a VM and leave it to cook for a while, that's great, AWS can do that with no problems at all. When you've got very spiky, uneven loads, AWS has real problems dealing with it, because ultimately VM's are only as good as the weakest link in the chain.
AWS is this big fuzzy cloud of lots of hardware distributed all over the world, when you have a very interactive and realtime load, you want that VM to be run on a bunch of hardware that's pretty much *in the same building* as the VM, that rarely happens when AWS is involved, so there's delays, usually milliseconds, but milliseconds is enough, milliseconds add up fast when dealing with real time loads. When dealing with thousands and tens of thousands of people with real time loads, you multiply those milliseconds, you don't add them, it's an exponential problem. Which is why if you're dealing with an interactive issue in AWS you have to assume a factor of 10x whatever it is you're planning for because otherwise the moment it starts being put under load, the milliseconds become noticeable.
That's why AWS is a farm tractor. You don't use it for WRC rallies.