I'm NOT opposed to cashing. I'm arguing it won't be a factor.if you're opposed to caching
(and yes I know that cashing expires, else it wouldn't be very useful)
You brought it up in the GET/POST debate - And I'm just trying to point out that in reality caching in our case won't matter, and is invalid/irrelevant as an argument pro/con GET/POST.
---------
I can't really accept the "maybe we'll need structured queries" argument unless you come up with a plausible example, and one that can't be done with a query string in the URL
That's an unfair request (pun intended), as you yourself in the JSON(/XML) vs csv argument have pointed our that structured data indeed can always be flattened, it just won't necessarily be very pretty(/convinient).
Thus, that you can, doesn't mean that you should.
Do you really see POST as being an impossible way of doing things?
----
One other thing I'd like to bring up however:
the "verified" list is unlikely to change frequently and the reference list won't change at all
You seem to indicate there is a need to keep two different lists (that can be requested independently)?
I don't see that need - To me "a system is a system" once it's in the TGC.
If a list of systems is requested, that's what they get (of all systems), they shouldn't need to request two different lists (two different URLs?) to get all the data.
Most tools will (should) operate along the lines of
Step 1: Request a full pull of all systems
Step 2 to n: Request a list of systems that have changed since date xxx (date xxx being when they last requested a list)
That really (initially) should be the only thing needed.
A returned system could then have a "veracity field" or similar, indicating how sure TGC is about that bit of data (which would then be the highest possible for one of the canonical systems supplied by FD)
That keeps things simple.
That same "veracity field" - will work for distance data as well.
Keeping things nice and simple.
------
GET vs POST
And everyone is abiding by the standards all the time ofc (where's the sarcasm tagAccording to the HTTP standard POST is simply the wrong method to use in this case.
FWIW - On principle I agree, but in practice, it has just turned out in my experience to end up being simpler to have every call be POST - For a multitude of reasons.
Set it up once, and forget about it (and it's not exactly rocket science)
And all your code can use the same function to fetch data.
Whether it uses structured data or not.
And as I've mentioned several times, POST is just a personal preference for me - But it's based on 10+ years in the business of working with both.
Neither is however a deal breaker.
(Except in one rare case (I've had it happen perhaps twice ever) where GET can't be used for the request because the request data is too large. GET is usually restricted to ~1024 chars iirc. Could be even lower on mobile? Which is a practical browser limitation, and not a standards limitation)
-----
I don't think it needs to be autonomous from day 1. I'd rather have human verification of data until we're comfortable that the system is not going to get polluted with bad data.
Should the system get polluted with bad data (which to me seem neigh impossible, with the ideas I have of the system) then we would simply have to fix that by hand. No biggie.
Tools requesting "data since date xxx" would automatically get the correct new data.
If we don't build a system from the outset with the intention of it being autonomous, then there's a serious risk it never will be.
I've seen way to many "we'll fix this later" that never got fixed... (by many I actually mean almost 100%.... Sad but true)
Lets aim do to it right the first time - So we don't have to "patch it up later".
Not to mention, it's in fact extra work having to build in a system for human intervention. I'd rather avoid that (not minor) complication.
---------
Another aspect to this is that any redundant information has the potential to be out of sync. If coordinates are included with the distances then the client either has to either ignore them (defeating the purpose), take it on faith that they are always consistent, or check them and handle inconsistencies somehow. I'd rather see a single canonical value in the output.
I did not mean that the back store itself should save coordinates with the distances - Just that when the TGC provides the output it provides the canonical coords along with the distances.
With that model there is no sync/consistency issues.
I would personally appreciate having the coordinates at hand, supplied by the TGC (ie. it's trusted data), along with the distance data.
Makes life easier.
------
are we expecting tool makers using this data to synchronise with TGC and populate users' instances of the tool with data from their copy (effectively acting as a cache for the data), or are we expecting the users' instances to grab data directly? The former would be a lot friendly to TGC in terms of bandwidth but the later is easier from the point of view of the tool makers.
I'm expecting tool makers to make tools that will work according to the two steps I described above.
Step 1: Request a full pull of all systems
Step 2 to n: Request a list of systems that have changed since date xxx (date xxx being when they last requested a list)
Anything else (like requesting a full list all the time) would be Bad(tm).
Requiring tool makers to provide a proxy I think would be unreasonably - And many probably can't offer hosting. It would be a shame to cut off a potential great tool because of that.