My suggestion is "The Great Collector" (TGC) which is running on a webserver somewhere. Basically it does the following. A volunteer enters distances for a star system using ONE the tools created by a commander and that tool calculates the coords. After that the tool sends distances and coords to The Great Collector. TGC then updates The One Reference (TOR). TOR could be in multiple format. For example The Reference Format (TRF) of RedWizzard and a Trade Dangerous compatible CSV format or the CSV format of wtbw. TOR is used by all tools to highlight to a user that the systems a user wishes to enter distances for is already done.
Sounds good. Here's how I think the webservice (TGC) should work at a slightly higher level of detail (all of this IMHO, of course):
1. The main input should be distances between two systems. At least initially our frontend tools should only submit distances that are known to locate systems without error, i.e. at least 5 distances from the unknown system to reference stars with coordinates provided by FD that together generate correct coordinates (correct coordinates beings ones that generate the same distances to 3 dp using the single precision calculation). I do think it is worth collecting any distance data people are willing to submit (such as insufficient numbers of distances and distances to non-reference systems), but let's focus on what we need to actually locate systems first.
2. Submitting coordinates with the distances should be optional, and if they are submitted they should be checked by the TGC. Since it can check coordinates it should. We're all getting the same results so it doesn't really matter whose algorithm is used by the TGC, the implementor can choose.
3. Input format should be a JSON block with the same structure as the output. It seems like the simplest structured format to use. XML, for example, would be overkill.
4. TGC should store two lists of systems: verified systems and unverified systems. At least initially, I think at least one of us should be confirming that a star exists before it is considered verified. This would simply involve opening the galaxy map, searching for the unverified system, and checking that the distance to it (from wherever you are right now) is consistent with the calculated coordinates. This process should give us a high level of confidence that the verified systems are accurate. How the TGC actually stores the data (DB/files/whatever) is fairly irrelevant and whoever implements it can choose.
5. Multiple output datasets should be offered: verified systems with coordinates, unverified systems with coordinates, verified systems with coordinates and distances, unverified systems with coordinates and distances, unattached distance data if we capture it, and perhaps static lists of reference systems. Tools that are only interested in the results don't need the distances but those of us who've written tools to locate systems will want to be able to check the output of TGC is correct and will need distances.
6. Output formats: JSON with the same structure as the input. The implementor can do anything else they want to, of course, e.g. TD CSV, but tool specific formats can just as easily be done by the tool so the implementor shouldn't feel compelled.
7. As the data volume increases it is likely that we will need to be able to fetch data by region: this should be considered in the design. Probably just allowing the fetcher to specify a system and a distance would be enough (i.e. "give me coordinates for all verified systems within 50 Ly of Sol"). I think it should be designed for caching as well: data requests should be URL based (either query or hierarchy, it doesn't really matter) and the TGC should correctly identify unchanged data. If the bandwidth use gets high then we can do things like reducing the frequency of updates to the verified list (once a day or whatever).
8. Duplicate data: TGC definitely shouldn't throw any data away. If a duplicate system is submitted the distances should be merged and the coordinates rechecked. It might be worth keeping a count of how many times each distance has been reported too. If Sol - Ross 128 has been reported 3 times identically then we have a higher degree of confidence about that distance.
9. Bad data: any bad data should be logged and reported somehow. I've seen too many cases where data that looked bad actually turned out to be correct to just throw stuff away.
Additional considerations:
I think that additional data such as distance to stations, allegiance, economy, etc should be stage 2. We'd want that in place by release (and ideally by gamma) though.
Do we want to capture unknown system names without distances or coordinates (which could be then used to prompt volunteers to get distance data)? I was thinking of a logbook tool which kept a record of where a commander flew. It could easily submit system names with no additional effort from the user. But is it the value of that data worth bothering with?
TGC could provide some support for Snuble's plan to partition up the space for searching. E.g. tracking who is searching which box, generating lists of known stars in each box, etc.