Discussion What is the most efficient way to crowdsource the 3D system coordinates

TornSoul · Oct 27, 2014

if you're opposed to caching

I'm NOT opposed to cashing. I'm arguing it won't be a factor.

(and yes I know that cashing expires, else it wouldn't be very useful)

You brought it up in the GET/POST debate - And I'm just trying to point out that in reality caching in our case won't matter, and is invalid/irrelevant as an argument pro/con GET/POST.

---------

I can't really accept the "maybe we'll need structured queries" argument unless you come up with a plausible example, and one that can't be done with a query string in the URL

That's an unfair request (pun intended), as you yourself in the JSON(/XML) vs csv argument have pointed our that structured data indeed can always be flattened, it just won't necessarily be very pretty(/convinient).
Thus, that you can, doesn't mean that you should.

Do you really see POST as being an impossible way of doing things?

----

One other thing I'd like to bring up however:

the "verified" list is unlikely to change frequently and the reference list won't change at all

You seem to indicate there is a need to keep two different lists (that can be requested independently)?

I don't see that need - To me "a system is a system" once it's in the TGC.

If a list of systems is requested, that's what they get (of all systems), they shouldn't need to request two different lists (two different URLs?) to get all the data.

Most tools will (should) operate along the lines of
Step 1: Request a full pull of all systems
Step 2 to n: Request a list of systems that have changed since date xxx (date xxx being when they last requested a list)

That really (initially) should be the only thing needed.

A returned system could then have a "veracity field" or similar, indicating how sure TGC is about that bit of data (which would then be the highest possible for one of the canonical systems supplied by FD)

That keeps things simple.

That same "veracity field" - will work for distance data as well.

Keeping things nice and simple.

------

GET vs POST

According to the HTTP standard POST is simply the wrong method to use in this case.

And everyone is abiding by the standards all the time ofc (where's the sarcasm tag

).

FWIW - On principle I agree, but in practice, it has just turned out in my experience to end up being simpler to have every call be POST - For a multitude of reasons.

Set it up once, and forget about it (and it's not exactly rocket science)
And all your code can use the same function to fetch data.

Whether it uses structured data or not.

And as I've mentioned several times, POST is just a personal preference for me - But it's based on 10+ years in the business of working with both.

Neither is however a deal breaker.

(Except in one rare case (I've had it happen perhaps twice ever) where GET can't be used for the request because the request data is too large. GET is usually restricted to ~1024 chars iirc. Could be even lower on mobile? Which is a practical browser limitation, and not a standards limitation)

-----

I don't think it needs to be autonomous from day 1. I'd rather have human verification of data until we're comfortable that the system is not going to get polluted with bad data.

Should the system get polluted with bad data (which to me seem neigh impossible, with the ideas I have of the system) then we would simply have to fix that by hand. No biggie.
Tools requesting "data since date xxx" would automatically get the correct new data.

If we don't build a system from the outset with the intention of it being autonomous, then there's a serious risk it never will be.

I've seen way to many "we'll fix this later" that never got fixed... (by many I actually mean almost 100%.... Sad but true)

Lets aim do to it right the first time - So we don't have to "patch it up later".
Not to mention, it's in fact extra work having to build in a system for human intervention. I'd rather avoid that (not minor) complication.

---------

Another aspect to this is that any redundant information has the potential to be out of sync. If coordinates are included with the distances then the client either has to either ignore them (defeating the purpose), take it on faith that they are always consistent, or check them and handle inconsistencies somehow. I'd rather see a single canonical value in the output.

I did not mean that the back store itself should save coordinates with the distances - Just that when the TGC provides the output it provides the canonical coords along with the distances.
With that model there is no sync/consistency issues.

I would personally appreciate having the coordinates at hand, supplied by the TGC (ie. it's trusted data), along with the distance data.
Makes life easier.

------

are we expecting tool makers using this data to synchronise with TGC and populate users' instances of the tool with data from their copy (effectively acting as a cache for the data), or are we expecting the users' instances to grab data directly? The former would be a lot friendly to TGC in terms of bandwidth but the later is easier from the point of view of the tool makers.

I'm expecting tool makers to make tools that will work according to the two steps I described above.
Step 1: Request a full pull of all systems
Step 2 to n: Request a list of systems that have changed since date xxx (date xxx being when they last requested a list)

Anything else (like requesting a full list all the time) would be Bad(tm).

Requiring tool makers to provide a proxy I think would be unreasonably - And many probably can't offer hosting. It would be a shame to cut off a potential great tool because of that.

TornSoul · Oct 27, 2014

GET vs POST

I've just realized why it is I so very much prefer POST over GET.

(This is for the .net stack that I work with - Can't speak for how it works elsewhere.)

With a POST (with JSON data), I can set up my webservice so that it automatically parses all the (POST) data, and puts it into a nice little object automatically that I can then access the properties of regardless if it's a variable length of distances or whatever.

I just need to write the class definition for that object and I'm done. That's all there is to it, it's *that* simple.
Even if the (POST) data is a list of items, or a mix, or deeply nested (structured data).
I don't have to write a single line of code at all to do the parsing.

With a GET on the other hand - In the case where the parameters can vary, like with number of distances (some are there sometimes and sometimes not etc), I need to actually write code to handle all that.
I hate having to do that all the damn time. Need to manually break down the query string and put the components where I need them, and validate that numbers are numbers and so on and so forth.
It's tedious as hell - And I've done it a million times already

(GET: If the parameters are fixed - I can get of as easy as with POST, easier in fact as I don't need to write the class "wrapper")

It suddenly struck me that that's the main reason why I prefer POST.
I've done it so long, that I've completely forgotten why

But server side (in the .net stack at least) POST makes things 100 times easier to deal with.

Just a little FYI.

Athan · Oct 27, 2014

Taking a step back from technical/implementation details, let's look at the actual use cases of TGC.

I'll ignore the input to TGC for now, as that's of a more technical nature.

TGC Output Users

Tools that simply want system co-ordinates; route planners, galaxy maps and the like.

No interest in distance data, just the known, verified (be that from a FD-provided reference list or systems we have high confidence about) positions. To be honest most of these users won't even care if we're off by up to 0.1ly in a co-ordinate (and work so far points to possible errors being more of the order of 1/32 or 1/16th at most).

To aid in not wasting bandwidth (and processing time on the client), this simply needs a "new/changed since DateStamp" specifier. If you want all data then just don't supply this. That's all this particular use-case needs.
.
Tools that are helping to collect input data for TGC

These will definitely want distance data, which means something like: {"target":"systemname", "distances":["<systemname>":"<distance>"]}
They might optionally want co-ordinates, and may want only verified or all currently known data. Thus the query system needs to allow for flags.
.
Possible future expansions of data, such as station distances, allegiance/economy/etc info per system

These could call for expanding the default data, but I think this a bad idea as you don't want an existing tool to have to deal with such (even if it's just ignoring data and associated wasted bandwidth). So anything additional like this would need to be requested via a flag. This means the output for #1 needs to be formatted such that adding this additional information is easy.

Did I miss any ?

TGC Input Users

Tools purely supplying distance information

The only complication here is if the API provides for only supplying one distance at a time (current system, target system, distance to it), or if 'batch' input is allowed. This would allow for things like a program that OCRs the Navigation pane (yes, only 2dp, I know, but all data is data, and can help. I'd rather have a system's co-ordinates to ~1/8th ly than none at all).
.
Tools supplying distance information, but also their idea of resulting co-ordinates

Like #1, but the tool can optionally supply its "guess" at the target system co-ordinates. I'm not even sure if this is necessary, because TGC will want to verify the data anyway. The only possible use of this would be a tool that's only supplying partial data to TGC and TGC output is willing to include this (with a suitably low 'trust' score).
.

That's it, right ?

Back to the technical side of things...

I don't see why both simple GET queries and also POST queries can't be supported in the same system. In fact I'd probably offer just the GET to start with, no POST support, but have in mind expanding into POST queries in addition if any future queries make it necessary. If that did become necessary then GET support would continue, but anyone wanting access to the more sophisticated queries would have to use POST. Obviously any new POST query structure would have to also offer support for GET. Some people might see this as an argument to simply cut to the chase and offer POST-only from the start. So in the end it will be down to the personal preference of who codes up and offers the service. But it is just that, personal preference. The rest of us will just have to code to whatever API is offered, on the assumption that the API is well defined and backwards compatible (via a version field if needs be, and deprecation over a sensible time period would be OK).

On the issue of bandwidth... I'd hope that ensuring the server offers common compression methods (i.e. gzip) would aid in this. It's worked well for me in the past. Simply offer it on the server side and any client will need to send appropriate Accepts header to ensure it gets used. A small saving on bandwidth can be made by offering two separate output arrays for verified and unverified data, rather than having it all in one and a flag per item.

As has already been mentioned, although not explicitly, I think any date/time stamps should be specified in ISO 8601 date and time representation. For convenience of users perhaps this should support timezones other than UTC/Zulu, but I can see an argument for that leaving it to the tool makers to convert local time to UTC if needs be. To aid in this it might be an idea to have any output include a "now":"<datetime>" field to aid in this (i.e. to work around client machines with badly set clocks/timezones). All a client has to do then is store the last returned "now" to supply as its next "since" query parameter.

Athan · Oct 27, 2014

TornSoul said:
With a GET on the other hand - In the case where the parameters can vary, like with number of distances (some are there sometimes and sometimes not etc), I need to actually write code to handle all that.

In fact I think you just hit on THE reason to use POST and JSON for the input to TGC.

<baseurl>/submitdata?target=Some System&origin1=Origin 1&distance1=...&origin2=...&distance2=...<etc>

is horrible compared to:

{"target":"Some System","distances":["Origin 1":<distance 1>,"Origin 2":<distance 2>,...<etc>]}

The first means having to keep on seeing if a GET paremeter even exists (the originX), and also its matching distanceX. The latter just means iterating over an array.

So, yeah, I'm sold on supporting POST/JSON for the main input to TGC. And of course that also means it would be simpler to use the same scheme for queries. But, if possible, it would still be convenient to offer the simpler GET queries.

Athan · Oct 27, 2014

Oh, and on that note... input to TGC. Do we want to only support multiple new(?) distances to a given target system... or do we also want to support multiple new distances from a known system? Ideally it would be both, so the format needs to support that.

{"target":"Some System","distances":["Origin 1":<distance 1>,"Origin 2":<distance 2>,...<etc>]}

and also:

{"origin":"This system","distances":["new target":<distance>,"another target":<distance>,...]}

Would it be simpler to just support:

{data:[{"origin":<name>,"target":<name>,"distance":<distance>},{"origin":<another name>,...},...]}

i.e. an array of origin|target|distance tuples.

Snuble · Oct 27, 2014

Harbinger said:
Personally my plan is to get out there and start mapping once we have access to beta 3. I'll most likely change my reference stars to give them a nice spread within the new playable area but not until we have further confirmed (not calculated) systems hopefully courtesy of Michael Brookes.

I'm going to leave you guys to discuss the logistics of the site. Until you actually have a site in operation I highly suspect it's going to be down to the few that contributed previously to supply further coordinates/distances.

In the interim I'll update my JSON output to match RedWizzard's layout. It only included extra information so that I could view issues at a glance but distance error checking has made most of the output redundant anyway.

Just tell us where you start so us other can go work somewhere else

Oh, and post whenever you make new data available.

I have update http://travel-eliteadvisor.rhcloud.com/index.php to slice with SOL in a corner. I have also updated all the systems with economy and population.

Harbinger · Oct 27, 2014

Snuble said:
Just tell us where you start so us other can go work somewhere else

I can probably give you a general idea of the area I'm working in at any one time but I doubt it'll be as specific as just one of your slices as they're too small.

Snuble said:
Oh, and post whenever you make new data available.

Naturally.

Athan · Oct 27, 2014

For what it's worth I've improved my real-data star lookup tool a little. It can now be searched by a text field.

http://www.miggy.org/games/elite-dangerous/stardata/

It's still severely restricted by the names that were in my source database (I should really look into sourcing fresh data from simbad or the like), but still might prove useful for people trying to find known real stars. NB: I still don't know why ED has some stars so far out from where I calculated them to be. The specific cases I've checked in detail weren't errors in my source data and I can't spot an error in my methodology

(Although now I think about it, given the obvious use of Single Precision Floating Point in the game... I wonder if they did that for their calculations as well and I should redo things in that 'style' and see what happens....)

RedWizzard · Oct 28, 2014

TornSoul said:
Do you really see POST as being an impossible way of doing things?

No, not at all. It's just my preference for the reasons I've mentioned, and seems to be by far the most common choice. But if you're implementing it then you make the final decision. Same for the other things we've discussed - they all pretty minor and he who implements, chooses.

I do hope you're right in terms of bandwidth not being a concern though because it'll be harder to fix a POST-based system than it would be to add a POST option to a GET-based system if it became necessary.

TornSoul said:
You seem to indicate there is a need to keep two different lists (that can be requested independently)?

A returned system could then have a "veracity field" or similar, indicating how sure TGC is about that bit of data (which would then be the highest possible for one of the canonical systems supplied by FD)

That same "veracity field" - will work for distance data as well.

Yes, that will work just as well. It's another field that will have to be included in the output but it's not really a big deal.

TornSoul said:
(Except in one rare case (I've had it happen perhaps twice ever) where GET can't be used for the request because the request data is too large. GET is usually restricted to ~1024 chars iirc. Could be even lower on mobile? Which is a practical browser limitation, and not a standards limitation)

The limit is about 2000 bytes these days I believe (I think it was older versions of IE that had a lower limit). I doubt we'd hit that limit though.
The other case where POST can be preferable is if you are dealing with a characterset other than ascii/latin1. I don't think that applies here either.

TornSoul said:
I did not mean that the back store itself should save coordinates with the distances - Just that when the TGC provides the output it provides the canonical coords along with the distances.
With that model there is no sync/consistency issues.

I would personally appreciate having the coordinates at hand, supplied by the TGC (ie. it's trusted data), along with the distance data.

It's poor practice to assume anything, even from trusted sources. There is always the possibility of bugs. Just as the server must not assume data provided to it is accurate and consistent, so client tools should be checking the consistency of the supplied coordinates.

TornSoul said:
I'm expecting tool makers to make tools that will work according to the two steps I described above.
Step 1: Request a full pull of all systems
Step 2 to n: Request a list of systems that have changed since date xxx (date xxx being when they last requested a list)

Anything else (like requesting a full list all the time) would be Bad(tm).

Requiring tool makers to provide a proxy I think would be unreasonably - And many probably can't offer hosting. It would be a shame to cut off a potential great tool because of that.

So for web-based tools the tool webserver would fetch the full list rather than each user's browser doing it? Similarly offline tools would be distributed with a full list that had been pulled by the developer? That's what I meant by proxying. If that's the plan I think it needs to be stated explicitly somewhere otherwise web-based tools in particular are probably going to grab the full list from each user's browser (and each time they open since it's not easy to store it between sessions on the client side).

TornSoul said:
GET vs POST

I've just realized why it is I so very much prefer POST over GET.

(This is for the .net stack that I work with - Can't speak for how it works elsewhere.)

With a POST (with JSON data), I can set up my webservice so that it automatically parses all the (POST) data, and puts it into a nice little object automatically that I can then access the properties of regardless if it's a variable length of distances or whatever.

I just need to write the class definition for that object and I'm done. That's all there is to it, it's *that* simple.
Even if the (POST) data is a list of items, or a mix, or deeply nested (structured data).
I don't have to write a single line of code at all to do the parsing.

With a GET on the other hand - In the case where the parameters can vary, like with number of distances (some are there sometimes and sometimes not etc), I need to actually write code to handle all that.
I hate having to do that all the damn time. Need to manually break down the query string and put the components where I need them, and validate that numbers are numbers and so on and so forth.
It's tedious as hell - And I've done it a million times already

(GET: If the parameters are fixed - I can get of as easy as with POST, easier in fact as I don't need to write the class "wrapper")

It suddenly struck me that that's the main reason why I prefer POST.
I've done it so long, that I've completely forgotten why

But server side (in the .net stack at least) POST makes things 100 times easier to deal with.

Just a little FYI.

Ah, ok. Well that's a limitation of IIS/.net. I'd expect most stacks to be rather more dev-friendly, e.g. node.js provides both GET and POST data in an object hierarchy (I might need a one-liner to decode the JSON somewhere) and you don't need to write wrapper classes or anything else.

JesusFreke · Oct 28, 2014

I'm finally able to release the program I've been using to calculate/verify coords. I refactored it a bit so that it's more of a verification script, although my local version can calculate the coordinates for a star as well as verify existing coordinate. My local one can also handle lower accuracy distances, although since this one gets all distances from systems.json, it assumes the distances are all +/- .0005 accuracy.

This is actually a new algorithm that I implemented this weekend that works much better than my previous one. It's able to explore the area in and around the candidate region fairly quickly, even for "hard" stars, like alpha cygni.

The only star that it doesn't currently verify is Sagittarius A*, because there are multiple grid-aligned coordinates that match the distances we have. (We ended up using the screenshot method to find/verify the correct coordinate for this one.)

https://github.com/JesusFreke/edist

Requirements:
python 2.7, numpy and lmfit

usage:
python edist.py systems.json

using RedWizzard's canonical systems.json format

RedWizzard · Oct 28, 2014

Athan said:
In fact I think you just hit on THE reason to use POST and JSON for the input to TGC.

<baseurl>/submitdata?target=Some System&origin1=Origin 1&distance1=...&origin2=...&distance2=...<etc>

is horrible compared to:

{"target":"Some System","distances":["Origin 1":<distance 1>,"Origin 2":<distance 2>,...<etc>]}

The first means having to keep on seeing if a GET paremeter even exists (the originX), and also its matching distanceX. The latter just means iterating over an array.

So, yeah, I'm sold on supporting POST/JSON for the main input to TGC. And of course that also means it would be simpler to use the same scheme for queries. But, if possible, it would still be convenient to offer the simpler GET queries.

Input should be POST, no question. We were discussing output.

Athan said:
Oh, and on that note... input to TGC. Do we want to only support multiple new(?) distances to a given target system... or do we also want to support multiple new distances from a known system? Ideally it would be both, so the format needs to support that.

{"target":"Some System","distances":["Origin 1":<distance 1>,"Origin 2":<distance 2>,...<etc>]}

and also:

{"origin":"This system","distances":["new target":<distance>,"another target":<distance>,...]}

Would it be simpler to just support:

{data:[{"origin":<name>,"target":<name>,"distance":<distance>},{"origin":<another name>,...},...]}

i.e. an array of origin|target|distance tuples.

The assumption has been that the first of those formats is the one that will be used. The problem with the other two options is that it means the server must keep distances for systems with insufficient data to fix a location and I don't think that's the plan (at least not initially). Personally I have done quite a lot of data collection that way but I've manually been adding it to a file containing "unused" distances (distances.json in my source) which I can then extract systems from when there are enough distances to get a location. So I'd be in favour of it, but I think it's more important to get something running first.

RedWizzard · Oct 28, 2014

JesusFreke said:
I'm finally able to release the program I've been using to calculate/verify coords.

Great! I've been hoping we'd get to see this

Snuble · Oct 28, 2014

So who of you support bulk upload atm? When I start exploring I will start storing distances in my own database. It would be most convenient for me to create a dump for bulk upload, format can be adjusted to requirements.

RedWizzard · Oct 28, 2014

Snuble said:
So who of you support bulk upload atm? When I start exploring I will start storing distances in my own database. It would be most convenient for me to create a dump for bulk upload, format can be adjusted to requirements.

I can do it. Easiest for me if it's in my JSON format, or in the format TornSoul settles on.

Snuble · Oct 28, 2014

RedWizzard said:
I can do it. Easiest for me if it's in my JSON format, or in the format TornSoul settles on.

Do you have a defined upload (url + what you expect in the json) where I can dump my data? The alternative is for me to take one of the defined single system web forms and just loop POSTs to it.

wolverine2710 · Oct 28, 2014

graham.reeds said:
I need system + station for my market tool. I hope that there have been substantial changes to the commodities screen to make it easier to OCR and work out where it is.

Have you released your tool to the public? I've searched for threads you have started but can't seem to find your market tool. If its been released, can you please give me the url so I can add it to OP.

Snuble · Oct 28, 2014

I still hope the coords will be displayed on galaxy map this update tough... While it was fun to figure it out the first time, getting them for 1500 more systems will be tedious.

wolverine2710 · Oct 28, 2014

Athan said:
For what it's worth I've improved my real-data star lookup tool a little. It can now be searched by a text field.

http://www.miggy.org/games/elite-dangerous/stardata/

It's still severely restricted by the names that were in my source database (I should really look into sourcing fresh data from simbad or the like), but still might prove useful for people trying to find known real stars. NB: I still don't know why ED has some stars so far out from where I calculated them to be. The specific cases I've checked in detail weren't errors in my source data and I can't spot an error in my methodology

(Although now I think about it, given the obvious use of Single Precision Floating Point in the game... I wonder if they did that for their calculations as well and I should redo things in that 'style' and see what happens....)

Have briefly looked at it. Are there plans to use all SB2 coordinates? I've added it to the OP.

wolverine2710 · Oct 28, 2014

Snuble said:
I still hope the coords will be displayed on galaxy map this update tough... While it was fun to figure it out the first time, getting them for 1500 more systems will be tedious.

Last week when I wrote Michael Brookes a PM about getting a partial list of SB3 coordinates. His complete response.

A partial list should be possible, although hopefully there will be less need.

Didn't post the last part of the reply because I didn't want to derail the thread. But as SB3 hits today... I think when SB3 is out its worth checking what exactly he did mean with that sentence. I'm hoping for the coordinates to be shown somewhere in ED but I think ED will now have some form of navigation planner mentioned in the DDF. As in enter destination and when you have done the first jump the next jump is already set in the computer. Hence no need to look up the next system, select it as destination and then jump. Rinse repeat. We will have to wait what is in ED SB3 today. Very curious.

wolverine2710 · Oct 28, 2014

JesusFreke said:
I'm finally able to release the program I've been using to calculate/verify coords. I refactored it a bit so that it's more of a verification script, although my local version can calculate the coordinates for a star as well as verify existing coordinate. My local one can also handle lower accuracy distances, although since this one gets all distances from systems.json, it assumes the distances are all +/- .0005 accuracy.

This is actually a new algorithm that I implemented this weekend that works much better than my previous one. It's able to explore the area in and around the candidate region fairly quickly, even for "hard" stars, like alpha cygni.

The only star that it doesn't currently verify is Sagittarius A*, because there are multiple grid-aligned coordinates that match the distances we have. (We ended up using the screenshot method to find/verify the correct coordinate for this one.)

https://github.com/JesusFreke/edist

Requirements:
python 2.7, numpy and lmfit

usage:
python edist.py systems.json

using RedWizzard's canonical systems.json format

I've added you to list1 on the OP. Just used SourceTree to clone your git repository. I'm getting an error when running it.

Python version: Python 3.4.1
Command: python edist.py "C:\Data\Games\Elite Dangerous\RW-3d coords git\systems.json"
Error:
File "edist.py", line 166
print "******************************"
^
SyntaxError: invalid syntax

Not sure what I'm doing wrong.

Discussion What is the most efficient way to crowdsource the 3D system coordinates

TornSoul

TornSoul

Athan

Athan

Athan

Snuble

Harbinger

Athan

RedWizzard

JesusFreke

RedWizzard

RedWizzard

Snuble

RedWizzard

Snuble

wolverine2710

Tutorial & Guide Writer

Snuble

wolverine2710

Tutorial & Guide Writer

wolverine2710

Tutorial & Guide Writer

wolverine2710

Tutorial & Guide Writer