Discussion What is the most efficient way to crowdsource the 3D system coordinates

KevinMillican · Oct 25, 2014

Harbinger said:
@TornSoul, whilst I can't argue that having just one JSON format is optimal. Once you know the structure of someone else's JSON output it literally takes 2 minutes to figure out how to only grab what you need from it from there on out. A JSON output is essentially just an array in string form.

It could just as easily be done with conditionals and a flag within the json output, identifying the type used.

Personally I think JSON is a poor choice as it is relatively new and structured like a programming language rather than a data exchange format. Implementations vary widely; you only need to look at the examples on json.org site and the output from PHPadmin. I also doubt whether the nature of the data really warrants anything more than a simple delimited file.

CSV or XML would be a more sensible choice, or even Excel - that would help deal with less obvious routes.

TornSoul · Oct 25, 2014

@harbinger
Yes it's about the easiest thing there is - for a human...
But it requires that human to do it every time (to update the TGC parser)

Which means latency for a new tool to go public (which could be a very long latency if - 2-3-4-5 years from now the TGC maintainers have gone)

Your argument goes the other way around as well - Seeing as just how easy it is to "mutate" a JSON - The tool maker can just as well change his to suit that of the TGC in the first place. No latency. Tool working right away.

--

And if we go academical on it...
The proper way to build a stack, is not to let the consumers(tools) dictate the service(TGC), but the other way around.
(TGC is a service even though it's receiving data)

If your tool want to use, I don't know the Twitter API (I assume there is one), good luck asking the Twitter guys to change their API to accommodate your tool... That's just not how it works - and for good reasons.

TornSoul · Oct 25, 2014

KevinMillican said:
Personally I think JSON is a poor choice as it is relatively new and structured like a programming language rather than a data exchange format. Implementations vary widely; you only need to look at the examples on json.org site and the output from PHPadmin. I also doubt whether the nature of the data really warrants anything more than a simple delimited file.

CSV or XML would be a more sensible choice, or even Excel - that would help deal with less obvious routes.

Uhm JSON has been around for about a decade or more - Granted only gaining mainstream popularity within the last.. oh... 5-6-7 years, mainly due to the re-emergence of the web 2.0 (or is it 3.0 by now).

A couple of the main advantages of JSON are:
-It's very ligthweight (compared to xml. CSV ties on this one)
-It's extremely easy to parse by web applications (and for the last 3-4 or so years also .net stacks - not sure about others).
In fact for web apps (JavaScript) it's pretty much automatic (as the format itself is a subset of JavaScript - ie. no parsing needed)

Excel...

Not touching that one.

In this day and age, JSON is the way to go if you wan't to live on the web (web services).

Athan · Oct 25, 2014

My vote would be with a JSON format. Even if someone must parse it by-eye, it's easy enough to have a browser extension to pretty-format it[1]. Else it's just code parsing it anyway. Given the simplicity of the format, whilst also allowing for nesting, I can't see it being difficult to write a parser for a language where there's no existing JSON support.

Also, what TornSoul says about having one, well defined format for the central server/repository just makes sense. Anyone wanting a different format gets to write a to/from converter for their needs.

Has everyone so far been making sure their tools will handle UTF-8 for the system/station names? Whilst FD appears to have stuck to pretty much plain ASCII so far there's no telling what they might allow for player named systems and stations.

[1] - For instance if I query my stardata thing: http://www.miggy.org/games/elite-dangerous/stardata/get-edstars.pl?x=0&y=0&z=0&range=10 then the JSON View extension I have in Chrome formats the output very nicely.

Biteketkergetek · Oct 25, 2014

KevinMillican said:
Personally I think JSON is a poor choice as it is relatively new and structured like a programming language rather than a data exchange format. Implementations vary widely; you only need to look at the examples on json.org site and the output from PHPadmin. I also doubt whether the nature of the data really warrants anything more than a simple delimited file.

CSV or XML would be a more sensible choice, or even Excel - that would help deal with less obvious routes.

I find JSON a better data format than XML because it can represent both arrays and maps easily and explicitly. I don't think it is structured like a programming language, perhaps this is just your personal opinion. XML is unnecessarily complex, and easy to mess up (should the data be an attribute or inside a tag?). There are valid reasons why people hate it. Where did you get the information that implementations vary widely? I didn't have the slightest problem dealing with it in the languages I have tried so far.

I wonder how and for what purpose you would use the data for if you thought even Excel is even better. It would be terrible for developers using and updating the data possibly from multiple sources through modern version control systems.

Biteketkergetek · Oct 25, 2014

TornSoul said:
A couple of the main advantages of JSON are:
-It's very ligthweight (compared to xml. CSV ties on this one)
<snip>

I just happened to write some code using CSV instead of JSON. Once there is some structure in the data (like stations) or optional fields (like government, allegiance and economy that makes sense for populated systems only) CSV runs out of steam. Also not well defined, separator could be ';' (since comma is the decimal separator in many languages) strings could use single or double quotes. I can't think of a reason why would it be better, except maybe if the data has really no structure and one needs to edit it frequently manually.

TornSoul · Oct 25, 2014

I agree with everything you listed Biteketkergetek - I just wanted to keep it short in my first post is all

I've worked with all of the above mentioned formats professionally over the years - Each had it's heydays, only to be replaced by something better (as in easier to work with - aka being more productive) - Currently JSON is king.

Perhaps in 5 years time it'll be something else - Although I doubt it... As I don't see that happening until JavaScript is replaced with something else - And that is going to take quite awhile...

Biteketkergetek · Oct 25, 2014

Well, I have worked with CSV almost all day long, it took me much more time to deal with than JSON ever did. Then CMDR Millican suggested XML and Excel, and I couldn't keep a straight face any longer.

I excuse myself. I just felt the conversation was somewhat opinionated, and missed the more factual (theoretical and practical) points why people tend to chose JSON over XML and CSV formats.

RedWizzard · Oct 26, 2014

KevinMillican said:
Personally I think JSON is a poor choice as it is relatively new and structured like a programming language rather than a data exchange format. Implementations vary widely; you only need to look at the examples on json.org site and the output from PHPadmin. I also doubt whether the nature of the data really warrants anything more than a simple delimited file.

CSV or XML would be a more sensible choice, or even Excel - that would help deal with less obvious routes.

Any JSON implementation that does not conform to the ridiculously simple specification on json.org is broken. From what I could find with a two minute search on Google phpmyadmin's output looks ok though.

CSV is just as bad in terms of implementation variation, in fact a lot worse. I don't see how you can complain about JSON in that regard and also advocate CSV.

The data is structured enough that CSV or any other flat format would require two files or a lot of repetition. Systems have some directly attached data (e.g. name, coordinates) and a set of distances (which have e.g. target system, distance). To do that as a single CSV would require repeating the system specific data for each distance, which is inefficient and requires the TGC parser to handle inconsistencies. It's also inflexible: if we want to add data for stations with JSON it's completely trivial and backwards compatible (old systems will just automatically ignore the new field). If we want to add that for CSV then we can only really add it as a separate file - shoehorning it into one file would be horrible.

XML is well suited to the data, and has the advantage that we could write a schema for validation. But it's heavyweight and more onerous to implement and that is a big negative. It generally requires parsing twice to get it into a native representation in a language: XML -> DOM -> native (though for something this simple a streaming parser would probably be the way to go if available).

XML and CSV are really polar opposites with JSON in the middle. An argument for CSV over JSON is also argument for CSV over XML, and an argument for XML over JSON is also an argument for XML over CSV. I can't really see how you could oppose JSON and suggest both as alternatives.

Excel is simply a terrible idea. It's complicated, fragile, platform-dependent, version-dependent, and requires software that not everyone has (and that probably won't even be available for most hosting solutions). It has no advantages.

TornSoul · Oct 26, 2014

RedWizzard said:
XML is well suited to the data, and has the advantage that we could write a schema for validation.

JSON has as well http://json-schema.org

I've never tried using it - So no idea how much of a bother it is. I've just stumbled over the fact that it exists.

TornSoul · Oct 26, 2014

A challenge (I hope)

5 distances to known systems are submitted.
Unfortunately one of the distances has a typo (is wrong)

The challenge:
Can you determine p0
Can you determine which is the wrong distance

Data

Code:

	       x 	    y 	              z 	dist
p1	  -6.00000 	 36.28125 	-19.87500 	109.337
p2	 -51.87500 	 16.78125 	 -0.37500 	 67.447
p3	 -45.46875 	 18.56250 	 12.59375 	 79.204
p4	  -1.62500 	 16.81250 	  4.00000 	116.216
p5	 -20.12500 	 19.28125 	 19.00000 	104.498

RedWizzard · Oct 26, 2014

TornSoul said:
A challenge (I hope)

5 distances to known systems are submitted.
Unfortunately one of the distances has a typo (is wrong)

The challenge:
Can you determine p0
Can you determine which is the wrong distance

Data

Code:

x y z dist p1 -6.00000 36.28125 -19.87500 109.337 p2 -51.87500 16.78125 -0.37500 67.447 p3 -45.46875 18.56250 12.59375 79.204 p4 -1.62500 16.81250 4.00000 116.216 p5 -20.12500 19.28125 19.00000 104.498

Easy

p0 is [-113.25, 16.78125, -28.34375] aka BD+63 1764.
My entry page shows a 0.001 error in the distance to p5, it should be 104.499.

RedWizzard · Oct 26, 2014

TornSoul said:
JSON has as well http://json-schema.org

I've never tried using it - So no idea how much of a bother it is. I've just stumbled over the fact that it exists.

Yeah, I've heard of that but haven't tried it either. To me tacking schemas onto JSON seems like going too far down the route of trying to make JSON fit XML's problem domain. If you need schemas why not just use XML?

TornSoul · Oct 26, 2014

RW here is one specifically for you (I got more)

Code:

Jurua             75.449
LHS 246          116.532
Nang Ta-khian    120.834
LHS 3006         118.445
Hepa             120.902

Your entry form reports: Coordinates: (-134.625, 33.25, -38.0625)
While the correct coords are : Bhotepa (-134.625, 33.21875, -38.0625)

This is more of a question than a test - As I've already seen what happens

In my case I would throw it all away as unreliable data. But I know you have a different view on this.

My question is - How would you handle a case like this one?
- With regards to the optained coords (which are wrong)
- The submitted distances (which apparently have errors - They don't)

Unlike in the previous example there is error!=0 on more than one distance (so you can't pinpoint one and say that one must be wrong)

Snuble · Oct 26, 2014

Well, I understand developers that want to stick to what they know. I am one of them. However, I've also learned that things change over time. I've had to adapt from html dom to flash and now back to jquery/html5. I've prefered xml, but its always been somewhat of a dinosaur to work with, mostly because it can be hard for untrained eyes to read it even if it presents clear advantages for me in a maintenance role once it is working. That is why csv often is prefered in data exchange by me. Now, I'm a bit new to this json thing. More then a few 3rd party tools have forced me to learn it. And the old fart that I am have to admit I start liking it. Enough that it start to work its way into my own code.

Also we have to understand that once the game is released, and hopefully become a waste of time for millions of people, a few, or even one tool will become what "everyone" use, and as such also be what decides data formats. Or if formats is even shared anymore.

Right now, it is best if those that want to gather data personaly do so, but that they also share their data. Right now, what is needed is A) a way to download coords B) a way to download distances C) a way to UPLOAD multiple distances (if tool accept them in a form...). D) a way to UPLOAD multiple coords (if tool have a way of quality check that data). This is what any tool exposing themself to the public trough this effort should provide.

RedWizzard · Oct 26, 2014

TornSoul said:
RW here is one specifically for you (I got more)

Code:

Jurua 75.449 LHS 246 116.532 Nang Ta-khian 120.834 LHS 3006 118.445 Hepa 120.902

Your entry form reports: Coordinates: (-134.625, 33.25, -38.0625)
While the correct coords are : Bhotepa (-134.625, 33.21875, -38.0625)

This is more of a question than a test - As I've already seen what happens

In my case I would throw it all away as unreliable data. But I know you have a different view on this.

My question is - How would you handle a case like this one?
- With regards to the optained coords (which are wrong)
- The submitted distances (which apparently have errors - They don't)

Unlike in the previous example there is error!=0 on more than one distance (so you can't pinpoint one and say that one must be wrong)

If this was data I'd gathered I'd start by checking the distance that is out by the most. If the problem is a typo then the largest error is usually on the incorrect value and a typo in one distance can cause errors in other distances if that distance is one of the three used for trilateration so I wouldn't immediately assume this wasn't a typo.

In terms of the webservice I'd keep the distances but not accept the submitted coordinates (or generate any itself if it were using my algorithm). I would have higher confidence in a subsequent submission that generates coordinates that are consistent with these distances than the subsequent submission alone. So in terms of what I suggested about a webservice before, this system would be in the unverified list (or preferably the unattached distance list if that was implemented) and whoever was doing the verifying could get the additional distances necessary to get a good fix (it often only takes one). That's pretty much how I've handled it in beta 2. Or they could just leave it on the unverified list until another submission is made. For my own frontend, at least initially, I wouldn't allow a set of distances with errors to be submitted. But of course the webservice can't rely on that.

Multiple ship ownership could make getting a couple of extra distances in this sort of case very easy. Just park a few sidewinders at well spread reference systems and switch between them to get the distances you need. That's assuming the costs of owning and switching between multiple ships is low enough and that switching takes you to the stored ship and not the other way around.

JesusFreke · Oct 26, 2014

TornSoul said:
A challenge (I hope)

5 distances to known systems are submitted.
Unfortunately one of the distances has a typo (is wrong)

The challenge:
Can you determine p0
Can you determine which is the wrong distance

Here's what I get

p5 is wrong

p0 coord is -113.250000, 16.781250, -28.343750

KevinMillican · Oct 26, 2014

Biteketkergetek said:
I find JSON a better data format than XML because it can represent both arrays and maps easily and explicitly. I don't think it is structured like a programming language, perhaps this is just your personal opinion. XML is unnecessarily complex, and easy to mess up (should the data be an attribute or inside a tag?). There are valid reasons why people hate it. Where did you get the information that implementations vary widely? I didn't have the slightest problem dealing with it in the languages I have tried so far.

I wonder how and for what purpose you would use the data for if you thought even Excel is even better. It would be terrible for developers using and updating the data possibly from multiple sources through modern version control systems.

I did not mean that I thought Excel was better; I just think, as with CSV or other delimited files, it's more universal than JSON or XML. Personally, JSON is not much of an issue for me, as I am a programmer, and have been writing conversion routines for all sorts of formats for about 35 years now.

My viewpoint is more user-orientated; not everyone who wants to access (and process) data will necessarily have the tools to use this format, and I think it's good practice to aim for the simplest structure that accomplishes the task.

Kicks · Oct 26, 2014

KevinMillican said:
I did not mean that I thought Excel was better; I just think, as with CSV or other delimited files, it's more universal than JSON or XML. Personally, JSON is not much of an issue for me, as I am a programmer, and have been writing conversion routines for all sorts of formats for about 35 years now.

My viewpoint is more user-orientated; not everyone who wants to access (and process) data will necessarily have the tools to use this format, and I think it's good practice to aim for the simplest structure that accomplishes the task.

To say that CSV is more universal than JSON is somewhat short-sighted. CSV is good for flat-file type operations, whereas JSON/XML can include a lot more hierarchal information in a far-easier to consume form (given that JSON parsers are available for any language that you'd be serious in using). Given that systems have multiple stations in (some have black markets, some do not), CSV quickly becomes irrelevant at this juncture as a JSON file can not only easily list the name and co-ordinates, but also stations, and other points of interest.

What we, as a community, could do - is settle on what that format is. I'm not a big XML fan, but modern webservices can return the data in a variety of formats based on what you place in the "accept" header. It's trivial to serialize data in JSON or XML.

Snuble · Oct 26, 2014

Quick and dirty as promised, without any attempt at formating, http://travel-eliteadvisor.rhcloud.com/index.php

I think this offers a structured and efficient way to traverse the current universe. It also opens itself for basic sharing of the space between participants. Also it can be a good way to visit all current systems. Start at bottom slice, snake your way down the table, then go up one slice and continue back.

Basically, I'll open a new thread where you "tell where you go", and then post a short after action report.
Keep the tools you need close at hand. A router planner to save fuel. Pen and paper to note down unknown systems. Make sure you get all the new systems before you leave the box. Be systematic.

There will be some delays in the system. For the boxes to update, I'm relying on other to share their data, and then generate the new set of boxes based on that data. The only other source of new data will be my own exploring.