Discussion What is the most efficient way to crowdsource the 3D system coordinates

Biteketkergetek · Dec 2, 2014

TornSoul said:
That's exactly one of those "red flags" I was fishing for.
If order is important a bitmask is not suitable.

I've never even considered that to be an issue - and would have to code for it regardless the data model used, if true (would need to keep track of the order)

I would really like to have this backed up with some concrete examples though, rather than anecdotes (no offense intended).
Seeing is believing I suppose

The data structure aside it just seems odd to me, so I'd like to be convinced otherwise.
I realize this might not be easy to dig up... But I'd really like to see it.

It's not a coding issue if you keep both station and system economies in the db.

What is more important, storing data as it is provided (even if order turns out to be irrelevant) or being smart and not being able to store and lose potentially relevant information?

Biteketkergetek · Dec 2, 2014

TornSoul said:
True if an array is used.
Not true if individual fields where used (which is what is being done in existing formats that I've seen)

So if bitmask is dropped, array would definitely be the way to go.

I don't understand this part of your message. Are you talking about the EDSC db, or the API?

I need to update the my DB structure when I add a field to JSON, but it would be possible to store the JSON as a whole or just part of it in the DB if querying those fields in SELECT statements is not necessary.

JesusFreke · Dec 2, 2014

TornSoul said:
"5 distances, no errors"
Yeah I thought I remembered you where less stringent than I am. As I require all distances to have no error.
And I definitely need to change that (as I've pointed out a few times already)
As one erroneous distance value in the TGC DB and it won't be able to calc the coordinates - Which is obviously bad.

"2 distances better"
I don't understand what this means?

What are you comparing?

I think he means he has to have at least 5 distances without an error (but e.g. there could be a 6th that had an error).

With regards to the "2 distances better" thing, that's the redundancy thing we had talked about earlier in the thread. Basically, every grid point near the correct point (for some definition of "near") should have at least 2 distances that invalidate it. Or, if one of the distances is wrong, then you'll end up with the correct point having 1 error, and then all the other points nearby should have at least 3 errors.

I use the same redundancy threshold in my script/tools before I submit the distances for a system to edsc. And even after I initially submit a system with redundancy 1, I'll keep trying to collect distances for it until it has a redundancy of 2 (i.e. at least 3 distances that invalidate every point).

TornSoul · Dec 2, 2014

JesusFreke said:
With regards to the "2 distances better" thing, that's the redundancy thing we had talked about earlier in the thread. Basically, every grid point near the correct point (for some definition of "near") should have at least 2 distances that invalidate it. Or, if one of the distances is wrong, then you'll end up with the correct point having 1 error, and then all the other points nearby should have at least 3 errors.

Dime just dropped - Thank you.

TornSoul · Dec 2, 2014

Biteketkergetek said:
What is more important, storing data as it is provided (even if order turns out to be irrelevant) or being smart and not being able to store and lose potentially relevant information?

Very good point.
I think I'm convinced - To go for a structure that records the order of economies - ie. array.

And if going with array for economy, it would be consistent to do so for facilities as well (just without the order bit)

I would still keep black market separate.

Biteketkergetek said:
I don't understand this part of your message. Are you talking about the EDSC db, or the API?

I need to update the my DB structure when I add a field to JSON, but it would be possible to store the JSON as a whole or just part of it in the DB if querying those fields in SELECT statements is not necessary.

You lost me here I'm afraid

Biteketkergetek · Dec 2, 2014

TornSoul said:
You lost me here I'm afraid

I meant the structure of the DB schema does not have to follow the API closely.

This is why I asked if we're discussing the DB structure or the API itself.

The same JSON API could be done with any of these table schema variants:

Version with details as fields, schema needs to be updated as fields are changed:

Code:

CREATE TABLE Station
 (
   station_id INTEGER PRIMARY KEY,
   name VARCHAR(40) COLLATE nocase,
   system_id INTEGER NOT NULL,
   ls_from_star DOUBLE,

   -- details start
   allegiance VARCHAR(40),
   government VARCHAR(40),
   economy VARCHAR(40),

    // NULL → unknown
   commodity_market BOOL,
   black_market BOOL,
   shipyard BOOL,
   outfitting BOOL,
   repair BOOL,
   rearm BOOL,
   -- details end

   valid_id INTEGER NOT NULL,

   UNIQUE (system_id, name),

   FOREIGN KEY (system_id) REFERENCES System(system_id)
   	ON UPDATE CASCADE
   	ON DELETE CASCADE,
   FOREIGN KEY (valid_id) REFERENCES Valid(valid_id)
   	ON UPDATE CASCADE
   	ON DELETE CASCADE
 );

Version with details as a json encoded object:

Code:

CREATE TABLE Station
 (
   station_id INTEGER PRIMARY KEY,
   name VARCHAR(40) COLLATE nocase,
   system_id INTEGER NOT NULL,
   ls_from_star DOUBLE,

   -- json details
   detail BLOB,

   valid_id INTEGER NOT NULL,

   UNIQUE (system_id, name),

   FOREIGN KEY (system_id) REFERENCES System(system_id)
   	ON UPDATE CASCADE
   	ON DELETE CASCADE,
   FOREIGN KEY (valid_id) REFERENCES Valid(valid_id)
   	ON UPDATE CASCADE
   	ON DELETE CASCADE
 );

CMDR Generic EventHandler · Dec 2, 2014

Bad idea to store a json blob like that. What if you wanted to find all the stations that are linked to a certain faction? Or you wanted to find the closest repair station to where you are now. You would have to download all the json blobs and parse them outside of the database. Instead of doing a simple sql query.

CMDRKNac · Dec 2, 2014

CMDR Generic EventHandler said:
Bad idea to store a json blob like that. What if you wanted to find all the stations that are linked to a certain faction? Or you wanted to find the closest repair station to where you are now. You would have to download all the json blobs and parse them outside of the database. Instead of doing a simple sql query.

This is yet the best argument against bitmasks IMO, good point. Being able to do queries effectively is probably more important than not cluttering the JSON output and reduce verbosity. And as Biteketkergetek is saying, are we talking about the API or the database schema you will be using? I guess is the first, in that case is more important to have good ways to do queries than reducing the number of fields or whatever.

TornSoul · Dec 2, 2014

Biteketkergetek said:
I meant the structure of the DB schema does not have to follow the API closely.

This is why I asked if we're discussing the DB structure or the API itself.

Gotcha - And I was referring to the API/JSON

CMDRKNac said:
This is yet the best argument against bitmasks IMO, good point. Being able to do queries effectively is probably more important than not cluttering the JSON output and reduce verbosity.<snip> more important to have good ways to do queries than reducing the number of fields or whatever.

Purely academically (and not related to what to use for the API) :

Bitmasks are actually very very fast, in fact as efficient as you can get (for the specific scenarios where bitmasks make sense).

Compare
- if (value==x or value==y or value==z) then {something}
with
- if (bitmask && 14) then {something} (you'd have a mask variable instead of a hardcoded 14 ofc)

Nevermind the specific syntax which will differ from language to language, but the structure above will be the same.

With a bitmask you have *one* comparison (and a bitwise one at that - which doesn't get faster), while with the other you have 3 different comparisons.

Biteketkergetek · Dec 2, 2014

CMDR Generic EventHandler said:
Bad idea to store a json blob like that. What if you wanted to find all the stations that are linked to a certain faction? Or you wanted to find the closest repair station to where you are now. You would have to download all the json blobs and parse them outside of the database. Instead of doing a simple sql query.

I'm talking about the EDSC DB that is used to coordinate the mapping effort. It won't work for all cases, certainly works for my map. But it is an implementation detail, 99% my code doesn't care about it, and can be changed any time.

Consider it a workaround until after release, then it won't be a moving target anymore. The structure can be figured out, and a proper table may be set up. That's the plan for my own project.

CMDRKNac · Dec 2, 2014

TornSoul said:
With a bitmask you have *one* comparison (and a bitwise one at that - which doesn't get faster), while with the other you have 3 different comparisons.

Well, that's right, but that's why I said 'effectively' and not 'efficiently'

Computationally on a local level, it would be cheaper to use bitmasks. But for doing db queries...

The client would have to first build a custom query locally for comparison (mapping each binary value to the criteria wanted for that value), in most cases we are talking that it will take writing several lines of code compared to a couple lines of code it would take just to send a simple query that meet a criteria (if bitsmask '001101' == true then ... get data).

Other way would be to request the values from every candidate possible (for example, every 'station' that meets a secondary criteria) then process locally, then send a request for more data, but this seems even worse.

In both cases the number of queries you would do over a more 'normal' query are the same or more, at the expense of added complexity when constructing the queries, and gaining some marginal efficiency (not sure how much, because I'm not a db internals specialist, but I'm pretty sure the order of magnitude of that gain is quite low) on the internal database processing. Not sure the trade over is worth it IMO, so yes, is more efficient, but I don't think it's effective

But this is always the case with higher abstraction vs. lower abstraction (bitmasks in this case), an argument of efficiency vs. effectiveness.

RedWizzard · Dec 2, 2014

TornSoul said:
That's exactly one of those "red flags" I was fishing for.
If order is important a bitmask is not suitable.

I've never even considered that to be an issue - and would have to code for it regardless the data model used, if true (would need to keep track of the order)

I would really like to have this backed up with some concrete examples though, rather than anecdotes (no offense intended).
Seeing is believing I suppose

The data structure aside it just seems odd to me, so I'd like to be convinced otherwise.
I realize this might not be easy to dig up... But I'd really like to see it.

Took a while to find a single system with different orders but I found plenty of another anomalies on the way.
The vast majority of systems I checked have consistent order: Extraction before Refinery before Industrial before High Tech. Other types never seem to be paired (Service, Agriculture, Tourism). Interestingly the order shown for the system is the reverse of the order shown for the stations (e.g. a "Extraction, Refinery" system will have "Refinery, Extraction" stations).

The anomalies I found are:
Kuk: Eudoxus Dock is "Industrial, Refinery", all other stations in that system are "Refinery, Industrial".
LHS 3836 has three "Industrial, Refinery" stations while the vast majority of other stations in other systems with those two types are "Refinery, Industrial".
LHS 449 and LHS 3531 have "Refinery, Extraction" stations while the vast majority of other stations in other systems with those two types are "Extraction, Refinery".
There are systems which have a system wide economy type that never appears on a station, e.g. Alpha Centauri, Leesti, Zaonce.
There are stations with "Extraction, Extraction", e.g. in Luyten 347-14, and in La Rochelle. I think this could be a bug, though there are also stations which are just "Extraction" (e.g. in Beta Hydri).

I think we should be storing data that allows what's seen in ED to be exactly reproduced. So I think an array would be best and we should allow the duplicate "Extraction, Extraction" economy types. We should also store the system economies (not just combine the stations' economies).

TornSoul said:
Not true if individual fields where used (which is what is being done in existing formats that I've seen)

So if bitmask is dropped, array would definitely be the way to go.

Yes, individual fields would be a fragile implementation in terms of later changes. I wouldn't recommend that.

Edit: (question removed as you've already answered it)

RedWizzard · Dec 2, 2014

Snuble said:

From EDSC

Code:

name: "LP 211-22",
      refs: [
        {
          name: "16 c Ursae Majoris",
          dist: 24.7
        },
        {
          name: "LHS 266",
          dist: 17.12
        },
        {
          name: "LP 167-71",
          dist: 17.49
        },
        {
          name: "Maya",
          dist: 16.35
        },
        {
          name: "Regulus",
          dist: 39.44
        },
        {
          name: "Sol",
          dist: 69.4
        },
        {
          name: "Aulis",
          dist: 39.83
        },
        {
          name: "G 146-5",
          dist: 9.42
        },
        {
          name: "Olwain",
          dist: 43.25
        },
        {
          name: "Tukualang",
          dist: 22.07
        },
        {
          name: "NLTT 21440",
          dist: 5.96
        },
        {
          name: "G 116-72",
          dist: 7.19
        },
        {
          name: "Chandra",
          dist: 15.06
        },
        {
          name: "LP 211-12",
          dist: 9.19
        },
        {
          name: "BD+40 2208",
          dist: 5.43
        }
      ]
    }

I get (1.09375, 51.59375, -46.40625). That location satisfies 7 more distances than any other location I checked. Two of those distances are apparently incorrect:
G 146-5 should be 9.41
LP 211-12 should be 9.18
Be good if someone could check these in the nav panel and galaxy map. If the nav panel is unreliable that's very annoying.

RedWizzard · Dec 2, 2014

JesusFreke said:
I think he means he has to have at least 5 distances without an error (but e.g. there could be a 6th that had an error).

Yes. I think if there are bad distances but I have good confidence in the location (based on at least 5 good distances) I would allow the system to be submitted (with a warning and confirmation) but I wouldn't submit the bad distances.

JesusFreke said:
With regards to the "2 distances better" thing, that's the redundancy thing we had talked about earlier in the thread. Basically, every grid point near the correct point (for some definition of "near") should have at least 2 distances that invalidate it. Or, if one of the distances is wrong, then you'll end up with the correct point having 1 error, and then all the other points nearby should have at least 3 errors.

Yup, that's it. "Near" is currently a cube of 64 grid points (so side length 4) around the coordinates generated by trilateration. It's a pretty small volume for performance reasons. I am checking that volume around each pair of candidates generated by trilateration of each combination of three distances though so it's still very thorough.

donpost · Dec 2, 2014

Hi Guys, is there anything your Average Joe like me can be doing to help collect distance data?

Biteketkergetek · Dec 2, 2014

donpost said:
Hi Guys, is there anything your Average Joe like me can be doing to help collect distance data?

See OP. Use http://edstarcoordinator.com/default.html to enter distance data.

You could try my map as well to enter station and other data. It is not linked to EDSC (because I don't yet know how to do it right, and need the data in my DB), but the data is regularly posted to bitbucket.

Biteketkergetek · Dec 2, 2014

How about using ZeroMQ to publish/subscribe for updates?

EDSC would have a fixed subscriber for other projects like mine to push updates to, and a publisher that would publish any changes to the DB.

TornSoul · Dec 2, 2014

RedWizzard said:
Took a while to find a single system with different orders but I found plenty of another anomalies on the way.
The vast majority of systems I checked have consistent order: Extraction before Refinery before Industrial before High Tech. Other types never seem to be paired (Service, Agriculture, Tourism). Interestingly the order shown for the system is the reverse of the order shown for the stations (e.g. a "Extraction, Refinery" system will have "Refinery, Extraction" stations).

The anomalies I found are:
Kuk: Eudoxus Dock is "Industrial, Refinery", all other stations in that system are "Refinery, Industrial".
LHS 3836 has three "Industrial, Refinery" stations while the vast majority of other stations in other systems with those two types are "Refinery, Industrial".
LHS 449 and LHS 3531 have "Refinery, Extraction" stations while the vast majority of other stations in other systems with those two types are "Extraction, Refinery".
There are systems which have a system wide economy type that never appears on a station, e.g. Alpha Centauri, Leesti, Zaonce.
There are stations with "Extraction, Extraction", e.g. in Luyten 347-14, and in La Rochelle. I think this could be a bug, though there are also stations which are just "Extraction" (e.g. in Beta Hydri).

Just wow.... g good work there rw.

That definitely blows a couple of my assumptions out of the water.

This one in particular annoys me
"There are systems which have a system wide economy type that never appears on a station, e.g. Alpha Centauri, Leesti, Zaonce."

RedWizzard said:
I think we should be storing data that allows what's seen in ED to be exactly reproduced. So I think an array would be best and we should allow the duplicate "Extraction, Extraction" economy types. We should also store the system economies (not just combine the stations' economies).

I'm sadly forced to agree, given your findings above.

What a mess imo... :-(

TornSoul · Dec 2, 2014

Biteketkergetek said:
How about using ZeroMQ to publish/subscribe for updates?

EDSC would have a fixed subscriber for other projects like mine to push updates to, and a publisher that would publish any changes to the DB.

Not familiar with the tech.

So for now - No

Smacker · Dec 2, 2014

TornSoul said:
Just wow.... g good work there rw.

That definitely blows a couple of my assumptions out of the water.

This one in particular annoys me
"There are systems which have a system wide economy type that never appears on a station, e.g. Alpha Centauri, Leesti, Zaonce."

I'm sadly forced to agree, given your findings above.

What a mess imo... :-(

Presumably this is all stuff FD can fix Server side, all ticketed I hope