Discussion What is the most efficient way to crowdsource the 3D system coordinates

TornSoul · Dec 1, 2014

I don't have soft quota caps.

There's some hard ones, I think mainly my router is the bottle neck if you submit at ton at the same time (and don't wait for each to complete) as JesusFreke found out

My router is starting to get a bit ancient... I really do need an excuse for getting a new one

It starts to get a bit slow around 200 concurrent (open/half-open) connections.

Apart from that - What do you mean by "remap"?

TGC automatically updates coordinates of systems if just a single new distance is added to that system (as part of a submission to something else) - As such, all systems for which it's possible to calculate coords, with the distances in the DB, has them.

If you wan't any "newly calculated" coords simply use to date filter to pull any changes.

-------

EDIT: Hah, as I was writing this post I got a mail enlightening me to something I don't do.

"Cascade effect":
"i.e. adding some distance gives enough information for the coordinate for system1 to calculated. And then knowing the coordinates of system1 gives enough information for some other system that had a ref to system1.. etc."

I don't currently do that, but probably ought to. Maybe that is what you meant?

filth · Dec 1, 2014

TornSoul said:
Apart from that - What do you mean by "remap"?

My client collects the systems commanders visit and sends them to my backend. When the backend receives a new system, i dont have coordinates yet for, it will make a query to your api in order to find out if system is known.
If your api doesnt have coordinates as well, the system will be stored without any.

Another job will start once a day, collect all systems without coordinate data from the db and query your api if there are coordinates. This is done in order to get a new status, if coordinates have been entered in the meantime.

TornSoul · Dec 1, 2014

filth said:
Another job will start once a day, collect all systems without coordinate data from the db and query your api if there are coordinates. This is done in order to get a new status, if coordinates have been entered in the meantime.

Uh that's a very very bad idea.

Right now it might be manageable - But consider in 2-3-4 years time - That would end up ALOT of requests.

Instead - Turn the problem around:

Instead of asking for each system if it has changed (coords has been found since last), simply ask for any/all systems that has changed since you last asked, and then use that list to update you own systems.

That's just one request - and you get all systems that have changed in one go (and you don't have to needlessly pull the tens of thousand (in a few years time) that have no change)

That's exactly what the date filter is intended for.

---------

As a matter of fact let me suggest some general changes to the workflow of your app:

- Once, and only once, pull all system data from the TGC - Store that locally for your app.
- Once an hour (every 5 hours, once a day whatever) request all changes since you last asked (via the datefilter).
Update your local list with the changelist you got back.

This means that you *always* have an updated list.
(well it could be one hour out of sync - But the data seriously doesn't change that often, so mostly it will be 100% up to date)

This also means that there is no need to query TGC each and every time one of your commanders submit a new system, for which you don't have coordinates.
Because if you don't have the coordinates, neither does TGC (as you just updated at most an hour ago)

So your app will me more responsive (as you don't have to query TGC each and every time), and TGC don't end up getting congested if 10K commanders use your app

Happiness all around

filth · Dec 1, 2014

Yepp this way or just pull the coordinates for revisited systems. Ill fix that

Snuble · Dec 1, 2014

I would prefer a human-readable station format. Storage-wise it would not matter as its 4 or 5 bools. Streaming data would take a bit more bandwidth, but again not much to talk about.

For economies it makes sense to have it encoded somehow. Question is, what happens if FD throws a new economy our way? For that reason it might be better to use an array for the economies. Note that each station also most often have a combination of economies.

Also Allegiance might be very important information to have for each station, maybe even more so then for the system.

TornSoul · Dec 1, 2014

Snuble said:
For economies it makes sense to have it encoded somehow. Question is, what happens if FD throws a new economy our way? For that reason it might be better to use an array for the economies. Note that each station also most often have a combination of economies.

The possibility of more than one economy is the main reason for the bitmask in the first place...

That's exactly why a bitmask is better than a field for each.
(In fact each position in the bitmask *is* a field)

With a bitmask, it doesn't matter if FD suddenly decides if a station can have say 4 economis - It will fit in the bitmask as well. And Zero DB or code change will be needed!

And if they add another economy - That will simply mean that the bitmask can end up as a larger value. It simply gets tagged on at the end. (Unless we hit 64+ economies we are good)
Again zero changes needed.

Without a bitmask any of the above would mean every DB table/view/stored procedure and every code class would have to be updated with a new economy field. Huge annoyance.

Snuble said:
Also Allegiance might be very important information to have for each station, maybe even more so then for the system.

It's there, it's "inherited" from the faction controlling the station.

And there's another reason for doing it like that - What if FD decides that stations can change hands?
With my scheme, the only update needed is that of the faction name for the station - and presto, everything else is updated.

If that data was stored directly as station properties, it would mean every single property would have to be updated.

And furthermore - What if FD decides to fluctuate that "influence" property of factions, so that another faction is suddenly the controlling one (which could say change the government of system)
With my scheme, simply change one property of a system (controlling faction) and you're done - everything else falls into place by itself.

TornSoul · Dec 1, 2014

Bitmasks are such wonderful useful things (given the correct situation)

I've seen the following example:
A system with two stations.
Station 1: Has economy "eco1"
Station 2: Had economy "eco1 and eco2"

The system as such then has economy "eco1 and eco2"

If I supplied the economies of the station as arrays - You would have to merge those two arrays, and remove duplicates (eco1) to show the system economy.
(and keep in mind there could be many more stations, so you'd have a lot of duplicates)

With a bitmask - You don't care, you just "or" together all the bitmasks for each station and you have the result. Duplicates are taken care of automatically.

eco1 = 1 (01)
eco2 = 2 (10)

System economy =
1 | 2 = 3 = eco1 and eco2 ("|" = bitwise "or")
or in binary
01 | 10 = 11 = eco1 and eco2

It's perfect for this situation where we have a fixed set of options - and where we need to "merge" them to show the overall system economy.

And the same goes for facilities - when tools wants to show all facilities available in a system.

codersparks · Dec 1, 2014

TornSoul said:
Bitmasks are such wonderful useful things (given the correct situation)

I've seen the following example:
A system with two stations.
Station 1: Has economy "eco1"
Station 2: Had economy "eco1 and eco2"

The system as such then has economy "eco1 and eco2"

If I supplied the economies of the station as arrays - You would have to merge those two arrays, and remove duplicates (eco1) to show the system economy.
(and keep in mind there could be many more stations, so you'd have a lot of duplicates)

What you have described is set addition (also known as Union of two sets), in your example:

station1 + station2 = [eco1] + [eco1, eco2] = [eco1, eco2]

or a more complicated example:

stationA + stationB + stationC + stationD = [eco1] + [eco2, eco3] + [eco1, eco2,eco4] + [eco3] = [eco1, eco2, eco3, eco4]

now I don't proclaim to have checked all systems, but from those that I have seen the system is the combination of all the unique values for the stations, perfect for set manipulation (and in modern languages very fast)

As a station can only have a maximum of one instance of each economy in its list (i.e, [eco1,eco2,eco2] is invalid) the array can be treated as a set

CMDRKNac · Dec 2, 2014

Hi guys, I've deployed the first version of my webapp, is mostly backend stuff but it includes an useful out of game route planner which will return a list of the route. Main forum thread here: https://forums.frontier.co.uk/showthread.php?t=68403 and website is:

http://www.elitedangerouscentral.com/

I plan to add much more stuff. The database is synched with EDSC (I will have a script to query for new additions hourly/daily tomorrow, I populated the local db manually today). As for the new format, I don't really care, if the output it's a bitmask it can be easily processed by each application. But strictly from my personal pov, I don't see any benefit to this (maybe keeping the data less verbose, but also less readable), but that's a very personal thing depending on the tools each dev is using and the architecture of their app and db. Actually for me it would be easier to parse the json output as single fields (or arrays/lists for efficiency) and if they are removed/added is not a problem at all as I'm using a NoSQL DB and all the manipulation is done with OOP language so the usage of fields and data structures like lists is more natural and manipulating the db structure is not a big deal (I understand though than in the case of relational db's that approach would be more comfortable).

It would take more time to parse the bitmask and output something I can use locally, but not really that much. Performance, imo, is not an issue unless we are expecting thousands of concurrent users (but then I would be more worried about server costs more than the publication itself lol!).

But in the end I think you should implement what is better for you (as long as it's not more complicated that it need be) and what you think it's more efficient. Bitmasks would keep the data cleaner that's for sure.

Biteketkergetek · Dec 2, 2014

TornSoul said:
Bitmasks are such wonderful useful things (given the correct situation)

I've seen the following example:
A system with two stations.
Station 1: Has economy "eco1"
Station 2: Had economy "eco1 and eco2"

The system as such then has economy "eco1 and eco2"

If I supplied the economies of the station as arrays - You would have to merge those two arrays, and remove duplicates (eco1) to show the system economy.
(and keep in mind there could be many more stations, so you'd have a lot of duplicates)

With a bitmask - You don't care, you just "or" together all the bitmasks for each station and you have the result. Duplicates are taken care of automatically.

eco1 = 1 (01)
eco2 = 2 (10)

System economy =
1 | 2 = 3 = eco1 and eco2 ("|" = bitwise "or")
or in binary
01 | 10 = 11 = eco1 and eco2

It's perfect for this situation where we have a fixed set of options - and where we need to "merge" them to show the overall system economy.

And the same goes for facilities - when tools wants to show all facilities available in a system.

I like the bitwise idea a lot. I think it's great for the backend or sqlite/mysql/postgresql storage. For JSON I'm not yet convinced this would be ideal, unless there is an API to query what does any bit mean.

Unfortunately the order of the economies may be relevant, and this can't be represented with a set. I believe I've seen systems with (refinery, extraction), with ones having extraction as primary having all minerals in stock, and ones having refinery as primary economy, having mostly refined metals in stock instead.

RedWizzard · Dec 2, 2014

TornSoul said:
The possibility of more than one economy is the main reason for the bitmask in the first place...

That's exactly why a bitmask is better than a field for each.
(In fact each position in the bitmask *is* a field)

With a bitmask, it doesn't matter if FD suddenly decides if a station can have say 4 economis - It will fit in the bitmask as well. And Zero DB or code change will be needed!

And if they add another economy - That will simply mean that the bitmask can end up as a larger value. It simply gets tagged on at the end. (Unless we hit 64+ economies we are good)
Again zero changes needed.

Without a bitmask any of the above would mean every DB table/view/stored procedure and every code class would have to be updated with a new economy field. Huge annoyance.

My personal preference is either a comma separated string or an array of strings. Aside from readability, bitmasks have another problem: they require a definition outside the data. Every tool needs to know which bit represents which value. If a new economy type is added then every tool needs to be updated. With a string or array of strings many tools would not need to be updated at all - they can simply output whatever is in the field. Another minor issue is that bitmasks are not well suited when you need special case values. E.g. what goes in the bitmask to represent "unknown" vs "none"?

I don't see why any db structure would need changing if economies were stored as a string or array of strings (subtable in the db). I also don't see how you can avoid code changes when the definition of the bitmask changes as the definition is generally hard coded (while there are certainly cases where you could avoid code changes with a string or array of strings).

- - - - - Additional Content Posted / Auto Merge - - - - -

Biteketkergetek said:
I like the bitwise idea a lot. I think it's great for the backend or sqlite/mysql/postgresql storage. For JSON I'm not yet convinced this would be ideal, unless there is an API to query what does any bit mean.

Unfortunately the order of the economies may be relevant (I haven't checked), and this can't be represented with a set.

Right. Bitmasks may well be a good internal representation. They're not a good data transfer representation.

Yes, order could be important: I've definitely seen cases where the system has economy "X, Y", but a station has economy "Y, X". This doesn't matter for facilities though.

Biteketkergetek · Dec 2, 2014

RedWizzard said:
Right. Bitmasks may well be a good internal representation. They're not a good data transfer representation.

Yes, order could be important: I've definitely seen cases where the system has economy "X, Y", but a station has economy "Y, X". This doesn't matter for facilities though.

Well the bitmask is good only if the order is not relevant. Perhaps I stick with the string list bot in the db and json after all.

RedWizzard · Dec 2, 2014

Finwen said:
Name Distance Error
HIP 1573 333.9 0.01
Sol 383.31 0.00
Shotel 488.84 0.00
Ottia 313.67 0.00
34 Pegasi 379.53 0.00

Calculated

Coordinates: (-81.78125, -149.4375, -343.375)
Better than next candidate(s) by 1 distances
Inconsistent distances: 1

I guess the distance error made the problem for me?

It's saying that the distance to HIP 1573 should be either 333.91 or 333.89 (333.9 +/- 0.01). Manually calculating the distance it looks like it should be 333.91. You should check it in the galaxy map.

Personally my standard for considering the data good enough to submit is: minimum 5 distances, no errors, and the calculated coordinates should be at least 2 distances better than the next candidate(s). When I set it up to submit to TGC directly I'll be requiring this standard (hopefully get that done today).

Snuble · Dec 2, 2014

Just added LP 211-22 to EDSC, but it fails to calculate coordinates. Submited a fair number of distances distributed around the system. One thing I notice is that the distance in Nav panel can vary by 0.01 compared to distance in Galaxy Map. Ex, G116-72 is listed with 7.19 in nav panel, but 7.18 in Galaxy map.

RedWizzard · Dec 2, 2014

Snuble said:
Just added LP 211-22 to EDSC, but it fails to calculate coordinates. Submited a fair number of distances distributed around the system. One thing I notice is that the distance in Nav panel can vary by 0.01 compared to distance in Galaxy Map. Ex, G116-72 is listed with 7.19 in nav panel, but 7.18 in Galaxy map.

What are the distances you've got? I'll try my algorithm...

filth · Dec 2, 2014

CMDRKNac said:
Hi guys, I've deployed the first version of my webapp, is mostly backend stuff but it includes an useful out of game route planner which will return a list of the route. Main forum thread here: https://forums.frontier.co.uk/showthread.php?t=68403 and website is:

http://www.elitedangerouscentral.com/

I plan to add much more stuff. The database is synched with EDSC (I will have a script to query for new additions hourly/daily tomorrow, I populated the local db manually today). As for the new format, I don't really care, if the output it's a bitmask it can be easily processed by each application. But strictly from my personal pov, I don't see any benefit to this (maybe keeping the data less verbose, but also less readable), but that's a very personal thing depending on the tools each dev is using and the architecture of their app and db. Actually for me it would be easier to parse the json output as single fields (or arrays/lists for efficiency) and if they are removed/added is not a problem at all as I'm using a NoSQL DB and all the manipulation is done with OOP language so the usage of fields and data structures like lists is more natural and manipulating the db structure is not a big deal (I understand though than in the case of relational db's that approach would be more comfortable).

It would take more time to parse the bitmask and output something I can use locally, but not really that much. Performance, imo, is not an issue unless we are expecting thousands of concurrent users (but then I would be more worried about server costs more than the publication itself lol!).

But in the end I think you should implement what is better for you (as long as it's not more complicated that it need be) and what you think it's more efficient. Bitmasks would keep the data cleaner that's for sure.

Hehe its the same style template i am using

TornSoul · Dec 2, 2014

Biteketkergetek said:
Unfortunately the order of the economies may be relevant, and this can't be represented with a set. I believe I've seen systems with (refinery, extraction), with ones having extraction as primary having all minerals in stock, and ones having refinery as primary economy, having mostly refined metals in stock instead.

RedWizzard said:
Yes, order could be important: I've definitely seen cases where the system has economy "X, Y", but a station has economy "Y, X". This doesn't matter for facilities though.

That's exactly one of those "red flags" I was fishing for.
If order is important a bitmask is not suitable.

I've never even considered that to be an issue - and would have to code for it regardless the data model used, if true (would need to keep track of the order)

I would really like to have this backed up with some concrete examples though, rather than anecdotes (no offense intended).
Seeing is believing I suppose

The data structure aside it just seems odd to me, so I'd like to be convinced otherwise.
I realize this might not be easy to dig up... But I'd really like to see it.

RedWizzard said:
Another minor issue is that bitmasks are not well suited when you need special case values. E.g. what goes in the bitmask to represent "unknown" vs "none"?

Yup - That's why I kept "black-market" out of the facilities list.
The "standard facilities" can all be updated in one go (all info is available), while black-market info might be in limbo for who knows how long.

RedWizzard said:
I don't see why any db structure would need changing if economies were stored as a string or array of strings (subtable in the db).

True if an array is used.
Not true if individual fields where used (which is what is being done in existing formats that I've seen)

So if bitmask is dropped, array would definitely be the way to go.

Snuble · Dec 2, 2014

RedWizzard said:
What are the distances you've got? I'll try my algorithm...

From EDSC

Code:

name: "LP 211-22",
      refs: [
        {
          name: "16 c Ursae Majoris",
          dist: 24.7
        },
        {
          name: "LHS 266",
          dist: 17.12
        },
        {
          name: "LP 167-71",
          dist: 17.49
        },
        {
          name: "Maya",
          dist: 16.35
        },
        {
          name: "Regulus",
          dist: 39.44
        },
        {
          name: "Sol",
          dist: 69.4
        },
        {
          name: "Aulis",
          dist: 39.83
        },
        {
          name: "G 146-5",
          dist: 9.42
        },
        {
          name: "Olwain",
          dist: 43.25
        },
        {
          name: "Tukualang",
          dist: 22.07
        },
        {
          name: "NLTT 21440",
          dist: 5.96
        },
        {
          name: "G 116-72",
          dist: 7.19
        },
        {
          name: "Chandra",
          dist: 15.06
        },
        {
          name: "LP 211-12",
          dist: 9.19
        },
        {
          name: "BD+40 2208",
          dist: 5.43
        }
      ]
    }

TornSoul · Dec 2, 2014

Snuble said:
One thing I notice is that the distance in Nav panel can vary by 0.01 compared to distance in Galaxy Map. Ex, G116-72 is listed with 7.19 in nav panel, but 7.18 in Galaxy map.

Yikes - That sucks really really bad.

You should definitely ticket that.

Are they perhaps using different datatypes (single vs float) when calculating the distances in the two different views?

Having inconsistent distances is just bad all around.

TornSoul · Dec 2, 2014

RedWizzard said:
Personally my standard for considering the data good enough to submit is: minimum 5 distances, no errors, and the calculated coordinates should be at least 2 distances better than the next candidate(s). When I set it up to submit to TGC directly I'll be requiring this standard (hopefully get that done today).

"5 distances, no errors"
Yeah I thought I remembered you where less stringent than I am. As I require all distances to have no error.
And I definitely need to change that (as I've pointed out a few times already)
As one erroneous distance value in the TGC DB and it won't be able to calc the coordinates - Which is obviously bad.

"2 distances better"
I don't understand what this means?

What are you comparing?