Discussion What is the most efficient way to crowdsource the 3D system coordinates

TunaMage · Nov 26, 2014

maddavo said:
My System.csv is at http://www.davek.com.au/td/ . Yes it directly usable in TD (I was amazed and relieved it worked first time).

Where did the systems come from? eh - our previous data - so that was RedWizzard's systems.json plus some additions I had there from my fork of ed-systems, from that I made the System.csv we were using. Probably best to explain the method I used:
1. Take MB's list, put all the systems into the right columns, add 'Gamma1' source and timestamp columns.
2. Convert the coords to SOL-centric.
3. Append previous System.csv (that was 1130 systems).
4. find duplicate star names and delete the wrong one. This accounts for systems we knew about before or systems that have moved.
5. find duplicate coordinates and delete the wrong one - EXCEPT for HIP xxx19/20 that are real. This accounts for systems that have been renamed.

That gave me 20179 systems. I'm pretty sure there are no duplicates in there.

Hi Maddavo, maybe I'm missing something here? I understand how you cleaned and de-duped the data, etc, but my question is, is it not possible that the old data from Beta3 that made it into your new list now, has out of date coordinates. Ie, those systems might have moved between B3 and G1?

wolverine2710 · Nov 26, 2014

JesusFreke said:
For those who typically use a standard set of reference stars... here are some candidates from Michael's list..

Ao
El
Ge
Ix
Ki
Nu
Ra
Su
Wi
Ys

There's another 80-some-odd stars of length 3

Wow. That should make typing in names in the GM much easier ;-)

melak47 · Nov 26, 2014

TornSoul said:
Just my own little toybox I decided to share - With test/out commented code and all
My first venture out into playing with THREE.js.

What a coincidence, I've been doing the same

(Click for interactive thingy, stole some orbit controls from somewhere)

stars ~35ly around Sol, with connections plotted for stars <= 12ly apart.
Look at the code at your own risk - it might just be terrible!

TornSoul said:
I'll check it out, and take it under consideration.
Thanks for the pointer.

That'd be awesome. I have no hosting, so I'm (ab)using dropbox - and they seem to enforce https; so I can't get in touch with your API when accessed through there.
What you see above is a static snapshot, unless your browser doesn't mind mixed protocols.

maddavo · Nov 26, 2014

TunaMage said:
Hi Maddavo, maybe I'm missing something here? I understand how you cleaned and de-duped the data, etc, but my question is, is it not possible that the old data from Beta3 that made it into your new list now, has out of date coordinates. Ie, those systems might have moved between B3 and G1?

If a system moved from B3 to G1 then after merging it should have been picked up in step 4. For those systems I checked the GM to see which was right and kept it.
Although I've posted my System.csv, I wouldn't necessarily trust it. I'd suggest it only as an untrusted wip. If anyone finds invalid data in there then I'd like to know. Now TGC is populated, please treat TGC as TOR authority.

I'm trying to pull the json from TGC so I can compare the data and work out the differences. I'm just getting my head around this API thingy and json/recordset conversions & dereferencing. I've managed to create an html table from some filtered data which was pretty cool. Unfortunately the verbose json with coords tends to put a bit of overhead on the data transfer so when I ask for the whole thing then my code is timing out. When I sort that out and get some spreadsheets in excel to compare then I'll be more at home and I'll post the results.

I pwn Excel, I am a JSON noob.

TunaMage · Nov 26, 2014

maddavo said:
If a system moved from B3 to G1 then after merging it should have been picked up in step 4. For those systems I checked the GM to see which was right and kept it.

What I mean, though, is the 346 systems you have extra to the cleaned list from MB. Presumably those 346 are nowhere at all in MB's latest list, and therefore would have had to have their coords from previous B.x data, and those coords aren't relevant in Gamma...

maddavo · Nov 26, 2014

TunaMage said:
What I mean, though, is the 346 systems you have extra to the cleaned list from MB. Presumably those 346 are nowhere at all in MB's latest list, and therefore would have had to have their coords from previous B.x data, and those coords aren't relevant in Gamma...

OK, I see what you mean. The 346 extra systems I expect are non-market systems in Gamma. We were navigating around and working out coords for systems. I don't think they're irrelevant as they are useful for navigation - travelling between market systems or mining in the non-market/non-station systems. We didn't throw out non-station systems between B1 -> B2 or B2 -> B3 unless we knew they were deleted. Once I get TGC data to compare it shouldn't be too hard to get a list of those extra systems and then use the GM to see if they still exist. At least then we wouldn't have to visit all those systems again to get distances for the coords.

TunaMage · Nov 26, 2014

Hi Maddavo, no, I didn't mean irrelevant because they didn't have market data. What I meant is that, the coordinates will no longer be applicable. From a couple of tests I did in gamma, most of the coords from B3 changed in gamma. So the concern is the 346 sytems may well be in gamma, but in your list they may have out of date Beta coords.

Snuble · Nov 26, 2014

These things are data you will be cleaning out while exploring the galaxy. If a ghost system is on your "should be there list" when you enter a system (the tool you use should ideally try to simulate the "1" panel of systems), well, mark it for deletion. I'm keeping my old data for just this reason. It might be right... Saving me to do anything but a name change, or update coords slightly.

TunaMage · Nov 26, 2014

Snuble, yep, I agree - data should never be completely thrown out. But if we are giving to third parties to use in their trading, routeplanning, etc tools, then data for gamma (and release, assuming gamma coords = release coords) should either come from MB himself or be verified with the system collection tools developed in this thread, before coords are given. Otherwise data for some systems from the old B3 pill may be incorrect.

RedWizzard · Nov 26, 2014

JesusFreke said:
Ah nice! The redundancy thing you mention sounds like a great way to go. And yeah, merging the regions would definitely help a lot. What is the volume/size of the cube that you search around each trilateration result? One random idea would be to use an approximate sphere instead of a cube, although that would really only be an incremental improvement.

I don't have any timing info for mine related to how many distances there are, but I think there would be 2 variables. As the number of distances goes up, it likely means the candidate region is smaller and more tightly bounded, which would reduce the number of locations that need to be evaluated. And on the flip side, there are O calculations required for each location based on the number of distances. So I suspect on average, mine would probably go fastest with around 5-6 distances, and be somewhat slower on either side of that.

With 4 distances per star, my script took ~4.5minutes to run over ~300 stars, so roughly 1s per star. I'll have to go grab another system or two's worth of distances to all of them and see if my theory is correct.

I can probably still tighten the search area a fair bit. It basically tries to evaluate all locations will a sum of squares error less than some value -- so by reducing that value I can reduce the volume that needs to be searched. And reducing the volume gives a cubic reduction in time, so that should help quite a bit. I just have to run some tests to make sure I don't tighten it *too* much.

I've settled on cubes with a volume of 64 grid locations. This is based on testing about 2400 random points within a cube 1000 Ly across centered on Sol. Larger candidate cubes found about 1% more systems successfully (i.e. with 10 or fewer distances), but the time taken goes up considerably. With the 64 point cube it took about a minute to find the 2400 points to the level of redundancy I wanted and used an average of 8.6 distances. This is good example of how only having the beta 3 references would have hurt (me, at least): stars in beta 3 region only require about 5.5 distances to find on average (5 distances would be the minimum for that level of redundancy).

Using a sphere would cut down the number of points to test by nearly 50% compared to cubes so it might be worth pursuing. It's just easy iterating over cubes

I'm not making any attempt to refine the candidate regions so more distances always means more points to test for me. Each combination of 3 distances creates two candidate cubes so each new distance results in a lot more cubes (actually adding the nth distance multiples the number of candidate cubes by n/(n-3) which tends towards 1 as n increases but for the number of distances we're likely to use it's typically between 1.3 and 2). But if the distances are all correct then one of the cubes from each combination should be close to identical. These are the ones that usually get merged. The other candidate cubes will typically contain points that only satisfy a maximum of 3 distances so a good optimization would be to throw out those cubes. The problem is figuring out when they are unneeded in light of the possibility of some of the distances being incorrect. This is on my list of enhancements but I haven't got it done yet. I also still plan to look at your code for determining the candidate region at some point.

Another optimisation route is to take advantage of the way distances are added incrementally. Currently I'm simply recalculating everything whenever the input data changes but for adding a distance I should be able to only fully calculate the new candidate cubes and only test the existing candidate cubes with the new distance. I think that if the new distance confirms the existing leading candidate point then I can almost skip all other work (maybe just test the second best candidates to see if the new distance increases the redundancy). The problem with that will be if subsequent distances don't confirm the leading candidate I'll need to be careful about where I'm up to (or I'll have to recalculate from scratch).

I also want to do some testing with incorrect distances thrown into the mix. The limited testing I've done manually is very encouraging - I've had a case where it was able to correctly locate a system using only 5 distances with two of them incorrect, though that was probably a lucky combination of good references and the right errors (and wouldn't satisfy my redundancy criteria). I've also got the gamma dump to sort out and hooking up to TGC so not quite sure when I'll actually get to this testing :/

RedWizzard · Nov 26, 2014

maddavo said:
OK, I see what you mean. The 346 extra systems I expect are non-market systems in Gamma. We were navigating around and working out coords for systems. I don't think they're irrelevant as they are useful for navigation - travelling between market systems or mining in the non-market/non-station systems. We didn't throw out non-station systems between B1 -> B2 or B2 -> B3 unless we knew they were deleted. Once I get TGC data to compare it shouldn't be too hard to get a list of those extra systems and then use the GM to see if they still exist. At least then we wouldn't have to visit all those systems again to get distances for the coords.

In beta 3 I identified and checked all systems we'd located that weren't on the beta 3 reference list. I'm probably going to do the same with gamma - checking ~400 systems will take a few hour I expect. At the very least I'll mark them as unverified in systems.json. It's not necessary to visit each one, I just checked the distance to each unverified system from two reference stars.

Harbinger · Nov 26, 2014

RedWizzard said:
In beta 3 I identified and checked all systems we'd located that weren't on the beta 3 reference list. I'm probably going to do the same with gamma - checking ~400 systems will take a few hour I expect. At the very least I'll mark them as unverified in systems.json. It's not necessary to visit each one, I just checked the distance to each unverified system from two reference stars.

I'll wait until your new systems.json is ready before doing any further mapping.

RedWizzard · Nov 26, 2014

Harbinger said:
I'll wait until your new systems.json is ready before doing any further mapping.

Hopefully I'll get it done tomorrow night at the latest (so approx 24 hours).

Edit: I don't mean I'll have all the unverified systems checked by then!

Biteketkergetek · Nov 26, 2014

@melak47: Nice work!

I've started making a 3D map as well for my routing chart, but decided to focus my efforts on the 2D version instead.

It still uses the old data, but I found a method to label systems based on their importance when zoomed out, in case anyone is interested.

TornSoul · Nov 26, 2014

maddavo said:
Unfortunately the verbose json with coords tends to put a bit of overhead on the data transfer so when I ask for the whole thing then my code is timing out.

Put in a cubefilter - and pull only a bit of the data at a time (50 LY cube or whatever works)
The data is only going to grow - So everyone will need to do so sooner or later

Snuble · Nov 26, 2014

TornSoul said:
Put in a cubefilter - and pull only a bit of the data at a time (50 LY cube or whatever works)
The data is only going to grow - So everyone will need to do so sooner or later

Are you planning to include support for other static info about each system, like the stations and services they provide?

TornSoul · Nov 26, 2014

Yes.

It's delayed a bit due to
1: Procrastinating
2: Can't bloody decide on a format (that I'm happy with - Nothing really "feels" right...)
3: Procrastinating because of 2

So.. It'll get there.. Eventually...

Snuble · Nov 26, 2014

Okay, you can run the format toward us again, I think we've started to get an idea on what we need...

Biteketkergetek · Nov 26, 2014

What's wrong with the format in RedWizzard's systems.json?

melak47 · Nov 26, 2014

TornSoul said:
2: Can't bloody decide on a format (that I'm happy with - Nothing really "feels" right...)

With potentially more and more info about each system, maybe it would be nice to be able to pass something with the filter that tells the API what info we care about.
Something like ['coord', 'stations', 'celestial'] to tell it we want the coordinates, a list of stations, and a list/tree/??? of celestial bodies.
Or even more detailed like { 'stations': ['orbitting', 'services'], 'celestial': ['stars', 'planets', 'moons', 'asteroids'] } if we want to know which body a station is orbitting, what services are offered, etc...

Discussion What is the most efficient way to crowdsource the 3D system coordinates

Tutorial & Guide Writer