Discussion What is the most efficient way to crowdsource the 3D system coordinates

Andargor · Jan 26, 2015

And I just had an idea, which may or may not have been proposed earlier.

Snuble/Biteketkergetek have already proposed a method of extracting coordinates from the galaxy map (if I understand, it involves some graphics processing and/or OCR), but what about this:

- Set the cursor at a know star
- Set the zoom at a certain value
- Use joystick controls programmatically to move the cursor
- The speed at which the cursor moves can be used to determine distance, and hence position
- A further indication is that the stars above and below the horizontal (XZ?) plane have a visual indicator

One snag is that the cursor will "snap" to star positions, and also I do not know how precise it would be (although 1/32 ly grid and all...)

I will mull this over... but an API would be best

MagmaiKH · Jan 27, 2015

I've got a report of a bad star coordinate for either Agastani or Delkar.

In order to check the math on them ... I need the math.

What we need to store and disseminate is the raw multilaterating data not cooked 3D coordinates.

I've read a lot of post from people who don't understand enough of how it works to do it correctly. I'm not sure I do.

If a mistake is/was made in the multilateration calculations and all we have is 3D coordinates then we're sunk. The whole dataset is now in question.

If we have the raw data then we could re-run algorithms and recreate the dataset.

Snuble · Jan 27, 2015

MagmaiKH said:
I've got a report of a bad star coordinate for either Agastani or Delkar.

In order to check the math on them ... I need the math.

What we need to store and disseminate is the raw multilaterating data not cooked 3D coordinates.

I've read a lot of post from people who don't understand enough of how it works to do it correctly. I'm not sure I do.

If a mistake is/was made in the multilateration calculations and all we have is 3D coordinates then we're sunk. The whole dataset is now in question.

If we have the raw data then we could re-run algorithms and recreate the dataset.

Dont worry too much about it, some systems will move around as part of the stellar forge thingy. The best thing to do is resubmit distances for that system and if its a FD submitted system we might have to wait for the project owner to come back online.

Agastani is in the wrong place, no distances submited to EDSC as far as I can see, and its a FD submitted system. If we're lucky the system will recalculate the coords once someone visit the place and start submitting distances for it. As far as I can see noone submitted market data for the place, so looks like it is a bit "outside" where "everyone" is. Aprox coords is 22 -18 -8

Delkar seems to have the correct coordinates when I look at it in game.

RedWizzard · Jan 27, 2015

Andargor said:
Well, seemingly all 400 billion systems have at least their position already determined, since you can browse them in the galaxy map. Having the ability to query their position should be trivial, be it for the game client or by a web api.

EDIT: Or if not all 400 billion, at least those that have been generated by player presence.

I'm sure the systems in the galaxy map are calculated on demand based on the visible area. It can't be using a lookup table or similar database as it's simply too much data (just for the coordinates in single precision floating points or integers you'd need 12 * 400,000,000,000 ~= 4.25 TB).

There's no evidence that visiting a system has any lasting effect other than the exploration status (which appears to be stored separately in one of the transaction servers) so it can simply regenerate systems from scratch each time they are required. Although the procedural generation is essentially a matter of generating a bunch of random numbers, the seed for the psuedorandom generator is fixed so the result is deterministic.

AFAICT it's fundamentally the same as how it worked in FE2 and FFE: the galaxy is divided into sectors of some size and a density function defined to specify the star density in each sector. In the previous games this was a simple bitmap image. The sector coordinates and density are the seeds for the procedural generation system which spits out a list of system names and coordinates (and probably the other data you get on the info tab). In ED it sounds like the density function specifies available mass and the procedural generation system then coalesces that mass into systems. Once it has the basic details of the system it can proceedural generate the contents when required. For the real-life systems there will just be a big list - those will be indexed by sector and simply looked up as required. Michael mentioned somewhere that there are about 144,000 of those objects.

What this means for the API is that there will be limits on how we can specify the systems we want. It's easy to retrieve a system by name or generate all systems in a specified volume but anything else would require generating each system to see if it qualifies. I.e. the only way to get a list of all systems containing black holes would be to generate every system in the galaxy and check them. That's not practical from a performance standpoint. This is why Frontier have never been able to give us an exact count of systems in the galaxy. Even relatively small volumes could be problematic in terms of system generation and the volume of data required. I estimate a 1000 radius Ly sphere might contain 2-5 million systems (based on there being somewhere between 20,000 and 50,000 systems within 200 Ly of Sol).

RedWizzard · Jan 27, 2015

MagmaiKH said:
I've got a report of a bad star coordinate for either Agastani or Delkar.

In order to check the math on them ... I need the math.

What we need to store and disseminate is the raw multilaterating data not cooked 3D coordinates.

I've read a lot of post from people who don't understand enough of how it works to do it correctly. I'm not sure I do.

If a mistake is/was made in the multilateration calculations and all we have is 3D coordinates then we're sunk. The whole dataset is now in question.

If we have the raw data then we could re-run algorithms and recreate the dataset.

The raw data is available. You can get it from TGC using the GetDistances API, or download my cached copy from https://github.com/SteveHodge/ed-systems/raw/master/tgcdistances.json (it's about 8 MB).

Most of the submitted systems reference systems whose coordinates were supplied by FD; there aren't many systems that rely on previous results yet. Unfortunately we know the FD data is not 100% reliable, and Agastani and Delkar are both FD systems.

I'm actually been looking at the data quality in TGC myself lately and there is a lot of bad data: misspelled names, typos in distances, test systems, and even a few cases where distances for two systems has been combined under one name. Plus my trilateration algorithm generates coordinates for about 200 systems that TGC can't locate. Since TornSoul has gone AWOL my plan is to correct these issues in the cached data files I've been uploading to github so anyone who wants to get the fixes can load those files as if they were a response from TGC (they're in exactly the same format) and then contact TGC for any new data since. I should get the corrected data uploaded in a couple of days.

Andargor · Jan 27, 2015

RedWizzard said:
I'm sure the systems in the galaxy map are calculated on demand based on the visible area. It can't be using a lookup table or similar database as it's simply too much data (just for the coordinates in single precision floating points or integers you'd need 12 * 400,000,000,000 ~= 4.25 TB).

TBH, in the corporate world that is not very large at all.

RedWizzard said:
There's no evidence that visiting a system has any lasting effect other than the exploration status (which appears to be stored separately in one of the transaction servers) so it can simply regenerate systems from scratch each time they are required. Although the procedural generation is essentially a matter of generating a bunch of random numbers, the seed for the psuedorandom generator is fixed so the result is deterministic.

You may be correct, they might just send the seed to the client for it to generate, and update the display for visited systems. Any player exploration is really only seen at the system map level or info panel, which can be pulled on demand. The client would have the algorithm for calculating precession as well. On the other hand, I'm still not excluding a lookup table with ids and positions, since it's not really that big, and precession would be batch calculated.

RedWizzard said:
What this means for the API is that there will be limits on how we can specify the systems we want. It's easy to retrieve a system by name or generate all systems in a specified volume but anything else would require generating each system to see if it qualifies. I.e. the only way to get a list of all systems containing black holes would be to generate every system in the galaxy and check them. That's not practical from a performance standpoint. This is why Frontier have never been able to give us an exact count of systems in the galaxy. Even relatively small volumes could be problematic in terms of system generation and the volume of data required. I estimate a 1000 radius Ly sphere might contain 2-5 million systems (based on there being somewhere between 20,000 and 50,000 systems within 200 Ly of Sol).

Well, I don't totally disagree. However, since you can browse the galmap, it means the capability of determining the position and basic information of each star system is available in the client, be it by pseudorandom generation or query. Hopefully, the API would let us query at least that, since position is required for the routing apps. As far as specific system information (e.g. the bodies inside), it may be available via the API for systems either visited by you or purchased.

We shall see.

graham.reeds · Jan 27, 2015

In 2001 we had a 7.5TB across 132 SCSI drives. I now have more in my 6 bay enclosure at home.

RedWizzard · Jan 28, 2015

Andargor said:
TBH, in the corporate world that is not very large at all.

You may be correct, they might just send the seed to the client for it to generate, and update the display for visited systems. Any player exploration is really only seen at the system map level or info panel, which can be pulled on demand. The client would have the algorithm for calculating precession as well. On the other hand, I'm still not excluding a lookup table with ids and positions, since it's not really that big, and precession would be batch calculated.

Yes I'm aware that 4.25 TB is not that much to store, though note that is only coordinates. You can do the calculation on what would be required to store names and star types and numbers. It's certainly over 10 TB.

But storing that data is not the issue, the issue is transmitting it. This is a problem both in terms of what they can send to the client and also in terms of what the API can reasonably do even if it is a client-side API. It's obvious that they are not sending that volume of data to the client - you'd notice it in your data usage if they were and the galaxy map responds too quickly for it to be sent on demand. So it must either be looked up or generated in the client, as you've also concluded. Again it's obviously not stored as part of the client. The installation is less than 4GB, they'd have to have invented a truly magical compression algorithm to find 10+ TB of galaxy data into that (though it can be argued that procedural generation is a form of compression). But the point is that however the client has the data it still cannot supply a third party app with 10+ TB of data, that is clearly not practical.

You mentioned precession. I assume you mean the change of position of systems within the galaxy over time? Do you have any evidence that is actually implemented? My understanding is that the system positions are static. In fact I think I've seen Michael say that is the case though I don't have a reference.

Andargor said:
Well, I don't totally disagree. However, since you can browse the galmap, it means the capability of determining the position and basic information of each star system is available in the client, be it by pseudorandom generation or query. Hopefully, the API would let us query at least that, since position is required for the routing apps. As far as specific system information (e.g. the bodies inside), it may be available via the API for systems either visited by you or purchased.

It doesn't matter whether the galaxy is generated server side or client side, the API will not be able to provide that volume of data. So there will clearly be limitations in how that data can be queried. That is my point.

graham.reeds said:
In 2001 we had a 7.5TB across 132 SCSI drives. I now have more in my 6 bay enclosure at home.

I have about 6 TB of storage at home as well, but you've also missed the point: it's not storing that amount of data that is the problem, it's communicating it. How long would it take to simply read all 7.5 TB off your server?

Andargor · Jan 28, 2015

I think that the discussion is getting derailed, no one can argue that transmitting 10 TB over a DSL line is possible. The exact mechanics are not really relevant, and only FD really knows.

RedWizzard said:
It doesn't matter whether the galaxy is generated server side or client side, the API will not be able to provide that volume of data. So there will clearly be limitations in how that data can be queried. That is my point.

The question is: will the API be able to provide stellar information? You are saying two things here: the API can't give all the stellar data at once, and that the API can give some data in a limited fashion. Again, not disagreeing.

The point of my galmap example is that the client already is giving you some of that limited data. When you browse the map, you can see the stars in the area you are viewing, whatever the method used by FD. You can also plan routes up to 100 LY and soon up to 1000 LY, which means the coordinates are there or you are pulling them. In either case, the star coordinates are available, and some basic info such as type.

Maybe you are thinking about the API being strictly web-based vs client-based? I still think it should be both. You are already pulling your exploration data from the server, so that it is overlaid on the galaxy map when you are viewing it. Why couldn't you access that info?

Anyway, not sure where the discussion is going, we'll see what will be provided.

RedWizzard · Jan 28, 2015

Andargor said:
The question is: will the API be able to provide stellar information? You are saying two things here: the API can't give all the stellar data at once, and that the API can give some data in a limited fashion. Again, not disagreeing.

Yes, that's what I've been saying. Quoting from my original message on this: "I expect it will provide coordinates but it will have to be limited in some way ... I guess it'll either be limited to systems within a certain distance of your ship or it'll return coordinates within a requested volume (as TGC can do)."

Andargor said:
The point of my galmap example is that the client already is giving you some of that limited data. When you browse the map, you can see the stars in the area you are viewing, whatever the method used by FD. You can also plan routes up to 100 LY and soon up to 1000 LY, which means the coordinates are there or you are pulling them. In either case, the star coordinates are available, and some basic info such as type.

Maybe you are thinking about the API being strictly web-based vs client-based? I still think it should be both. You are already pulling your exploration data from the server, so that it is overlaid on the galaxy map when you are viewing it. Why couldn't you access that info?

Anyway, not sure where the discussion is going, we'll see what will be provided.

No, I'm not saying the API will be strictly web-based or client-based. I agree they'll probably need both. What I am saying is that you won't be able to say "give me all the systems in the galaxy". It must be limited in some way ("give me all the systems within X Ly of the ship", or "give me all the systems within X Ly of system Y"). I suspect the volume limit might be quite low: as I said I estimate a 1000 Ly sphere would contain millions of systems, perhaps as many as 5 million. That would already be enough data to challenge practicality (it'd be more than 50MB).

So getting back to my original point, depending on exactly how the API is limited, there may still be a place for a collection service for coordinate data. Specifically if the API only provides data on systems within a certain range of the ship.

graham.reeds · Jan 28, 2015

Also the API will be throttled.

Athan · Jan 28, 2015

And that's why I stated that I think it would be sensible of FD to use their own API to pre-generate a compressed file of the static information (so not controlling factions or anything else that can change due to player action/story) for the inhabited volume plus some buffer. Else they just get multiple of 'us' making all the requests to get that data ourselves. It's one very obvious use case of the API.

Snuble · Jan 29, 2015

Here is one error in the dataset:

Hyades Sector WF-M b8-3 is registered with 39.65625,21.40625,-97.09375 submitted by a commander 16th of january. This is however an alias for Gatonese. The correct name at that coordinates is Hyades Sector XF-M b8-3. Its a simple typo made, but it highlights why some sort of trusted admins are needed for the dataset.

RedWizzard · Jan 29, 2015

Snuble said:
Here is one error in the dataset:

Hyades Sector WF-M b8-3 is registered with 39.65625,21.40625,-97.09375 submitted by a commander 16th of january. This is however an alias for Gatonese. The correct name at that coordinates is Hyades Sector XF-M b8-3. Its a simple typo made, but it highlights why some sort of trusted admins are needed for the dataset.

I'm working on a corrected version of the data. Hopefully should have something in the next day or two. I completely agree it needs administration; I think TornSoul was hoping it would be self-correcting if people submitted the same systems/distances but that doesn't really work. You can log enough correct distances to get good coordinates despite bad distance data but there is no way to correct a typo in a name.

I was really surprised at how much bad data there is (though it's still only a small fraction of the total). One of the worst cases I've seen is LHS 1375. It has over 120 distances recorded but about half of them are actually for LHS 1326 which is a couple of Ly away. There are a few other cases where it looks like distances for two different systems have been recorded under one system. There are also plenty of typos (in names and distances), distances with zero or one dp (probably taken from the nav panel), test systems people have (presumably accidentally) loaded into the live system (e.g. '1', '11', 'a', 'aa')... it's going to take a bit of work to clean up.

Athan · Jan 29, 2015

Another error is 'Rathamas' not being at 50.59375 | -147.28125 | 46.59375 as per the FD data dump in gamma, instead it's somewhere near 46.5 | -153.5 | 50. I've deleted it from my DB.

Andargor · Jan 30, 2015

Interesting tidbits from Michael on how the galaxy is rendered in the galmap vs exploration data:

Sloma said:
2. Can we get a filter on galaxy map that would show visited / explored systems? I think this very important for explorers especially when you see a lot of systems with generic names, names which are difficult to remember.

Michael Brookes said:
2. Probably not, that data doesn't exist on that level of the map and would slow the map down to retrieve that data.

Michael Brookes said:
At the top level the galaxy map doesn't need to connect to the server.

RedWizzard · Jan 31, 2015

Andargor said:
Interesting tidbits from Michael on how the galaxy is rendered in the galmap vs exploration data:

Ninja'd

So the galaxy is definitely generated on the client rather than communicated from the server. That implies the system names/coordinates can only change with updated client executables - I had a plan to keep a "last seen date" for each system in TGC but it think "last seen version" will be better.

For the API it kind of confirms what we were thinking: having both client-side and server-side API will probably be necessary. I think some people are going to be disappointed about the limitations the API is likely to have.

Finwen · Jan 31, 2015

I also think the API will be limited. But hopefully it will contain enough to to make our mapping easier.

wolverine2710 · Feb 2, 2015

Finwen said:
I also think the API will be limited. But hopefully it will contain enough to to make our mapping easier.

I have to admit I'm not that confident that the FD API will be extensive. Hopefully its not going the DDF route. That said, commanders have spoken about their wishes/use cases and hopefully we see quite a large portion of that back in the FD api - at some point. Coordinates and prices are imho the bare minimum of what the api should provide. EDDN and others can then distribute the data.

graham.reeds · Feb 2, 2015

You will only be able to get coordinates for the portion of the galaxy that is hardcoded, the rest of the galaxy is a seeded number.

Discussion What is the most efficient way to crowdsource the 3D system coordinates

Tutorial & Guide Writer