TGC should collect all usefull data for 3rd party tools.
And no, for this Beta there is no reason to collect a massive amount of data outside of fine tuning the tools to be good enough for release.
And the real questions for FD is if the galaxy will be redone AFTER Gamma. For example Gamma could be just another beta, where we keep our ships and progress, but where the systems themselfs get regenerated.
TGC should collect all usefull data for 3rd party tools.
And no, for this Beta there is no reason to collect a massive amount of data outside of fine tuning the tools to be good enough for release.
And the real questions for FD is if the galaxy will be redone AFTER Gamma. For example Gamma could be just another beta, where we keep our ships and progress, but where the systems themselfs get regenerated.
My understanding is that gamma is just early release with bug fixing still ongoing. There shouldn't be any changes to the system generation after gamma starts unless some major bug requires it. I'd be willing to take the risk and start gathering data then.
TGC should collect all usefull data for 3rd party tools.
And no, for this Beta there is no reason to collect a massive amount of data outside of fine tuning the tools to be good enough for release.
And the real questions for FD is if the galaxy will be redone AFTER Gamma. For example Gamma could be just another beta, where we keep our ships and progress, but where the systems themselfs get regenerated.
We're aiming not to have to regenerate the galaxy after gamma release. Individual systems might be changed for fixes, but not the system as a whole. It depends on what comes up though.
May I ask a really dumb question? Why we are doing this? For what purpose are we trying to determine the star coordinates?
If it is just for the intellectual exercise and for fun, then fine. But given that FD seem to have shown a willingness to enhance the GM into a major tool (as witnessed by the recent changes) including providing a route planning capability (albeit embryonic at this stage), what do we need these coordinates for?
Don't want to rain on any parade - but just wondering.
I started writing my app, after buying a shed load of items that nearly wiped me out in cash, first jump was ok, second jump was outside of my range. I had to do 3 jumps between system1 and system2 before I had burnt enough fuel to make the long jump back to where I wanted to sell the items.
So the goal I'm going for is route planning with fuel consumption and Profit with Fuel prices included.
When I have a good set of co-ordinates I run a calculation on the system to find nearby systems and their range.
It would work without the co-ords but I wouldn't be able to find the marginal cases where a system is just at the edge of my range when I'm on 50% fuel. Shortcutting the original path.
My next task is to send a message to my Pebble watch with the route on it, with up and down buttons so I can step through the route.
May I ask a really dumb question? Why we are doing this? For what purpose are we trying to determine the star coordinates?
If it is just for the intellectual exercise and for fun, then fine. But given that FD seem to have shown a willingness to enhance the GM into a major tool (as witnessed by the recent changes) including providing a route planning capability (albeit embryonic at this stage), what do we need these coordinates for?
Don't want to rain on any parade - but just wondering.
To be honest, for me, it's mostly just for the fun of it . I know the data is useful for various tools and all that, but I don't really use those tools much.
And apparently I just like to fool around with numbers
I've done a buttload more testing (more to come) to try and figure out how to proceed, what we need, and what we don't really need, to get "good enough" data.
From earlier - Doing 100K random reference system selections (and calculating a float32 rounded distance to 2dp) - and a p0.
Doing the subset combinations of all ref systems ( C(3, numref) * (numref-3) tests)
Picking the p0 candiate with the smallest RMS gives the following.
Much better than I initially thought.
And the diminishing returns beyond 6 ref systems is obvious (but honestly - even with 5 ref systems - the error rate is imo surprisingly low)
I wanted to see the relation/correlation between the RMS values and whether we got a correct or incorrect p0 value - "Just because"
Here's a picture of that (not posting all the numbers )
This was done for a 10 million run btw (instead of 100K) with 5 ref systems.
That takes awhile... was a one off - "Just because"
Click it to be able to read it View attachment 526
The graph on the right is simply a zoomed image of the one on the left.
A couple (of perhaps obvious) observations:
- If RMS ever gets above 0.0049 (which in itself only had 5 out of 10M!) then we can be sure we don't have the correct coordinate
- Unfortunately having a RMS below that value, doesn't really tell us anything - Except that there's a 96.7% chance it's correct.
-----
I then set out to see if we could improve this a bit.
By doing a cube search around the found p0 - using lowest RMS as the determinator for picking the final p0 value.
I only did this for the incorrect values.
So keeping the RMS graphs above in mind, it's definitely possible that a cube search on a p0 that is actually correct, could end up returning a non-correct p0 (as RMS is used as determinator - and not reverse distance check)
But for the incorrect values the cube search was indeed able to find a few more correct values.
It's not really impressing though...
Here's another picture (can't be rarsed to format all those numbers to look nice here)
I did the cube search for various radii of the cube. 1 to 8.
A radius of 8 - means a cube 2*8+1 (17) 1/32th LY across - or a bit over half a LY cubed searched (That's 4913 coordinates checked)
Observations:
For 5 ref systems the cube search is able to find a candidate with a better RMS value in 53.8% to 68.1% of cases.
Of those better candidates - Only 26.5% to 41.7% where actually correct. The rest still didn't match the p0 which we where trying to find.
Also: A cube with radius of more than 3 (maybe 4) is nearly pointless.
Which to me indicates that while the initial incorrect value, is well incorrect, it's not wrong by much (in an overwhelming amount of cases)
So yeah - a cube search does eek out a bit more mileage - But definitely not as much as I expected (coming from the mindset it ought to be able to nail it - with radius of 8 for sure...)
It gets "worse" with 6 ref systems.
Cube search improves even less here - Of the coords with better RMS found, only 10.3% to 20.8% are actually correct.
In hindsight that's actually not too surprising (but always good to see the numbers confirming it) - As those p0 cases we are trying to improve on, are getting more and more "degenerate" as our number of ref systems increase.
As the ref systems themself (via trilateration) are able to nail almost all of them - and those they can't, well it's because the coords/distances are aligned such that it's very hard to do it - Thus cube search is put in a worse and worse situation, with the "left overs" from the regular trilateration, and thus improves less and less.
-----------
Next up:
Instead of using RMS to pick the best candidate from the cubesearch - I'm going to do the reverse distance check instead.
I'm going to record how many coords in the cube will pass that reverse distance check (I expect multiple often will - due to only 2dp).
I'll then use RMS to pick one p0 among just those that pass the reverse distance check - and then check to see if that value is indeed correct.
I honestly don't expect to see much of a difference in the results (error percentage) - But I'm kinda interested in seeing just how many candidates in a cube search will actually pass the reverse distance check.
Which will put some kind of number on how impossible (literally) it will be for some sets of distances to determine the correct p0.
---
PS: Are anyone actually interested in all these numbers?
I run them for my own sake - To convince myself what variables needs tweaking to get the best result.
And I share them here to save someone else from the trouble - or catch me in making a mistaken conclusion.
But am I wasting my time with that? (takes a bit to make these posts)
You are CERTAINLY NOT wasting your time with that (posting). Perhaps it gives the other math gods (ok commanders) some ideas. I totally like all the work you are putting in this. Keep up the good work.
Note: Haven't received a csv file from Michael Brookes at this point. Some more patience is needed I'm afraid ;-)
One constraint I can think of is that we should ensure that we get enough distances for a given star such that if any 1 distance is removed, it still results in the same single matching coordinate after searching the candidate region. I think this would ensure that there is enough redundancy in the data to be able to detect 1 incorrect distance. I'm not sure how onerous a requirement that would be though.
The erroneous system is included when calculating the RMS - D'oh.
Disregard the below
This actually happens automatically with the test runs I've been doing (see post above)
Due to running all subset combinations - several of those won't have the erroneous distance included.
And as such will give a better candidate (based on RMS).
And thus when comparing candidates from all combinations afterwards, one with a high RMS (due to wrong distance), will not be selected.
In fact all (possible) combinations of removing distances is run - so it will catch any amount of wrong distances.
The possible exceptions being, if the wrong distance is just 0.01, it *could* happen that this could give a better RMS value (would be rare though) - But in that case the candidate would be really good anyhow - and a subsequent cube search could improve it (and perhaps even identify the wrong distance(s) - situation permitting)
It does point out where one could insert that particular test though.
Calculate the RMS for each combination of ref systems (not involved in the trilateration calculus) - and see if any abnormal values pop out - pointing at one or more distances being wrong.
This starts getting computationally very expensive if the number of ref systems is high though...
(running all possible combinations...)
There are a number of fairly obvious algorithmic tricks you could use to improve that:
1) Pre-calculate all of the possible routes between pairs of populated systems, and store them all somewhere. Or just store the route costs, whichever you prefer. This means that you just do a lookup when you need to know such a value.
2) Use Dijkstra's pathfinding algorithm to calculate *all* paths to *all* populated systems from each given starting point, rather than using A* to calculate one path at a time. This is more efficient in aggregate to complete point 1, above.
3) Pre-calculate the fuel-efficient link set (using the simple algorithm in my Python scripts) before starting the pathfinding. This makes pathfinding easier, and therefore quicker.
4) Use a min-heap instead of explicitly sorting lists by distance. This is something I *don't* do in my Python scripts, but I would if I rewrote them in a more efficient language (where manually manipulating data can be faster than just calling a generic sort function).
5) If you're searching for trades from a given starting point, you can (as an alternative to pre-calculating all routes) use Dijkstra to define the order in which you evaluate candidate stations. This will reveal the closer stations' potential more quickly, giving the user the solution they want sooner (even though the code may still be working).
6) Further to point 5, you can extend the idea by performing one pathfinding step for each possible starting point in turn, keeping the data structures in place. This provides useful solutions sooner when doing a global trade-route search.
Still its even more efficient to just bypass Dijkstra, find good candidates and then do Dijkstra for details. Its NP hard to solve and while brute forcing 6 stations or 50 stations, it was beginning to crack with B2. It effectivly a "find all routes" problem. And currently I'm ruling out routes as early as possible by first finding the good 2 station routes, then just dropping anything that falls under a set treshold based on that average. Finding the best 2 station routes are trivial and will scale nicely with increasing numbers. The 3 station, 4 station and more will not. Dijkstra really only matter when a accurate time estimate is needed.
That said, I dont expect to find any gems in that data. B2 had the good trades pretty close together, and for the cases where a significant profit difference could be achieved, the time it would take to complete the route just offset the profit to biowaste levels.
Good news. Received partial SB3 list from Michael Brookes
Just received a PM from michael brookes. Haven't csv-ed it yet. Wanted to share it directly. Shown in the spoiler tag. It contains 755 coordinates. As far as I can see they are in the new 2dp format.
Of course I share it here, you guys are doing the hard work ;-)
Double hail to Michael as he has send the list again. It was good that I PM-ed him again this morning. He had already PM-ed the list to a cmdr and expected the list would be shared. A commander seems to have forgotten to post it ;-( At least haven't seen anything.
Gonna send a PM to double check if the following is still true for the partial SB3 list. Yesterday I made a mistake about the list and he posted the following here
Michael brookes: That's not correct - the list was systems that have had economies attached to them."
I've run some tests with my calculator/verifier on an oldish systems.json (with 551 systems), and I believe the 2-digit distance will not prohibit exact system coordinates. Unfortunately it uses a huge amount of memory: memory use is proportional to the square of the max. distance value used, and reducing the precision seemingly increased the memory use by a factor of 4-8 os so, and is very slow. Therefore it is not a very practical method, but I think it shows that the distances with just 2 fractional digits could still be useful.
Don't shoot the messenger ;-)
I Double checked the PM and indeed the entries you mentioned have no star system name.
About converting. Didn't have time to check the list but do you mean we have to convert them so that SOL is again the centre?
Should you be able to do that, could you please post the converted list in perhaps TD CSV or TOR format here?
I've just produced (but not uploaded) maps based on that coordinate set. I first had to convert it from an ill-formed space-separated file into a CSV, by extracting the *last* three fields from the remainder.
Jata and Chara are rather a long way from the main pill; it's extremely unlikely that we'll be able to visit them, but we can use them as reliable external references.
There is a system called "Test2", which probably isn't meant to be there.
Ignoring Jata and Chara, the remaining systems can all be reached with a 22.35 ly jump range. As the intermediate systems are discovered, that will probably come down.
I'm happy to use whatever coordinate system ED does - it honestly doesn't matter where the centre is.