Discussion What is the most efficient way to crowdsource the 3D system coordinates

wolverine2710 · Oct 26, 2014

Snuble said:
Quick and dirty as promised, without any attempt at formating, http://travel-eliteadvisor.rhcloud.com/index.php

I think this offers a structured and efficient way to traverse the current universe. It also opens itself for basic sharing of the space between participants. Also it can be a good way to visit all current systems. Start at bottom slice, snake your way down the table, then go up one slice and continue back.

Basically, I'll open a new thread where you "tell where you go", and then post a short after action report.
Keep the tools you need close at hand. A router planner to save fuel. Pen and paper to note down unknown systems. Make sure you get all the new systems before you leave the box. Be systematic.

There will be some delays in the system. For the boxes to update, I'm relying on other to share their data, and then generate the new set of boxes based on that data. The only other source of new data will be my own exploring.

Thanks Snuble, nice work. Looks quite and functional. Have briefly checked it and need to put it on the rack later on. What I like is the extra info like stations/platform etc. As it is stored at your site it can be made available in some form/format later on.

wolverine2710 · Oct 26, 2014

I had my little nephew over this weekend and he of course wanted/demanded attention ;-)

I've looked at the posts about TGC and will reread them carefully. I'm in the process of writing a LENGTHY post in which I make a SUGGESTION of how we COULD proceed taking into account what has been written. Like I said before its crucial we get this puppy automated as soon as possible. We've all seen how much of a struggle it was to get all data combined in The Reference format. Now with a much bigger bubble this becomes even more important.

wolverine2710 · Oct 26, 2014

After the discussion about automating the process I want to present my SUGGESTIONS of how to proceed with it. As I'm not "The Great Dictator" but more of a "The Humble Coordinator" (Yes there is something fishy about the acronym) please lets discuss this further and see if we can work it out so work on TGC can start.

I don't see The Great Collector as something which will exist for ever or will hold all information about ED. I'm sure better tools/platform will emerge. My intention is to have it set up quickly so SB3 crowd sourcing becomes easier to coordinate and with limited functionality (as in data stored in it).

You will notice that my suggestions are heavily based on the ones posted by RedWizzard. Please reread his proposal.

Data in TGC
Lets in the first incarnation store ONLY coordinates and distances for star systems in it. Other info like distance to stations/platform and anarchy,federation is great, very useful but lets first do coordinate and distances. Storing other info in TGC would mean the input format has to change quite a bit. If I have the choice between simple and fast (to implement) and multi purpose and more time consuming to implement. That does NOT mean that when setting up TGC we should not think about it - see internal format. Also commanders like KevinMillican with edstarcoordinator already are doing stuff like that AND he has an export function.

Important rule: Consider uploaded coordinates and distances incorrect UNTIL proven to be correct.
Verifying coordinates and distances. A number of tools have various ways of checking if entered distances are correct. Other (newer) tools might not. Hence coordinates/distances uploaded to TGC can be wrong. Coordinates and distances should be flagged 'unverified' and double checked, after confirming they are OK they get the flag "verified".

Checking coordinates.
1) Biteketkergetek has a verifier to verify if coordinates are 1/32 LY grid compliant. It is written in GO. The algorithm could be rewritten in a language which can ran on the site of the hoster. Another possibility: Biteketkergetek has his own site. He could turn the verifier in a webservice which can called by TGC.
2) As suggested by RedWizzard. A commander or normal volunteer can verify coordinates by using the Galaxymap (see point 4 of RW). TGC can have a webpage where unverified coordinates are shown and normal volunteer can then enter disances there.
3) TGC should have a Trilateration inside. This can calculate the coords and check with the uploaded ones. In fact as RW suggested TGC should be able to work with ONLY distances being supplied.

Checking distances
Distances uploaded should be untrusted by default. They need to be verified by normal users. A webpage can be used for this. If two or more normal volunteers enter the same distance the distance is promoted to verified.

Input format TGC
I suggest using a JSON structure - without a scheme. Concerning one or multiple JSON formats for that. As suggested by I think TornSouls it best to have ONE input format BUT I don't want to force that and alienate commanders who have a tool now. Iirc atm there are 4 tools for a normal volunteer to enter distances: Spreadsheet of TunaMage, website of Harbinger, website of TornSouls and the pages (tool) which run locally in a web browser by RedWizzard. The last three generate JSON. If Harbinger and TornSouls can change their output to the reference format of RedWizzard that would be great. In that case his format would need to be enhanced with entries like: Author, name_of_tool_ version_number_of_tool, url_of_tool etc. TGC in that case can run on ONE port. If the authors feel it would removed certain functionality from their tools and limit them we can have TGC be running on multiple ports. One for each tool.

Internal format
In the first incarnation of TGC this could be the input reference format. For things like flagging unverified and verified coordinates and distances it has to be extended. Better would be to store it in an open source database like MySql (often used), Postgress or even sqllite. This would mean a convertor to put the JSON in the database. If the database can be set up in such a way that it is portable between (open source) databases that would be a plus.

Output format TGC
The reference format of RedWizzard aka TOR - The One Reference. This can be the same as the input format IF coordinates are uploaded also.
TD CSV format compatible with Trade Dangerous (TD).
Simple CSV. Compatible with what wtbw used. Link here.
XML. Optional. Of course a schema for this has to be constructed. Some commanders prefer XML and we should not exclude them.

How to proceed - hosting wise.
Determine where TGC can be hosted - as in which commander.
Determine which languages and databases can be used at that site.
Determine who is able/willing to write software for TGC.
That can of course be multiple commanders. If they have access to the site it would be easier.

wolverine2710 · Oct 26, 2014

RedWizzard said:
Sounds good. Here's how I think the webservice (TGC) should work at a slightly higher level of detail (all of this IMHO, of course):
1. The main input should be distances between two systems. At least initially our frontend tools should only submit distances that are known to locate systems without error, i.e. at least 5 distances from the unknown system to reference stars with coordinates provided by FD that together generate correct coordinates (correct coordinates beings ones that generate the same distances to 3 dp using the single precision calculation).

That would be ideal. Not all tools do that atm and we should not force them to. All uploaded distances (and coordinates) should be treated as unverified/untrusted by default till they have been verified. Can be a normal volunteer. This can be a webpage where all unverified distances and coordinates are shown.

RedWizzard said:
I do think it is worth collecting any distance data people are willing to submit (such as insufficient numbers of distances and distances to non-reference systems), but let's focus on what we need to actually locate systems first.

Agree. Not all tools operate on 3D coordinates. Chromatix for example uses distances but as he is missing data for 60 star systems he is now using the coordinates.

RedWizzard said:
2. Submitting coordinates with the distances should be optional, and if they are submitted they should be checked by the TGC. Since it can check coordinates it should. We're all getting the same results so it doesn't really matter whose algorithm is used by the TGC, the implementor can choose.

Agree.

RedWizzard said:
3. Input format should be a JSON block with the same structure as the output. It seems like the simplest structured format to use. XML, for example, would be overkill.

Agree.

RedWizzard said:
4. TGC should store two lists of systems: verified systems and unverified systems. At least initially, I think at least one of us should be confirming that a star exists before it is considered verified. This would simply involve opening the galaxy map, searching for the unverified system, and checking that the distance to it (from wherever you are right now) is consistent with the calculated coordinates. This process should give us a high level of confidence that the verified systems are accurate. How the TGC actually stores the data (DB/files/whatever) is fairly irrelevant and whoever implements it can choose.

Not 100% sure. But didn't you check all coordinates and distances in TOR (The One Reference)? In a later stage this verification should be done by a normal volunteer. Show on a webpage the unverified system. Let a user enter a distance from current system to the unverified system. Then calculate distance and compare. If they match the system can be called verified and removed from the unverified list. Instead of one verification it could be made so that it is removed when say 2 or 3 times the distance is verified by a normal volunteer.

RedWizzard said:
5. Multiple output datasets should be offered: verified systems with coordinates, unverified systems with coordinates, verified systems with coordinates and distances, unverified systems with coordinates and distances, unattached distance data if we capture it, and perhaps static lists of reference systems. Tools that are only interested in the results don't need the distances but those of us who've written tools to locate systems will want to be able to check the output of TGC is correct and will need distances.

It would give authors a choice between a smaller list of verified systems and a larger list but with a risk involved. Output datasets should not be limited to TOR but also in TD compatible CSV format and a simple CSV format like wtbw used.

RedWizzard said:
6. Output formats: JSON with the same structure as the input. The implementor can do anything else they want to, of course, e.g. TD CSV, but tool specific formats can just as easily be done by the tool so the implementor shouldn't feel compelled.

Not sure about the static lists of reference systems. Is the current list valid for SB3? Or are perhaps new reference points needed for SB3?
What do you mean with "unattached distance data if we capture it"?
If uploaded coordinates are optional the input and output format can't be the same.
All data sets should be available as a simple download using a fixed url.

RedWizzard said:
7. As the data volume increases it is likely that we will need to be able to fetch data by region: this should be considered in the design. Probably just allowing the fetcher to specify a system and a distance would be enough (i.e. "give me coordinates for all verified systems within 50 Ly of Sol"). I think it should be designed for caching as well: data requests should be URL based (either query or hierarchy, it doesn't really matter) and the TGC should correctly identify unchanged data. If the bandwidth use gets high then we can do things like reducing the frequency of updates to the verified list (once a day or whatever).

Agree. But lets start simple.
Not sure what you mean with "data requests should be URL based (either query or hierarchy."?

RedWizzard said:
8. Duplicate data: TGC definitely shouldn't throw any data away. If a duplicate system is submitted the distances should be merged and the coordinates rechecked. It might be worth keeping a count of how many times each distance has been reported too. If Sol - Ross 128 has been reported 3 times identically then we have a higher degree of confidence about that distance.

Agree.

RedWizzard said:
9. Bad data: any bad data should be logged and reported somehow. I've seen too many cases where data that looked bad actually turned out to be correct to just throw stuff away.

RedWizzard said:
Additional considerations:

I think that additional data such as distance to stations, allegiance, economy, etc should be stage 2. We'd want that in place by release (and ideally by gamma) though.

Agree.

RedWizzard said:
Do we want to capture unknown system names without distances or coordinates (which could be then used to prompt volunteers to get distance data)? I was thinking of a logbook tool which kept a record of where a commander flew. It could easily submit system names with no additional effort from the user. But is it the value of that data worth bothering with?

The file netlog.log could be used for that. Normal volunteers could passively support TGC. They don't have to calculate distances but by just flying around they would give TGC (in the end) a complete list of what star systems are there. Useful for when the universe really opens up. But if the universe really opens up it will impossible for TGC to be up to date. Hoping for some kind of web-api or other support from FD for that.

RedWizzard said:
TGC could provide some support for Snuble's plan to partition up the space for searching. E.g. tracking who is searching which box, generating lists of known stars in each box, etc.

Nice touch.

Commander_Sonak · Oct 27, 2014

Than you

I used your program to find my way back because a 3D map is too hard to follow for someone in my advanced age.

RedWizzard · Oct 27, 2014

wolverine2710 said:
Not sure about the static lists of reference systems. Is the current list valid for SB3? Or are perhaps new reference points needed for SB3?

I expect that the SB2 reference systems (by which I mean all 300+ that Michael gave us the coordinates for) will be able to locate all the SB3 systems. Those references have been able to accurately locate several stars well outside the Pill (e.g. Rigel, Enif, Mirphak, Polaris). In any case we're expecting a list of reference systems from Michael for SB3, aren't we?

What I was meaning about reference system lists in terms of outputs though was that TGC doesn't really need to provide those systems as they are well known and static - any tools should have them builtin (or provided from the tool's website or whatever). If I were running it I wouldn't want those 300+ stars being downloaded from TGC every time someone opens their tool. But thinking further on it, I think TGC should provide them for completeness.

wolverine2710 said:
What do you mean with "unattached distance data if we capture it"?

Distance data where someone has not entered enough data to locate a system. I don't think we should bother with this initially (and maybe not ever).

wolverine2710 said:
If uploaded coordinates are optional the input and output format can't be the same.

What I mean is that the input and output formats should be identical in structure and any common fields are identical in naming and data format.

wolverine2710 said:
All data sets should be available as a simple download using a fixed url.

Not sure what you mean with "data requests should be URL based (either query or hierarchy."?

Basically I meant what you said: the output should be available via URL (not requiring a POST or other method). There are two ways to a set up a URL for this. The traditional way has been to a query fragment, e.g.:
http://tgc.org/stars?type=unverified <- get all unverified stars
http://tgc.org/stars?type=verified&since=2014-10-30 12:00:00&distance=20&origin=Sol <- get all verified stars entered since 2014-10-30 12:00:00 within 20 Ly of Sol
It's also possible to make the query parameters part of the URL hierarchy, e.g.:
http://tgc.org/stars/unverified <- get all unverified stars
http://tgc.org/stars/verified/since/2014-10-30 12:00:00/location/Sol/20 <- get all verified stars entered since 2014-10-30 12:00:00 within 20 Ly of Sol
It's really an implementation detail. My main point is that the outputs should be available via a simple GET as this is more accessible and allows caching.

TornSoul · Oct 27, 2014

Snuble said:
Quick and dirty as promised, without any attempt at formating, http://travel-eliteadvisor.rhcloud.com/index.php

One request:

Please align the cubes on 0,0,0 (Sol) - So that Sol is at a corner rather than in the center of the 0,0,0 cube.

It gets really annoying having to do the *20-10 conversion (as opposed to just *20) all the time to figure out what block/coordinates belong together.
You also have to remember; do I add 10 or subtract 10 to make it fit etc.

It's very minor, but it's confusing - And it just rubs me the wrong way.

TornSoul · Oct 27, 2014

KevinMillican said:
My viewpoint is more user-orientated; not everyone who wants to access (and process) data will necessarily have the tools to use this format

I can't envision anyone not being a tool maker of some variety accessing this data in the first place. Why on earth would you?
As that's the case, the tool makers will be programmers, who have the tools and understanding necessary to consume JSON.

The tools are the ones that need to be user-oriented, not the raw data.

The raw data needs to be accessible in as simple and efficient a format as possible.

That said, I have no objections to the TGC *outputting* data in several formats.

Input however needs to be one single predefined format.

Snuble · Oct 27, 2014

I wouldnt mind several input formats either. As long as what goes where is well defined.

As for Sol in center or Sol in corner, no big deal. I'm currently entering economy+population for the known systems and will most likely change this when I do. Once SOL is visitable however, SOL will be present in all 4 boxes surrounding it.
The important bit now is to go about the data gathering a bit more systematic as a group. Beta 2 was most about getting the tools. Now we have the tools. I would be surprised if we where not hunting the last systems by sunday.

I do hope someone with a bit more time on their hands will make the grid their own. What they should think about is stuff like printing a slice on paper (6x8 fits on A4...). And maybe when the grid grows try to keep it printable by implementing another grid layer on top of the 20^3.

Does the Elite lore say anything about how far human settlements have spread?

TornSoul · Oct 27, 2014

RedWizzard said:
Basically I meant what you said: the output should be available via URL (not requiring a POST or other method). There are two ways to a set up a URL for this. The traditional way has been to a query fragment, e.g.:
http://tgc.org/stars?type=unverified <- get all unverified stars
http://tgc.org/stars?type=verified&since=2014-10-30 12:00:00&distance=20&origin=Sol <- get all verified stars entered since 2014-10-30 12:00:00 within 20 Ly of Sol
It's also possible to make the query parameters part of the URL hierarchy, e.g.:
http://tgc.org/stars/unverified <- get all unverified stars
http://tgc.org/stars/verified/since/2014-10-30 12:00:00/location/Sol/20 <- get all verified stars entered since 2014-10-30 12:00:00 within 20 Ly of Sol
It's really an implementation detail. My main point is that the outputs should be available via a simple GET as this is more accessible and allows caching.

Caching:
First, there are two "kinds" of caching.
There is client side cashing (what your browser does), and then there is server side caching (A server can cache some data internally to prevent pulling it from the DB repeatedly)

The first saves bandwidth (and load times), the second saves processing time on the server (and can thus output data faster)

Caching makes sense if the data "never" changes (like the stylesheets and JavaScript files your browser downloads visiting a given page).

We are however making an app whose purpose it is to continually update (add to) it's data.
So I question the caching argument.

If I do a http://tgc.com/systems - I expect to get data including the newest available, and not some cached version - that might be missing the last 10 entered systems.

Note here again the difference in caching.
The server is free to cache anything it likes (saving DB hits).
But the browser will (by design) not even bother asking the server, if the cache time has not expired (it will simply use a locally stored copy)

So I'm not totally happy about the idea of using GET, using caching as an argument.

Secondly, and this is a personal preference, I prefer POST (over GET) as it allows for structured query parameters.
Granted, currently the need for that really isn't there. But I prefer being prepared.
It would really suck if we suddenly had the need, and had to tell everyone to change their tools to use POST instead of GET.
Better everyone is using POST from the get go.

The downside with POST is of course that you cant just type in an URL and see the result in the browser.

But honestly who would want to see that output in a browser anyhow?

The purpose of the TGC is not to be and end-point for endusers - That's the job of the tools.

The job of the TGC is to be a depository of "canonical" information - That the tools can then use for whatever purposes they need, and present it in a way that makes sense in the context of the tool.
Which I suspect in most cases won't involve showing system coordinates (the main canonical data of the TGC) anyhow.

So the argument for using GET, of being able to just type it in a browser and thus see the data in the browser is moot imo as well.

There is even a (weak) argument for not allowing people to simply enter an URL and see the result in the browser.
It's too damn easy, and many will try it "just to see", which will mean more bandwidth used, and more work for the server.
I don't expect this would be much of a burden, percentage wise, but for completeness it bears mentioning.

--

"make the query parameters part of the URL hierarchy,"
This is usually referred to as a RESTful API.

I'll admit this is all the rave these days and considered "best practice" in many circles.

I'm not a super great fan though (again, personal preference) - As it requires to keep track of a great many URL endpoints, and (if you are sane) forces you to use a specific way of programming (MVC), as it naturally lends itself towards this type of URL creating/consumption (and makes it more natural to use/consume all these different URLs)

Basically I find it a bit "confusing" if you will. In it's simplicity (which is what it is all about) it makes other things more complicated.
I find it much simpler to keep track of a few request parameters - and using the same URL.
And it doesn't "force" me to program in a certain way. (and no it doesn't absolutely force me - But it makes my life more complicated if I don't "conform")

---

To summarize:
- I prefer POST over GET (I think there are valid arguments for this)
- I prefer not going with a REST API (This is purely a personal preference)

- - - - - Additional Content Posted / Auto Merge - - - - -

Snuble said:
As for Sol in center or Sol in corner, no big deal.

Cool!

Snuble said:
SOL will be present in all 4 boxes surrounding it.

Why would you do that?
It should only be present in one cube - Having it in more than one is in-consistent.

Basically cube 0,0,0 - contains the coordinates 0 <= x,y,z < 20 ly (note the "<=" vs "<") thus insuring no system is present in more than one cube.

------

EDIT: OK that auto-merge crap is getting really annoying...

graham.reeds · Oct 27, 2014

As long as I can say 'all new beyond this date' and get updated coordinates + stations I will be happy.

TornSoul · Oct 27, 2014

wolverine2710 said:
Data in TGC
Lets in the first incarnation store ONLY coordinates and distances for star systems in it.

Agreed

wolverine2710 said:
Important rule: Consider uploaded coordinates and distances incorrect UNTIL proven to be correct.Coordinates and distances should be flagged 'unverified' and double checked, after confirming they are OK they get the flag "verified".

Agreed

wolverine2710 said:
Checking coordinates.
1) Biteketkergetek has a verifier to verify if coordinates are 1/32 LY grid compliant.<snip>
2) A commander or normal volunteer can verify coordinates by using the Galaxymap<snip>
3) TGC should have a Trilateration inside. This can calculate the coords and check with the uploaded ones.

1) I think all tools does - But see #3

3) (yes 3 not 2) TGC should run all the checks we can possible think up.

2) IMPORTANT: Needs it's own discussion maybe?.
I'm very opposed to a TGC that needs "admin input" to function.
This will NOT work in the long run (3-4-5 years from now).
I speak from experience of several similar projects over the last 10+ years.
TGC needs to run completely autonomous - Not requiring any kind of intervention to do it's job (baring fixing of bugs and additional functionality as we deem necessary)
Tools can come and go (as they should) and they can require whatever intervention the author seems fit - But the TGC needs to be able to "just work" without intervention.
And this is very doable - As it's job is so restricted as it is.

wolverine2710 said:
Checking distances

Already covered in Important rule: Consider uploaded coordinates and distances incorrect UNTIL proven to be correct.
Verification - See point 2 above.

wolverine2710 said:
Input format TGC
I suggest using a JSON structure - without a scheme. Concerning one or multiple JSON formats for that. As suggested by I think TornSouls it best to have ONE input format BUT I don't want to force that and alienate commanders who have a tool now.

Agreed on format.

But we DO need to force tool makers.

We are in the process of pinning down the requirements for TGC, and every tool maker have the opportunity to provide input.

We have to figure out something that will work, not something that will "work for everyone".

And changing the input/output of your tool is extremely minor - That's not where the bulk of the work in making a tool is.
It's not worth considering in this context at all it's so minor.

wolverine2710 said:
Iirc atm there are 4 tools for a normal volunteer to enter distances: Spreadsheet of TunaMage, website of Harbinger, website of TornSouls and the pages (tool) which run locally in a web browser by RedWizzard. The last three generate JSON. If Harbinger and TornSouls can change their output to the reference format of RedWizzard that would be great. In that case his format would need to be enhanced with entries like: Author, name_of_tool_ version_number_of_tool, url_of_tool etc. TGC in that case can run on ONE port.

Not sure I follow how this is related to input to TGC?

wolverine2710 said:
If the authors feel it would removed certain functionality from their tools and limit them we can have TGC be running on multiple ports. One for each tool.

Again - No

The tool makers will have to change.
And it's so minor it's not worth making a fuzz about.

wolverine2710 said:
Internal format
In the first incarnation of TGC this could be the input reference format. For things like flagging unverified and verified coordinates and distances it has to be extended. Better would be to store it in an open source database like MySql (often used), Postgress or even sqllite. This would mean a convertor to put the JSON in the database. If the database can be set up in such a way that it is portable between (open source) databases that would be a plus.

Internal format is completely up to whoever ends up making an accepted version.
Remember, that data can be pulled via the web API at any time, and thus be ported to something else if needed, so no point in restricting an author to what he can use or not.
Let him use whatever he is familiar with - It doesn't matter.
The only thing we need to proscribe is what data, as a minimum, is in the storage. That's all.

wolverine2710 said:
Output format TGC
The reference format of RedWizzard aka TOR - The One Reference. This can be the same as the input format IF coordinates are uploaded also.
TD CSV format compatible with Trade Dangerous (TD).
Simple CSV. Compatible with what wtbw used. Link here.
XML. Optional. Of course a schema for this has to be constructed. Some commanders prefer XML and we should not exclude them.

I have some minor quibbles with RW's format.
For one, every time a system name is present, it should be accompanied by it's coordinates (if known, not possible for p0 of course).
Reason being it saves a bunch of look-ups for coordinates all the time.
There's a few other things - But lets leave that for another post dedicated to just that.

My point being - RW's current format is not necessarily optimal (and yes changing it will require tool makers to adjust - Myself included. And again, it's so minor it's not worth making a fuzz about

)

Output format - I'm cool with TGC providing both JSON and XML.
If CSV, then it will have to be restricted/remodeled in some way. It simply doesn't lend itself to the structured data that we have. I would not consider this a priority at first.

Get TGC up and running with JSON at first.
Then when time permits, we can add the rest.

wolverine2710 said:
How to proceed - hosting wise.
Determine where TGC can be hosted - as in which commander.
Determine which languages and databases can be used at that site.
Determine who is able/willing to write software for TGC.
That can of course be multiple commanders. If they have access to the site it would be easier.

I can host - np.
If I host - It'll be the .NET/SQL stack.
I'm interested in writing the TGC. If not me, and I'm hosting, I'll require access to the source, and compile it myself (don't want any surprises in there...)

- - - - - Additional Content Posted / Auto Merge - - - - -
EDIT - Grrrr this auto merge crap again... Anyone know if it's possible to disable it - or who to pester to get it changed.

graham.reeds said:
As long as I can say 'all new beyond this date' and get updated coordinates + stations I will be happy.

Station data is not considered for the TGC (at least initially).

You'll have to go to one of the tools out there for that (which in turn will rely on the TGC).

But apart from that - Agreed a 'all new beyond this date' feature should be part of the TGC.

If you can live without the station data - You can already to that with the EDSC api

graham.reeds · Oct 27, 2014

I need system + station for my market tool. I hope that there have been substantial changes to the commodities screen to make it easier to OCR and work out where it is.

Snuble · Oct 27, 2014

Re why SOL in 4 boxes, simply because ot will not be issue for other systems. I would rather have SOL in 4 boxes then miss a system somewhere due to rounding or simply a forgotten =. Too much attention to detail distracts from the goal. Get as much data gathered as quickly as possible.

When game is up wednesday (no way they will get it running on patch day), the order of business should be to get a ship with 20LY jump range. Open galaxy map, and start working the grid boxes finding new systems.
I would apreciate if the first thing done was to get the extremities of the new "pill/sphere/spiral" or whatever form the new sandbox have. That way a new grid could be calculated and be ready for population.
I'm also thinking about if the first scan should be done without leaving Eranin at all. I think I'll use half an hour and look into screenshot/html5 to get an aproximate value for x/y to populate the grid.
It will be more tedious then actually visiting the systems in a ship, but at the same time, will make good data for actually exploring the systems.

We should gather System name, Allegiance, Economy, Government and Population.
In each system I would suggest Stations, Resource extraction (if they are still points of interest that is) and distance to main star. Until we get in system microjumps, this is important for time calculations.
For each system, services they provide, what ships can dock, what ships they sell, blackmarket, modules on sale, and of course the commodity details.

- - - - - Additional Content Posted / Auto Merge - - - - -

TornSoul said:
Agreed
2) IMPORTANT: Needs it's own discussion maybe?.
I'm very opposed to a TGC that needs "admin input" to function.
This will NOT work in the long run (3-4-5 years from now).
I speak from experience of several similar projects over the last 10+ years.
TGC needs to run completely autonomous - Not requiring any kind of intervention to do it's job (baring fixing of bugs and additional functionality as we deem necessary)
Tools can come and go (as they should) and they can require whatever intervention the author seems fit - But the TGC needs to be able to "just work" without intervention.
And this is very doable - As it's job is so restricted as it is.

Maybe the "correct" way is for everyone that want to be a TGC. Participate in the project with a public tool, you should not just import data, but also share the same data the same way you imported it. This way when someone get tired of the game and that database disappears, its just part of a network so the data live on and continue to be shared.

Snuble · Oct 27, 2014

graham.reeds said:
I need system + station for my market tool. I hope that there have been substantial changes to the commodities screen to make it easier to OCR and work out where it is.

http://elitetradingtool.co.uk/ got the right idea. But I would go one step further. To be able to query your tool, a set of systems with old data have to be updated first. And of course give a penalty for providing dirty data (some clever dude(tte) will figure out that to get search priviliges they just have to go to the update page, click some and not actually visit the system, so a penalty is needed).

RedWizzard · Oct 27, 2014

TornSoul said:
Caching:
First, there are two "kinds" of caching.
There is client side cashing (what your browser does), and then there is server side caching (A server can cache some data internally to prevent pulling it from the DB repeatedly)

The first saves bandwidth (and load times), the second saves processing time on the server (and can thus output data faster)

Caching makes sense if the data "never" changes (like the stylesheets and JavaScript files your browser downloads visiting a given page).

We are however making an app whose purpose it is to continually update (add to) it's data.
So I question the caching argument.

If I do a http://tgc.com/systems - I expect to get data including the newest available, and not some cached version - that might be missing the last 10 entered systems.

Note here again the difference in caching.
The server is free to cache anything it likes (saving DB hits).
But the browser will (by design) not even bother asking the server, if the cache time has not expired (it will simply use a locally stored copy)

Even stylesheets and JavaScript files change. If you couldn't use caching with resources that change you couldn't use it at all, ever. This case is actually a pretty good one for caching because the "verified" list is unlikely to change frequently and the reference list won't change at all. And there is no reason why caching should prevent you getting the newest available list. Caching can trivially be setup so that the browser asks the server if the resource is out of date so the server doesn't need to resend it unless it has been updated. This can even be made to work with POST though it is not so easy.

TornSoul said:
Secondly, and this is a personal preference, I prefer POST (over GET) as it allows for structured query parameters.
Granted, currently the need for that really isn't there. But I prefer being prepared.
It would really suck if we suddenly had the need, and had to tell everyone to change their tools to use POST instead of GET.
Better everyone is using POST from the get go.

The downside with POST is of course that you cant just type in an URL and see the result in the browser.

I can't really accept the "maybe we'll need structured queries" argument unless you come up with a plausible example, and one that can't be done with a query string in the URL. And even if there is one, it's probably still worth using the easier GET requests for the 99.9% of cases that can.

According to the HTTP standard POST is simply the wrong method to use in this case. "The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URL in the Request-Line." This isn't some newfangled RESTful paradigm, it's fundamental to the design of HTTP. You can, of course, subvert the methods in any way you like, but standards are made for a reason.

Note that POST has made your API more difficult to implement and use: there was the CORS issue (solved, but still), and on my end I have to do a custom $.ajax call rather than a trivial $.getJSON. These sorts of things are not deal-breakers but they are indications that you're doing something atypical and they should be a sign that maybe a rethink is in order.

TornSoul said:
There is even a (weak) argument for not allowing people to simply enter an URL and see the result in the browser.
It's too damn easy, and many will try it "just to see", which will mean more bandwidth used, and more work for the server.
I don't expect this would be much of a burden, percentage wise, but for completeness it bears mentioning.

Sorry, you don't get to make any argument (no matter how weak) about bandwidth if you're opposed to caching

. And since you bought bandwidth up, are you aware of how large the data already is? It's about 300k for your JSON format, about 1k per star. Mine's only slightly less at 230k. Do you really want people grabbing 2MB every time they open a tool to plan a jump sequence across the SB3 bubble even though 99% of the data is unchanged since they last reloaded 5 minutes ago? How's it going to work when there are 20,000 stars or 20 million?

TornSoul said:
"make the query parameters part of the URL hierarchy,"
This is usually referred to as a RESTful API.

I'll admit this is all the rave these days and considered "best practice" in many circles.

Actually it's not really anything to do with REST. RESTful systems typically prefer this style over query strings, but it's not unique to RESTful systems. E.g. Wikipedia isn't RESTful. Query strings are more of an accident of the development of webservers as most early webservers used the filesystem to generate the website's URL hierarchy and CGI being scripts and executables in that filesystem. If you read the HTTP standard it's apparent that the designers were thinking URLs would be used for lower level objects (e.g. a message in a forum) rather than higher level URLs with query strings.

Personally I would probably go with a mixture: URLs for each type of output/list and then optional query strings for an filtering.

RedWizzard · Oct 27, 2014

TornSoul said:
2) IMPORTANT: Needs it's own discussion maybe?.
I'm very opposed to a TGC that needs "admin input" to function.
This will NOT work in the long run (3-4-5 years from now).
I speak from experience of several similar projects over the last 10+ years.
TGC needs to run completely autonomous - Not requiring any kind of intervention to do it's job (baring fixing of bugs and additional functionality as we deem necessary)
Tools can come and go (as they should) and they can require whatever intervention the author seems fit - But the TGC needs to be able to "just work" without intervention.
And this is very doable - As it's job is so restricted as it is.

I agree with this but I don't think it needs to be autonomous from day 1. I'd rather have human verification of data until we're comfortable that the system is not going to get polluted with bad data.

TornSoul said:
I have some minor quibbles with RW's format.
For one, every time a system name is present, it should be accompanied by it's coordinates (if known, not possible for p0 of course).
Reason being it saves a bunch of look-ups for coordinates all the time.

I have to say your cavalier attitude to bandwidth seems at odds with your forward-looking view on system autonomy above

. Including redundant coordinates is a large part of why your distance data is 300k in JSON for just 270 odd stars while mine is about 190k for the same stars.

It may be that the vast majority of tools only bother with coordinates and don't download the distance data. But if distance data is commonly requested then we do need to consider bandwidth as TGC is likely to end up with tens of thousands of stars at least.

Another aspect to this is that any redundant information has the potential to be out of sync. If coordinates are included with the distances then the client either has to either ignore them (defeating the purpose), take it on faith that they are always consistent, or check them and handle inconsistencies somehow. I'd rather see a single canonical value in the output.

Actually this brings up a question: are we expecting tool makers using this data to synchronise with TGC and populate users' instances of the tool with data from their copy (effectively acting as a cache for the data), or are we expecting the users' instances to grab data directly? The former would be a lot friendly to TGC in terms of bandwidth but the later is easier from the point of view of the tool makers.

TornSoul said:
I can host - np.
If I host - It'll be the .NET/SQL stack.
I'm interested in writing the TGC. If not me, and I'm hosting, I'll require access to the source, and compile it myself (don't want any surprises in there...)

If you're interested in doing it, I say go for it.

I'm tempted to put something together myself but I'm not sure I'll have the time in the next few weeks or the inclination to support it long term. I would use node.js backed by CouchDB (probably, perhaps Redis). I might do it just as an exercise.

Snuble · Oct 27, 2014

I hope the plan is to have one export with coords only. I dont want to keep downloading a lot of distances used mostly for debugging, when what I need 2 weeks from now is a set of coordinates (lets say someone start a new system, and need a fresh set of data, a megabyte of distance info is not what you want).

RedWizzard · Oct 27, 2014

Snuble said:
I hope the plan is to have one export with coords only. I dont want to keep downloading a lot of distances used mostly for debugging, when what I need 2 weeks from now is a set of coordinates (lets say someone start a new system, and need a fresh set of data, a megabyte of distance info is not what you want).

The plan is to offer both (at least that's what I expect).

Harbinger · Oct 27, 2014

Personally my plan is to get out there and start mapping once we have access to beta 3. I'll most likely change my reference stars to give them a nice spread within the new playable area but not until we have further confirmed (not calculated) systems hopefully courtesy of Michael Brookes.

I'm going to leave you guys to discuss the logistics of the site. Until you actually have a site in operation I highly suspect it's going to be down to the few that contributed previously to supply further coordinates/distances.

In the interim I'll update my JSON output to match RedWizzard's layout. It only included extra information so that I could view issues at a glance but distance error checking has made most of the output redundant anyway.

Discussion What is the most efficient way to crowdsource the 3D system coordinates

wolverine2710

Tutorial & Guide Writer

wolverine2710

Tutorial & Guide Writer

wolverine2710

Tutorial & Guide Writer

wolverine2710

Tutorial & Guide Writer

Commander_Sonak

RedWizzard

TornSoul

TornSoul

Snuble

TornSoul

graham.reeds

TornSoul

graham.reeds

Snuble

Snuble

RedWizzard

RedWizzard

Snuble

RedWizzard

Harbinger