Superb work. Thanks very much for your effort and using the EDDN data ;-) I've been so frank to put eddb in the top of the OP of my 3rd party tools thread (also here in this sub forum) in the announcements section. I've also put it in the TODO section, will be moved to the correct section later.....
In the first place a clean dataset of systems was needed to have no mess in this. The 20k systems from EDSC were more than a good source. And I was wrong in stating that it's being synced. I changed that.
Generally I have not much use of a station without commodity prices yet. I hope this will change and we can fill the database with more and more data, not only trading data. So EDDB could become a wiki kinda thing aswell.
As I mentioned already, my plans are to make all the proper data available through a REST API. It's planned to be read only in the first place but this might change if reliable sources are willing to help. I'm vey open for others to contribute.
The site is written in PHP with the Yii2 framework.
@themroc. Posting here instead of your thread because its only related to EDDN. Have you considered creating a 0MQ publisher and sending the validated/corrected data to it? EDDN subscribers could then optionally connect to your validated EDDN firehose publisher instead of the original EDDN publisher (data stream). It might sound strange me suggesting this (making EDDN partially obsolete) but I think it fits the mission of EDDN. Keeping it simple, provide a data stream which others can act upon (enhance).
Another possibility would be that eddb reads the EDDN firehose and sends validated/corrected data back to EDDN (perhaps it doesn't have to be a POST) which then takes care of the distribution (we have free hosting - 20 Mb line - offered by a generous company which is making it all possible) on a different ZeroMQ publisher. Other validators services could do the same but data would be combined into one "validated publisher" firehose. Yes it would waste bandwidth and make things more complex (less K.I.S.S principle). Commanders and of course implementor James would this be viable or should I stop smoking "the good stuff" (Dutchie here). I have to admit I can't oversee the consequences of it all so lets discuss it -technical aspect and viability ;-)
@themroc. Posting here instead of your thread because its only related to EDDN. Have you considered creating a 0MQ publisher and sending the validated/corrected data to it? EDDN subscribers could then optionally connect to your validated EDDN firehose publisher instead of the original EDDN publisher (data stream). It might sound strange me suggesting this (making EDDN partially obsolete) but I think it fits the mission of EDDN. Keeping it simple, provide a data stream which others can act upon (enhance).
Another possibility would be that eddb reads the EDDN firehose and sends validated/corrected data back to EDDN (perhaps it doesn't have to be a POST) which then takes care of the distribution (we have free hosting - 20 Mb line - offered by a generous company which is making it all possible) on a different ZeroMQ publisher. Other validators services could do the same but data would be combined into one "validated publisher" firehose. Yes it would waste bandwidth and make things more complex (less K.I.S.S principle). Commanders and of course implementor James would this be viable or should I stop smoking "the good stuff" (Dutchie here). I have to admit I can't oversee the consequences of it all so lets discuss it -technical aspect and viability ;-)
I'm not using IRC but I've Skype - for chat. Would that be OK for you as well. If so please send me a PM and I will send you me Skype address. If not do you have a suggestion for another form of communication. But its getting late here and I've to get up early so I guess it will be tomorrow. I'm in GMT+1 btw. Looking forward to discuss this further...
Commanders. Lets hear what you have to say about my post. I'm equally interested in all your suggestions/remarks ;-)
I had a discussion late last evening with themroc - author of eddb. It was about validating data and to check my suggestion here in this thread with him: Him receiving (being subscribed to the EDDN firehose) DIRTY EDDN data (typos in name, false prices) and sending back CLEAN data (validated/corrected) to EDDN, which then publishes it to the firehose so that subscribers (clients which use the data) can use it.
I did some RTFM wrt ZeroMQ today and its becoming a bit clearer now. Still a laymen wrt ZeroMQ but a laymen who can read ;-)
To keep things simple. Eddb subscribes to EDDN and receives dirty data. That data is being cleaned and send back to EDDN using a normal POST action. He said bandwidth (bw) is not a problem for him, he has more bw then EDDN has ;-) EDDN is enhanced with an extra publisher (firehose). It now has a dirty publisher and a clean publisher. This is better then setting up a new EDDN just for the eddb data.
Lets first discuss how EDDN now works. Caveat emperor: This is how I've interpreted EMDR, the EDDN code and the ZeroMQ docs. As I'm human I can make mistakes... There is one publisher and it sends all data to all connected clients. EDDN is using JSON schemes and its up to the client to only process (filter out) the correct $schemaRef messages. Atm there are two: 'regular' and 'test'. Note: The client DOES receive ALL messages send ($schemaRef).
What happens when eddb submits clean data to EDDN? EDDN receives clean data from the eddb and dirty data from anywhere else (atm most likely from OCR solutions like EliteOCR and regulatedNoise clients. The publisher sends out both data - dirty and clean. Wasting bw in the process. Two things are needed:
1) We need an additional publisher for the clean data.
2) EDDN must determine where the data is coming from. The existing uploaderID field could be used for that. Ofc everybody can set that uploaderID field and poison the well. Really trust me, my data is clean, while in fact its not. I believe the ip-address of a sender is hashed in EDDN. A check on this hashed ip-address in combination with a check on the contents of the uploaderID field can be used to only accept clean data from the eddb website and send those messages to the clean data publisher. Ofc the hashed ip-address don't have to be hard coded into EDDN but could come from a config file - contents NOT on github.
3) In zeroMQ a subscriber can be connected to multiple publishers at the same time or to one or more specific publisher -and process the received data. By connecting to the clean publisher instead of the dirty one a client ONLY receives clean data. It does NOT receive data from the dirty publisher. Not wasting bw. The client could also decide to connect to both. Or only to the dirty one in case the clean publisher doesn't send out data any longer - ie the eddb is dead.
The same set up can be used when multiple validator services would surface. They can all send data to EDDN and the author of a client can decide which publisher (s)he deems most reliable. A "super" validator could also surface and connect to all validated publishers and send REAL and utterly clean data back to EDDN. You get the picture. A normal validator service like eddb would ofc only subscribe to the dirty publisher - to prevent loops....
Atm there are multiple commanders already using the EDDN. Of course we don't want to break their clients due to this change in EDDN. Afaik each zeroMQ publisher needs a separate port. If clients are not changed they connect to the default publisher port - the dirty one. When an author changes their client it could connect to a clean data publisher.
I've had a small peek at the code. It seems that the changes needed to make this work are relative small. Laymen speaking here.... I will contact implementer James to see what he thinks about it. In the mean time, what do you commanders think about this suggestion. Perhaps some zeroMQ buff are around here - who can give their opinion.
I'm expecting that the JSON structure will change in the future but for now (clean data requirement) I don't think changes are needed - so clients don't break....
I think that once we have a clean data stream, commanders of Trading tools (BPC,TD,Thrudd etc) are far more likely to embrace the EDDN. In the end we all PROFIT!!
Edit 1: Did some more zeroMQ reading. Another possibility to receive clean data from validator services would be to have a zeroMQ SUB (subscriber) inside EDDN which receives data from a validator (uses PUB to send to send data to it). The linux fw or iptables could be set up in such a way to ONLY accept data from the ip-address of a validator service. Each SUB (inside EDDN) for a specific validator send its data to a clean data publisher which belongs to it. I think no hard coded stuff is needed but again can be read from a config file.
Edit 2: A SUB for every trusted trading tool might be a good idea. As in a trading tool which has their own website (hence ip-address) which rules out Trade Dangerous. It makes poisoning the well a bit harder. Data received from a trusted source could be send to either its own PUB or to the dirty publisher. In the latter case pranksters could ofc still send bogus data and pretend they are a trusted source.
Edit 3: Expecting bad, false, tampered data is also a keystone of EMDR. Its up to clients to handle it. Or in the case of validators to clean up the mess and provide good data to EDDN - which then sends it to clients. EMDR and as is the case with EDDN is really just a relay of data. I know, in hinsight EDDR (Elite Dangerous Data Relay) instead of EDDN (Elite Dangerous Data Network) would perhaps been a more suiting/fitting name ;-(
In the future I'm hoping for a great and Blinking EDDN X-mas tree. What do I mean: Have a look at Eve Market Data Relay (EMDR) - Relay activity map. The solar systems light up as market data arrives. I believe EVE has over 5000 solar systems. Its from this page. EMDR Monitor also look nice....
If we ever would have that much data coming into EDDN that would be dream come true ;-) It would also mean lots and lots of useful data (less obsolete/old ones) to feed trading tools with. profit, profit AND PROFIT. Perhaps someone is already building such a tool ;-)
In the future I'm hoping for a great and Blinking EDDN X-mas tree. What do I mean: Have a look at Eve Market Data Relay (EMDR) - Relay activity map. The solar systems light up as market data arrives. I believe EVE has over 5000 solar systems. Its from this page. EMDR Monitor also look nice....
If we ever would have that much data coming into EDDN that would be dream come true ;-) It would also mean lots and lots of useful data (less obsolete/old ones) to feed trading tools with. profit, profit AND PROFIT. Perhaps someone is already building such a tool ;-)
I guess for this to happen, an OCR tool had to work mostly independent from manual verification. I believe that not many people are willing to put an effort into manual steps involved. Just my 2 cent...
Perhaps we should focus on convincing FD to implement a preliminary export function. No fixed format, no matured API, just to drop a .json or .csv into the log folder on opening the Commodity Market. (Or perhaps on entering the system.)
The expenditure to implement such an import & export process were a fracture of those you guys invest now. By experience, contributing data will never become a mass phenomenon anyway until it's completely or almost completely automatic.
A suggestion thread with acclamation from every1 frequenting all those trading tool threads would be hard to ignore. For the gods' sakes, it's just a simple text file drop. That's a freshman student's job for an afternoon.
I wonder about the merits of having multiple validators performing a conceptually-similar function... but that's not a fault of the design at all - I can see it's necessary to support such a thing in the design.
Also, it would be good - and in keeping with the community spirit - if the rules applied by a particular validator were to be published, so that users and devs alike know *exactly* what data transformations to expect from a validator (and therefore, for instance, why *their* data isn't showing up on x or y validator).
I don't know where to report it but, Minarii system changed location since the last patch. It's no longer near Dhathaarib or Codorain... (but a bit away from there...)
Perhaps we should focus on convincing FD to implement a preliminary export function. No fixed format, no matured API, just to drop a .json or .csv into the log folder on opening the Commodity Market. (Or perhaps on entering the system.)
The expenditure to implement such an import & export process were a fracture of those you guys invest now. By experience, contributing data will never become a mass phenomenon anyway until it's completely or almost completely automatic.
A suggestion thread with acclamation from every1 frequenting all those trading tool threads would be hard to ignore. For the gods' sakes, it's just a simple text file drop. That's a freshman student's job for an afternoon.
I know of at least three threads (one by myself) where such a "dump data in a file" approach (no cpu/bandwidth burden at the FD servers) has been discussed. Iirc I've also contacted Michael Brookes about it. So far no real action/response aside from "FD is planning a web-api after release". Now that they are finalizing the plans for the next releases/updates to ED I'm hoping its something planned for Q1/Q2. Not holding my breath though. It would life SO much easier. I can get at the data using the web-api (I've the info of how to do it) but I'm not allowed to. OCR-ing, validators, coordinates, current system. Its taking so much effort to get the data out of Ed - in a legal way. I find the situation suboptimal but that's probably just me.....
Perhaps if you are up to it, you could create such a request thread. I'm more then happy to post there ;-)
I don't know where to report it but, Minarii system changed location since the last patch. It's no longer near Dhathaarib or Codorain... (but a bit away from there...)
I wonder about the merits of having multiple validators performing a conceptually-similar function... but that's not a fault of the design at all - I can see it's necessary to support such a thing in the design.
Also, it would be good - and in keeping with the community spirit - if the rules applied by a particular validator were to be published, so that users and devs alike know *exactly* what data transformations to expect from a validator (and therefore, for instance, why *their* data isn't showing up on x or y validator).
One could indeed argue about the merits of multiple validator services but imho I think its a bit like trading tools and route planners. There are quite a few of them. Some flourish and some perish. Some are of interest for some and others are not. Every tool has its niche and its audience I think. I really believe in software darwinism. The best tool(s) survive. Also I think its best not to bet on one horse. We've seen this with TGC created by TornSoul. His last post is from the 16th of December. Since then basically no one seems to know exactly what happened to him. Perhaps a long holiday, perhaps RL issues, perhaps burned out. If he for what ever reason does not come back or can't continue his excellent work we (Ed community) have a problem. His code isn't open source to start with and he's using C# which is not cross platform. Therefore I think its best to try to have a design which can cope with this kind of events - software which goes the way of the dinosaurs.
Wrt validators. Some will be open source, some will not. Validators can be a separate tool or can be part of an existing tool like the BPC. They can push data into EDDN and if it turns out the data can be trusted they can get a trusted data source flag assinged to it. Afaik Slopey has/had quite some work done when he has scraping the ED heap memory and those results were sometimes/often scrambled (like with OCR results). I'm hoping that validator services will be open source and written using tools which can run on multiple platform - choice of the implementer ofc. But I DO agree that it would be best to have an overview of the exact rules a validator uses so it can be checked/validated.
Validator services can also be used in the reversed way. They could deliver data which could be used by for example OCR tools to improve their OCR-ed output. I've described it in post #13 in the eddb thread. Copy/paste of it is in the spoiler tag.
@themroc. I think your API could also be used by for example OCR programs to enhance their output. Lets look at prices. System and station names can be corrected in an OCR client using Approximate string matching (could be Levenshtein distance only). Just like you do now. The OCR-ed name for system/station is checked against a dictionary of known and good spelled systems/stations. When your API provides all systems/stations they can use that.
Numbers are far more difficult because there is no dictionary to check against. So you have probably implemented logic to determine what a good prices range for a certain commodity at a certain station/platform is. If you are API can provide a list of correct prices ranges this could be used by OCR programs and of course any other programs. That way they don't have to collect data themselves and determine prices ranges and can use the knowledge you have gathered by validating prices.
You probably don't want to let hammer OCR clients your API to check if a certain name, price is correct for each and every name/price they have OCR-ed. Hence providing them with a names dictionary and or prices range so they can do that locally.
I realize EDDN is still in its infancy but the toddler is growing up quite nicely. I want EDDN to become a pretty young girl/boy in the near future and entering adulthood in the more distant future. So far I haven't seen complaints of EDDN suddenly stopping providing data, so at least with the current relative low data upload rate its pretty stable. Which is good. We could need some stress testing tools to see how it performs under a heavy load. Preferable not open source and not released to the public as they are in essence DDOS tools....
In the next days I will again try to get a few trading tool commanders on board (TD,BPC,Thrudd). So that to start with they pump data into EDDN - which is very easy to do, just a POST with a JSON structure. Them using the EDDN data can be done later when validator services can provide good/high quality data. In case of TD I've today written a request in the TD thread. I know kfsone is very busy with optimizing TD so EDDN support is probably a low priority. But TD is open source and any one can extend it and create a git pull request for it. Hoping someone jumps into it and creates a tool which uploads the .prices data of TD (only relevant not obsolete data) so EDDN can get more data. When EDDN validators are up and running perhaps emdr-tap.py can be revived by someone and an eddn-tap.py being (re)created so that all TD users (as long as they run the tap) have up to date data.
Edit: Wrt to checking the validators. Perhaps the validators could have an API where one can upload a JSON structure and you are getting back either OK or an error message telling you exactly why its has been rejected (and why that data would not be send back to EDDN). That would be cool.
I wonder about the merits of having multiple validators performing a conceptually-similar function... but that's not a fault of the design at all - I can see it's necessary to support such a thing in the design.
Also, it would be good - and in keeping with the community spirit - if the rules applied by a particular validator were to be published, so that users and devs alike know *exactly* what data transformations to expect from a validator (and therefore, for instance, why *their* data isn't showing up on x or y validator).
I've mentioned this too. It does seem a huge shame so much hard work is going into solving something that really isn't a problem; Typing in commodity prices, using OCR to get them, yet all the time a single XML file dumped by FD when you go into Starport Services would solve it all!
> XML file dumped by FD when you go into Starport Services
great idea! Since OCR we get the data more/less automated anyway so why not provide a reliable, official way by FD
> XML file dumped by FD when you go into Starport Services
great idea! Since OCR we get the data more/less automated anyway so why not provide a reliable, official way by FD
Exactly. It's a tiny piece of work for FD, but makes the world of difference to these tools. But it all comes down to FD trying to limit tools like this of course so not playing ball.
Exactly. It's a tiny piece of work for FD, but makes the world of difference to these tools. But it all comes down to FD trying to limit tools like this of course so not playing ball.
See post #331. FD in the form of Michael Brookes was helpful with the crowd sourcing effort. He provided us 3+ times with coordinates from an spreadsheet they had lying around for internal usage. Otherwise the current route planners probably would not be possible - as in far less easy to achieve without a REAL crowd source effort. Aside from that no form of cooperation with the community - that I am aware of. Iirc in the netlogs the current system has been removed with ED 1.0.4 cripling captains log tool - perhaps there is a workaround, not played a while. Another example is Elite Shipyard which gives a complete (WIP in fact) overview of all components, calculates insurance and distance to travel. All done by volunteers providing/gathering data. The algorithm for the jump range is unknown. It would be great if FD could provide the community with that kind of info. After all ED is live now. Not saying they are not willing to provide it, perhaps the author has not asked it.
But I have to admit that I can't shake the feeling that FD is not that forthcoming - MY opinion. Perhaps its against there way of how ED is meant to be played. BUT that is of course in contrast with the statement by Michael Brookes "that they are planning a web-api after the official release of ED". So NOT sure what to think.
Read in the dev RSS feed that they are finalizing their plans for the next updates to ED (for this year I thought) - NOT the planned paid extensions. Hmmm, sounds like a good time to PM/ask him if a web-api is still planned and if its planned for Q1, Q2 of 2015 or after that. That would not give away that much and we would know if we have to stay "creative" to get our "precious" back - data. Data we lost when accessing Ed was verboten and all commanders played ball. The fun fact is that a LOT of what we need IS in the iOS api - prices for example. I can access the data easily (wet dream for tool authors) But we are NOT allowed to use it...........
Note: I agree with just dumping a XML/JSON file by ED. It would mean NO extra cpu usage/bandwidth/extra servers at the side of FD. All processing done client-side. The info received from the FD servers when for example opening the market or selecting a star could simply be dumped in a round robin XML/JSON file. The community sure knows how to process that. Here is hoping.....
Did send a PM to Michael Brookes about the official web-api. Simple question, something like: "Is it planned for Q1 2015, Q2 2015, after Q2 2015 or has it been cancelled".
I can understand why they might not want an API call. But simply dumping XML files when you enter Star Port services? One with the system details and another with the local commodity details? 3rd party tools could simply be polling for these and voila!