Release EDDN - Elite Dangerous Data Network. Trading tools sharing info in an unified way.

Status
Thread Closed: Not open for further replies.
I got informed by a french user that I made mistakes in the french commodities. This is the right way:
Clothing = MAQUILLAGE
Consumer Technology = VÊTEMENTS
Domestic Appliances = ÉQUIPEMENT DE LOISIR

I will update it in elite OCR for the next version.

OMG, that sounds so wrong.
 
OMG, that sounds so wrong.

That is wrong. Vetements has the same root as the English word Vestments and it means clothing. Maquillage is cosmetics.

Unless ED changed the commodities on accident due to a bad translation on their side - that's not correct.

ÉQUIPEMENT DE LOISIR literally means 'equipment of leisure'. I'm not certain what that would be. Hot-tubs?

Household Appliances is Électroménager
 
Last edited:
Unfortunately I can never quite get past the message-per-item requirement for the uploader. Tools like TD and EliteOCR encourage users to update an entire station at a time, so why is the data network working item at a time, especially when it's using such a verbose json format. The encapsulation:payload ratio right now is very, very low.

711k of station updates in the TradeDangerous ".prices" format(*) converts to 2.9M of EDDN data. A compact, per-station format would reduce it to 511k.

I realize you're using compression, but the batched format still compresses down nicely; using gzip/9 compression:

Code:
Original .prices file:    711k
Unbatched eddn data:     2871k
Unbatched eddn gz:        123k
Batched terse data:       551k
Batched terse gz:          94k

(see https://www.dropbox.com/personal/Public/ed/eddntransform)

The batched is also significantly less expensive to generate, compress, uncompress, parse and use.

Use? Well, yeah, because what you're probably most likely to do is take the updates and stuff them into a database. Batched mode allows you to do it transactionally, e.g. station-at-a-time or single-commit.

I know mine is not the only tool using SQLite, so I'll speak to that. Due to the way the Windows file system works, individual transactions can have a 2-300ms+ disk IO overhead, especially writes.

This matters because of the way the ZeroMQ stack works especially Pub/Sub, and the fact that many users will be submitting ~70 updates at a time - so EDDN generates 70 messages and the receiver takes upto 14,000ms (14 seconds) to process them, causing several of the messages to time out.

In the original emdn-tap.py I worked around this by doing my own re-batching and setting timeouts high.

The better solution, though, is to support batch submissions from the gateway down.
 
Last edited:

wolverine2710

Tutorial & Guide Writer
Commander MagmaiKH suggested in the FD API thread a version of EDDN created by FD. I call it FDDR. I of course like the idea and wrote a post there. Its copy/pasted in the spoiler tag beneath.
Another idea, maybe too far out there to do but food-for-thought, is for FD to roll the EDDN functionality into the core server and then it could stream market updates to clients that connect kinda like the NYSE or LSE does.

(Note that they make heavy use of private networks with multi-casting enabled - which we cannot do over the Internet.)

So we'd have an API to query for known system and station data which could include last-known-good commodity data and then a streaming service for market data. Part of the data you query for is to get the system-station ID which is like a stock-ticket but probably just a GUID perhaps modified to have a couple digits to reflect the galactic sector it's in. Then in the streaming service you use the fixed-size ID (not variable size strings) to send data and now you can easily do it with a binary format.


I would like to register a loud vote against JSON. It's a terribly inefficient and difficult to parse format which translate to wasted cycles dealing with it which matters when there's a lot of them. I read ridiculous things like omg it handles 200 updates a minute with a thousand end-points it's great!

The target capability is a thousand updates per second with ten's of thousands of end-points.

Read it yesterday, decided to sleep over it for a night before adding a post. I TRULY like the idea of FD rolling out EDDN functionality. For Commanders who know me a bit (see profile) this does not come as a surprise ;-) Lets call the FD version FDDR (FD Data Relay). In this case they only have to create a ZeroMQ (ZMQ) PUB (publisher), which is VERY easy to create. It would have a few advantages.


  • Prevents poisoning the well. With EDDN it relies on multiple sources. Commanders against EDDN could send bogus info to EDDN. Just like the BPC in the past was fed was stuff like Javacakes and bogus prices. This could ofc be handled by EDDN (login for example) but it would make EDDN a bit more complicated and less K.I.S.S.
  • Less bandwidth. If multiple commanders enter a station and open the commodities market normally with EDDN both commanders (if they participate in EDDN) would upload that info to EDDN - which then directly distriubutes it. FD could decided to send the info only once the info IF the market hasn't changed when multiple commanders in the same station open the market.

I DO agree that it should be an opt-in feature. As in commanders have an option in ED to have FD send their commodities prices to FDDR, which can then be read by subscribed tools/commanders. Some commanders don't want to share their info because they fear it will hurt their discovered trade runs - which is of course fine.

EMDR handles approx 1 Million messages a day (almost 12 messages per second). Its very scalable, robust, redundant. EDDN (which is heavily based upon EMDR) and FDDR can use this setup. I haven't logged in to the EDDN server to see the current bandwidth usage. MagmaiKH wrote: "The target capability is a thousand updates per second with ten's of thousands of end-points". I don't think that is true. Let me try to explain.

I think most commanders are not going to subscribe to FDDR, just like they don't all connect to EDDN. Commanders use trading tools like the BPC, Thrudd. These tools connect to FDDR and store the info locally in a database. The BPC, Thrudd website accesses that database. BPC/Thrudd atm get their data from OCR-ed screens and manual input. Hoping they output/upload to EDDN soon ;-) For tools which run locally without a huge database (for older/historical data) like Trade Dangerous its a bit different. Normally they should connect 24/7 to FDDR/EDDN to stay up to date. EDDN is just a simple relay, it doesn't store info. The mission of EDDN is clear and a simple one, K.I.S.S. principle. It relies on value added services (VAS) to make it more useful. One such service is created by Maddavo which has a merge tool for TD. It connects to EDDN. TD commanders can get the latest up to date .prices file for TD from within TD by a certain sub command. When TD commanders upload their manual entered (or OCR-ed) prices to the merge tool they update the merge tool. With EDDN atm we have at least two commanders who connect to EDDN AND make the stored historical data available to other commanders - so it can be used in their tools. I'm sure more services will be created. EDDB is being created. I would be surprised if in the end more then 100,200 commanders would connect directly to FDDR. Others will probably use the various VA services out there. A tool I like and which is being created by a commander (who liked the idea) for EDDN is the equivalent of EVE's EMDR map - See the solar systems light up as market data arrives.

If FD is concerned about bandwidth usage they could limit access to FDDR. As in for example only EDDN (or a few more) can subscribe to it. The community then connects to EDDN and its up to EDDN to be scalable, redundant enough. The community would provide the needed bandwidth!!!. In the case of EDDN we were very lucky. Commander Errantthought (thanks a lot man) pitched the idea to his boss and we got free hosting services. Its described on the wiki page of EDDN - to prevent possibly violating forum rules here. See this high-level overview of EMDR to get an idea of how this is done in EMDR. The discussed announcers/relays are visualized in the tool EMDR monitor - EMDR relay/announcer monitor web app. It would cost FD very little bandwidth in this setup - limit access to FDDR to a few tools like EDDN.


TLDR; Yes I hope FD creates EDDN functionality into the core server and then it could stream market updates to clients. It even has a few advantages over EDDN....

Edit: Yes in hindsight I should have chosen a different name. EDDR (Elite Dangerous Data Relay) would have been a more fitting name then EDDN (Elite Dangerous Data Network) as EDDN does not store anything. Ah well live and learn ;-)

Edit2: ZMQ supports multicast. Wrt ZMQ performance, see here. Snippet of info: "As for bandwidth usage we are able to get to 1Gb/sec boundary for messages 128 bytes long. 2Gb/sec boundary is passed for messages 1024 bytes long and bandwidth of 2.5Gb/s is reached for messages 32kB long" AND "Throughput gets to the maximum of 2.8 million messages per second for messages 8 bytes long".

To make this absolutely totally clear: I'm NOT the one who implemented EDDN. That excellent work has been done by commander jamesremuscat. Thanks again for all your hard your work James!!!
I'm just the guy who liked the original EMMD idea by commander Andreas (heavily based on EMDR) and tried to see it revived....
Should you like the idea of a FDDR, the best place to have your opinion heart is in the API thread.
 
Last edited:

wolverine2710

Tutorial & Guide Writer
Unfortunately I can never quite get past the message-per-item requirement for the uploader. Tools like TD and EliteOCR encourage users to update an entire station at a time, so why is the data network working item at a time, especially when it's using such a verbose json format. The encapsulation to payload ratio right now is very, very low.

711k of station updates in the TradeDangerous ".prices" format(*) converts to 2.9M of EDDN data. A compact, per-station format would reduce it to 511k.

I realize you're using compression, but the batched format still compresses down nicely; using gzip/9 compression:

Code:
Original .prices file:    711k
Unbatched eddn data:     2871k
Unbatched eddn gz:        123k
Batched terse data:       551k
Batched terse gz:          94k

(see https://www.dropbox.com/personal/Public/ed/eddntransform)

The batched is also significantly less expensive to generate, compress, uncompress, parse and use.

Use? Well, yeah, because what you're probably most likely to do is take the updates and stuff them into a database. Batched mode allows you to do it transactionally, e.g. station-at-a-time or single-commit.

I know mine is not the only tool using SQLite, so I'll speak to that. Due to the way the Windows file system works, individual transactions can have a 2-300ms+ disk IO overhead, especially writes.

This matters because of the way the ZeroMQ stack works especially Pub/Sub, and the fact that many users will be submitting ~70 updates at a time - so EDDN generates 70 messages and the receiver takes upto 14,000ms (14 seconds) to process them, causing several of the messages to time out.

In the original emdn-tap.py I worked around this by doing my own re-batching and setting timeouts high.

The better solution, though, is to support batch submissions from the gateway down.

Olliver thanks for the time putting into the post. You've expressed this before. Iirc James said that currently EDDN is mimicking Andreas EMDN JSON format so that existing tools could be reused. It makes starting to use EDDN a little easier. Afaik the plan is to have a v2 of the JSON format which will have an upload per station - commods in an array. 94K vs 12K. Its a reduction to 0.76 of the original size. BW will become very important. The processing/usage part I DO agree with. You have experience with it so you know what your are talking about.

I tried to access the file in the dropbox account but it asks me to login before I can download your file. Could you please change it to a public link so it can be downloaded. That way I/we can all have a look at your suggestion of the JSON file. It sound like a good idea to discuss your suggestion for a V2 JSON - why wait with the risk of getting snowed under (again). What currently is also missing in the EDDN format are official FD id's for commodities/categories/stationname/system etc. Id's for commods etc would be great for localization purposes - French/German. Stuff which is in the iOS webApi JSON format. Perhaps that also could be used as a base to get additional ideas.

Lets DO discuss a V2 EDDN JSON format and see if we can nail it down. Afterwards this can be implemented in EDDN. By James or by a pull request. I can imagine a situation where v1 and v2 are temporarily both (uploade to and) published by EDDN so that commanders can adjust/change their tools. Otherwise their tools would break....

Edit: EMDR changed into EMDN were appropriate.
 
Last edited:
I support this notion and I'm able to change my tools with moments notice. I'm also archiving every bit coming through EDDN so if someone loses few hours of entries, they can contact me.
 

wolverine2710

Tutorial & Guide Writer
Has someone has succeeded in opening the dropbox file supplied by kfsone? If so, could you please upload it in some form or if not to big in a spoiler tag. I would like to have a look at it.
 
Just a quick note: The whole corrected data of eddb has been made public and can be downloaded as JSON files. This includes all the EDDN stuff. Currently the update kicks in once a day. This could happen more often too, if needed...

- - - - - Additional Content Posted / Auto Merge - - - - -

I support this notion and I'm able to change my tools with moments notice. I'm also archiving every bit coming through EDDN so if someone loses few hours of entries, they can contact me.

It's loading and loading and loading :)
 
Just a quick note: The whole corrected data of eddb has been made public and can be downloaded as JSON files. This includes all the EDDN stuff. Currently the update kicks in once a day. This could happen more often too, if needed...
Very nice. Thanks for that. I notice the System.json has "Destination" in there, I thought this had been culled weeks ago.
 
I notice the System.json has "Destination" in there, I thought this had been culled weeks ago.

I only applied additive updates and did not clean up the systems yet. Not before I have messages like yours or an approved master list. But thx, gonna fix it right away..
 
Last edited:
Olliver thanks for the time putting into the post. You've expressed this before. Iirc James said that currently EDDN is mimicking Andreas EMDN JSON format so that existing tools could be reused. It makes starting to use EDDN a little easier. Afaik the plan is to have a v2 of the JSON format which will have an upload per station - commods in an array. 94K vs 12K. Its a reduction to 0.76 of the original size. BW will become very important. The processing/usage part I DO agree with. You have experience with it so you know what your are talking about.

I tried to access the file in the dropbox account but it asks me to login before I can download your file. Could you please change it to a public link so it can be downloaded. That way I/we can all have a look at your suggestion of the JSON file. It sound like a good idea to discuss your suggestion for a V2 JSON - why wait with the risk of getting snowed under (again). What currently is also missing in the EDDN format are official FD id's for commodities/categories/stationname/system etc. Id's for commods etc would be great for localization purposes - French/German. Stuff which is in the iOS webApi JSON format. Perhaps that also could be used as a base to get additional ideas.

Ah, I'd linked the folder where I'd put the files together rather than the zip file.

https://dl.dropboxusercontent.com/u/109944963/ed/eddntransform.zip
 
Beta Version of a Spark Based Trade Analyser

Thanks to themroc, I have been able to put together a mapreduce program using Apache Spark to calculate the best trade route pairs in the galaxy. I will probably expand it to finding the best cycles, simply because I think longer cycles might actually produce better credit/hour returns than strict pairs of trades. Right now using Slopey's BPC data or EDDB data, I can run a galaxy-wide search for trade routes of 20 Ly or less in about 30 seconds on a 2012 Macbook Pro. Getting Spark to run on windows is a bit of a challenge because it you need to setup Hadoop for Windows, but on Linux/BSD/Mac based systems this should be pretty easy.

Repo is on Github at huadianz/elite-trade-analyzer. Can't post links apparently.
 
Status
Thread Closed: Not open for further replies.
Back
Top Bottom