Release EDDN - Elite Dangerous Data Network. Trading tools sharing info in an unified way.

Status
Thread Closed: Not open for further replies.
So just a heads up. I released eliteOCR 0.3.8. It has direct export to EDDN. It might be faulty. I hope it isn't... If there are problems let me know.
 
Love this idea. Out of curiousity, and to check understanding, if EDDN is a aggregator/firehose (distributed data collated and thrown at any client listening), is there currently anyone providing buffering of this data e.g. the last 24 hours?

For example, EDDN receives 30k messages over an hour. It dutifully forwards them on. Sadly one of the subscribers was down for that period, and so missed out on the new data. This is based on my limited understanding of how ZeroMQ would be handling the pub-sub, so if I'm wrong, please say!

A number of messaging systems have the ability to 'go back in time' or work from an index such that you can say 'ok, the last message I reliably processed was 25, lets' start pulling from 26' - Amazon Kinesis for example. Rather than try and bloat EDDN's mission, I was wondering if anyone had layered this sort of thing on top as yet? Would greater reliability of messages be of interest to tool developers or is occasional loss tolerable?

Edit: There are of course alternatives to a single source of storage - the various subscribers could all repeat the same data back over time, and thus you'd have crowd-sourced storage of anything you as a single subscriber missed, but that approach would be reliant on being able to spot stale data (some form of reliable timestamp for example).
 
Last edited:

wolverine2710

Tutorial & Guide Writer
3rd party services.

Love this idea. Out of curiousity, and to check understanding, if EDDN is a aggregator/firehose (distributed data collated and thrown at any client listening), is there currently anyone providing buffering of this data e.g. the last 24 hours?

For example, EDDN receives 30k messages over an hour. It dutifully forwards them on. Sadly one of the subscribers was down for that period, and so missed out on the new data. This is based on my limited understanding of how ZeroMQ would be handling the pub-sub, so if I'm wrong, please say!

A number of messaging systems have the ability to 'go back in time' or work from an index such that you can say 'ok, the last message I reliably processed was 25, lets' start pulling from 26' - Amazon Kinesis for example. Rather than try and bloat EDDN's mission, I was wondering if anyone had layered this sort of thing on top as yet? Would greater reliability of messages be of interest to tool developers or is occasional loss tolerable?

Edit: There are of course alternatives to a single source of storage - the various subscribers could all repeat the same data back over time, and thus you'd have crowd-sourced storage of anything you as a single subscriber missed, but that approach would be reliant on being able to spot stale data (some form of reliable timestamp for example).

EDDN is a relay. It sends out what it receives. So YES its indeed a "EDDN is a aggregator/firehose". Its possible that some additional checks will be introduced, depends on James. It does NOT store anything (atm - see ELK stack below). James and I like the K.I.S.S principle. We hope that others step up and provide "3rd party services" like a buffering service to be used be other commanders. A buffering service based upon a messaging could be such a 3rd party service. I'm NOT aware of the existence of any 3rd party services.

I've PM-ed with Maddavo. He has created a website where commanders can upload their Trade Dangerous (TD) .prices file and receive a merged .prices file back with the latest up to date prices. He is looking into adding EDDN support. I've not PM-ed with him lately - Xmas holiday - so don't know what the status is and if its being implemented. If that SHOULD be added the following is likely. When he receives a .prices file the latest prices are uploaded to EDDN. When EDDN stabilizes the service connects to the firehose and uses its data. The latest merged .prices file will be accessible for download by using a simple http download. In effect it would be the first 3rd party service for EDDN - for use with TD.

I'm personally planning to install an ELK stack on the server where EDDN is running. It will be for personal use (for starters) because in ElasticSearch (the E in ELK) by design, security is not built into it. They leave it up to you as the developer to implement the right security for your environment. There are ways to improve that. There may or may not be a way to connect the ELK stack by others -for example to connect Kibana (the K in ELK) to it.

I'm hoping that 3rd party buffering services will pop up so that you can and others can collect data they have missed. Also extremely useful for those who are not able to connect 24/7 to the firehose of EDDN.
 
Last edited:

wolverine2710

Tutorial & Guide Writer
My first thought was "this would make most sense implemented as the gateway appending a hash for the contents of the 'message' property"... then I decided that computing hashes for each and every message isn't something I'd like to commit the gateway to doing ;)

Note that duplicate messages do have non-zero value: if you get the same data from two different people, it's less likely to be faked. (In practice, as I might have said already, I'm not sure anyone will bother going to that level of effort to counter poisoned data.)

I can see value in the usage of hashes. Aside from what Andargor said also for an ELK stack - statistics. You might want to reconsider your decision. There are extremely fast non-cryptographic hash algorithm out there. Most notably xxhash. From the website: xxHash is an extremely fast non-cryptographic Hash algorithm, working at speeds close to RAM limits. Hash comparison (single thread, Linux Mint 64 bits, using Open Source's SMHasher on a Core i5 3340M @2.7GHs)
Code:
Name     Speed on 64 bits    Speed on 32 bits
XXH64       13.8 GB/s            1.9 GB/s       
XXH32        6.8 GB/s            6.0 GB/s

There are versions for a lot of languages, amongst which python. If you should create a JSON v2 (which imho is a good thing, see arguments by kfsone) where the prices are in an array so far less updates to EDDN and hence less hash calculations. A v2 would have have far less hash collisions.
 
Last edited:

wolverine2710

Tutorial & Guide Writer
I do believe it should be generated at the producer end (i.e. what is POSTing the data), rather than at the gateway or consumer apps.

That makes sense, otherwise you could not filter out things. It would also cause no extra burden on the EDDN server.
 

wolverine2710

Tutorial & Guide Writer
Love this idea. Out of curiousity, and to check understanding, if EDDN is a aggregator/firehose (distributed data collated and thrown at any client listening), is there currently anyone providing buffering of this data e.g. the last 24 hours?

I've posted some messages in the thread "[GENERAL] Common Use SQL Game Database Source". A very very useful resource for anyone wanting to create an application which uses a database. See post #33 and post #37. So it looks in the not so distant future (a few weeks) there could be an 3rd party service which provides you with what you need/want.
 
I'm late to this thread, but find it very interesting from a data and architectural perspective. From an integration perspective I think there are some things that could solve some issues that have come up.

This project is described as a firehose approach. No snapshots are captured, and it only caters to market data, not other kinds of ED related data. As a consumer app, if I want snapshots, historical data, or related data (system locations etc), this project won't help me.

So, to me a logical extension this would be something that allows this, and is easy to integrate with. Here's an idea: create a client for this project that saves data in XML/JSON files, including pulling data from other sources, and periodically push the changed files to a Github repo. As an online web app wanting snapshot data, or even historical data, all I then need to do is clone that repo and pull changes every once in a while. Any other apps that can contribute to the data can either become a collaborator to the project, or issue pull requests.

WDYT?
 

EDDN is intentionally kept simple in both mission and implementation to allow this kind of development by others. Market data is being used as the current use case because it is dynamic. However, there is a need for static or semi-static data being stored somewhere, such as system/station data (see EDSC/TGC or Biteketkergetek's stuff).

And definitely someone will start archiving, especially if there's to be some kind of a stock market ticker, for example. If you are willing, you could tap into the feed and automate the archiving to github or some other online service.
 
Last edited:
EDDN is intentionally kept simple in both mission and implementation to allow this kind of development by others. Market data is being used as the current use case because it is dynamic. However, there is a need for static or semi-static data being stored somewhere, such as system/station data (see EDSC/TGC or Biteketkergetek's stuff).

And definitely someone will start archiving, especially if there's to be some kind of a stock market ticker, for example. If you are willing, you could tap into the feed and automate the archiving to github or some other online service.

Fantastic! So, here's my plan: create a Heroku app that will subscribe to the firehose and update local XML files. Then periodically (say every 10 minutes) push changed files to a GitHub repo. I want to also get the list of star coordinates from EDSC and put that info in the same directory structure. Other data sources can be added later.

This gives app creators:
1) a one-stop-shop for ED data (sources can be merged, because XML)
2) snapshot, changes, and historical data (because Git)
3) option to write online or offline apps (e.g. pull when online only)

Should be interesting!
 
Why has it to be that complicated .... my python is a but rusty and I am getting currently pyzmq not running. four fricking pages of "easy" install and I have no clue what to do

All I want to do is fetch some data (so that I can generate better data)
 

wolverine2710

Tutorial & Guide Writer
Why has it to be that complicated .... my python is a but rusty and I am getting currently pyzmq not running. four fricking pages of "easy" install and I have no clue what to do

All I want to do is fetch some data (so that I can generate better data)

I'm a Java guy and new to Python so can't really help. Hopefully one of the commanders here can help your further.... Perhaps it would help if you tell what exactly have you done so far and what documentation you have used.
 

wolverine2710

Tutorial & Guide Writer
Fantastic! So, here's my plan: create a Heroku app that will subscribe to the firehose and update local XML files. Then periodically (say every 10 minutes) push changed files to a GitHub repo. I want to also get the list of star coordinates from EDSC and put that info in the same directory structure. Other data sources can be added later.

This gives app creators:
1) a one-stop-shop for ED data (sources can be merged, because XML)
2) snapshot, changes, and historical data (because Git)
3) option to write online or offline apps (e.g. pull when online only)

Should be interesting!

I very much like to see the mission of EDDN starting to take off. I'm very glad that commanders see the value of it and are taking the time/effort to enhance EDDN.
Your approach is a new, refreshing and exiting one.

Some things to consider. I believe there are some things to consider, especially "Github: Working with large files". From that website:
GitHub warns you when you try to add a file larger than 50 MB. Pushes containing files larger than 100 MB are rejected for a few reasons.

In many cases, committing large files is unintentional and causes unneeded repository bloat. Every time someone clones a repository with a large file, they'll have to fetch that file, adding excess time to their download.

In addition, if a repository is 10 GB in size, Git's architecture requires another 10 GB of extra free space available at all times. This allows Git to move the files around in its normal course of operations. Unfortunately, this also means that we must be much less flexible with how we store these repositories.
You need to break up your data in max 50 MB files and the total size for a github account is 10GB.

I don't have the knowledge to determine how feasible your git solution will be, perhaps others do.

Atm I don't have numbers about the bandwidth usage and number of uploads to EDDN. The last one will be known once I have an ELK stack setup - Kibana will tell/visualize me everything I want to know and more. There are bandwidth monitoring tools for Linux and perhaps our hosting service (Vivio Technologies) which provided us with free hosting (20 Mbs line) has tools for it.

Please keep me/us in the loop about your progress with your tool. Like I said it IS very much appreciated.
 
Last edited:
Why has it to be that complicated .... my python is a but rusty and I am getting currently pyzmq not running. four fricking pages of "easy" install and I have no clue what to do

All I want to do is fetch some data (so that I can generate better data)
My recomendation is to use pip
I tried yesterday to run the example from Github and I installed every missing package with "pip install xxx". I had no problems what so ever. I also use python 2.7 if it makes any difference.
When you run it and get "couldnt import module", google the name. then get the one where pypi website appears first.
 
I very much like to see the mission of EDDN starting to take off. I'm very glad that commanders see the value of it and are taking the time/effort to enhance EDDN.
Your approach is a new, refreshing and exiting one.

Some things to consider. I believe there are some things to consider, especially "Github: Working with large files". From that website:
GitHub warns you when you try to add a file larger than 50 MB. Pushes containing files larger than 100 MB are rejected for a few reasons.

In many cases, committing large files is unintentional and causes unneeded repository bloat. Every time someone clones a repository with a large file, they'll have to fetch that file, adding excess time to their download.

In addition, if a repository is 10 GB in size, Git's architecture requires another 10 GB of extra free space available at all times. This allows Git to move the files around in its normal course of operations. Unfortunately, this also means that we must be much less flexible with how we store these repositories.
You need to break up your data in max 50 MB files and the total size for a github account is 10GB.

I don't have the knowledge to determine how feasible your git solution will be, perhaps others do.

Atm I don't have numbers about the bandwidth usage and number of uploads to EDDN. The last one will be known once I have an ELK stack setup - Kibana will tell/visualize me everything I want to know and more. There are bandwidth monitoring tools for Linux and perhaps our hosting service (Vivio Technologies) which provided us with free hosting (20 Mbs line) has tools for it.

Please keep me/us in the loop about your progress with your tool. Like I said it IS very much appreciated.

So, here's the progress: I have been listening to the ZeroMQ subscription all day, but not getting a single message. Not sure what I'm doing wrong, or if there's actually no messages today.

For the EDSC data, I tried invoking the API but only get Error 500 (no idea why, followed the example). So, I downloaded the elite.json file that some other user prepared, and chopped it up into one file per system, with directory structure based on system name and quadrant (to avoid having too many files in one dir). Once I get messages from EDDN I should be able to create market files in this structure, and then push that as a repo.

As for running out of space, the easy solution there would be to roll over to a new repo once the limit is reached, and keep the old one as archive for historical charting if needed. For usecases needing latest snapshot and polling for changes, that works just fine.

Once I can get proper data from both EDDN ZeroMQ and EDSC API it should be fairly little work to actually get it done, as all the real heavylifting is done by Git and GitHub. So that's where I'm at right now.
 

wolverine2710

Tutorial & Guide Writer
So, here's the progress: I have been listening to the ZeroMQ subscription all day, but not getting a single message. Not sure what I'm doing wrong, or if there's actually no messages today.

For the EDSC data, I tried invoking the API but only get Error 500 (no idea why, followed the example). So, I downloaded the elite.json file that some other user prepared, and chopped it up into one file per system, with directory structure based on system name and quadrant (to avoid having too many files in one dir). Once I get messages from EDDN I should be able to create market files in this structure, and then push that as a repo.

As for running out of space, the easy solution there would be to roll over to a new repo once the limit is reached, and keep the old one as archive for historical charting if needed. For usecases needing latest snapshot and polling for changes, that works just fine.

Once I can get proper data from both EDDN ZeroMQ and EDSC API it should be fairly little work to actually get it done, as all the real heavylifting is done by Git and GitHub. So that's where I'm at right now.

I started EliteOCRReader and later RegulatedNoise (same author and program but now with better number OCR-ing) and as I write this, data is coming in. You can also use the EDDN\src\eddn\Client.py program. If you were using your own firehose listener it looks as if something is wrong. A good way to monitor what is coming in, going out is wireshark (just set a few filters to keep things clean).

TGC aka edsc api is sometimes down, please ask for the status in the crowd sourcing coordinates project thread (see sig or my profile). A good way to test it is to use the jfiddler examples on the API page.
What is that elite.json file you are talking about and where can I find it?

Thanks for the update and good luck with the tool.
 
I started EliteOCRReader and later RegulatedNoise (same author and program but now with better number OCR-ing) and as I write this, data is coming in. You can also use the EDDN\src\eddn\Client.py program. If you were using your own firehose listener it looks as if something is wrong. A good way to monitor what is coming in, going out is wireshark (just set a few filters to keep things clean).

TGC aka edsc api is sometimes down, please ask for the status in the crowd sourcing coordinates project thread (see sig or my profile). A good way to test it is to use the jfiddler examples on the API page.
What is that elite.json file you are talking about and where can I find it?

Thanks for the update and good luck with the tool.

Alright, got it working now. I had missed the actual "subscribe" command. So now I'm getting EliteOCR messages, and I can decompress them properly. Now to create a market.xml file in the mentioned directory structure, and have the program push it to Github, and.. that should be it. Deploy to Heroku, profit.
 

wolverine2710

Tutorial & Guide Writer
Alright, got it working now. I had missed the actual "subscribe" command. So now I'm getting EliteOCR messages, and I can decompress them properly. Now to create a market.xml file in the mentioned directory structure, and have the program push it to Github, and.. that should be it. Deploy to Heroku, profit.

Nice. Looking forward to the end result. Are you going to provide info about how to get the older data from github, what to do with it etc etc?
 
Nice. Looking forward to the end result. Are you going to provide info about how to get the older data from github, what to do with it etc etc?

So, part of the reason I'm doing this is because I want to write consumer apps myself. So I need the snapshotting and all that goodness anyway. Doing this project, and the other app as "an example" should give good hints to others for how to consume it. It'll all be published on Github.
 
Status
Thread Closed: Not open for further replies.
Back
Top Bottom