Release EDDN - Elite Dangerous Data Network. Trading tools sharing info in an unified way.

wolverine2710 · Dec 28, 2014

RickDangerous said:
So, part of the reason I'm doing this is because I want to write consumer apps myself. So I need the snapshotting and all that goodness anyway. Doing this project, and the other app as "an example" should give good hints to others for how to consume it. It'll all be published on Github.

Nice to hear that ;-)
What programming language are you using?

RickDangerous · Dec 28, 2014

wolverine2710 said:
Nice to hear that ;-)
What programming language are you using?

Java 7. So far the code is very straightforward, and Heroku works well with Java.

donpost · Dec 28, 2014

EDIT: I should have RTFM

Walter Harriman · Dec 28, 2014

RickDangerous said:
Alright, got it working now. I had missed the actual "subscribe" command. So now I'm getting EliteOCR messages, and I can decompress them properly. Now to create a market.xml file in the mentioned directory structure, and have the program push it to Github, and.. that should be it. Deploy to Heroku, profit.

What kind of information will there be available when you're done? I'm working on a web based "captains log" for myself, and I'm looking for a good source of station data (simply what stations, if any, there are in the system I'm currently in). I've been using the station.csv file from tradedangerous (https://bitbucket.org/kfsone/traded...cca78ef147f8492bb1580bd250f1b/data/?at=master) but I don't know how often it's updated, so a file that pulls its data direcrly from EDDN might be more up to date (if such data is available).

RickDangerous · Dec 28, 2014

Walter Harriman said:
What kind of information will there be available when you're done? I'm working on a web based "captains log" for myself, and I'm looking for a good source of station data (simply what stations, if any, there are in the system I'm currently in). I've been using the station.csv file from tradedangerous (https://bitbucket.org/kfsone/traded...cca78ef147f8492bb1580bd250f1b/data/?at=master) but I don't know how often it's updated, so a file that pulls its data direcrly from EDDN might be more up to date (if such data is available).

Right now I'm including the EDSC (system and station data) and EDDN data (market data). Plan is that as new types of sources are added, I'll slurp them in as well, so that the Github repo is as close to an authoritative copy of "all we know" about ED, for apps to integrate easily with.

- - - - - Additional Content Posted / Auto Merge - - - - -

So I've been listening to ZeroMQ EDDN messages now for awhile, and it is VERY annoying that data that does not conform to the spec are let through. Specifically, there is a source called "RegulatedNoise" that includes a property "categoryName" in the message, which isn't in the spec. Messes up my parsing :-( These kinds of messages should not be let through to the consumers!!

Walter Harriman · Dec 28, 2014

RickDangerous said:
Right now I'm including the EDSC (system and station data) and EDDN data (market data). Plan is that as new types of sources are added, I'll slurp them in as well, so that the Github repo is as close to an authoritative copy of "all we know" about ED, for apps to integrate easily with.

- - - - - Additional Content Posted / Auto Merge - - - - -

So I've been listening to ZeroMQ EDDN messages now for awhile, and it is VERY annoying that data that does not conform to the spec are let through. Specifically, there is a source called "RegulatedNoise" that includes a property "categoryName" in the message, which isn't in the spec. Messes up my parsing :-( These kinds of messages should not be let through to the consumers!!

That's awesome! Keep us posted on your development. Too bad about the bad data though, hopefully this kind of stuff gets sorted out once things get properly rolling.

donpost · Dec 28, 2014

Walter Harriman said:
That's awesome! Keep us posted on your development. Too bad about the bad data though, hopefully this kind of stuff gets sorted out once things get properly rolling.

It appears I can put anything I like in the "message" portion of the update. This got through fine:

{"header": {"softwareVersion": "0.1", "gatewayTimestamp": "2014-12-28T17:28:41.036491", "softwareName": "EDDN client", "uploaderID": "donpost"}, "$schemaRef": "http://schemas.elite-markets.net/eddn/commodity/1", "message": {"categoryName": "Minerals", "buyPrice": 520, "timestamp": "2014-12-28T17:28:34.7774228+00:00", "stationStock": 0, "systemName": "Suyarang", "stationName": "Treshchov Orbital", "demand": 0, "thisIsntRight": "Look, I can put any property I like in the message!", "sellPrice": 510, "itemName": "Rutile"}}

wolverine2710 · Dec 28, 2014

donpost said:
It appears I can put anything I like in the "message" portion of the update. This got through fine:

See post#150.
Atm EDDN is a POC and the format can and probably will change. James last update is from the 18th of December. He probably is enjoying his Xmas holiday or as in the past has RL issues to take care of.

wolverine2710 · Dec 28, 2014

RickDangerous said:
Right now I'm including the EDSC (system and station data) and EDDN data (market data). Plan is that as new types of sources are added, I'll slurp them in as well, so that the Github repo is as close to an authoritative copy of "all we know" about ED, for apps to integrate easily with.

- - - - - Additional Content Posted / Auto Merge - - - - -

So I've been listening to ZeroMQ EDDN messages now for awhile, and it is VERY annoying that data that does not conform to the spec are let through. Specifically, there is a source called "RegulatedNoise" that includes a property "categoryName" in the message, which isn't in the spec. Messes up my parsing :-( These kinds of messages should not be let through to the consumers!!

See post#150.
Atm EDDN is a POC and the format can and probably will change. James last update is from the 18th of December. He probably is enjoying his Xmas holiday or as in the past has RL issues to take care of.

RegulatedNoise is the successor of EliteOCRReader by maxh2003. It has OCR functionality now. Please be so kind to post in his thread and tell him the issue you have found.

I'm a Java guy like you. Defensive programming is of course a good thing/best practise. I'm sure you can easily skip fields which are not in the current specs of the JSON format. I don't know if you are using plain Java or a framework. Something which has worked for me like a charm in the past is xstream. It supports (de)serializing of XML and JSON. Serializing is ideal for testing (just dump an object/array/List) to XML. It works the other way around to. Read in a JSON structure and it creates an object, from which you very easily get a certain field. See this page for an example. Hope this helps a bit.

I mean no disrespect but EDDN is Open Source (written in Python) and you and everyone else can (and are encouraged to) create a pull request for it - to solve issues or add functionality.

CMDRKNac · Dec 28, 2014

I will develop an API using HTTP & JSON to fetch historical snapshots of data. All the trading data is already being stored in my db (only current snapshots for now, not saving historical data) and can be browsed in the web, but can't be fetched by 3rd party apps (I'll hook up some tools to make use of it in the future), unfortunately I won't have proper developing time until probably February so is good other initiatives coming out.

Sin the last EliteOCR the data has increased exponentially, with proper parsing for example in just the last 4 hors there have been 120 systems updated... At this rate we should get most of the trade data in the game (20k something systems?) in few time, and honestly it changes very little actually so that would be for the most part current.

Askarr · Dec 28, 2014

I had a brief play with streaming EDDN to Dynamo, just to see what happened. Pretty successful. For a 12 hour period, I got 8860 messages, covering 137 stations, with up to 8 refreshes of the same station. I do note that the bulk (more than 2/3rds) of the current traffic is new data i.e. most stations are not getting refreshed quickly.

I also note a distinct variation in casing of station/system names, and several players using EliteOCR are sending French commodity names, which means these appear as entirely separate data.

Having some durability to the messages I think is wise, as otherwise any hiccups at all will lose data potentially for some time before another user refreshes that given data point.

If someone were to layer Kafka on top for message durability, would this be of interest, or do we feel that occasional loss of data is tolerable? I was considering that if it's of real interest, it'd be easy enough to have a listener local on the same ZeroMQ box that posted to, say, Amazon Kinesis, and that'd mean downtime was less of an issue.

However, if several people are already planning on storage durability, then in theory any given consumer could look up data from several sources and get up to date info regardless of individual consumer downtime. In other words, durability like Kafka is irrelevant if there would be two trade APIs that are both read-public & both listening to EDDN - clients could ask both and use latest data.

Just curious (and excited) to see where this is all heading

Askarr · Dec 28, 2014

donpost said:
It appears I can put anything I like in the "message" portion of the update. This got through fine:

I'd point out that JSON is an additive format. As long as required fields are provided, there are usually no restrictions (unless enforced via schema) for additional data to be used.

This can be a good thing. You can mark up messages with your own metadata that extends an common format, without loss. I don't think there's much value in EDDN being draconian. As long as the basics of commodities are sent correctly, if certain clients want to send extra data, there's no real harm. Any good parsing code is only looking for known properties anyway, otherwise it's fragile to changes to the format.

wolverine2710 · Dec 28, 2014

Askarr said:
I had a brief play with streaming EDDN to Dynamo, just to see what happened. Pretty successful. For a 12 hour period, I got 8860 messages, covering 137 stations, with up to 8 refreshes of the same station. I do note that the bulk (more than 2/3rds) of the current traffic is new data i.e. most stations are not getting refreshed quickly.

I also note a distinct variation in casing of station/system names, and several players using EliteOCR are sending French commodity names, which means these appear as entirely separate data.

Having some durability to the messages I think is wise, as otherwise any hiccups at all will lose data potentially for some time before another user refreshes that given data point.

If someone were to layer Kafka on top for message durability, would this be of interest, or do we feel that occasional loss of data is tolerable? I was considering that if it's of real interest, it'd be easy enough to have a listener local on the same ZeroMQ box that posted to, say, Amazon Kinesis, and that'd mean downtime was less of an issue.

However, if several people are already planning on storage durability, then in theory any given consumer could look up data from several sources and get up to date info regardless of individual consumer downtime. In other words, durability like Kafka is irrelevant if there would be two trade APIs that are both read-public & both listening to EDDN - clients could ask both and use latest data.

Just curious (and excited) to see where this is all heading

Do you mean this Dynamo? If so it look nice.
It seems now that the EliteOCR and RegulatedNoise OCR solutions are sending data to EDDN its taking off ;-)
Thrudd is looking into supporting EDDN as well. kfsone of TradeDangerous is currently optimizing TD but its plugin system means everybody can make a TD_EDDN_FEEDER plugin. I believe maddavo (which has a website to merge TD .prices files) is looking into this. When EDDN is stable enough Slopey will also look into it. Hence all three major trading tools (I know of) are going to support it.

Concerning data loss with EDDN (should it be temporarily down). EDDN is heavily based upon EVE's EMDR. See their high level overview (HLO) page. Atm EDDN is very simple. The gateway and relay are on the same machine. If you look at the HLO page you notice that everything can be setup in a redundant way. The loadbalancer can detect whether a gateway is down and send the uploaded data to a working gateway. The client can connect to multiple relayers and pick a working (or the best/fastest) one. The whole EMDR setup is created in such a way that is scales very very well. From the website: EMDR delivers nearly a million messages a day (11-12 messages per second) to a number of different projects. So there are quite a lot of things which can be done to make sure the EDDN service stays up nearly 100% and is scalable.

EDDN doesn't store anything. That is by design. That's left to 3rd party services. One could be a buffering service where one can retrieve the data for say the last day. For trading tools like thrudd and slopey which have their own website/database the fact that EDDN doesn't store anything is not a problem. It could be a problem for TD but maddavo is looking into this. Of course there are more tools which could use trading data (or any other data). If those tools running locally and don't connect to a database and don't run 24/7 they WILL miss data. Kinesis (bound to amazone I believe) and Kafka or its alternatives look very promising in getting that older data. Some don't like kafka and prefer Cassandra. Atm one commander is setting up a database with all kinds of information which will also hold prices another is looking into a git solution. Its both WIP and I appreciate each and every approach. Kafka from what I just read (its distributed approach which looks a lot like EMDR/EDDN) looks VERY useful as the logic to get older data is built into it.

I'm NOT going to discourage commanders. I however DO like to encourage commanders to come up with bright ideas and hopefully implement them. Is Kafka better then the other solutions? I don't know, time will tell. I believe in software Darwinism. The best solution or the best supported solution survives or perhaps multiple solutions survive. There probably isn't one best solution. So YES if you want to setup a kafka service by all means, "make it so" commander. Perhaps someone comes up with another idea. Cool.

Its also entirely possible that a better solution then EDDN surfaces and that EDDN goes the way of the dinosaurs. Personally (can't speak for EDDN's implementor James) I would be OK with that. I started this thread and other similar threads to try to get the ball rolling. In the hope that a snowflake turns into a giant snowball.

In short I don't know what the future will bring BUT I hope that the future will be bright and that a data network will be created were lots of tools can take advantage of!!

Note: Like I said before. I'm going to setup an ELK stack on the server where EDDN runs. Kibana (K in elk) is a brilliant tool to visualize things. The elasticsearch (E in ELK) is a very powerful non-sql database. If it can be made publicly available is unknown at this point - see also post #204.

Edit: What I do hope is that the number of API's will be kept to a minimum. I can imagine that someone comes up with a proxy API which gets data from the other tools/API's. So that in the long run there will be one or two API's. API's Darwinism....

donpost · Dec 28, 2014

Askarr said:
I'd point out that JSON is an additive format. As long as required fields are provided, there are usually no restrictions (unless enforced via schema) for additional data to be used.

This can be a good thing. You can mark up messages with your own metadata that extends an common format, without loss. I don't think there's much value in EDDN being draconian. As long as the basics of commodities are sent correctly, if certain clients want to send extra data, there's no real harm. Any good parsing code is only looking for known properties anyway, otherwise it's fragile to changes to the format.

Sounds reasonable to me. To be honest I'm no expert at this stuff - I was just reporting the behaviour in case it was a bug

.
The markets appear to have stabilised since the 23rd December server side patch. Unless you trade away most of the supply/demand quantities prices don't change much. As soon as all the trading apps are sharing data I'm going to go on a market adding frenzy

RickDangerous · Dec 29, 2014

wolverine2710 said:
I'm a Java guy like you. Defensive programming is of course a good thing/best practise. I'm sure you can easily skip fields which are not in the current specs of the JSON format. I don't know if you are using plain Java or a framework. Something which has worked for me like a charm in the past is xstream. It supports (de)serializing of XML and JSON. Serializing is ideal for testing (just dump an object/array/List) to XML. It works the other way around to. Read in a JSON structure and it creates an object, from which you very easily get a certain field. See this page for an example. Hope this helps a bit.

I mean no disrespect but EDDN is Open Source (written in Python) and you and everyone else can (and are encouraged to) create a pull request for it - to solve issues or add functionality.

I'm using Jackson for reading JSON and XStream for writing XML. I turned off the "fail on unknown" flag in Jackson, so now it works. And I do know that JSON is additive (as is XML). Still defeats the purpose of having a schema if it isn't really used...

Askarr · Dec 29, 2014

One thing that intrigues me is how to solve the transform problem. Handling messages like these seems to fall into four stages:

Firehose/consolidate the messages from many clients (EDDN solves this)
Store the messages (either temporarily or long term, ranging from Kafka for temporary to Cassandra for long term, to MySQL etc. etc.)
Transform the messages
We have a lot of data that is partially corrupt. OCR isn't perfect. I've seen commodity data that is almost spelt right, for example. There's foreign languages to contend with as well.
Do something useful with the commodity data
.. Profit! (couldn't resist)

I ponder - what's the best approach for transforming the data to viable consolidated uniform trade data?

The engineer in me ponders a unified backend: EDDN, then some temp storage like Kafka, something like Storm to translate & process the messages, and then a 'clean' stream of trade data that doesn't contain typos etc. However, someone would have to build all that, and while I'd be happy to participate & contribute, I know I couldn't commit to providing all that any time soon and I doubt anyone else would.

Another alternative is crowd-solving the problem, as wolverine mentions. Several people will independently bang on this nail, and come up with their own solutions. What would be ideal is we can share those solutions and mutually share translation data in some way.

So here's my wild idea. Why not leverage EDDN! Add a new schema type. Let the crowd provide translations for everyone. Something like:

Code:

{
    "$schemaRef": "http://schemas.elite-markets.net/eddn/commodity_transform/1",
    "header": {
        "uploaderID": "bob",
        "softwareName": "My Vaguely Cool Processing Backend",
        "softwareVersion": "v0.1.5.b"
    },
    "message": {
        "typoLadenEntry": "Eran1n",
        "validEntry": "Eranin",
        "timestamp": "2014-11-17T12:34:56+00:00"
    }
}

K Kinnison · Dec 29, 2014

Well it's took me a while today, never used Python before (I'm a web developer, so I usually use PHP), so a fair learning curve there, but I've got a python script running now that's connected to the EDDN datastream and populating a basic mySQL database on my tiny VPS. What I'll do with the databse, I'm not sure yet though, lol!
For anyone else who's not familiar with Python here is my code (which I'm sure everyone will tell me is crap! lol)

Code:

import zlib
import zmq.green as zmq
import simplejson
import MySQLdb as mdb
import sys

def main():
        con = mdb.connect('localhost', 'elite_eddn', 'password', 'elite_eddn');
        context = zmq.Context()
        subscriber = context.socket(zmq.SUB)
        subscriber.setsockopt(zmq.SUBSCRIBE, "")
        subscriber.connect('tcp://eddn-relay.elite-markets.net:9500')

        while True:
                market_json = zlib.decompress(subscriber.recv())
                market_data = simplejson.loads(market_json)
                if market_data['$schemaRef'] == 'http://schemas.elite-markets.net/eddn/commodity/1':
                        softwareVersion = market_data['header']['softwareVersion']
                        gatewayTimestamp = market_data['header']['gatewayTimestamp']
                        softwareName = market_data['header']['softwareName']
                        uploaderID = market_data['header']['uploaderID']
                        buyPrice = market_data['message']['buyPrice']
                        timestamp = market_data['message']['timestamp']
                        stationStock = market_data['message']['stationStock']
                        systemName = market_data['message']['systemName'].encode('string-escape')
                        stationName = market_data['message']['stationName'].encode('string-escape')
                        demand = market_data['message']['demand']
                        sellPrice = market_data['message']['sellPrice']
                        itemName = market_data['message']['itemName'].encode('string-escape')

                        with con:

                                cur = con.cursor()
                                insert_stmt = "INSERT INTO commodities (softwareVersion, gatewayTimestamp, softwareName, uploaderID, buyPrice, timestamp, \
                                        stationStock, systemName, stationName, demand, sellPrice, itemName, server_time) VALUES \
                                        ('%s','%s','%s','%s',%s,'%s',%s,'%s','%s',%s,%s,'%s',NOW())"
                                data = (softwareVersion,gatewayTimestamp,softwareName,uploaderID,buyPrice,timestamp,stationStock,systemName,stationName, \
                                        demand,sellPrice,itemName)

                                do_insert =  insert_stmt % data

                                cur.execute(do_insert)

                                print "Row Inserted for ",systemName,stationName,itemName,cur.rowcount

                        sys.stdout.flush()

if __name__ == '__main__':
        main()

EDIT: I don't actually mind if people tell me it's crap if they give constructive ideas on how to improve it - as I said, I'm completely new to Python.

Snake Man · Dec 29, 2014

Thanks Kinnison for the code, much appreciated.

the_angry_angel · Dec 29, 2014

Incase it helps anyone get a leg up, I've pushed a very small and quick vagrant+ansible project to github.com/theangryangel/elite-dangerous-eddn-elasticsearch-kibana-vagrant that will setup a provision a VM to collect data from EDDN, shove it into Elasticsearch, provide a Kibana 4 frontend and automatic culling of data older than 30 days.

Although I have put minimal instructions in the repo, consider it sold and seen with no support, I'm afraid.

Couple of caveats;
I'm a sysadmin, not a programmer. Code quality may vary.
I would *not* run this on any public facing without modifying the ansible playbook. There are serious issues, such as it pulling a module or two directly from source, a lack of log rotation for a couple of components.

I rarely visit the forums, but I just want to thank everyone involved. I've had some great fun with this the last evening or two, pulling in data and analysing it.

If the EDDN does continue, and grows past it's current in carnation, please get in touch - I may be able to provide some resources through work - depending on what's required

wolverine2710 · Dec 29, 2014

the_angry_angel said:
Incase it helps anyone get a leg up, I've pushed a very small and quick vagrant+ansible project to github.com/theangryangel/elite-dangerous-eddn-elasticsearch-kibana-vagrant that will setup a provision a VM to collect data from EDDN, shove it into Elasticsearch, provide a Kibana 4 frontend and automatic culling of data older than 30 days.

Although I have put minimal instructions in the repo, consider it sold and seen with no support, I'm afraid.

Couple of caveats;
I'm a sysadmin, not a programmer. Code quality may vary.
I would *not* run this on any public facing without modifying the ansible playbook. There are serious issues, such as it pulling a module or two directly from source, a lack of log rotation for a couple of components.

I rarely visit the forums, but I just want to thank everyone involved. I've had some great fun with this the last evening or two, pulling in data and analysing it.

If the EDDN does continue, and grows past it's current in carnation, please get in touch - I may be able to provide some resources through work - depending on what's required

Thats brilliant. I wil use your setup as a base to create an ELK stack on the EDDN server. Saves a lot of work/exploration.

EDDN WILL grow past its current in carnation. I'm sure about that!! Thanks to your work and the others which are building upon, enhancing EDDN. Each and every effort is so much appreciated. Rest assured, I will contact you in the (near) future. Thanks again for the great work you have done. Very much appreciated.

Xmas holidays and now my nephew being here for a few days will make sure I can't do anything till the 2nd of January 2015 ;-(

- - - - - Additional Content Posted / Auto Merge - - - - -

K Kinnison said:

Well it's took me a while today, never used Python before (I'm a web developer, so I usually use PHP), so a fair learning curve there, but I've got a python script running now that's connected to the EDDN datastream and populating a basic mySQL database on my tiny VPS. What I'll do with the databse, I'm not sure yet though, lol!
For anyone else who's not familiar with Python here is my code (which I'm sure everyone will tell me is crap! lol)

Code:

import zlib
import zmq.green as zmq
import simplejson
import MySQLdb as mdb
import sys

def main():
        con = mdb.connect('localhost', 'elite_eddn', 'password', 'elite_eddn');
        context = zmq.Context()
        subscriber = context.socket(zmq.SUB)
        subscriber.setsockopt(zmq.SUBSCRIBE, "")
        subscriber.connect('tcp://eddn-relay.elite-markets.net:9500')

        while True:
                market_json = zlib.decompress(subscriber.recv())
                market_data = simplejson.loads(market_json)
                if market_data['$schemaRef'] == 'http://schemas.elite-markets.net/eddn/commodity/1':
                        softwareVersion = market_data['header']['softwareVersion']
                        gatewayTimestamp = market_data['header']['gatewayTimestamp']
                        softwareName = market_data['header']['softwareName']
                        uploaderID = market_data['header']['uploaderID']
                        buyPrice = market_data['message']['buyPrice']
                        timestamp = market_data['message']['timestamp']
                        stationStock = market_data['message']['stationStock']
                        systemName = market_data['message']['systemName'].encode('string-escape')
                        stationName = market_data['message']['stationName'].encode('string-escape')
                        demand = market_data['message']['demand']
                        sellPrice = market_data['message']['sellPrice']
                        itemName = market_data['message']['itemName'].encode('string-escape')

                        with con:

                                cur = con.cursor()
                                insert_stmt = "INSERT INTO commodities (softwareVersion, gatewayTimestamp, softwareName, uploaderID, buyPrice, timestamp, \
                                        stationStock, systemName, stationName, demand, sellPrice, itemName, server_time) VALUES \
                                        ('%s','%s','%s','%s',%s,'%s',%s,'%s','%s',%s,%s,'%s',NOW())"
                                data = (softwareVersion,gatewayTimestamp,softwareName,uploaderID,buyPrice,timestamp,stationStock,systemName,stationName, \
                                        demand,sellPrice,itemName)

                                do_insert =  insert_stmt % data

                                cur.execute(do_insert)

                                print "Row Inserted for ",systemName,stationName,itemName,cur.rowcount

                        sys.stdout.flush()

if __name__ == '__main__':
        main()

EDIT: I don't actually mind if people tell me it's crap if they give constructive ideas on how to improve it - as I said, I'm completely new to Python.

Cool. Very much appreciated. It seems EDDN (thanks Andreas for setting it up in the first place in combination with marketdump) is on the right track if I look at all the work commanders are putting into it. I'm sure that in a few weeks the EDDN data will be used in the existing and new trading tools ;-)

I'm SO hoping that in the nearer future FD will make a web-api or dump info into a JSON/XML file so that OCR-ing the market won't be needed anymore and that the data will be 100% correct.

Release EDDN - Elite Dangerous Data Network. Trading tools sharing info in an unified way.

wolverine2710

Tutorial & Guide Writer

RickDangerous

donpost

Walter Harriman

RickDangerous

Walter Harriman

donpost

wolverine2710

Tutorial & Guide Writer

wolverine2710

Tutorial & Guide Writer

CMDRKNac

Askarr

Askarr

wolverine2710

Tutorial & Guide Writer

donpost

RickDangerous

Askarr

K Kinnison

Snake Man

the_angry_angel

wolverine2710

Tutorial & Guide Writer