Release EDDN - Elite Dangerous Data Network. Trading tools sharing info in an unified way.

Status
Thread Closed: Not open for further replies.

wolverine2710

Tutorial & Guide Writer
Again I'm posting this because how EliteOCR deals with it (which is an extension on how is represented in game). Techncially, zero and null are different things. It's not hard for me to add an other if statement and attach an integer 0 to a value if it does not exist and I guess everybody can do the same, but it reduced 'straightforwardness' from EliteOCR. I guess is a matter of us as a community comming to standard on how to deal with it which is what we are doing now

I agree, communication is key as is setting standards for the community - ideal world.
It would be suboptimal if a defacto standards arises because someone was first. Be that EliteOCR or EDDN or <toolX> for that matter.
Also wrt system names being in a .csv or not, especially given the fact that station/platform names are NOT unique. Lets not try to guess to which system a platform belongs, lets just specify it.
 
Last edited:
In other news... if anyone can tell me why I am not receiving messages using this code I'd be eternally grateful. It definitely connects but gets stuck on the last line waiting for a message when I know for a fact that messages are being sent out.

Code:
Dim context As New ZMQ.Context
Dim subscriber = context.Socket(SocketType.SUB)
Dim emptyArray As Byte() = System.Text.Encoding.Default.GetBytes("")
subscriber.SetSockOpt(ZMQ.SocketOpt.SUBSCRIBE, emptyArray)
subscriber.Connect("tcp://eddn-gateway.elite-markets.net:9500")
Dim response = subscriber.Recv()

The code looks OK to me (not that I've written any VB in a long time). I'm actually trying to track down a similar problem at my end - one of my computers connects just fine but receives no messages, while all my other machines work perfectly.

What version of ZeroMQ are you running? If you PM me your IP address I can check that the connection's actually getting through to rule that out as a problem.
 
Does the OCR scan only get applied to the panel on the right, then? Because sell/buy is listed for most things in the big table on the left...

Ok my bad, I looked into a csv file and was one of those cases of not being sure who is doing the selling and the buying :D here is some values and then 'SELL' (price at which they buy) column is always present, the one which is not always is BUY (ofc if the station does not offer the commodity):

System;Station;Commodity;Sell;Buy;Demand;;Supply;;Date;
;Kennan Dock;Explosives;204;219;;;12870;Med;2014-12-17T21:20;
;Kennan Dock;Hydrogen Fuel;102;107;;;85107;Med;2014-12-17T21:20;
;Kennan Dock;Mineral Oil;204;;100468;Low;;;2014-12-17T21:20;
 

wolverine2710

Tutorial & Guide Writer
EliteOCR is out now. The following wrt system names in the .csv file(s) might be interesting:
0.3.6
- Warnings in case people want to export CSV without system name. BPC and some other tools require system name!
 
Hi guys,
I finally found time to look closely at EDDN. It look very promising! I would like to add direct export from EliteOCR to EDDN. Maybe as another plugin.
I have one question. Do I have to create one separate POST request for every single commodity?

I hope makers of other tools, which are already compatible with EliteOCR, won't have any problem with me creating a direct connection.
 
A direct connection is very welcome, it will even save time for everybody as the servers/tools will fetch the data directly (as most people who is using 3rd party tools is using EliteOCR by now) without an extra layer of output-input (export/import csv).

As for the POST, right now seems like it, and IMO it should stay that way (market ticks must work like that).

BTW I just tried last version of OCR and I must congratulate you on a fantastic job as right now it scans images without any errors. Pretty good stuff.
 
Just saw commodity updates come through from EliteOCR, timestamped 2014-12-18T20:35:39.967953 - just thought I'd mention it in case anyone was testing and wondered if it worked.
 
It's EliteOCRReader who was sending the messages.
On more question. How should I create the uploader ID? I'm thinking of making something like EO + 6 random symbols from (0-9,a-f) (kinda like hexadecimal). Is this OK or are there some guide lines for this?
I will save one per user in the windows registry (where I save other settings). Those are persistent from version to version until somebody deletes them by hand.

I tried doing post requests but all I get is : Expecting value: line 1 column 1 (char 0)
 
Last edited:
Does not seem to be any guideline for now. I'm implementing that just now, in my case is obvious because I'll be using the userID from the database of the user that is uploading the data.

In the case of EliteOCR (or any other desktop app), you can either use one global random ID (you can use whatever, like 'seeebek is the boss') for every instance of your application or, more optimal, try to randomize some ID for each instance of your application that always stay the same as you plan.

In the future this may be used to 'ignore' certain users known for posting bad data or whatever, so is a good idea to keep them singular as you plan.

- - - - - Additional Content Posted / Auto Merge - - - - -

I tried doing post requests but all I get is : Expecting value: line 1 column 1 (char 0)

Are you including the schemaRef and complying to the format? I guess you are using Python for this too, this is how it looks like in my app if it helps you:

Code:
            JSON = {
                        "$schemaRef": "http://schemas.elite-markets.net/eddn/commodity/1",
                        "header": {
                            "uploaderID": 'temp',
                            "softwareName": "ED Central Dev Server",
                            "softwareVersion": "v0.2.2"
                        },                    
                        "message": message
                    }         
            headers = {'content-type':'application/json; charset=utf-8'}
            
            try:
                requests.post(url, data=json.dumps(JSON), headers=headers)
            except:
                print('Failed to POST data to EDDN .....')

(message is a dictionary including systemName, stationName et al.) Using the Request library.

PS: I must add that you have to be careful with the 'json' module from standard library in Python depending on the version you are using, there is some bug that keeps coming back and forth regarding serialization of objects and from bytes to strings and viceversa. I didn't know what was wrong until I started using the json module that comes in Flask library instead of the one in the standard library and everything got 'suddenly' fixed.
 
Last edited:
Does not seem to be any guideline for now. I'm implementing that just now, in my case is obvious because I'll be using the userID from the database of the user that is uploading the data.

In the case of EliteOCR (or any other desktop app), you can either use one global random ID (you can use whatever, like 'seeebek is the boss') for every instance of your application or, more optimal, try to randomize some ID for each instance of your application that always stay the same as you plan.

In the future this may be used to 'ignore' certain users known for posting bad data or whatever, so is a good idea to keep them singular as you plan.

- - - - - Additional Content Posted / Auto Merge - - - - -



Are you including the schemaRef and complying to the format? I guess you are using Python for this too, this is how it looks like in my app if it helps you:

Code:
            JSON = {
                        "$schemaRef": "",
                        "header": {
                            "uploaderID": 'temp',
                            "softwareName": "ED Central Dev Server",
                            "softwareVersion": "v0.2.2"
                        },                    
                        "message": message
                    }         
            headers = {'content-type':'application/json; charset=utf-8'}
            
            try:
                requests.post(url, data=json.dumps(JSON), headers=headers)
            except:
                print('Failed to POST data to EDDN .....')

(message is a dictionary including systemName, stationName et al.) Using the Request library.

PS: I must add that you have to be careful with the 'json' module from standard library in Python depending on the version you are using, there is some bug that keeps coming back and forth regarding serialization of objects and from bytes to strings and viceversa. I didn't know what was wrong until I started using the json module that comes in Flask library instead of the one in the standard library and everything got 'suddenly' fixed.
Ah I am so stupid. I was trying to pass the dict directly to requests.posts without using json.dumps.
I will try and maybe have working code tomorrow. I was testing until now with http://schemas.elite-markets.net/eddn/commodity/1/test
 
On more question. How should I create the uploader ID? I'm thinking of making something like EO + 6 random symbols from (0-9,a-f) (kinda like hexadecimal). Is this OK or are there some guide lines for this?

My original plan was to have uploaderID generated by the gateway (a salted hash of the uploader's IP address) so that people could filter out known-malicious uploaders if they wanted to. I might put that in another property if people would find it useful to have some application-specific data in uploaderID (for example, you could use it to tell if you've just received a message that the application just sent out, and not process it). What are people's thoughts?
 
My original plan was to have uploaderID generated by the gateway (a salted hash of the uploader's IP address) so that people could filter out known-malicious uploaders if they wanted to.

How would that work in case of servers and centralized databases? In my case all the data is sent by the server, the user ID (just finalized the whole thign and is operational now) is attached from the local database user ID in the JSON field. But if it was assigned automatically by EDDN it would always have the same ID regardless of the user. Even if I can control and moderate most stuff some toxic data may pass before I catch it, but it would be a loss to ban or ignore all the posts from one source just because a couple loose bad users.
 

wolverine2710

Tutorial & Guide Writer
Hi guys,
I finally found time to look closely at EDDN. It look very promising! I would like to add direct export from EliteOCR to EDDN. Maybe as another plugin.
I have one question. Do I have to create one separate POST request for every single commodity?

I hope makers of other tools, which are already compatible with EliteOCR, won't have any problem with me creating a direct connection.

PERFECT news. Welcome on board. Your tool gets better and better!! The more data the better ;-)

Direct export sounds sweet. Aside from the implementation (plugin or not) I think the BPC approach is a good one. If they use the BPC, even in local mode, data is uploaded to the BPC so everyone can/will benefit. So I'm kind of hoping that uploading to EDDN is NOT something one can disable. One thing to consider. Afaik your .csv file(s) can contain data from older scans as well. In the final version of your EliteOCR_EDDN_Feeder perhaps best to send only recent data. EDDN is just a relayer. Checking for stale data can best be done at the sender side and of course the receiver. At this point it is best to send as much data as possible to stress test EDDN a bit.

Looking forward to see EDDN support in EliteOCR ;-)
 
James, any thought on requiring an array in the POST, which would allow more than one commodity in one message. If there is just one, then it's a length 1 array. Just thinking about seeebek's question about having to POST for every commodity.

EDIT: Hmmm, or maybe the schema would allow for an array of commodities in the "message" field...
 
Last edited:

wolverine2710

Tutorial & Guide Writer
For some the Xmas period means more time for other it means less time. The latter is true for me. Going a week on holiday, Rosi in Germany. I will have internet access there but give previous experience probably not that much time to post (here).

Now that EDDN slowly but steadily gets more data/attention I've written Slopey (BPC), Cmdr Thrudd and kfsone (TD) a PM asking them to consider creating a <toolname>_EDDN_Feeder. That way hopefully EDDN can be stress tested a bit and check if the JSON format suffices for everyone. Connecting to the EDDN firehose can be done later when things settle down a bit. Lets start small and first feed EDDN with data. This week I've contacted more commanders like LasseB., gazelle and Maddavo (TD merge tool) with the same request. I will be doing the same this weekend for the other authors of OCR solutions.

Some like Serpenstar do have a OCR solution but not limited programming skills Each their own speciality;-) Making a <toolname_EDDN_Feeder difficult/impossible. I will check them all and report back. Perhaps a commander can make a generic tool which takes a generated .csv file (there could be multiple format) and uploading it to the EDDN.

Commanders, EDDN's future is starting to look brighter and brighter ;-)
 

wolverine2710

Tutorial & Guide Writer
James, any thought on requiring an array in the POST, which would allow more than one commodity in one message. If there is just one, then it's a length 1 array. Just thinking about seeebek's question about having to POST for every commodity.

EDIT: Hmmm, or maybe the schema would allow for an array of commodities in the "message" field...

There are probably two school of thought here.
1) a http POST for every commodity
2) a htp POST which can send a whole market in one go.

Both have advantages/disadvantages. (2) would mean less connections but perhaps a bit more difficult code (sender and receiver). If EDDN is able to handle situation (1) without problems things can stay simple(r). Time will tell. A schema for array support could be one way to handle it but perhaps best to have as little schema's as possible. James and other what do you think.

Edit: Perhaps best to have in the test phase a tool to stress test EDDN. As in send X messages in Y seconds, receive them all and check if all is received or something did get lost. Checking on an identifier means the stress test tool can filter out the rest. I've contacted a commander about that, have to PM him back ;-(. Perhaps others have or are creating a similar tool. The more the better. Each and every tool to (stress) test or feed EDDN is most welcome.....
 
One of the problems I experienced with EMDN was the commodity-per-message. It's just not a good pattern. Among the problems is the simple matter of disambiguating a gap in data from loss vs omission vs absence.

Code:
   + Chemicals
      Explosives                177     190         ?    21561H  2014-12-17 13:16:47
      Mineral Oil               247       0         ?         -  2014-12-17 13:16:47

Was Hydrogen:
- not updated?
- unavailable?
- lost data?

There are perf considerations behind having messages as update batches rather than single commodities, too.

But the really big problem that EDDN is going to experience is that there is no central authority for Systems, Stations, etc, so the network will rapidly fill up with data from overlapping but not equal source sets. Especially with divergent tools involved, you'll be getting "THINGY PORT", "Thingy Port" and "thingyport".

You can also anticipate getting really awful time data. You'll be receiving random timezones, random accuracy (expect future timestamps and ancient timestamps on a regular basis), and the occasional human error in the timestamp field.

You've also chosen fairly wordy json. I did some preliminary work creating a shareable json format for TD (http://kfs.org/td/source -> misc/prices-json-expr.py), which has a dictionary of the item table first (so I can use a locally-scoped set of IDs for items that the recipient can easily translate once, saving a ton of text processing work). But I trimmed hundreds of kb of a relatively small prices file by using terse names for the fields and by judicious use of arrays.

It's not just bandwidth I'm concerned about there; as the system grows and more users begin submitting prices, the cost of processing all those json fields goes up too. It's easy to think of this as a distributed system, but it's not distributed processing - each endpoint is doing a complete workload.

With regards to author IDs ... You could just provide a token mechanism; user goes to the website, provides their email address, you send that email address a token, store email + token to a database, and when a submission is sent with that token, you put the database ID (i.e. neither of the other two fields) into the outgoing zmq messages.

Have you considered using ZeroMQ to receive the messages?
 
But the really big problem that EDDN is going to experience is that there is no central authority for Systems, Stations, etc, so the network will rapidly fill up with data from overlapping but not equal source sets. Especially with divergent tools involved, you'll be getting "THINGY PORT", "Thingy Port" and "thingyport".

This is the biggest problem right now, but the solution is not so hard IMO: just validate every system (and in the hopefully not distant future every station) against a local copy of TGC/EDSC.

You can also anticipate getting really awful time data. You'll be receiving random timezones, random accuracy (expect future timestamps and ancient timestamps on a regular basis), and the occasional human error in the timestamp field.

Well, given that most data will be coming from EliteOCR and it's timestamped automatically I don't think this is gonna be a big deal. Everybody using tools or webs should be parsing and converting into UTC anyway IMO.

About consistency of sent data ("disambiguating a gap in data from loss vs omission vs absence"), the first steep is to decide what fields are required or not, if they are required and do not arrive then it would be data lose (but considering is sent in the same JSON object iI would consider that close to impossible, it will either arrive or not). I made a related question earlier, in case required fields do not have any data we should use 'none' value or do something else with them? Right now in the case of integer fields I'm using 0 when data is not available (like buyPrice or stationStock), it kind of a solution. For non-requires string fields (level of supply/demand - low/med/high) if they are available I'm including them if not I'm omitting them.
 
Last edited:
Status
Thread Closed: Not open for further replies.
Back
Top Bottom