Release EDDN - Elite Dangerous Data Network. Trading tools sharing info in an unified way.

maxh2003 · Dec 30, 2014

K Kinnison said:
Also, can I please ask that anyone testing the system uses the TEST SCHEMA - so I don't get things like this :
INSERT INTO `commodities` (`id`, `softwareVersion`, `gatewayTimestamp`, `softwareName`, `uploaderID`, `buyPrice`, `timestamp`, `stationStock`, `systemName`, `stationName`, `demand`, `sellPrice`, `itemName`, `server_time`) VALUES
(25024, 'v1.0', '2014-12-30T17:46:07.705573', 'RegulatedNoise', '6de58078-8bba-4a93-b646-2e20c35f1327', 276, '2014-12-29T19:21:00', 1491, 'SomeSystem', '', 0, 257, 'Basic Medicines', '2014-12-30 17:46:07');

I can do two things here, default to the test schema and prevent upload where the system name hasn't been changed from the default. Is there anything else wrong with your parsed data that I've missed?

MagmaiKH · Dec 30, 2014

Any anyone gotten the zlib deflate working in .NET?
I tried the built-in System.IO.Compression.DeflateStream and a couple of zlib libraries and I always get data errors trying to decompress the stream.
There was a few post about skipping the first couple of bytes as they could be zlib header data prior to the deflate header but that did not help.

maxh2003 · Dec 30, 2014

MagmaiKH said:
Any anyone gotten the zlib deflate working in .NET?

PM sent

Askarr · Dec 31, 2014

Andargor said:
I would recommend that commanders not dump the feed directly to database, since you will always get garbage. Although there is a schema for the container, there is no way to "enforce" the content in EDDN, such as bad OCR.

Personal opinion here: NoSQL is probably more suited for this. If you wish to use SQL with "clean" data, then perhaps an intermediate NoSQL database can help as you can retroactively perform corrections (e.g. once a proper station name is determined after the fact).

I would agree. Any consumer of this data is going to get bad data points. As I said earlier, I still feel the best way to solve this is that everyone has a way to contribute to corrections. That way, any methods added benefit everyone.

It need not be via EDDN (but if you're already listening to it...), it could be a separate corrections API; it seems like this is a problem that affects everyone.

CMDRKNac said:
My next to-do thing is add a list for commodity names and translate from other language clients (if anybody can help with this I'll be glad, like a list of commodities and english equivalent) to english.

I'm currently looking into hooking up Google Translate with the incoming data. Of course the remaining problem is where to make the data available.

seeebek · Dec 31, 2014

CMDRKNac said:
My next to-do thing is add a list for commodity names and translate from other language clients (if anybody can help with this I'll be glad, like a list of commodities and english equivalent) to english.

I'm already working on support for other languages in eliteOCR. I already have german finished (not yet public). I hope just as you do to find people who will help me to get the translations. If not: Once I'm home I will run the game, switch it to other languages and make screenshots to get the data.

I will translate automatically the data export to EDDN to english. this will reduce the language cluttering.

K Kinnison · Dec 31, 2014

seeebek - can you explain about uploaderID - does that change everytime someone uploads data, or is it constant for a specific uploader client?

K Kinnison · Dec 31, 2014

Anyone with a bit of python knowledge know how I can sort this issue - I understand it's to do with unicode translation, but what function etc should I use?
Traceback (most recent call last):
File "eddn_test.py", line 56, in <module>
main()
File "eddn_test.py", line 47, in main
print do_insert
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 339: ordinal not in range(128)

It's causing my script to die every now and then..

Askarr · Dec 31, 2014

K Kinnison said:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 339: ordinal not in range(128)

Assuming Python 2 in the below.

Your script has got Unicode & standard strings confused somewhere. Python's default encoding is ASCII, and print is telling you that it can't print the unicode character è from what is supposed to be an ASCII string. Example command line equivalent:

Code:

>>> print u'\xe8'
è

>>> print u'\xe8' + "a"
èa

>>> print u'\xe8' + "aè"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8a in position 1: ordinal
not in range(128)

>>> print u'\xe8' + u"aè"
èaè

>>> type("a")
<type 'str'>

>>> type(u"a")
<type 'unicode'>

The string literal is considered ASCII, but in the above I tried to put an accented e directly in. This violates the 0-127 strict requirement. Marking the literal as Unicode fixes this. Wherever your code is handling Unicode codepoints, you need to handle encoding and decoding such that the end result matches the valid character ranges for your types. If it's type str, it must be ASCII or have the encoding specified. If it's type unicode, it's allowed to contain Unicode characters directly.

Edit: it is considered good practice to use Unicode throughout your code and only convert to str when dealing with IO at the edges of your program. This keeps things vaguely sane.

Askarr · Dec 31, 2014

Having a list of known commodities and their in-game translations is key, as far as I can see, for handling the incoming data well. While some are just badly spelt courtesy of OCR, the problem with automated translations is two-fold:

Non-English languages appear to get mangled far more by OCR (whether directly or because EliteOCR & RegulatedNoise have built-in hints for English already)
The game's translations are not like for like. Non-lethal weapons is Armes Incapacitantes in-game. The literal translations are Armes Non Létales & Stun Weapons respectively.

Thus while a run of a translate & spelling-correct program over the data did yield successes, such as Cartouches Alimentaires to Food Cartridges, automation alone is going to be hit & miss.

K Kinnison · Dec 31, 2014

OK, thanks for that Askarr - this is my first foray into Python and it seems a bit finicky! PHP doesn't throw up these kind of issues for me.
I've put the u part in, hopefully it will sort the issue - I await the next issue!! lol.

EDIT: OH - PS - you said "assuming Python 2" - which was reasonable to assume - it's Python 2.7 on my server.
I looked at the possibility of installing Python 3 but googling a little suggested to me that sticking with Python 2 was probably the better bet in terms of compatibility.
Do you think Python 3 would be better?

CMDRKNac · Dec 31, 2014

seeebek said:
I'm already working on support for other languages in eliteOCR. I already have german finished (not yet public). I hope just as you do to find people who will help me to get the translations. If not: Once I'm home I will run the game, switch it to other languages and make screenshots to get the data.

I will translate automatically the data export to EDDN to english. this will reduce the language cluttering.

Thanks, great to hear.

seeebek · Dec 31, 2014

K Kinnison said:
seeebek - can you explain about uploaderID - does that change everytime someone uploads data, or is it constant for a specific uploader client?

UploaderID stays static between versions. Only way to change it isto delete settings in windows registry. 99% of users will not be able to change it.

seeebek · Dec 31, 2014

K Kinnison said:
Anyone with a bit of python knowledge know how I can sort this issue - I understand it's to do with unicode translation, but what function etc should I use?
Traceback (most recent call last):
File "eddn_test.py", line 56, in <module>
main()
File "eddn_test.py", line 47, in main
print do_insert
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 339: ordinal not in range(128)

It's causing my script to die every now and then..

I got a lot of problems in EliteOCR while readin files or saving them. Basically I had to put .enclde("windows-1252") or .decode("windows-1252") on almost everystring not to get problems with windows and special characters.
I hope this gives you at least an idea what to search for.

- - - - - Additional Content Posted / Auto Merge - - - - -

Askarr said:
Having a list of known commodities and their in-game translations is key, as far as I can see, for handling the incoming data well. While some are just badly spelt courtesy of OCR, the problem with automated translations is two-fold:

Non-English languages appear to get mangled far more by OCR (whether directly or because EliteOCR & RegulatedNoise have built-in hints for English already)

The game's translations are not like for like. Non-lethal weapons is Armes Incapacitantes in-game. The literal translations are Armes Non Létales & Stun Weapons respectively.

Thus while a run of a translate & spelling-correct program over the data did yield successes, such as Cartouches Alimentaires to Food Cartridges, automation alone is going to be hit & miss.

Yes, this is exactly why I will collect the names as they appear in the game. I will try to provide them to everybody as needed. For now I have only german. Send me PM if you need them.

seeebek · Dec 31, 2014

K Kinnison said:
That sounds like a *lot* of manual correction!
At the end of the day, I don't know personally which one of

Naddoddur Terminal
Nadoodour Terminal
Naooooour Terminal
Nadooddur Terminal

is correct - am I meant to fly there and check them myself!?

My guess would be the one ith the most "D"s. Since eliteocr makes the mistake mostly in one direction: D->O
I do my best to reduce the damage

K Kinnison · Dec 31, 2014

seeebek said:
UploaderID stays static between versions. Only way to change it isto delete settings in windows registry. 99% of users will not be able to change it.

Sorry, I don't think I was making myself clear. Does the ID change every single time someone downloads data, or does it stay the same?
So, if, for example, someone downloads data for Leesti at 9am then turns off the PC, then turns the PC back on 10 hours later and downloads additional data for Leesti, is the uploaderID the same or is it different?

EDIT: TBH I think I already know the answer to this one, I'm just double checking!

Askarr · Dec 31, 2014

K Kinnison said:
...suggested to me that sticking with Python 2 was probably the better bet in terms of compatibility.
Do you think Python 3 would be better?

I'd agree with that suggestion. While Python 3 is a great iteration of the language (and for starters, fixes most of this Unicode nonsense by default), a lot of modules haven't really caught up, and for pyzmq specifically, gevent (a required module) isn't available for Python 3 last I looked. So for working with EDDN I'd say Python 2.7.9 is as far as you could go (unless someone's got a Python 3 0MQ binding lying around...).

Snake Man · Dec 31, 2014

Hey guys is this the correct format to get from client.py:

Code:

{'header': {'softwareVersion': '0.3.8', 'gatewayTimestamp':

Meaning its single quotes ' instead of " ?

EDDN wiki shows the example with " quotes, http://jsonlint.com/ says a copy paste from EDDN client.py data stream is invalid, if I swap ' to " then its valid.

So does just the client.py output show wrong character, or does all the senders and EDDN itself already send them?

seeebek · Jan 1, 2015

K Kinnison said:
Sorry, I don't think I was making myself clear. Does the ID change every single time someone downloads data, or does it stay the same?
So, if, for example, someone downloads data for Leesti at 9am then turns off the PC, then turns the PC back on 10 hours later and downloads additional data for Leesti, is the uploaderID the same or is it different?

EDIT: TBH I think I already know the answer to this one, I'm just double checking!

Yes it stays the same even if you restart your computer and even if you update EliteOCR to a newer version. And actually even if you delete eliteOCR, wait 30 days and download it again. The uploaderID stays in the registry and is always the same. I don't understand why you use the word "download". EliteOCR uploads the data.

- - - - - Additional Content Posted / Auto Merge - - - - -

Snake Man said:
Hey guys is this the correct format to get from client.py:

Code:

{'header': {'softwareVersion': '0.3.8', 'gatewayTimestamp':

Meaning its single quotes ' instead of " ?

EDDN wiki shows the example with " quotes, http://jsonlint.com/ says a copy paste from EDDN client.py data stream is invalid, if I swap ' to " then its valid.

So does just the client.py output show wrong character, or does all the senders and EDDN itself already send them?

I checked on eliteOCR's side. It genererates the request always with " not with '. So the problem seems to be on the client side.

Snake Man · Jan 1, 2015

Ah I see, thanks seeebek, it must be how client.py in my linux shell is outputting the data, perhaps pure python issue on "print" command dunno, never wrote single line of python.

Jamesrecmuscat seems to be unavailable, does anyone else know if its EDDN itself or only the client.py which generates invalid JSON output? And of course if its the .py how to fix it?

seeebek · Jan 1, 2015

Snake Man said:
Ah I see, thanks seeebek, it must be how client.py in my linux shell is outputting the data, perhaps pure python issue on "print" command dunno, never wrote single line of python.

Jamesrecmuscat seems to be unavailable, does anyone else know if its EDDN itself or only the client.py which generates invalid JSON output? And of course if its the .py how to fix it?

It is safe to do something like this:

Code:

[TABLE="class: highlight tab-size-8 js-file-line-container"]
[TR]
[TD="class: blob-code js-file-line"][COLOR=#A71D5D]while[/COLOR] [COLOR=#0086B3]True[/COLOR]:[/TD]
[/TR]
[TR]
[TD="class: blob-num js-line-number, align: right"][/TD]
[TD="class: blob-code js-file-line"]        market_json [COLOR=#A71D5D]=[/COLOR] zlib.decompress(subscriber.recv())[/TD]
[/TR]
[TR]
[TD="class: blob-num js-line-number, align: right"][/TD]
[TD="class: blob-code js-file-line"]        market_data [COLOR=#A71D5D]=[/COLOR] simplejson.loads(market_json)[/TD]
[/TR]
[TR]
[TD="class: blob-num js-line-number, align: right"][/TD]
[TD="class: blob-code js-file-line"]        [COLOR=#A71D5D]print[/COLOR] str(market_data).replace("'",'"')[/TD]
[/TR]
[TR]
[TD="class: blob-num js-line-number, align: right"][/TD]
[TD="class: blob-code js-file-line"]        sys.stdout.flush()[/TD]
[/TR]
[/TABLE]

This will turn every ' into " in the given string.

Edit: I derp'd. You need to put str() around market data, since it is a dict not a string. This is also why there are '. This is how python represents dicts when printed or cast to string.

or you just do:

Code:

[TABLE="class: highlight tab-size-8 js-file-line-container"]
[TR]
[TD="class: blob-code js-file-line"][COLOR=#A71D5D]while[/COLOR] [COLOR=#0086B3]True[/COLOR]:[/TD]
[/TR]
[TR]
[TD="class: blob-num js-line-number, align: right"][/TD]
[TD="class: blob-code js-file-line"]       market_json [COLOR=#A71D5D]=[/COLOR] zlib.decompress(subscriber.recv())[/TD]
[/TR]
[TR]
[TD="class: blob-num js-line-number, align: right"][/TD]
[TD="class: blob-code js-file-line"]       print market_json[/TD]
[/TR]
[TR]
[TD="class: blob-num js-line-number, align: right"][/TD]
[TD="class: blob-code js-file-line"][/TD]
[/TR]
[TR]
[TD="class: blob-num js-line-number, align: right"][/TD]
[TD="class: blob-code js-file-line"]       sys.stdout.flush()[/TD]
[/TR]
[/TABLE]

Then you take the data directly from the stream without the need to convert it to dict first and then back to str. Also in the stream there seem to be only ".