Release EDDN - Elite Dangerous Data Network. Trading tools sharing info in an unified way.

wolverine2710 · Dec 29, 2014

Askarr said:
One thing that intrigues me is how to solve the transform problem. Handling messages like these seems to fall into four stages:

Firehose/consolidate the messages from many clients (EDDN solves this)

Store the messages (either temporarily or long term, ranging from Kafka for temporary to Cassandra for long term, to MySQL etc. etc.)

Transform the messages

We have a lot of data that is partially corrupt. OCR isn't perfect. I've seen commodity data that is almost spelt right, for example. There's foreign languages to contend with as well.

Do something useful with the commodity data

.. Profit! (couldn't resist)

I ponder - what's the best approach for transforming the data to viable consolidated uniform trade data?

The engineer in me ponders a unified backend: EDDN, then some temp storage like Kafka, something like Storm to translate & process the messages, and then a 'clean' stream of trade data that doesn't contain typos etc. However, someone would have to build all that, and while I'd be happy to participate & contribute, I know I couldn't commit to providing all that any time soon and I doubt anyone else would.

Another alternative is crowd-solving the problem, as wolverine mentions. Several people will independently bang on this nail, and come up with their own solutions. What would be ideal is we can share those solutions and mutually share translation data in some way.

So here's my wild idea. Why not leverage EDDN! Add a new schema type. Let the crowd provide translations for everyone. Something like:

Code:

{ "$schemaRef": "http://schemas.elite-markets.net/eddn/commodity_transform/1", "header": { "uploaderID": "bob", "softwareName": "My Vaguely Cool Processing Backend", "softwareVersion": "v0.1.5.b" }, "message": { "typoLadenEntry": "Eran1n", "validEntry": "Eranin", "timestamp": "2014-11-17T12:34:56+00:00" } }

I have to sleep a night about this before I can give you the answer you deserve.... Thanks for the input ;-). It IS appreciated.

K Kinnison · Dec 30, 2014

Left my script to run, got this error after several hours:
File "eddn_test.py", line 36, in main
itemName = market_data['message']['itemName'].encode('string-escape')
TypeError: must be string, not unicode

So I've tried .encode('unicode-escape') instead - not sure what the difference is TBH!

psema4 · Dec 30, 2014

Node.js Client

I threw together a simple client for Node.js, loosely modelled after the Python example. Alas, I'm unable to post the code 'cuz I don't have enough posts on the forums yet.

I'll come back and update this post with the code when I can.

Looking forward to pumping market data through Pentaho (Community Edition) for some visualizations and reports.

wolverine2710 · Dec 30, 2014

psema4 said:
I threw together a simple client for Node.js, loosely modelled after the Python example. Alas, I'm unable to post the code 'cuz I don't have enough posts on the forums yet.

I'll come back and update this post with the code when I can.

Looking forward to pumping market data through Pentaho (Community Edition) for some visualizations and reports.

You can PM me the code and I'm glad to post it for you ;-)
Also WELCOME commander!!!

wolverine2710 · Dec 30, 2014

A good sql schema

Good to see that commanders continue to experiment with EDDN ;-)
For any one wanting to put the data in a database you might want to look at the following thread "[GENERAL] Common Use SQL Game Database Source" by sngerous. He has created a wonderful sql schema which can be used for it looks all kind of data. Also if you are a database admin or database designer please DO hop over there and see if you help him to determine if it can be improved upon. Two (three, four) know of course more then one. Why reinvent your ow (roundish) wheel when there is a perfect round wheel waiting out there for you ;-)

wolverine2710 · Dec 30, 2014

wolverine2710 said:
You can PM me the code and I'm glad to post it for you ;-)
Also WELCOME commander!!!

Received a PM by psema4 . Here is his PM/code.

And the offer to share my node.js example.

I tried to wrap app.js and setup.js in the forums' code and spoiler tags but 'twas a no-go..

My code's up at https://github.com/psema4/node-eddn-client; really nothing more than a "hello world" example and not intended for production use. Fully intend to expand on it in the new year though.

Cheers,
@psema4

Snake Man · Dec 30, 2014

psema4 said:
I threw together a simple client for Node.js

I think you are looking examples from the one broken tool which feeds data into EDDN, there is no message part "categoryname" in the official EDDN format, so can you please remove: categoryName varchar(64) default '',

wolverine2710 can you please "enforce" this as there seems to be tons of confusion because of this one data entry. Would be much appreciated.

K Kinnison · Dec 30, 2014

Definitely is some bad data coming from OCR though - I've just seen 10 distinct stations for Leesti - and I thought "Whattttt!!!" - so investigated:
SQL query: SELECT distinct(stationName) FROM `commodities` WHERE systemName = 'Leesti' LIMIT 0, 25 ;
Rows: 10

Current selection does not contain a unique column. Grid edit, checkbox, Edit, Copy and Delete features are not available.

stationName
George Lucas
R.George Lucas
3George Lucas
8George Lucas
Sgeorge Lucas
Xgeorge Lucas
George Lucas 0
George Lucas 3
George Lucas N
Georgelucas

Not good.

K Kinnison · Dec 30, 2014

And another someone just added a few seconds ago.. clearly the OCR isn't working properly and they're not checking it...
SQL query: SELECT distinct(stationName) FROM `commodities` WHERE systemName = 'Yembo' LIMIT 0, 25 ;
Rows: 4

stationName
Naddoddur Terminal
Nadoodour Terminal
Naooooour Terminal
Nadooddur Terminal

Treb42 · Dec 30, 2014

I had a go with RN yesterday and found the OCR kinda lacking...
For one, I had no option in the GUI to correct certain errors, the CSV textbox stayed empty and whatever was recognized landed directly in the db.
I have no clue of how the OCR for commodities works but since it is a very limited dataset (just looking at the english commodities) wouldn't it be possible to automaticly check the OCR results against a defined list of known commodities!? Esp. since I can't simply edit the result in the app like i can for station and system names.

I'll post a few sample screenshots when im back home, but comparing the hit rates (running 2650x1440) eliteOCR performed alot better :/ but I realy like the workflow with RN

Treb42 · Dec 30, 2014

...and i posted in the wrong forum...argh!

Andargor · Dec 30, 2014

I would recommend that commanders not dump the feed directly to database, since you will always get garbage. Although there is a schema for the container, there is no way to "enforce" the content in EDDN, such as bad OCR. Hence, you must ensure you perform validation and/or correction before inserting in your DB.

Personal opinion here: NoSQL is probably more suited for this. If you wish to use SQL with "clean" data, then perhaps an intermediate NoSQL database can help as you can retroactively perform corrections (e.g. once a proper station name is determined after the fact).

Snake Man · Dec 30, 2014

Not sure if you meant that as reply to me but anyway, when I asked wolverine2710 to "enforce" it, I meant that he speaks directory to author of the tool that pumps the erroneous message type into EDDN. So yes there is definitely way to enforce content as in the FORMAT of content in EDDN.

psema4 · Dec 30, 2014

Snake Man said:
I think you are looking examples from the one broken tool which feeds data into EDDN, there is no message part "categoryname" in the official EDDN format, so can you please remove: categoryName varchar(64) default '',

Done; added a filter to discard object keys not defined by the schema.

CMDRKNac · Dec 30, 2014

I also saw some terrible station names (though the georgelucas thing was a prank lol), my plan to deal with it for now at least until TGC is updated to deal with stations and other stuff is when I feel the the db is well populated (won't take long at this pace!) to lock it so new stations cannot be created and clean it up slowly, leave the manual input on my web to insert any missing station and validate any input against existing stations (doing this with systems already and at least is blocking most of the garbage from being inserted in the db).

Far from ideal but I do not see other way to do it for now. On the other side most data seems to be quite healthy, I already have a trading tool feeding from it online and able to be used on the web btw.

My next to-do thing is add a list for commodity names and translate from other language clients (if anybody can help with this I'll be glad, like a list of commodities and english equivalent) to english.

Snake Man · Dec 30, 2014

Thank you psema4, nice job.

Also good news from RegulatedNoise (OCR utility) front, categoryname entry fixed from EDDN export.

maxh2003 · Dec 30, 2014

Snake Man said:
Thank you psema4, nice job.

Also good news from RegulatedNoise (OCR utility) front, categoryname entry fixed from EDDN export.

Sadly I don't have a lot of time to code *and* play *and* monitor the forums; please do bring RegulatedNoise EDDN issues to the RN thread or I might not hear about them! Thanks to Snake Man for the heads-up this time

K Kinnison · Dec 30, 2014

Andargor said:
I would recommend that commanders not dump the feed directly to database, since you will always get garbage. Although there is a schema for the container, there is no way to "enforce" the content in EDDN, such as bad OCR. Hence, you must ensure you perform validation and/or correction before inserting in your DB.

Personal opinion here: NoSQL is probably more suited for this. If you wish to use SQL with "clean" data, then perhaps an intermediate NoSQL database can help as you can retroactively perform corrections (e.g. once a proper station name is determined after the fact).

That sounds like a *lot* of manual correction!
At the end of the day, I don't know personally which one of

Naddoddur Terminal

Nadoodour Terminal

Naooooour Terminal

Nadooddur Terminal

is correct - am I meant to fly there and check them myself!?

K Kinnison · Dec 30, 2014

Would it be at all possible to include the user's IP address in the datastream download?
It would seem a reasonable addition.
I'm sure that would help with possible identification of scamming/spamming or just bad data...

K Kinnison · Dec 30, 2014

Also, can I please ask that anyone testing the system uses the TEST SCHEMA - so I don't get things like this :
INSERT INTO `commodities` (`id`, `softwareVersion`, `gatewayTimestamp`, `softwareName`, `uploaderID`, `buyPrice`, `timestamp`, `stationStock`, `systemName`, `stationName`, `demand`, `sellPrice`, `itemName`, `server_time`) VALUES
(25024, 'v1.0', '2014-12-30T17:46:07.705573', 'RegulatedNoise', '6de58078-8bba-4a93-b646-2e20c35f1327', 276, '2014-12-29T19:21:00', 1491, 'SomeSystem', '', 0, 257, 'Basic Medicines', '2014-12-30 17:46:07');

--

I've specifically setup my script so it only uses the live schema - if you want to test then don't use that.

Release EDDN - Elite Dangerous Data Network. Trading tools sharing info in an unified way.

wolverine2710

Tutorial & Guide Writer

K Kinnison

psema4

wolverine2710

Tutorial & Guide Writer

wolverine2710

Tutorial & Guide Writer

wolverine2710

Tutorial & Guide Writer

Snake Man

K Kinnison

K Kinnison

Treb42

Treb42

Andargor

Snake Man

psema4

CMDRKNac

Snake Man

maxh2003

K Kinnison

K Kinnison

K Kinnison