wolverine2710
Tutorial & Guide Writer
One thing that intrigues me is how to solve the transform problem. Handling messages like these seems to fall into four stages:
I ponder - what's the best approach for transforming the data to viable consolidated uniform trade data?
- Firehose/consolidate the messages from many clients (EDDN solves this)
- Store the messages (either temporarily or long term, ranging from Kafka for temporary to Cassandra for long term, to MySQL etc. etc.)
- Transform the messages
- We have a lot of data that is partially corrupt. OCR isn't perfect. I've seen commodity data that is almost spelt right, for example. There's foreign languages to contend with as well.
- Do something useful with the commodity data
- .. Profit! (couldn't resist)
The engineer in me ponders a unified backend: EDDN, then some temp storage like Kafka, something like Storm to translate & process the messages, and then a 'clean' stream of trade data that doesn't contain typos etc. However, someone would have to build all that, and while I'd be happy to participate & contribute, I know I couldn't commit to providing all that any time soon and I doubt anyone else would.
Another alternative is crowd-solving the problem, as wolverine mentions. Several people will independently bang on this nail, and come up with their own solutions. What would be ideal is we can share those solutions and mutually share translation data in some way.
So here's my wild idea. Why not leverage EDDN! Add a new schema type. Let the crowd provide translations for everyone. Something like:
Code:{ "$schemaRef": "http://schemas.elite-markets.net/eddn/commodity_transform/1", "header": { "uploaderID": "bob", "softwareName": "My Vaguely Cool Processing Backend", "softwareVersion": "v0.1.5.b" }, "message": { "typoLadenEntry": "Eran1n", "validEntry": "Eranin", "timestamp": "2014-11-17T12:34:56+00:00" } }
I have to sleep a night about this before I can give you the answer you deserve.... Thanks for the input ;-). It IS appreciated.