Release EDDN - Elite Dangerous Data Network. Trading tools sharing info in an unified way.

Status
Thread Closed: Not open for further replies.
Whether or not kfsone and/or maddavo are on board or want to participate, the format of the code and trade data is completely open source, so if they continued their own way, it would be straightforward to integrate their data into a 'universal' (no pun intended) database.

I'm in - I like the idea of one central reference. It can be used for any trading tool/navigation tool and the more tools, the more users, the more up-to-date data. And having up-to-date data is pretty important. I only wrote my market data share site as a quick hack because EMDN was down for TradeDangerous. My intention is to shut it down as it becomes redundant due to something else (hopefully EDDN I guess).

One thing I have noticed is that data quality is also important. I imported Slopey's data a few times and found a fair bit of bad data eg: stock level and sell price swapped. There's only so much that automated parsing/sanity checks can do. This leads to some painful manual massaging of data which I would be pleased to avoid.

I would like EDDN to provide hooks for tools to obtain and update data for systems, stations, market prices.

What kind of server is required to run such a thing? Who is setting that up? And finally, when will it be up? :D
 
Who is setting that up? And finally, when will it be up?

Lot's of juniors devs around? :) I'm asking because I see a lot of talk and people being eager to start coding, which is a typical junior approach. Senior is to gather and write solid specs first, a technical documentation and then start with the implementation. So the first step should be to set up a wiki and document it and then start implementing it. I would propose to use a git repository on Github for that and using the markdown functionality it offers. Changes and suggestions to the documentation can be made by cloning the repo, making a change and doing a pull request. This can be discussed then and accepted if the majority agrees to it. You can see an example how this works here.

So what should be discussed and agreed on first is a transport and technologies to pick.

ZeroMQ vs RESTful

ZeroMQ is a little more tough to implement than REST in most languages. I think almost every language has a lib for HTTP requests or even a dedicated REST lib. ZeroMQ on the other hand is probably faster. Technically it would be possible to implement both without much effort if the application architecture is well done. ZeroMQ/RESTful -> service layer -> Model layer for example.

Data format

I would suggest JSON, the footprint is much lower than using a XML format and same argument as above, almost all languages can deal with it. However, if properly done it would be very easy to serve any data format that is wanted. CSV or XML could be provided easily. Both would be just ~1h of implementation time if it is done right. But when sending data to the application there should be only one format.

Database

We won't ever get the whole galaxy mapped within our lifetime, by a rough calculation 100 billion systems, where every system has a data set with a size of 15kb would take ~14305gb on disk. And I doubt we will reach that any time soon either. ;) I think the LHC and it's experiments already generated a lot more data than this.

The basic question would be if we want to go for a relational DB or a none relational and NoSQL DB. There has been a pretty annoying hype about NoSQL DBs trying to generate the impression they are a complete replacement - which is wrong. As always, use the right tool for the right job. We need to determine first if we have a lot of relatively complex relations or not. Depending on that we can pick between Postgres and MongoDB for example. After that step we can start designing the DB schema.

The best would be in the case of doubts to build a prototype and benchmark it. I have not much trust in pure speculations that something is faster than another solution without benchmarks. People like to make claims about their favorite toys but very often don't provide proof.

Server side

I would go for a php based solution (using a framework) or NodeJS.

Authentication

OAuth 2.0 and / or JWT Token?

API Design

We need to define endpoints and paramters for API calls. Also an API should be versioned so that clients have an upgrade path and can still use an old API while there is a new iteration of the API. For example api.elitedata.org/v1/, api.elitedata.org/v2/...

Here are a few services I could think of the API should offer:

  • /market (buy / sell data)
  • /ships
  • /commodities
  • /components (ship parts or however you call them)
  • /systems


Unit tests

Whatever is decided, test that code! At least I won't join anything that doesn't use unit tests.

Open source

To open the source or to not open the source of the application code is the big question. I would go for opening it. Bug fixes, enhancements and contributions can be made by everybody then.

Who is going to pay for that infrastructure?

That is a serious question. I doubt a donate button alone will bring enough revenue to run the servers for a long time. I know some people wont like this and run away: Enforce the consumers of the API to display ads in their app and to return that revenue to the API provider to cover the costs for keeping the API up and running. Feel free to come up with better suggestions. :)

---

By the way, I personally could come up with a prototype application within a weekend that implements a RESTful API - give the situation we have clear specifications to start with. I have some experience with php frameworks, AngularJS, Jquery, API Design and a bunch of other things and the management side of development. I've worked as lead developer the last few years until I've switched some time ago to smaller company to get a less stressful position to avoid a burnout. :( However, I love to write code again in my free time after that switch, so let the planing begin! :)
 
Last edited:

wolverine2710

Tutorial & Guide Writer
TGC - The Great Collector

A couple of times the desire for EDDN to hold more then just market data has came up by multiple commanders. More data is of course better but I like the Unix way, a tool does one thing and does that very well and its output can be used as the input for another command (pipes). Imho EDDN should concentrate on dynamic data, in this case prices - at least for now. TGC is for static data, like 3D coords of star systems, stations, BM etc. The best thing, Its already amongst us!! TGC is alpha atm and will reach Beta soon. Update: Its live an Beta, see this post.

What is TGC. With SB1 wtbw iirc made a tool to get 3D coords out of ED. After the data policy change he could not do it any longer (in a legal way). To get those in a legal way a commander started the thread "What is the most efficient way to crowd source the 3D sytem coordinates for Beta 2 ?". The end result was that 5+ tools emerged which can calculate 3D coordinates from distances gathered from the Galaxy Map. It turned out that in the end a lot of time did go into synchronizing everything. What has already been done, what still needs to be done. A commander then suggested TGC. Tornsouls put up the gauntlet and created TGC. It has NO UI but only a web-api. Its sole purpose at first was to store distances and calculate 3D coordinates. Distances/coords submitted by the 5+ tools. Then of course more data was needed, like stations,BM etc. For more information see this post. And also this post, its about minimizing the number of API's. In the past IxFores had a very nice web-api which also had EMDN data in it. After the policy change he found the whole situation suboptimal and did shutdown his API. It had nice stuff inside, like a routeplanner api. When something is missing in TGC we could ask Tornsouls if he could implement it. Or an author could create an api just for this goal with a companion (web)application using TGC for the data its needs (coordinates, system names). In a way Unix pipes. Note: Edsc by Tornsouls was mentioned by Thrudd. It is one of the 5+ tools to calculate coordinates. It ALSO is TGC.

I think for tool authors its best to have as few API's as possible. TGC should fit the bill for static data. EDDN for dynamic data. Just my 2 euro cent.

Edit: Can't recall if I asked IxForress to revive his api. As I said he has aside from pure data some nice route planning stuff. Will send him a PM.

I started this thread on a date just a few days before I've to go away for a few days. Going today and offline in the weekend (visiting fiiends in Germany). In hinsight not a clever date to start this thread. Will try my best to find internet access and follow and contribute to this thread.
 
Last edited:
Sounds neat.

Can't help but feel though with a solution that relies on a central server/database that the costs to maintain that server would go up fast, also possibly leading into that server going down/up, taking everyones tools usability down. And worst case the maintainer of the server just disappearing and shutting down the server gone with it all the data. This system requires a sole person/server to be at the epicenter of trust.

Could be I'm shamelessly plugging my own idea/thread, but that's genuinely how I feel about it. *shrug*
 
Ya know gents, I have a dedicated server that's not getting much use (host a handful of sites and an IRC file bot), if you need a test bed, let me know.

Basic Stats:
Quad Core 3.2ghz Xeon E3 CPU
32gb of RAM
2 TB HDD space in RAID 1
250 mbps Full Duplex, No cap connection
Running CentOS 6.5 + Plesk 12
 

wolverine2710

Tutorial & Guide Writer
@burzum. Thanks for your input. Like I said, have to go away in a few hours and can't respond now. Will try my best to do that this evening.
@mike_art03a. Thanks for the offer man. It started with Pibbles (webspace/code), after that came burzum (webspace/code) and now you. It looks as if we at least for the start have webspace to test things out. To everyone. Its so much appreciated!!!
 
Last edited:

wolverine2710

Tutorial & Guide Writer
Sounds neat.

Can't help but feel though with a solution that relies on a central server/database that the costs to maintain that server would go up fast, also possibly leading into that server going down/up, taking everyones tools usability down. And worst case the maintainer of the server just disappearing and shutting down the server gone with it all the data. This system requires a sole person/server to be at the epicenter of trust.

Could be I'm shamelessly plugging my own idea/thread, but that's genuinely how I feel about it. *shrug*

For starters. Please let EVERYBODY present their ideas and plug the hell out of it ;-) Please do NOT hold back. The more info/methods we have the better we can EDDN make. Perhaps a certain method is not suited for EDDN but heh we at least have discussed, researched it!!!!
I like to see this thread as an OPEN place where each and every idea (superb,crazy,viable,not viable) is very much welcomed. The last thing I would see happening is that someone feels uncomfortable to present ideas here!!!! That would be the day we all lost.

I've talked to you about if DHT could be used for the tradedangerous.prices file. Hadn't had time to come back to you about that - my bad. Not sure if .prices is suited for it. Its all about syncing one decentralized .prices file and when multiple users update the .prices (at the same time) file I'm not sure if thats gonna work. It could be that data is overwritten/lost with unforseen consequences. Maddavo is much more into TD then I am. Perhaps he can give the ultimate answer about that.
Note: TGC has data which does change far less often then EDDN's data.

BUT DHT to minimize bandwidth and other purposes sounds VERY interesting. Don't know enough about DHT to know if its suitable for EDDN or some of the data sources BUT its certainly worth looking into it. For example should EDDN become more then just a data stream and store data in ElasticSearch (ideal for Kibana) or another database and when for example someone wants to use Kibana (ELK stack) the commander needs data. If someone can't/won't run EDDN continuously he could get the data by say a data dump of the last day, week. For that DHT that could be ideal. Commanders, if you know more about DHT lets discuss this further, it seems a cool way to minimize bandwidth for certain situations. Mlow thanks for your input, totally apreciated!!!
 
Last edited:

wolverine2710

Tutorial & Guide Writer
Whether or not kfsone and/or maddavo are on board or want to participate, the format of the code and trade data is completely open source, so if they continued their own way, it would be straightforward to integrate their data into a 'universal' (no pun intended) database.

Totally agree. Thats one of the nice things about Trade Dangerous, its open source and very well written (python). Anyone can fork it and enhance it for for example EDDN. Btw Maddavo has already stated "I'm in". Haven't heart from Slopey yet. It would be absolutely great if he would be in also. As in using EDDn (he used EMDN before) to get more data and upload data to EDDN as well. The more datasources EDDN has the better it will be. See also metcalfe's law.
 
Last edited:

wolverine2710

Tutorial & Guide Writer
Probably last post before I've to travel to Germany. Its about ZeroMQ. From their website.

Language bindings:
ZeroMQ comes with the low-level C API. High-level bindings exist in 40+ languages including Python, Java, PHP, Ruby, C, C++, C#, Erlang, Perl, and more.
For complete list of language bindings have a look here.
 
Last edited:
@wolverine: Glad to help, but to note, I'm no programmer, just a sysadmin. However, I don't mind getting things running to test.
 
Can't help but feel though with a solution that relies on a central server/database that the costs to maintain that server would go up fast, also possibly leading into that server going down/up, taking everyones tools usability down. And worst case the maintainer of the server just disappearing and shutting down the server gone with it all the data. This system requires a sole person/server to be at the epicenter of trust.

You can allow certain people to mirror the DB for redundancy and performance is not really a problem in 2014. Have a load balancer and spawn additional instances as needed in your cloud on demand. That's basically what most bigger sites do. I'm not saying to write bad code but today it is much much cheaper to buy another server or some instances or processing power (depends on the services business model) than to highly optimize your code in early stages of development. Make it work, then make it work even better. I've just made an older existing site with 1.7 million hits per day running a lot better (~40%) by spending only two days analyzing it and adding some caching and doing some minor SQL query optimizations. :)

@burzum. Thanks for your input. Like I said, have to go away in a few hours and can't respond now. Will try my best to do that this evening.

No problem, but I'll be out from tomorrow afternoon until Sunday late evening. So I might not answer before Sunday late night.
 
Last edited:
I'm all for separation of concerns here...

I think it's going to be a mistake to try and make a single all-powerful Thing that holds all the data ever... the original designs for EMDR/EMDN were deliberately restricted to just being the "firehose" of data. There's no central database; if you want to store the data, you can store just the bits relevant to your use-case, in the way that makes sense for your application. If people want a centralised database that holds data, they can have that, but I'd avoid trying to bundle those requirements with the EDDN infrastructure. It's entirely implementation-agnostic: one can write components of the system in any language one chooses, and there are ZeroMQ bindings for most languages out there (Python and PHP are two that I've used).

http://www.eve-emdr.com/en/latest/overview.html is a good reference for how this architecture was designed.

[edit] And just to clarify - this doesn't preclude anyone from writing a persistence layer to provide a REST API for the data; it just compartmentalises it away from the logistics of collecting and sharing the data in the first place.
 
Last edited:
Seriously, there is no need for ZeroMQ. Just stick to JSON, agree on a structure, and then you can create RESTful web service that can handle star data, stations, black markets, commodities, you name it. Done this way, you can have as many front-ends as you like, and a public-API for all to use until Frontier release an API themselves.

Good luck, commanders!

PS: If you're using .NET, ServiceStack makes it incredibly easy to write config-free, scaleable web-services and you don't have to worry about the WCF crap. :)
 
Seriously, there is no need for ZeroMQ. Just stick to JSON, agree on a structure, and then you can create RESTful web service that can handle star data, stations, black markets, commodities, you name it. Done this way, you can have as many front-ends as you like, and a public-API for all to use until Frontier release an API themselves.

I think you're confusing layers. ZeroMQ is not an alternative to JSON. ZeroMQ is a way of efficiently moving the JSON from the players who create it, to all those who want to consume it, including those who want to provide a RESTful web service interface to the data. Having the latter as a centralised single point of failure isn't a very good model for this sort of system, nor does it scale as easily (and remember, in this game, scale is quite an important thing!)
 
I think you're confusing layers. ZeroMQ is not an alternative to JSON. ZeroMQ is a way of efficiently moving the JSON from the players who create it, to all those who want to consume it, including those who want to provide a RESTful web service interface to the data. Having the latter as a centralised single point of failure isn't a very good model for this sort of system, nor does it scale as easily (and remember, in this game, scale is quite an important thing!)

Perhaps I worded myself badly (my apologies), as I wasn't suggesting ZeroMQ be a replacement :) If a concerted few were willing, it'd not cost much to pick a cloud system (Amazon/Azure, to name but two), and host it there. It really isn't that expensive, and you can scale very well.

Personally I'd not be overly concerned with the database at this juncture. Millions of rows, properly indexed, can still be queried incredibly fast, and for the majority of what is needed, that'll be fine. 400,000,000,000 systems... well, I guess you'd scale that problem when you come to it :)
 
Lot's of juniors devs around? :) I'm asking because I see a lot of talk and people being eager to start coding, which is a typical junior approach. Senior is to gather and write solid specs first, a technical documentation and then start with the implementation. So the first step should be to set up a wiki and document it and then start implementing it. I would propose to use a git repository on Github for that and using the markdown functionality it offers. Changes and suggestions to the documentation can be made by cloning the repo, making a change and doing a pull request. This can be discussed then and accepted if the majority agrees to it. You can see an example how this works here.

So what should be discussed and agreed on first is a transport and technologies to pick.

ZeroMQ vs RESTful

ZeroMQ is a little more tough to implement than REST in most languages. I think almost every language has a lib for HTTP requests or even a dedicated REST lib. ZeroMQ on the other hand is probably faster. Technically it would be possible to implement both without much effort if the application architecture is well done. ZeroMQ/RESTful -> service layer -> Model layer for example.

Data format

I would suggest JSON, the footprint is much lower than using a XML format and same argument as above, almost all languages can deal with it. However, if properly done it would be very easy to serve any data format that is wanted. CSV or XML could be provided easily. Both would be just ~1h of implementation time if it is done right. But when sending data to the application there should be only one format.

Database

We won't ever get the whole galaxy mapped within our lifetime, by a rough calculation 100 billion systems, where every system has a data set with a size of 15kb would take ~14305gb on disk. And I doubt we will reach that any time soon either. ;) I think the LHC and it's experiments already generated a lot more data than this.

The basic question would be if we want to go for a relational DB or a none relational and NoSQL DB. There has been a pretty annoying hype about NoSQL DBs trying to generate the impression they are a complete replacement - which is wrong. As always, use the right tool for the right job. We need to determine first if we have a lot of relatively complex relations or not. Depending on that we can pick between Postgres and MongoDB for example. After that step we can start designing the DB schema.

The best would be in the case of doubts to build a prototype and benchmark it. I have not much trust in pure speculations that something is faster than another solution without benchmarks. People like to make claims about their favorite toys but very often don't provide proof.

Server side

I would go for a php based solution (using a framework) or NodeJS.

Authentication

OAuth 2.0 and / or JWT Token?

API Design

We need to define endpoints and paramters for API calls. Also an API should be versioned so that clients have an upgrade path and can still use an old API while there is a new iteration of the API. For example api.elitedata.org/v1/, api.elitedata.org/v2/...

Here are a few services I could think of the API should offer:

  • /market (buy / sell data)
  • /ships
  • /commodities
  • /components (ship parts or however you call them)
  • /systems


Unit tests

Whatever is decided, test that code! At least I won't join anything that doesn't use unit tests.

Open source

To open the source or to not open the source of the application code is the big question. I would go for opening it. Bug fixes, enhancements and contributions can be made by everybody then.

Who is going to pay for that infrastructure?

That is a serious question. I doubt a donate button alone will bring enough revenue to run the servers for a long time. I know some people wont like this and run away: Enforce the consumers of the API to display ads in their app and to return that revenue to the API provider to cover the costs for keeping the API up and running. Feel free to come up with better suggestions. :)

---

By the way, I personally could come up with a prototype application within a weekend that implements a RESTful API - give the situation we have clear specifications to start with. I have some experience with php frameworks, AngularJS, Jquery, API Design and a bunch of other things and the management side of development. I've worked as lead developer the last few years until I've switched some time ago to smaller company to get a less stressful position to avoid a burnout. :( However, I love to write code again in my free time after that switch, so let the planing begin! :)

Still working on a relational database structure that would work efficiently here. Personally, would go for SQL Server and ASP.NET/MVC - just because that's what I'm familiar with. Would initially expose an API using a 'classic' SOAP-based web service, all nice and bog-standard, and relatively simple and quick to implement. Would look to a more optimised API using JSON, but would need to research it a bit first.
 
ZeroMQ is the transport layer here. Instead of everyone getting data from everyone, they listen to one place for data, and everyone that have data to share put it in one place. For market data this is perfect. Market data is not persistent data. If you want historical data you would have to store that yourself. The tool that want to use this just have to keep an ear to that firehose.

If a tool or two want to share or import historical data, that is a very different, and something that the community might look into doing for TGC.
 
Status
Thread Closed: Not open for further replies.
Top Bottom