An excellent post with a lot of good points; please bear with me as I address them one by one 
Good points, though others have expressed a preference for the single-commodity approach. I'm definitely erring towards just allowing an array of commodities in a single message.
Nail head, hammer. That's a problem for every single third-party tool for Elite: Dangerous right now. It's also a problem that I'm loathe to try and solve in EDDN - however horrible using strings as keys might be, frankly the thought of creating and maintaining a shared mapping from numeric ID to commodity type is terrifying! If FD released a static data dump that contained such a mapping, I would jump at the chance to use it. For now, I don't know that there's much any of us can practically do; CMDRKNac suggests some validation against "known good" lists from TCG, but that of course has the risk of being incomplete or inaccurate...
Oh, yes. Already people are not specifying timezones without normalising to UTC, so we're seeing a wide range of interpretations of what "now" is... That's part of the reason the gateway adds a gatewayTimestamp - so there's at least one reliable timestamp within the message.
The current format was designed to be largely compatible with the short-lived predecessor of EDDN, to try and minimise the amount of work existing client implementations would need to do to get started. I'm totally not averse to working towards a v2 of the format, along with schemas for other messages as the demand arises. I'm not too worried about bandwidth, since everything from the gateway onwards is compressed.
Managing such a database isn't something I want to do, plus that would make any future replication/distribution far more hassle than it needs to be. The only use-case I'm envisaging here is for clients that want to impose selective filtering on the messages they receive (e.g. to filter out any from a source they perceive as untrustworthy) - though I'm not sure that in practice anyone would actually bother. It sounds like an application-specific uploaderID plus an IP-based hash would cover CMDRKNac's case of a central server posting messages from multiple users, as well as the case of individual uploaders.
I hadn't. Is that something that would be useful?
One of the problems I experienced with EMDN was the commodity-per-message. It's just not a good pattern. Among the problems is the simple matter of disambiguating a gap in data from loss vs omission vs absence.
Code:+ Chemicals Explosives 177 190 ? 21561H 2014-12-17 13:16:47 Mineral Oil 247 0 ? - 2014-12-17 13:16:47
Was Hydrogen:
- not updated?
- unavailable?
- lost data?
There are perf considerations behind having messages as update batches rather than single commodities, too.
Good points, though others have expressed a preference for the single-commodity approach. I'm definitely erring towards just allowing an array of commodities in a single message.
But the really big problem that EDDN is going to experience is that there is no central authority for Systems, Stations, etc, so the network will rapidly fill up with data from overlapping but not equal source sets. Especially with divergent tools involved, you'll be getting "THINGY PORT", "Thingy Port" and "thingyport".
Nail head, hammer. That's a problem for every single third-party tool for Elite: Dangerous right now. It's also a problem that I'm loathe to try and solve in EDDN - however horrible using strings as keys might be, frankly the thought of creating and maintaining a shared mapping from numeric ID to commodity type is terrifying! If FD released a static data dump that contained such a mapping, I would jump at the chance to use it. For now, I don't know that there's much any of us can practically do; CMDRKNac suggests some validation against "known good" lists from TCG, but that of course has the risk of being incomplete or inaccurate...
You can also anticipate getting really awful time data. You'll be receiving random timezones, random accuracy (expect future timestamps and ancient timestamps on a regular basis), and the occasional human error in the timestamp field.
Oh, yes. Already people are not specifying timezones without normalising to UTC, so we're seeing a wide range of interpretations of what "now" is... That's part of the reason the gateway adds a gatewayTimestamp - so there's at least one reliable timestamp within the message.
You've also chosen fairly wordy json. I did some preliminary work creating a shareable json format for TD (http://kfs.org/td/source -> misc/prices-json-expr.py), which has a dictionary of the item table first (so I can use a locally-scoped set of IDs for items that the recipient can easily translate once, saving a ton of text processing work). But I trimmed hundreds of kb of a relatively small prices file by using terse names for the fields and by judicious use of arrays.
It's not just bandwidth I'm concerned about there; as the system grows and more users begin submitting prices, the cost of processing all those json fields goes up too. It's easy to think of this as a distributed system, but it's not distributed processing - each endpoint is doing a complete workload.
The current format was designed to be largely compatible with the short-lived predecessor of EDDN, to try and minimise the amount of work existing client implementations would need to do to get started. I'm totally not averse to working towards a v2 of the format, along with schemas for other messages as the demand arises. I'm not too worried about bandwidth, since everything from the gateway onwards is compressed.
With regards to author IDs ... You could just provide a token mechanism; user goes to the website, provides their email address, you send that email address a token, store email + token to a database, and when a submission is sent with that token, you put the database ID (i.e. neither of the other two fields) into the outgoing zmq messages.
Managing such a database isn't something I want to do, plus that would make any future replication/distribution far more hassle than it needs to be. The only use-case I'm envisaging here is for clients that want to impose selective filtering on the messages they receive (e.g. to filter out any from a source they perceive as untrustworthy) - though I'm not sure that in practice anyone would actually bother. It sounds like an application-specific uploaderID plus an IP-based hash would cover CMDRKNac's case of a central server posting messages from multiple users, as well as the case of individual uploaders.
Have you considered using ZeroMQ to receive the messages?
I hadn't. Is that something that would be useful?