In-Development TradeDangerous: power-user trade optimizer

Avi0013 · Apr 25, 2018

eyeonus said:
If anyone has any improvements to make the thing run faster, I'd appreciate it.

Here's a thought. You tend to run through each database twice. The first pass is to get the total count so as to update a progress meter. Perhaps time that pass and see what fraction of the total update it is. If it's anything more than 10% or so, skip it, and instead keep a line count and post an update every 1000 or 10000 lines instead. Yes, we won't know when it will finish, but we'll see progress and we'll save at least 10% of the time

Avi0013 · Apr 25, 2018

If you want a faster line count than the naïve pass, see https://stackoverflow.com/questions...10gb-file-as-fast-as-possible/9631635#9631635

eyeonus · Apr 25, 2018

Avi0013 said:
If you want a faster line count than the naïve pass, see https://stackoverflow.com/questions...10gb-file-as-fast-as-possible/9631635#9631635

Thanks for that. I'll see about implementing it in the plugin.

In other news: Why I haven't written the code for creating the RareItem database- it's super complicated. The RareItems are in the commodities json with the regular items, and that file doesn't provide any station data, so in order to get which station a rare item is in (that is, get the system and station name), I have to search the listings file for the id of the item, find out what the station id is, look up the station id in the Station database, get the system id, look up the system id in the System database. Also, I have no idea how to figure out if a Rare is illegal or suppressed, there doesn't seem to be any matching fields in any of the eddb files.

Other than that, I think this plugin is pretty much done. I'd love to make more speed improvements, but honestly, taking between 10-25 minutes once a day isn't horrible, and since the eddb data dumps only happen once per day, there's literally no need to run this program any more often than that. I'm sure there're bugs just waiting to be found- I'm a programmer, there's always more bugs- but until those bugs get found, there's not much I can do about them.

I'm going to give it a few days or so for you guys to test it and whatnot, and then I think I'm going to talk to bgol about adding it to his(her?) bitbucket repo.

CMDR Ibizan · Apr 25, 2018

Hi Eyeonus, I might have found one of those bugs. Using TD, I seem to be getting problems when I try to limit trade runs to stations with large landing pads. I kept getting told that the station I am in does not fit the landing pad criteria because it is shown as “?”. That led me to check my station.csv file and it turns out that every one of the stations have pad sizes shown as “?”.

I suppose I could have mucked something up in the original conversion so I ran the plugin again using -O clean but got the same result.

Avi0013 · Apr 25, 2018

Bernd (bgol) is a he, if I am not mistaken. If that doesn't work, I have a fork of Bernd's repository in which I implemented the change to penalty which is sitting in the pull request, so I can probably add you as a team member to that. But it makes more sense to stay with Bernd, as he knows the code much much better than I do

eyeonus · Apr 25, 2018

CMDR Ibizan said:
Hi Eyeonus, I might have found one of those bugs. Using TD, I seem to be getting problems when I try to limit trade runs to stations with large landing pads. I kept getting told that the station I am in does not fit the landing pad criteria because it is shown as “?”. That led me to check my station.csv file and it turns out that every one of the stations have pad sizes shown as “?”.

I suppose I could have mucked something up in the original conversion so I ran the plugin again using -O clean but got the same result.

Nope, that's a bug. Fixing it now, I'll have the update uploaded in a few minutes.

eyeonus · Apr 25, 2018

Plugin updated.

Changelog:
added -www debug events to the table updating that will spew out the item that is about to be inserted into the database, exactly as it will be inserted- This will make it easier to find bugs similar to the all-stations-are-?-pad-size bug.
Fixed the all-stations-are-?-pad-size bug.
Converted the line reader for the progress reports to use the block reader method.

Plugin is in the same location as before, but just so you don't have to go back to the older posts in the thread:
EDDBlink TD plugin

Tromador · Apr 25, 2018

Hello eyeonus,

I've just returned from a long absence from ED and always used to use TD (and helped bugfix a few early releases). I was really disappointed to see it had gone by the wall and very heartened to see that someone has picked up the pieces. Nice job. I'll be giving this a try for sure.

Edit:

I've seen what you are doing. Can I ask some dumb questions?

Why have you chosen a daily EDDB pull, rather than set up an EDDN listener?
(I'm guessing available webspace, resources, bandwidth, programming time or a combination.)
If you had some additional resources (hosting, bandwidth, etc) would you be interested in attempting to recreate a Maddavo style site from the EDDN feed?

I know people have talked about being unable to get a hold of Maddavo's original webcode and that he isn't responding to requests. Has anyone tried emailing or calling Dave Kok @ davek.com.au?
It's possible he knows (or is) maddavo and may be able to help from that angle. Not that I want in any way to detract from the excellent work you are doing, just trying to think of ways to keep the data fresh.

Tromador · Apr 26, 2018

To note (Python 3.6.5)

I had to add @ line 15
import urllib.request
(AttributeError: module 'urllib' has no attribute 'request')

and install the request package (using pip).

before I could run -P eddblink -O clean

That's from a fresh clean python install on windows.

eyeonus · Apr 26, 2018

Tromador said:
Hello eyeonus,

I've just returned from a long absence from ED and always used to use TD (and helped bugfix a few early releases). I was really disappointed to see it had gone by the wall and very heartened to see that someone has picked up the pieces. Nice job. I'll be giving this a try for sure.

Edit:

I've seen what you are doing. Can I ask some dumb questions?

Why have you chosen a daily EDDB pull, rather than set up an EDDN listener?
(I'm guessing available webspace, resources, bandwidth, programming time or a combination.)
If you had some additional resources (hosting, bandwidth, etc) would you be interested in attempting to recreate a Maddavo style site from the EDDN feed?

I know people have talked about being unable to get a hold of Maddavo's original webcode and that he isn't responding to requests. Has anyone tried emailing or calling Dave Kok @ davek.com.au?
It's possible he knows (or is) maddavo and may be able to help from that angle. Not that I want in any way to detract from the excellent work you are doing, just trying to think of ways to keep the data fresh.

I think my OP about this plugin should answer most of your questions:

eyeonus said:
...When the plugin is run (depending on the chosen options), it will download the needed daily dumps from eddb and process them to create all the tables in the TradeDangerous database, including market data, and will generate a .prices file from the listings.csv

This won't get the most up-to-the-minute market data, since the eddb dumps happen once per day, it gets everything but the data submitted since the latest dump, which means not more than 24 hours after the submission.
It does check to see if the data already entered is newer than the data from eddb, so it won't overwrite newer data gathered by the user (i.e. with "trade.py import -P edapi -O eddn") if that's more recent.

I'm working on a user-side eddn listen server next, so that this plugin and the listen server combined fully replace Maddavo's site....

If anyone is willing to mirror the eddb files so as not to stress the eddb server, that'd be awesome, let me know the URL and I'll update the plugin to use that instead....

The problem with Maddavo's server is that it went down, and it going down made TD practically useless- unless you only care about the data you've gathered yourself, that is.

Making a replacement for the site isn't really a good idea, IMO, because if that goes down, we're right back to having the same problem.

So, my idea was to pull from the EDDB dumps as a starting point, as that's something that's not going to go down anytime soon, and have a client-side EDDN listener to pull the shared data between dumps. The two bits combined would completely replace the functionality of Maddavo's site, without needing to worry about the "new" site going down as well.

Run the plugin once to build the initial database, run the local EDDN listener to start pulling the updates, after a day, run the plugin one more time to get the data that got updated between the time of the initial dump and the start of the listener. IF you have the listener running 24/7, you'd never need to run the plugin again, and if you don't, the plugin will at least get all the updates that occurred between the time you shut the listener down and the most recent eddb dump.

It'd be nice to have a host for the eddb dumps so as not to stress the eddb server, but that requires someone with the means and know-how to create a server that automatically fetches the dumps from eddb and puts them up for download. (No one's volunteered and I myself have neither.) That said, even if someone did host a mirror, I'd still have the plugin use eddb itself as a backup in the event the mirror goes down for whatever reason.

eyeonus · Apr 26, 2018

Tromador said:
To note (Python 3.6.5)

I had to add @ line 15
import urllib.request
(AttributeError: module 'urllib' has no attribute 'request')

and install the request package (using pip).

before I could run -P eddblink -O clean

That's from a fresh clean python install on windows.

I don't think you need to add the import, just the pip install.

I built this in py3.4, I guess the request module doesn't install by default in py3.6?

I don't know if there's any way to have the plugin itself download the module if it's not already installed, but I'll look into it.

EDIT: According to everything I looked at, urllib.request is included in Python3, regardless of version. Just to make sure, I also downloaded the "Windows x86-64 embeddable zip file" for 3.6.5 and looked at the code in it, and it does include that package. No idea what went wrong with your installation, but I can tell you that you don't need to add that import line to the plugin, you just need to have the module installed.

Avi0013 · Apr 26, 2018

eyeonus said:
Making a replacement for the site isn't really a good idea, IMO, because if that goes down, we're right back to having the same problem.

So, my idea was to pull from the EDDB dumps as a starting point, as that's something that's not going to go down anytime soon, and have a client-side EDDN listener to pull the shared data between dumps. The two bits combined would completely replace the functionality of Maddavo's site, without needing to worry about the "new" site going down as well.

Run the plugin once to build the initial database, run the local EDDN listener to start pulling the updates, after a day, run the plugin one more time to get the data that got updated between the time of the initial dump and the start of the listener. IF you have the listener running 24/7, you'd never need to run the plugin again, and if you don't, the plugin will at least get all the updates that occurred between the time you shut the listener down and the most recent eddb dump.

I didn't catch this point on my first read through. Excellent and elegant.

eyeonus said:
Converted the line reader for the progress reports to use the block reader method.

Did you notice a speedup?

Tromador · Apr 26, 2018

eyeonus said:
I don't think you need to add the import, just the pip install.

I built this in py3.4, I guess the request module doesn't install by default in py3.6?

I don't know if there's any way to have the plugin itself download the module if it's not already installed, but I'll look into it.

If it were me, I wouldn't go that far, just a note in the documentation. As it happens, when python couldn't find it, it actually offered to run pip for me and deal with the problem - being me I just narrowed my brow and checked before doing it myself.

EDIT: According to everything I looked at, urllib.request is included in Python3, regardless of version. Just to make sure, I also downloaded the "Windows x86-64 embeddable zip file" for 3.6.5 and looked at the code in it, and it does include that package. No idea what went wrong with your installation, but I can tell you that you don't need to add that import line to the plugin, you just need to have the module installed.

Possibly installing the module makes it work like that. The first bug I hit was lack of urllib.request and only after fixing that did it demand the module, so it's always possible that had I done it the other way around... one sec... *does some testing*

Ok, yes. Now that I have the module installed, it no longer specifically requires that import. So, just a documentation note that python 3.6 probably needs that module installing before attempting import.

Tromador · Apr 26, 2018

eyeonus said:
I think my OP about this plugin should answer most of your questions:
So, my idea was to pull from the EDDB dumps as a starting point, as that's something that's not going to go down anytime soon, and have a client-side EDDN listener to pull the shared data between dumps. The two bits combined would completely replace the functionality of Maddavo's site, without needing to worry about the "new" site going down as well.

Or a more transparent set of source publicly available somewhere. The project wasn't short of people willing to pick up the hosting, just short on the code itself.

I like the idea of the EDDN listener (which I too missed in your OP), though perhaps less so the idea of running my PC 24/7... Soo...

Feature request - EDDN listener (optionally) dumps to a file, entries older than 24 hours are automatically trimmed (to save size and not duplicate EDDB).
That can then be easily translated to running on PC the way you are thinking, or putting on a server as a downloadable if someone can be bothered (potentially me, unless my former boss gets stressed about bw usage).
This gives us a best of both worlds, live data if you want it, or any given website is broken, plus something approaching maddavo's old 2d-prices.
Presumably this is all in platform independent python, so it runs fine on Linux/FreeBSD/*nix as well as M$.

It'd be nice to have a host for the eddb dumps so as not to stress the eddb server, but that requires someone with the means and know-how to create a server that automatically fetches the dumps from eddb and puts them up for download. (No one's volunteered and I myself have neither.) That said, even if someone did host a mirror, I'd still have the plugin use eddb itself as a backup in the event the mirror goes down for whatever reason.

Again, maybe I could, dependant upon bw usage. Not sure how many people use TD - probably not many so that doesn't scare me, but the thought of being used as a generic EDDB mirror by the community at large might get me in trouble. Keeping a "last 24 hours" file is less worrying as that really would likely only be useful to the TD people.

eyeonus · Apr 26, 2018

Avi0013 said:
Did you notice a speedup?

To be honest, not really. I'm sure there is one, and I'm leaving it in, but most of the slowness is from making changes in the database, not reading the file.

eyeonus · Apr 26, 2018

Tromador said:
Or a more transparent set of source publicly available somewhere. The project wasn't short of people willing to pick up the hosting, just short on the code itself.

I like the idea of the EDDN listener (which I too missed in your OP), though perhaps less so the idea of running my PC 24/7... Soo...

Feature request - EDDN listener (optionally) dumps to a file, entries older than 24 hours are automatically trimmed (to save size and not duplicate EDDB).
That can then be easily translated to running on PC the way you are thinking, or putting on a server as a downloadable if someone can be bothered (potentially me, unless my former boss gets stressed about bw usage).
This gives us a best of both worlds, live data if you want it, or any given website is broken, plus something approaching maddavo's old 2d-prices.
Presumably this is all in platform independent python, so it runs fine on Linux/FreeBSD/*nix as well as M$.

Again, maybe I could, dependant upon bw usage. Not sure how many people use TD - probably not many so that doesn't scare me, but the thought of being used as a generic EDDB mirror by the community at large might get me in trouble. Keeping a "last 24 hours" file is less worrying as that really would likely only be useful to the TD people.

First off, keep in mind that you wouldn't NEED to keep your PC running 24/7. Not running it 24/7 means some data might not get updated until the next EDDB dump, but even without running the listen server at all, you're still guaranteed to have the data within 24 hours of it being uploaded. The only difference between running the server and not running the server is that without the server running, you'll have to get the latest dump from EDDB to have all the data. (That is, with the listener running, the first EDDB dump that occurs AFTER the start time of the listener is the last dump you'll need to get as long as the listener is running.)

So, if you're going to go on vacation for a few weeks and decide to shut down the PC, it just means that you'll have to do a dump using the EDDBlink plugin to catch your local data up to date and restart the listener.

As far as a server listener, that's not a bad idea. My plan is to make the client-side listener write directly to the TD database, so once I have that coded, all that would need to happen is to have an instance of TD on the server, with the EDDBlink plugin, and an instance of the listener. Then it would be: 1) run EDDBlink with '-O clean' 2) Start the listener 3) Run EDDBlink with '-O all' once the dumps are updated to get the data between last dump and listener start 4) Trigger TD to re-export all the DB files on a timed basis, like maybe every 2 hours or so, and provide them for download.

From that point, it'd be a simple matter to write some code that downloads the files from the server and trigger a re-cache in TD. Providing all the files (all the .csv's and the .prices file) comes out to about 993MB. Not all the files are needed to be updated all the time, however. The .prices file (319MB on my PC right now) would need to be updated pretty constantly, but the other ones wouldn't need to be very often.

And of course, users would still have the option of dumping from eddb (or an eddb mirror if/when that exists) using the plugin and running the listener locally if they don't want to have to depend on a server to get the data- and/or if they want to make sure they have the latest data, as the server's data would still be a bit behind (up to 2 hours, or whatever time the export trigger is set to).

As far as bandwidth, wrt mirroring the EDDB dumps, if only TD users with my plugin are downloading them from the mirror, that equals exactly one download per user per file, as the plugin checks the timestamp and only downloads if the server copy is newer than the local copy. I don't think anyone outside of TD would really be using it, for two reasons- the other trading programs have their own mirror if they aren't downloading straight from EDDB, and there isn't much use for any of the files outside of a program that processes the data from them, so random CMDR's won't be downloading it all willy-nilly.

Going off of the latest dump as of this writing, the total size of all the files needed ('commodities.json', 'listings.csv', 'modules.json', 'stations.jsonl', and 'systems_populated.jsonl') is about 281MB. So it'd be about 0.3GB per user per day, and you can safely ignore users who decide to run their own listen server.

Based on my math, it looks like it'd actually be less bandwidth to host a mirror- the .prices file on its own is larger than all the dump files together.... Hmmm. Maybe exporting it back out into a listings.csv instead? I don't know.

I feel like we're getting ahead of ourselves. Let's test out this plugin and make sure it works correctly, get it added to TD itself so you don't have to find the link to my Google Drive folder, then see about getting the EDDN listener up and running correctly, and once all that is taken care of, then we can revisit all this.

Tromador · Apr 26, 2018

I guess it simply comes down to expectation management.

With the old maddavo system we could download a dataset which was up to the minute, whilst your approach would give up to a 12 hour gap. Not a huge deal I guess but not quite the same.

You are probably right though, getting the listener working is step one. Once we have data accessed properly and usable by TD then anything we do with that data should be trivial in comparison.

eyeonus · Apr 26, 2018

I honestly don't think Maddavo's was up to the minute. The 2d prices file was updated "after a batch of data was processed". Judging from the timestamps, the "every 2 hours" 2d prices file was last updated at 8:01, and the "on demand" 3h prices file was last updated at 8:40, which leads me to think that one got updated around twice an hour or so. Thinking on it, it wouldn't be very hard to make a listings.csv that only had the listings updated since the last dump, which would reduce the file size, but mean having to download multiple files (the dump + the since-last-dump update). Doing that would mean you could update the since-last-dump file at least as often as Maddavo's 3h prices file was without a huge increase in bandwidth usage.

Tromador · Apr 26, 2018

I can attest that it was up to the minute (or at least concurrent with eddn's output).

I could upload data and it would be in the 3h packet as soon as Maddavo received it from eddn. This meant that I could do a run and get latest from maddavo without need to log a local csv from edmc. Logging locally isn't a massive chore obviously, but yes the 3h was bang up to date.

I guess I have talked myself into arranging some hosting. Let me get as far as setting up a basic site with a daily pull from eddb and we can go from there. Do we happen to know what time they do their daily dump? The api page only says "every night".

eyeonus · Apr 26, 2018

No idea. Based on the page, commodities got updated first (5 hours ago) and the rest got updated later (2 hours ago). That corresponds to about midnight UTC for commodities and 3am UTC for the rest, so maybe play it safe and call it 4am UTC?