In-Development TradeDangerous: power-user trade optimizer

Well, look. You said you had a problem with the time take to deal with the system data, so guess what I looked at. I'll just stick to the webhosting bits and let you get on with it.
 
I appreciate you trying to help. I'm just saying I've already considered what you suggested, and explaining why it doesn't work.

I'm fairly certain I've got the solution.
 
I just did a small test with the Upgrade table. I edited the method to look and see if the upgrade already exists in the table using the ID, and do an update rather than an insert if so.

Here's the time frame to update the table using the old method:
NOTE: Processing Upgrades. Start time = 2018-04-30 17:40:58.612174
NOTE: Finished processing Upgrades. End time = 2018-04-30 17:42:130.288618

And here's the time frame using the new method:
NOTE: Processing Upgrades. Start time = 2018-04-30 18:14:50.478627
NOTE: Finished processing Upgrades. End time = 2018-04-30 18:14:50.621102

In both cases, the data is either inserted or updated, no line from the source file is skipped.

~71.68 seconds vs. ~0.14 seconds to process ~900 entries, I think we have a winner.

I'm going to get to work making the changes to the other methods.
 
I appreciate you trying to help. I'm just saying I've already considered what you suggested, and explaining why it doesn't work.

Yup. No worries.

That looks like a nice time improvement.

I presume then, as you've changed the database somewhat, that we couldn't use maddavo's plugin to output the old style csv files?

How do we deal with any deletions from systems/stations etc? If it's new, we insert. If it's changed, we update. If it's disappeared we should delete, do you have that covered?
Another thing is mistakes. Spelling errors and similar that were in the corrections file. Are we handling those yet, or just trusting EDDB?
 
Yup. No worries.

That looks like a nice time improvement.

I presume then, as you've changed the database somewhat, that we couldn't use maddavo's plugin to output the old style csv files?

Maddavo's plugin doesn't do the .csv export, TD does. I had to make a very small change to the csvexport.py file to make it work with the plugin, but that change doesn't break anything in TD. EDDBlink actually casuses a csv export automatically, as the final step before processing the market data, so you can see the difference if you compare what's in your data folder now with what's on Maddavo's site.

The exported .csv files probably won't work with an instance of TD that doesn't have the database changes eddblink makes, but that's because of the AUTOINCREMENT thing.
(The ID's provided by EDDB don't go 1-N without skipping any numbers, so the database might complain if it still has that AUTOINCREMENT tag on the relevant table. On the other hand, the code that does the importing will handle it perfectly, so it really depends on how SQL deals with trying to insert a specific number into a field that is supposed to AUTOINCREMENT. Does it take the number? Does it ignore that number and autoincrement instead? Does it throw an error and quit? I have no idea.)

They could be made definitely compatible by simply removing the id column on the tables that used to have AUTOINCREMENT, however.

(The specific change is to comment out the line "if columnRow['pk'] > 0 and pkCount == 1: continue" and remove the reverseList in 'csvexport.py'. Commenting out the line keeps it from ignoring the ID# column. Removing the reverseList makes the csv export be in ID# order rather than something else- it doesn't have any actual effect other than altering which order the columns are put into the csv file.)

How do we deal with any deletions from systems/stations etc? If it's new, we insert. If it's changed, we update. If it's disappeared we should delete, do you have that covered?
Another thing is mistakes. Spelling errors and similar that were in the corrections file. Are we handling those yet, or just trusting EDDB?

I haven't dealt with the case of a system or station disappearing. I was thinking of having the plugin delete any data that more than N days old, as a 'deleteold' option. Any stations that'd been deleted wouldn't get data updates anymore, so they would eventually come under that more than N days old criteria....

The only other way I can think of would be to see if there are any entries in the DB that don't exist in the source file, and delete them from the DB if so. I'm not sure how I'd go about doing that, however.....

As far as spelling errors, etc. I'm just trusting EDDB at this point. The way things are currently handled, if a data error of that kind does make it into EDDB, once it's fixed in EDDB it'll get fixed in TD too after a dump import. Also, most if not all people use some kind of program that reads the pilot journal to do the updating, like EDMC, so those sorts of errors aren't anywhere near as likely as they were back when OCR input was the most-used option.
 
Last edited:
Okay, so, updating the ShipVendor and UpdateVendor tables still takes forever, so I recommend always doing '-O skipvend' unless it's a clean run, especially since those don't get changed very often. Also Maddavo never even bothered making those, probably because of how enormous they are- especially UpgradeVendor: it's bigger by far than all the other tables combined.

I've no idea how to make the Vendor updates go any faster, either. Because of the way those tables are built, all the data for a single station = up to 33 entries in the ShipVendor table and up to over 900 entries in the UpgradeVendor table, all of which need to be deleted before the update can happen. Apparently DELETE is a very slow command.

Here's the run with the updated version:
python trade.py import -P eddblink -O all,skipvend -vvv
NOTE: Rebuilding cache file: this may take a few moments.
NOTE: Item.csv:112 WARNING Item 'Occupied CryoPod' is marked as DELETED and should not be used.
NOTE: Item.csv:179 WARNING Item 'Galactic Travel Guide' is marked as DELETED and should not be used.
NOTE: Import complete: 3407279 new items over 50,434 stations in 20,078 systems
NOTE: Downloading file: 'eddb\modules.json'.
NOTE: Requesting http://elite.ripz.org/files/modules.json
NOTE: Downloading 292.3KB uncompressed data
NOTE: Downloaded 292.3KB of uncompressed data 271.3KB/s
NOTE: Processing Upgrades: Start time=2018-04-30 23:51:40.969012
NOTE: Finished processing Upgrades. End time=2018-04-30 23:51:41.149381
NOTE: Downloading file: 'eddb\index.json'.
NOTE: Requesting https://raw.githubusercontent.com/EDCD/coriolis-data/master/dist/index.json
NOTE: Downloading 59.1KB gziped data
NOTE: Downloaded 0.6MB of gziped data 1.9MB/s
NOTE: Processing Ships: Start time=2018-04-30 23:51:42.524285
NOTE: Finished processing Ships. End time=2018-04-30 23:51:42.621489
NOTE: Downloading file: 'eddb\systems_populated.jsonl'.
NOTE: Requesting http://elite.ripz.org/files/systems_populated.jsonl
NOTE: Downloading 20.6MB uncompressed data
NOTE: Downloaded 20.6MB of uncompressed data 1.2MB/s
NOTE: Processing Systems: Start time=2018-04-30 23:52:00.271748
NOTE: Finished processing Systems. End time=2018-04-30 23:52:05.045827
NOTE: Downloading file: 'eddb\stations.jsonl'.
NOTE: Requesting http://elite.ripz.org/files/stations.jsonl
NOTE: Downloading 112.7MB uncompressed data
NOTE: Downloaded 112.7MB of uncompressed data 1.5MB/s
NOTE: Processing Stations, this may take a bit: Start time=2018-04-30 23:53:22.115531
NOTE: Finished processing Stations. End time=2018-04-30 23:53:40.868400
NOTE: Downloading file: 'eddb\commodities.json'.
NOTE: Requesting http://elite.ripz.org/files/commodities.json
NOTE: Downloading 99.4KB uncompressed data
NOTE: Downloaded 99.4KB of uncompressed data 185.7KB/s
NOTE: Processing Categories and Items: Start time=2018-04-30 23:53:42.464501
NOTE: Finished processing Categories and Items. End time=2018-04-30 23:53:42.585758
NOTE: S:\Elite Dangerous Programs\Trade Dangerous\data\Category.csv exported.
NOTE: S:\Elite Dangerous Programs\Trade Dangerous\data\Item.csv exported.
NOTE: S:\Elite Dangerous Programs\Trade Dangerous\data\Ship.csv exported.
NOTE: S:\Elite Dangerous Programs\Trade Dangerous\data\Station.csv exported.
NOTE: S:\Elite Dangerous Programs\Trade Dangerous\data\System.csv exported.
NOTE: S:\Elite Dangerous Programs\Trade Dangerous\data\Upgrade.csv exported.
NOTE: Downloading file: 'eddb\listings.csv'.
NOTE: Requesting http://elite.ripz.org/files/listings.csv
NOTE: Downloading 148.8MB uncompressed data
NOTE: Downloaded 148.8MB of uncompressed data 1.2MB/s
NOTE: Processing market data: Start time=2018-04-30 23:55:45.528303
NOTE: Finished processing market data. End time=2018-05-01 00:06:24.830279
NOTE: S:\Elite Dangerous Programs\Trade Dangerous\data\RareItem.csv re-exported.
NOTE: Regenerating .prices file.
NOTE: Import completed.

Long story short, the entire run of '-O all,skipvend' took from 23:51:40 to 00:06:24 to finish, not including the initial rebuilding of the cache. That's about 15 minutes, which isn't that bad. (Especially compared to before.)

I have it up on github now. https://github.com/eyeonus/EDDBlink

Oh yeah, Tromador, TD can handle downloading gzipped data, but your mirror is providing it uncompressed. Thought you might want to know, since that would save a bit on bandwidth. And download time for the users.
 
Last edited:
In both cases, the data is either inserted or updated, no line from the source file is skipped.

~71.68 seconds vs. ~0.14 seconds to process ~900 entries, I think we have a winner.
500+ times improvement? WOW. I'll take that every day of the week and twice on Sundays, thank you :)
 
Oh yeah, Tromador, TD can handle downloading gzipped data, but your mirror is providing it uncompressed. Thought you might want to know, since that would save a bit on bandwidth. And download time for the users.

I just tested this. I can download gzipped data, but EDDB doesn't helpfully add a .gz extension on the end when you do and then TD throws a rod.

D:\Games\Game Tools\Trade Dangerous>trade.py import -P eddblink -O all,skipvend
NOTE: Rebuilding cache file: this may take a few moments.
NOTE: Import complete: 3407678 new items over 50,434 stations in 20,078 systems
NOTE: Downloading file: 'eddb\modules.json'.
NOTE: Requesting http://elite.ripz.org/files/modules.json
NOTE: Downloaded 22.7KB of uncompressed data 0.6MB/s ******* <<===== No it isn't uncompressed!!!! *******
NOTE: Processing Upgrades: Start time = 2018-05-02 08:10:59.501386
-----------------------------------------------------------
ERROR: Unexpected unicode error in the wild!

Traceback (most recent call last):
File "D:\Games\Game Tools\Trade Dangerous\trade.py", line 104, in <module>
main(sys.argv)
File "D:\Games\Game Tools\Trade Dangerous\trade.py", line 77, in main
results = cmdenv.run(tdb)
File "D:\Games\Game Tools\Trade Dangerous\commands\commandenv.py", line 81, in run
return self._cmd.run(results, self, tdb)
File "D:\Games\Game Tools\Trade Dangerous\commands\import_cmd.py", line 124, in run
if not plugin.run():
File "D:\Games\Game Tools\Trade Dangerous\plugins\eddblink_plug.py", line 709, in run
self.importUpgrades()
File "D:\Games\Game Tools\Trade Dangerous\plugins\eddblink_plug.py", line 131, in importUpgrades
upgrades = json.load(fh)
File "C:\Program Files\Python36\lib\json\__init__.py", line 296, in load
return loads(fp.read(),
File "C:\Program Files\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 11: character maps to <undefined>

Flipside. I can change the extensions so it reads .gz, but then the links are different so they won't get downloaded.

I've set it back to uncompressed for now. How do we handle this?
 
I just tested this. I can download gzipped data, but EDDB doesn't helpfully add a .gz extension on the end when you do and then TD throws a rod.

D:\Games\Game Tools\Trade Dangerous>trade.py import -P eddblink -O all,skipvend
NOTE: Rebuilding cache file: this may take a few moments.
NOTE: Import complete: 3407678 new items over 50,434 stations in 20,078 systems
NOTE: Downloading file: 'eddb\modules.json'.
NOTE: Requesting http://elite.ripz.org/files/modules.json
NOTE: Downloaded 22.7KB of uncompressed data 0.6MB/s ******* <<===== No it isn't uncompressed!!!! *******
NOTE: Processing Upgrades: Start time = 2018-05-02 08:10:59.501386
-----------------------------------------------------------
ERROR: Unexpected unicode error in the wild!

Traceback (most recent call last):
File "D:\Games\Game Tools\Trade Dangerous\trade.py", line 104, in <module>
main(sys.argv)
File "D:\Games\Game Tools\Trade Dangerous\trade.py", line 77, in main
results = cmdenv.run(tdb)
File "D:\Games\Game Tools\Trade Dangerous\commands\commandenv.py", line 81, in run
return self._cmd.run(results, self, tdb)
File "D:\Games\Game Tools\Trade Dangerous\commands\import_cmd.py", line 124, in run
if not plugin.run():
File "D:\Games\Game Tools\Trade Dangerous\plugins\eddblink_plug.py", line 709, in run
self.importUpgrades()
File "D:\Games\Game Tools\Trade Dangerous\plugins\eddblink_plug.py", line 131, in importUpgrades
upgrades = json.load(fh)
File "C:\Program Files\Python36\lib\json\__init__.py", line 296, in load
return loads(fp.read(),
File "C:\Program Files\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 11: character maps to <undefined>

Flipside. I can change the extensions so it reads .gz, but then the links are different so they won't get downloaded.

I've set it back to uncompressed for now. How do we handle this?

I have no idea. Maybe ask the EDDB peeps? I know when downloading from that site, it gets gzipped even though the link is to *.json, not *.json.gz....

Do a run with the fallback option and you'll see what I mean:
NOTE: Downloading file: 'eddb\modules.json'.
NOTE: Requesting https://eddb.io/archive/v5/modules.json
NOTE: Downloading 22.7KB gziped data
NOTE: Downloaded 292.3KB of gziped data 1.7MB/s
 
Last edited:
I have no idea. Maybe ask the EDDB peeps? I know when downloading from that site, it gets gzipped even though the link is to *.json, not *.json.gz....

Yes. It does and so my site gets a gzipped .json file. Then TD downloads it and misinterprets that data as uncompressed when in it is in fact compressed.

When you download from EDDB are you sending the "Accept-Encoding: gzip, deflate, sdch" header to tell it to send gzipped? I wonder if that makes the difference and so TD knows what it is getting, whilst if it doesn't send the header (because I already sent it in my mirror request) it doesn't realise it's obtaining gzipped.
 
Yes. It does and so my site gets a gzipped .json file. Then TD downloads it and misinterprets that data as uncompressed when in it is in fact compressed.

When you download from EDDB are you sending the "Accept-Encoding: gzip, deflate, sdch" header to tell it to send gzipped? I wonder if that makes the difference and so TD knows what it is getting, whilst if it doesn't send the header (because I already sent it in my mirror request) it doesn't realise it's obtaining gzipped.

No idea. I just call the method 'transfers.download(...)', which is provided by TD in the file 'transfers.py'. I don't have time to look at it right now, I'm off to work, but I'll look at the method when I get back and see what it does.
 
No idea. I just call the method 'transfers.download(...)', which is provided by TD in the file 'transfers.py'. I don't have time to look at it right now, I'm off to work, but I'll look at the method when I get back and see what it does.

I suspect that somehow it's specifically requesting the gzipped data and so EDDB provides a content-type: gzip header, whilst my server has no way of knowing that a given file is gzipped or not, so doesn't send such a header. Certainly it looks that way to me from transfers.py, but then I got lost looking for any client headers that the method sends.

EDDB website says: To enable compression, add the Accept-Encoding: gzip, deflate, sdch entry to your request header.

So that's what I did to obtain gzipped copies of the data.

I can use mod_deflate to accomplish something similar (possibly identical) to how EDDB are doing it, but I would like to be sure of what exactly TD does, so I can match it, rather than wasting time changing config on blind speculation.
 
It looks like it's asking the site what kinds of encoding it has:

encoding = req.headers.get('content-encoding', 'uncompress')
transfer = req.headers.get('transfer-encoding', None)

This looks like the relevant bit of the documentation of the requests package:
http://docs.python-requests.org/en/master/user/quickstart/#response-headers

I think that EDDB has the transfer-encoding set to gzipped, not the file itself. But honestly, I'm out of my element here.

Also this:
http://docs.python-requests.org/en/master/user/quickstart/#binary-response-content
"The gzip and deflate transfer-encodings are automatically decoded for you."
 
Last edited:
I think that EDDB has the transfer-encoding set to gzipped, not the file itself. But honestly, I'm out of my element here.

Apparently so - even on the flat HTML pages (see header grab below). I'm not 100% convinced that's the right thing to do, but let me look into it. I'll leave it uncompressed for the moment and get back to you.

CACHE-CONTROL no-store, no-cache, must-revalidate
CONNECTION Keep-Alive
CONTENT-ENCODING gzip
CONTENT-LENGTH 11658
CONTENT-TYPE text/html
charset=UTF-8
DATE Wed, 02 May 2018 15:36:32 GMT
EXPIRES Thu, 19 Nov 1981 08:52:00 GMT
KEEP-ALIVE timeout=15, max=500
PRAGMA no-cache
SERVER Apache
STRICT-TRANSPORT-SECURITY max-age=315360000
includeSubDomains
VARY Accept-Encoding
X-CONTENT-TYPE-OPTIONS nosniff
X-FRAME-OPTIONS sameorigin
X-UA-COMPATIBLE IE=edge
X-XSS-PROTECTION 1
mode=block
 
I'm getting an error trying to use it. Would you prefer to discuss it here or should I create an issue on github?

Code:
F:\Elite\TD>trade.py import -P eddblink -O clean
Traceback (most recent call last):
  File "F:\Elite\TD\trade.py", line 104, in <module>
    main(sys.argv)
  File "F:\Elite\TD\trade.py", line 77, in main
    results = cmdenv.run(tdb)
  File "F:\Elite\TD\commands\commandenv.py", line 81, in run
    return self._cmd.run(results, self, tdb)
  File "F:\Elite\TD\commands\import_cmd.py", line 114, in run
    pluginClass = plugins.load(cmdenv.plug, "ImportPlugin")
  File "F:\Elite\TD\plugins\__init__.py", line 235, in load
    importedModule = importlib.import_module(moduleName)
  File "C:\Program Files\Python36\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 674, in exec_module
  File "<frozen importlib._bootstrap_external>", line 781, in get_code
  File "<frozen importlib._bootstrap_external>", line 741, in source_to_code
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "F:\Elite\TD\plugins\eddblink_plug.py", line 748
    return False
    ^
SyntaxError: 'return' outside function
 
Nevermind, whitespace issue combined with space/tabs.

New error though, so let me know if you want to follow up here or should I do the right thing and create a github issue :cool:

Code:
F:\Elite\TD>trade.py import -P eddblink -O clean
NOTE: Rebuilding cache file: this may take a few moments.
NOTE: Missing "F:\Elite\TD\data\TradeDangerous.prices" file - no price data.
Traceback (most recent call last):
  File "F:\Elite\TD\trade.py", line 104, in <module>
    main(sys.argv)
  File "F:\Elite\TD\trade.py", line 77, in main
    results = cmdenv.run(tdb)
  File "F:\Elite\TD\commands\commandenv.py", line 81, in run
    return self._cmd.run(results, self, tdb)
  File "F:\Elite\TD\commands\import_cmd.py", line 124, in run
    if not plugin.run():
  File "F:\Elite\TD\plugins\eddblink_plug.py", line 708, in run
    if self.downloadFile(UPGRADES_URL, self.upgradesPath) or self.getOption("force"):
  File "F:\Elite\TD\plugins\eddblink_plug.py", line 106, in downloadFile
    "%a, %d %b %Y %X GMT").timetuple())
  File "C:\Program Files\Python36\lib\_strptime.py", line 565, in _strptime_datetime
    tt, fraction = _strptime(data_string, format)
  File "C:\Program Files\Python36\lib\_strptime.py", line 362, in _strptime
    (data_string, format))
ValueError: time data 'Wed, 02 May 2018 05:58:18 GMT' does not match format '%a, %d %b %Y %X GMT'
 
Nevermind, whitespace issue combined with space/tabs.

New error though, so let me know if you want to follow up here or should I do the right thing and create a github issue :cool:

Code:
F:\Elite\TD>trade.py import -P eddblink -O clean
NOTE: Rebuilding cache file: this may take a few moments.
NOTE: Missing "F:\Elite\TD\data\TradeDangerous.prices" file - no price data.
Traceback (most recent call last):
  File "F:\Elite\TD\trade.py", line 104, in <module>
    main(sys.argv)
  File "F:\Elite\TD\trade.py", line 77, in main
    results = cmdenv.run(tdb)
  File "F:\Elite\TD\commands\commandenv.py", line 81, in run
    return self._cmd.run(results, self, tdb)
  File "F:\Elite\TD\commands\import_cmd.py", line 124, in run
    if not plugin.run():
  File "F:\Elite\TD\plugins\eddblink_plug.py", line 708, in run
    if self.downloadFile(UPGRADES_URL, self.upgradesPath) or self.getOption("force"):
  File "F:\Elite\TD\plugins\eddblink_plug.py", line 106, in downloadFile
    "%a, %d %b %Y %X GMT").timetuple())
  File "C:\Program Files\Python36\lib\_strptime.py", line 565, in _strptime_datetime
    tt, fraction = _strptime(data_string, format)
  File "C:\Program Files\Python36\lib\_strptime.py", line 362, in _strptime
    (data_string, format))
ValueError: time data 'Wed, 02 May 2018 05:58:18 GMT' does not match format '%a, %d %b %Y %X GMT'

I changed line 106 to read "%a, %d %b %Y %H:%M:%S GMT", which fixed that issue for me.
 
I'm not getting either of those errors. Re-download from the github and try it again. I think you have an older version- I had a bit of annoyance uploading to github last night and had to fix some things. You probably have the borked version.

Just to make things a bit easier, I'm going to start making releases, that way you can say which version of the plugin you're talking about.

v0.16 (the latest version) is up now.

If that doesn't work, get a clean vanilla copy of bgol's TD, download the plugin from github, and try from scratch.

And while it's true that right now we're the only ones using this thread, this thread is about TD, not my EDDBlink plugin for TD, so maybe error reports on the github in future.
 
Last edited:
Back
Top Bottom