I appreciate you trying to help. I'm just saying I've already considered what you suggested, and explaining why it doesn't work.
Yup. No worries.
That looks like a nice time improvement.
I presume then, as you've changed the database somewhat, that we couldn't use maddavo's plugin to output the old style csv files?
How do we deal with any deletions from systems/stations etc? If it's new, we insert. If it's changed, we update. If it's disappeared we should delete, do you have that covered?
Another thing is mistakes. Spelling errors and similar that were in the corrections file. Are we handling those yet, or just trusting EDDB?
500+ times improvement? WOW. I'll take that every day of the week and twice on Sundays, thank youIn both cases, the data is either inserted or updated, no line from the source file is skipped.
~71.68 seconds vs. ~0.14 seconds to process ~900 entries, I think we have a winner.
Oh yeah, Tromador, TD can handle downloading gzipped data, but your mirror is providing it uncompressed. Thought you might want to know, since that would save a bit on bandwidth. And download time for the users.
I just tested this. I can download gzipped data, but EDDB doesn't helpfully add a .gz extension on the end when you do and then TD throws a rod.
D:\Games\Game Tools\Trade Dangerous>trade.py import -P eddblink -O all,skipvend
NOTE: Rebuilding cache file: this may take a few moments.
NOTE: Import complete: 3407678 new items over 50,434 stations in 20,078 systems
NOTE: Downloading file: 'eddb\modules.json'.
NOTE: Requesting http://elite.ripz.org/files/modules.json
NOTE: Downloaded 22.7KB of uncompressed data 0.6MB/s ******* <<===== No it isn't uncompressed!!!! *******
NOTE: Processing Upgrades: Start time = 2018-05-02 08:10:59.501386
-----------------------------------------------------------
ERROR: Unexpected unicode error in the wild!
Traceback (most recent call last):
File "D:\Games\Game Tools\Trade Dangerous\trade.py", line 104, in <module>
main(sys.argv)
File "D:\Games\Game Tools\Trade Dangerous\trade.py", line 77, in main
results = cmdenv.run(tdb)
File "D:\Games\Game Tools\Trade Dangerous\commands\commandenv.py", line 81, in run
return self._cmd.run(results, self, tdb)
File "D:\Games\Game Tools\Trade Dangerous\commands\import_cmd.py", line 124, in run
if not plugin.run():
File "D:\Games\Game Tools\Trade Dangerous\plugins\eddblink_plug.py", line 709, in run
self.importUpgrades()
File "D:\Games\Game Tools\Trade Dangerous\plugins\eddblink_plug.py", line 131, in importUpgrades
upgrades = json.load(fh)
File "C:\Program Files\Python36\lib\json\__init__.py", line 296, in load
return loads(fp.read(),
File "C:\Program Files\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 11: character maps to <undefined>
Flipside. I can change the extensions so it reads .gz, but then the links are different so they won't get downloaded.
I've set it back to uncompressed for now. How do we handle this?
I have no idea. Maybe ask the EDDB peeps? I know when downloading from that site, it gets gzipped even though the link is to *.json, not *.json.gz....
Yes. It does and so my site gets a gzipped .json file. Then TD downloads it and misinterprets that data as uncompressed when in it is in fact compressed.
When you download from EDDB are you sending the "Accept-Encoding: gzip, deflate, sdch" header to tell it to send gzipped? I wonder if that makes the difference and so TD knows what it is getting, whilst if it doesn't send the header (because I already sent it in my mirror request) it doesn't realise it's obtaining gzipped.
No idea. I just call the method 'transfers.download(...)', which is provided by TD in the file 'transfers.py'. I don't have time to look at it right now, I'm off to work, but I'll look at the method when I get back and see what it does.
I think that EDDB has the transfer-encoding set to gzipped, not the file itself. But honestly, I'm out of my element here.
F:\Elite\TD>trade.py import -P eddblink -O clean
Traceback (most recent call last):
File "F:\Elite\TD\trade.py", line 104, in <module>
main(sys.argv)
File "F:\Elite\TD\trade.py", line 77, in main
results = cmdenv.run(tdb)
File "F:\Elite\TD\commands\commandenv.py", line 81, in run
return self._cmd.run(results, self, tdb)
File "F:\Elite\TD\commands\import_cmd.py", line 114, in run
pluginClass = plugins.load(cmdenv.plug, "ImportPlugin")
File "F:\Elite\TD\plugins\__init__.py", line 235, in load
importedModule = importlib.import_module(moduleName)
File "C:\Program Files\Python36\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 994, in _gcd_import
File "<frozen importlib._bootstrap>", line 971, in _find_and_load
File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 674, in exec_module
File "<frozen importlib._bootstrap_external>", line 781, in get_code
File "<frozen importlib._bootstrap_external>", line 741, in source_to_code
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "F:\Elite\TD\plugins\eddblink_plug.py", line 748
return False
^
SyntaxError: 'return' outside function
F:\Elite\TD>trade.py import -P eddblink -O clean
NOTE: Rebuilding cache file: this may take a few moments.
NOTE: Missing "F:\Elite\TD\data\TradeDangerous.prices" file - no price data.
Traceback (most recent call last):
File "F:\Elite\TD\trade.py", line 104, in <module>
main(sys.argv)
File "F:\Elite\TD\trade.py", line 77, in main
results = cmdenv.run(tdb)
File "F:\Elite\TD\commands\commandenv.py", line 81, in run
return self._cmd.run(results, self, tdb)
File "F:\Elite\TD\commands\import_cmd.py", line 124, in run
if not plugin.run():
File "F:\Elite\TD\plugins\eddblink_plug.py", line 708, in run
if self.downloadFile(UPGRADES_URL, self.upgradesPath) or self.getOption("force"):
File "F:\Elite\TD\plugins\eddblink_plug.py", line 106, in downloadFile
"%a, %d %b %Y %X GMT").timetuple())
File "C:\Program Files\Python36\lib\_strptime.py", line 565, in _strptime_datetime
tt, fraction = _strptime(data_string, format)
File "C:\Program Files\Python36\lib\_strptime.py", line 362, in _strptime
(data_string, format))
ValueError: time data 'Wed, 02 May 2018 05:58:18 GMT' does not match format '%a, %d %b %Y %X GMT'
Nevermind, whitespace issue combined with space/tabs.
New error though, so let me know if you want to follow up here or should I do the right thing and create a github issue
Code:F:\Elite\TD>trade.py import -P eddblink -O clean NOTE: Rebuilding cache file: this may take a few moments. NOTE: Missing "F:\Elite\TD\data\TradeDangerous.prices" file - no price data. Traceback (most recent call last): File "F:\Elite\TD\trade.py", line 104, in <module> main(sys.argv) File "F:\Elite\TD\trade.py", line 77, in main results = cmdenv.run(tdb) File "F:\Elite\TD\commands\commandenv.py", line 81, in run return self._cmd.run(results, self, tdb) File "F:\Elite\TD\commands\import_cmd.py", line 124, in run if not plugin.run(): File "F:\Elite\TD\plugins\eddblink_plug.py", line 708, in run if self.downloadFile(UPGRADES_URL, self.upgradesPath) or self.getOption("force"): File "F:\Elite\TD\plugins\eddblink_plug.py", line 106, in downloadFile "%a, %d %b %Y %X GMT").timetuple()) File "C:\Program Files\Python36\lib\_strptime.py", line 565, in _strptime_datetime tt, fraction = _strptime(data_string, format) File "C:\Program Files\Python36\lib\_strptime.py", line 362, in _strptime (data_string, format)) ValueError: time data 'Wed, 02 May 2018 05:58:18 GMT' does not match format '%a, %d %b %Y %X GMT'
I'm not seeing errors either. ---- EDIT: Yes I am, per github issue #2
I've reconfigured my mirror. Now serving compressed data as requested...
NOTE: Requesting http://elite.ripz.org/files/listings.csv
NOTE: Downloaded 149.0MB of gziped data 11.4MB/s