In-Development TradeDangerous: power-user trade optimizer

Avi0013 · Jul 19, 2018

Tromador said:
Just as a point I think it's important to make - we (by which I really mean eyeonus) are not in the business of making big changes to TD. The plan was and still is to have a method of getting data into the existing TD, with minimal changes/work in a post-maddavo world. At present we have one stupid bug which is hard to pin down (https://github.com/eyeonus/EDDBlink-listener/issues/7), but once that it resolved, then we will be ready to release. Mission creep into messing with the TD database in any way which "may involve a lot of work" is not one of the project deliverables.

Anything we quasi-programmers can do to help nail that problem down?

MarkAusten · Jul 19, 2018

TD Helper version 2.0.4.2 released here:https://github.com/MarkAusten/TDHelper/releases/tag/v2.0.4.2

Tromador · Jul 19, 2018

MarkAusten said:
Fair point. However I reserve the right to raise it again should it become and issue that I can't solve with a work-around in future.

Frankly, if I knew python better I'd take a look myself but that programming language is not one that I wish to be familiar with.

If we change a load of stuff, maybe - but in this case, the only change was a (now reverted) kludge to workaround a problem created for us. If (as is likely) we've changed nothing, then I imagine we would refer you to the python manual and the source code, like it or not!

Tromador · Jul 19, 2018

Avi0013 said:
Anything we quasi-programmers can do to help nail that problem down?

Mebbe... if you find one where it's not working, you might try testing the SQL manually.

If you see my comment here you could insert the code snippet to debug the SQL - lots of output, so be warned - then use that debug output to give the underlying SQL commands. Then try those manually on the database through a DB app which lets you run arbitrary SQL commands. See if they also fail. Read on in that thread about the INPUT vs UPDATE statements and what/why the difference, so you know what you need to see. Then let us know what happened.

Other than that, point any python guru friends at it and see if they can see something untoward? If eyeonus or my testing showed up anything obvious it would already be fixed. It's rather vexing.

MarkAusten · Jul 19, 2018

Tromador said:
Mebbe... if you find one where it's not working, you might try testing the SQL manually.

If you see my comment here you could insert the code snippet to debug the SQL - lots of output, so be warned - then use that debug output to give the underlying SQL commands. Then try those manually on the database through a DB app which lets you run arbitrary SQL commands. See if they also fail. Read on in that thread about the INPUT vs UPDATE statements and what/why the difference, so you know what you need to see. Then let us know what happened.

Other than that, point any python guru friends at it and see if they can see something untoward? If eyeonus or my testing showed up anything obvious it would already be fixed. It's rather vexing.

I wonder if it is the use of the exception on insert that is the cause of the problem. For testing purposes it would be interesting to see what happens if you use the much slower method of checking to see if the record exists and then inserting or updating as appropriate. If that works then it narrows the problem down to the use of the exception as decision logic and that perhaps being slightly different between different computers.

eyeonus · Jul 22, 2018

MarkAusten said:
Has something changed with the Trade Dangerous database schema or data? With the tradedangerous.db file from Thursday last week TD Helper runs fine but if I update the database or delete it and create a new one, TD Helper throws an error:

Code:

Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints.

The SQL command being run is:

Code:

select sys.name as sys_name, stn.name as stn_name from System sys left join Station stn on sys.system_id = stn.system_id

I have a work around for the issue for the moment bu in my investigations I note that SQLite is set up not to use FK constraints by default, it is probably a good idea either to enable the FK constraints ("PRAGMA foreign_keys=ON") and ensure that the values are correctly set or leave the FK constraints off and remove the FK definitions from the database schema. The former is the preferred option, obviously, but may involve a lot of work to ensure the referential integrity.

I literally have no time to work on this, but I'm fairly certain that yes it is. IIRC, there's a few pragmas that are run in the cache.py file, and turning on foreign keys is one of them.

EDIT: Sorry looks like it's in tradedb.py:

Code:

    def getDB(self):
        if self.conn:
            return self.conn
        self.tdenv.DEBUG1("Connecting to DB")
        conn = sqlite3.connect(self.dbFilename)
        conn.execute("PRAGMA foreign_keys=ON")
        conn.create_function('dist2', 6, TradeDB.calculateDistance2)
        return conn

EDIT2: It's also turned on when (re)building the cache, which is done in the buildCache method in cache.py

eyeonus · Jul 22, 2018

MarkAusten said:
I wonder if it is the use of the exception on insert that is the cause of the problem. For testing purposes it would be interesting to see what happens if you use the much slower method of checking to see if the record exists and then inserting or updating as appropriate. If that works then it narrows the problem down to the use of the exception as decision logic and that perhaps being slightly different between different computers.

I'm not going to say it isn't possible, but I highly doubt it. The stations that are failing to update all make it to the UPDATE command. That is, the "do INSERT if IntegrityError do UPDATE" never fails to get to the "do UPDATE" part.

gazelle · Jul 22, 2018

eyeonus said:
I'm not going to say it isn't possible, but I highly doubt it. The stations that are failing to update all make it to the UPDATE command. That is, the "do INSERT if IntegrityError do UPDATE" never fails to get to the "do UPDATE" part.

But you never start the transaction yourself. What if the internal transaction-algorythums of the sqlite3 module does an rollback if an IntegrityError is raised?

Tromador · Jul 22, 2018

eyeonus said:
I literally have no time to work on this, but I'm fairly certain that yes it is. IIRC, there's a few pragmas that are run in the cache.py file, and turning on foreign keys is one of them.

Now I've looked at it, it's right there in the top of the database schema TradeDangerous.sql at line 27, just after Oliver's comments - PRAGMA foreign_keys=ON;

gazelle said:
But you never start the transaction yourself. What if the internal transaction-algorythums of the sqlite3 module does an rollback if an IntegrityError is raised?

Yes we do - or rather pysqlite does it for us.

"pysqlite is notorious for implementing weird transactional semantics. By default, pysqlite will open a transaction when you issue your first write query (it does this by examining every query you execute). The transaction will be automatically committed when you either call Connection.commit() or execute any other query that is not a SELECT, INSERT, UPDATE, or DELETE (for example CREATE TABLE or PRAGMA)." [1]

Now it may be that we would be better off managing the transactions ourselves. But I've put a lot of thought into this, so riddle me this:

If there is a problem with transactions, or a funky write-lock, or any other SQLite error you want to come up with - why then is it not random. It picks on stations - KUKI AN/BOKEILI HUB (station_id '43119') is my test case. Almost invariably (emphasis on almost) it will not update when I send a schema (with EDMC or EDDiscovery) , at least on server side, normally that station updates fine via listener on my local PC, but then it seems that other people have their own particular test cases - maybe if I looked hard enough, I could find one my PC hates on. I should probably also test the PC in server mode.
UPDATE: I did test the PC in server mode. It still didn't hate on station 43119, but I did find other interesting results, as below.

Nevertheless, you got me to thinking about transactions and such, so I put the database into auto-commit mode, changing the connection in tradedb.py to read.

conn = sqlite3.connect(self.dbFilename, isolation_level=None)

And stopping the (now unecessary) db.commits in the listener by altering lines 551 & 687 to read
success = True

This had three effects.
1: Completely expected slowing down of the database. Particularly when running in server mode.
2: Trying to read the database, I would frequently get locking problems.

Code:

sqlite> SELECT * FROM StationItem WHERE station_id == 43119;
Error: database is locked

3: The problem went away, my bugbear station updated as it should, over and over again.

As another issue - getting back to server mode, something happens speedwise when running in this way. Compare

Code:

Server running updates
Market update for NINHURSAG/DANNY B MATTISSEN finished in 3.186 seconds.
Market update for PROCYON/DAVY DOCK finished in 2.219 seconds.
Market update for ROBIGO/HAUSER'S REACH finished in 2.98 seconds.
Market update for CORNENGU/MARGULIES DOCK finished in 0.719 seconds.
Market update for NLTT 53690/THAGARD TERMINAL finished in 1.45 seconds.

Local PC running updates in client mode
Market update for ZHANG FEI/SEREBROV HORIZONS finished in 0.009 seconds.
Market update for HIP 85360/MACAN PLATFORM finished in 0.62 seconds.
Market update for HIP 53688/WHITE GATEWAY finished in 0.599 seconds.
Market update for XINCA/FINCH MARKET finished in 0.341 seconds.

Local PC running updates in server mode
Market update for G 139-3/O'CONNOR DOCK finished in 3.561 seconds.
Market update for GATEWAY/WICCA TOWN finished in 2.494 seconds.
Market update for HR 1183/ARC'S FAITH finished in 3.219 seconds.
Market update for SHINRARTA DEZHRA/JAMESON MEMORIAL finished in 3.439 seconds.
Market update for ALIOTH/GOTHAM PARK finished in 2.051 seconds.
Market update for SHINRARTA DEZHRA/JAMESON MEMORIAL finished in 3.087 seconds.

For completeness I also ran the "server" in "client" mode. That ran faster than my PC and used about 25% of the CPU time it eats in server mode. Sadly the speed increase did not fix the problem in the same way that running autocommit had done.

So - a lot to think about, but I think that might be a bit of forward progress.
We can cure the problem by using autocommit.
In server mode the listener runs dog slow (it actually falls way behind in processing, so it can end up being minutes from a schema submission, to it finally getting processed). This slowness is further exacerbated by using autocommit.

Therefore there are two issues that need looking at -
1: Pysqlite and it's funky transactional methods which my testing suggests is the root of missing stations.
2: The listener running horribly horribly horribly in server mode, for which I can see no good reason.

[1] http://charlesleifer.com/blog/going-fast-with-sqlite-and-python

Avi0013 · Jul 22, 2018

I don't know SQL, but google returns this: https://docs.python.org/2/library/sqlite3.html#controlling-transactions

Tromador · Jul 22, 2018

Avi0013 said:
I don't know SQL, but google returns this: https://docs.python.org/2/library/sqlite3.html#controlling-transactions

That's from the python 2 documentation. Try https://docs.python.org/3.6/library/sqlite3.html to be up to date and it's still just the basic documentation page for that module. Feel free to google and see if you can come up with something, but do check the pages for usefulness first

Tromador · Jul 22, 2018

You know - as I test this more, it does seem to be dropping less schemas (from my test case) in client mode, where it's running at speed, than in server mode, where it is clunking along. I have to imagine there is some relationship here.

EDIT: Yup. I can almost turn the problem off and on (again for my test case) by switching between server and client - though it's not 100% even in client mode.

Avi0013 · Jul 23, 2018

OK, so the obvious question, what is the difference?

Also, the post to which you linked discussed write-ahead logging. Is that something to consider?

Tromador · Jul 23, 2018

Avi0013 said:
OK, so the obvious question, what is the difference?

Things that I can see:
1: When it runs the plugin, it uses slightly different parameters.
2: It runs the following code (as a separate thread)

Code:

def export_listings():
    """
    Creates a "listings-live.csv" file in "export_path" every X seconds,
    as defined in the configuration file.
    Only runs when program configured as server.
    """
    global export_ack, export_busy

    if config['side'] == 'server':
        DO A TON OF STUFF TO MAKE LISTINGS-LIVE.CSV EVERY n SECONDS 
    else:
        export_ack = True

And that's all. I have no idea why the export slows down everything else so horribly, I mean most of the time it's just waiting and ticking down its timer. When it does come time to export the file it does it quick sharp double time.

Also, the post to which you linked discussed write-ahead logging. Is that something to consider?

Good question, one which I asked myself a few minutes ago. I can report that switching up the DB into WAL mode made not a jot of difference to the underlying problem.

Avi0013 · Jul 23, 2018

WHy is it making a new file every n seconds instead of appending to an existing "live" file?

Tromador · Jul 23, 2018

Avi0013 said:
WHy is it making a new file every n seconds instead of appending to an existing "live" file?

That's one for @eyeonus. When new messages come in, they are saved into the database with a "from_live" field set TRUE. The listings_live.csv is generated by doing a query on the database and exporting all data where from_live is set TRUE.
I guess that's more robust, if anything happens to the generated csv corrupting it, the whole thing can be regenerated from the database.
When we get new overnights from EDDB, any entries in there have their "from_live" set FALSE and so the daily cycle begins again.

I think I know what the slowness is though. When that timer routine is counting down 'n' seconds between exports, it's looping (I am guesssing here, but it fits the facts) at breakneck speed and laughing maniacally as it eats as much CPU as possible. If I put a 10 second sleep at the top of the loop, then the market updates run at a similar speed as they would in client mode.

EDIT: And this appears to give (though I'm not surprised) the same increase in reliability.

Avi0013 · Jul 23, 2018

Now the question which remains is do they "see" the missing stations now?

Tromador · Jul 23, 2018

Avi0013 said:
Now the question which remains is do they "see" the missing stations now?

I was able to prove a reliability improvement on my test case, but it still wasn't 100%

Hmmm...

In server mode, it was so slow that it was actually falling seriously behind. Now it's keeping up. I wonder if I put it back into auto-commit mode, if it would still be fast enough...

I think maybe I'll stop messing with it until the morning. My brain is tired and still hungover.

eyeonus · Jul 23, 2018

Tromador said:
... I have no idea why the export slows down everything else so horribly, I mean most of the time it's just waiting and ticking down its timer....

Apparently going through the waiting loop was causing the problem. The checks it makes inside the waiting loop get done every time it goes through, and since there when waiting, it would immediately go back to the start of the loop, those checks would get done several times a second-- maybe even several hundred times. That adds up quickly, so the loop was actually slowing down everything else because it was eating up allocated CPU/Mem space.

Adding a simple line to make the waiting loop wait 1 second before jumping back to the top of loop fixes all that, as you can see in the latest commit log.

eyeonus · Jul 23, 2018

Avi0013 said:
WHy is it making a new file every n seconds instead of appending to an existing "live" file?

Two reasons:
1. I don't want to mess with the daily EDDB dump file, because if anything screws up with the exported listings, we still have that to fall back on. Losing the data since the last daily dump is better than losing everything.

2. Appending to a file means-- without doing a lot of very slow processing, anyway-- having the same station appear multiple times in the file, making the file grow horribly. If there's an update to Shinrarta Dezhra at 0200, 0330, 0414, 0508, 0945, etc. each of those updates would show up in the file. So just for one station, the file has already grown to 6 times as big as it needs to be because the old updates are still there in the file.

This wouldn't really slow down the listener very much, but it would cause a lot more stress on the server because of all the file writing, and it would significantly slow down client imports.

The only way to fix that would be to completely load the listings file into the program (causing an upswing in memory usage), search through the entire file for the lines that match the station update, replace the lines with the values from the update, and rewrite the file to disk. This wouldn't really save in terms of file writing (honestly, I'm not sure there is a way to save on that by very much), but it does save significantly on file size, and would dramatically increase the time it took to export the listings.

tl;dr -- It's much faster and easier to do it this way.