In-Development TradeDangerous: power-user trade optimizer

MarkAusten · Jul 24, 2018

Tromador said:
This feels right to me. I know Mark has concerns about the deletes, but I think it's worth looking at. It may well be that rolling our own transaction, rather than relying on pysqlite to do it for us, gives the efficiency we need for that process to complete in a timely manner.

It also completely removes the need for the INSERT/UPDATE exception logic that Mark is suspicious of and so has to be worth a try.

I have the server running in WAL mode already. I am also running synchronous off and temp store in memory. In the unlikely event of power failure and database corruption it is trivial to rebuild the db, so tuning for speed over reliability.

For the listener this is probably the better way to go since holding a 3.4 million entry hash table in memory takes up a fair bit of space. Rolling your own transaction I think will be a much better idea, it is not a difficult procedure and gives you the fine control that the library you are currently using does not.

I'll writte another test that does the delete all, vacuum and insert and see how that performs.

Tromador said:
Unless I misunderstood something, the premise of your test is flawed. Listings.csv is read by the plugin, never by the listener.

The listener deals with one market update at a time, even over the course of an entire day, listings.csv has orders of magnitude more data than comes from the eddn firehose.

Nope, I was talking about the import of the listings.csv and listings-live.csv files into the trade dangerous database. The comment about the listener was an aside. I should have been more clear about that.

eyeonus · Jul 24, 2018

gazelle said:
I made some simple tests and the difference is not that much:
DELETE/INSERT -> 115.11146521568298 seconds
INSERT/UPDATE -> 113.50208044052124 seconds

This was with 1000 iterations over one station with 80 items.

Sounds good to me. Now, since I'm self-taught on SQL, and my entire experience with it is with the making of the EDDBlink plugin and listener, can you, on the issue, post an example statement to do this?
(#7)

I assume it would look something like:

Code:

BEGIN
FROM StationItem DELETE * WHERE station_id = <station id number>
INSERT INTO StationItem (<first item data>)
INSERT INTO StationITem (<second item data>)
.
.
.
INSERT INTO StationItem (<last item data>)
COMMIT

But having a manually created example with real data would be very helpful.

MarkAusten said:
Nope, I was talking about the import of the listings.csv and listings-live.csv files into the trade dangerous database. The comment about the listener was an aside. I should have been more clear about that.

The EDDBlink import plugin, which is what imports the listings files into the TD DB, DOESN'T do "INSERT if exception UPDATE". Only the listener does that, so your premise is either flawed or irrelevant. Your choice.

Code:

                result = self.execute("SELECT modified FROM StationItem WHERE station_id = ? AND item_id = ?", (station_id, item_id)).fetchone()
                if result:
                    updated = timegm(datetime.datetime.strptime(result[0],'%Y-%m-%d %H:%M:%S').timetuple())
                    # When the dump file data matches the database, update to make from_live == 0.
                    if int(listing['collected_at']) == updated and listings_file == LISTINGS:
                        self.execute("""UPDATE StationItem
                                    SET from_live = 0
                                    WHERE station_id = ? AND item_id = ?""",
                                    (station_id, item_id))
                    if int(listing['collected_at']) > updated:
                        tdenv.DEBUG1("Updating:{}, {}, {}, {}, {}, {}, {}, {}, {}",
                             station_id, item_id, modified,
                             demand_price, demand_units, demand_level,
                             supply_price, supply_units, supply_level)
                        try:
                            self.execute("""UPDATE StationItem
                                    SET modified = ?,
                                     demand_price = ?, demand_units = ?, demand_level = ?,
                                     supply_price = ?, supply_units = ?, supply_level = ?,
                                     from_live = ?
                                    WHERE station_id = ? AND item_id = ?""",
                                    (modified, demand_price, demand_units, demand_level, supply_price, supply_units, supply_level, from_live,
                                     station_id, item_id))
                        except sqlite3.IntegrityError:
                            tdenv.DEBUG1("Error on update.")
                else:
                    tdenv.DEBUG1("Inserting:{}, {}, {}, {}, {}, {}, {}, {}, {}",
                             station_id, item_id, modified,
                             demand_price, demand_units, demand_level,
                             supply_price, supply_units, supply_level)
                    try:
                        self.execute("""INSERT INTO StationItem
                                (station_id, item_id, modified,
                                 demand_price, demand_units, demand_level,
                                 supply_price, supply_units, supply_level, from_live)
                                VALUES ( ?, ?, ?, ?, ?, ?, ?, ?, ?, ? )""",
                                (station_id, item_id, modified,
                                 demand_price, demand_units, demand_level,
                                 supply_price, supply_units, supply_level, from_live))
                    except sqlite3.IntegrityError:
                        tdenv.DEBUG1("Error on insert.")

MarkAusten · Jul 24, 2018

eyeonus said:
The EDDBlink import plugin, which is what imports the listings files into the TD DB, DOESN'T do "INSERT if exception UPDATE". Only the listener does that, so your premise is either flawed or irrelevant. Your choice.

Ah, I see what you mean. So ignore the insert if exception bit in this instance and just look a the timings. My games PC imports listings.csv in around 40 minutes. I'm trying to improve on that.

Better?

eyeonus · Jul 24, 2018

MarkAusten said:
Ah, I see what you mean. So ignore the insert if exception bit in this instance and just look a the timings. My games PC imports listings.csv in around 40 minutes. I'm trying to improve on that.

Better?

The logic of your test doesn't translate very well due to not accounting for two cases, but doing an executemany on the whole shebang rather than executing each individually obviously speeds things up, so when I have time, I'll work on updating the plugin to do that.

(The four cases are:
1: Data from source file is newer than data in DB -- UPDATE all data, setting from_live = 0 if source is listings.csv
2: Data from source file is same age as data in DB -- UPDATE nothing, set from_live = 0 if source is listings.csv
3: Data from source file is older than data in DB -- IGNORE
4: Data from source does not exist in DB -- INSERT, setting from_live = 0
)

MarkAusten · Jul 24, 2018

eyeonus said:
Sounds good to me. Now, since I'm self-taught on SQL, and my entire experience with it is with the making of the EDDBlink plugin and listener, can you, on the issue, post an example statement to do this?

The faster way to do this is to use the execute many form of the insert command:

Code:

BEGIN IMMEDIATE
DELETE FROM StationItem WHERE station_id = <station id number>
INSERT INTO StationItem (<first item data>),(<second item data>),...,(<last item data>)
COMMIT

Bear in mind that bulk deleting and inserting will fragment the database and a vacuum (https://www.sqlite.org/lang_vacuum.html) will need to be carried out on a regular basis unless auto_vacuum is enabled.

Note that many programmers will wince at the above code since the current "best practise" is to ensure that all changes to data in a table are via parameters to avoid SQL injection problems.

I've just carried out an import of the listings.csv file by deleting all the data in the StationItems table and then inserting all the data in a single transaction. The elapsed time was 258s and by far the fasted method so far. Case 1, I think

Avi0013 · Jul 24, 2018

Again, I don't know SQL, but this link indicates that the biggest speedups come from minimiizing transactions (which depend on HDD speed) and "preparing" (precompiling?) the SQL: https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite

MarkAusten · Jul 24, 2018

Avi0013 said:
Again, I don't know SQL, but this link indicates that the biggest speedups come from minimiizing transactions (which depend on HDD speed)

True but you have to temper that with losing all your data should something go wrong. Wrapping a transaction around a single market of data means that if something goes wrong then all you have lost is the current market's data and not anything else you have changed previously to this. It's a trade off.

Avi0013 said:
and "preparing" (precompiling?) the SQL:

Prepared SQL statements are also good but only when you know that all your changes are going to be insert or updates. Constantly changing and preparing the statements will slow things down. This is why in one of my tests I separated the inserts and updates into two separate lists and then processed each list in turn. In this manner I could preparing the appropriate statement before the start of each list and this gave the expected improvement in performance.

Avi0013 · Jul 24, 2018

Thank you, Mark. I think Tromador said that reconstituting the DB isn't a big deal so he was willing to risk it, in which case, wrapping the entire update in one transaction should be even faster than market-by-market, right?

Tromador · Jul 24, 2018

Avi0013 said:
Thank you, Mark. I think Tromador said that reconstituting the DB isn't a big deal so he was willing to risk it, in which case, wrapping the entire update in one transaction should be even faster than market-by-market, right?

I think everyone might be talking at cross purposes:

Reconstituting the database in the event of a power cut is a different matter than user frustration from having to re-run poorly written code which may have an unacceptable failure rate.

Wrapping up transactions on a per market basis is for the listener, not the plugin.

Finding more efficient ways to import the (large) listings.csv is for the plugin, not the listener.

And I am often struggling to tell apart which of the two is being referenced in any post (or part of a post).

For the avoidance of confusion please can everyone be clear which component they are discussing.

Because we have two SQL discussions ongoing, each of which is to solve a different problem.

Tromador · Jul 24, 2018

Forum does not allow post deletion, please ignore this post (3052)

gazelle · Jul 24, 2018

eyeonus said:
Sounds good to me. Now, since I'm self-taught on SQL, and my entire experience with it is with the making of the EDDBlink plugin and listener, can you, on the issue, post an example statement to do this?
(#7)

Done. I didn't test the code and it went through some modification.

Avi0013 · Jul 24, 2018

Tromador said:
I think everyone might be talking at cross purposes:

Reconstituting the database in the event of a power cut is a different matter than user frustration from having to re-run poorly written code which may have an unacceptable failure rate.

Wrapping up transactions on a per market basis is for the listener, not the plugin.

Finding more efficient ways to import the (large) listings.csv is for the plugin, not the listener.

And I am often struggling to tell apart which of the two is being referenced in any post (or part of a post).

For the avoidance of confusion please can everyone be clear which component they are discussing.

Because we have two SQL discussions ongoing, each of which is to solve a different problem.

Meanie.

Tromador · Jul 24, 2018

Busy busy afternoon - thanks everyone for contributing. eyeonus has some nice stuff to find when he wakes up/gets home from work/emerges from the hibernation pod.

Obviously major props to @Gazelle for refactoring the message processor in the listener
(and my incredible hard work correcting his typos)

At this rate, we might even get to do that formal re-release sometime before the heat death of the universe!

Avi0013 · Jul 24, 2018

So, @Tromador, can you leverage @gazelle's work for your OTHER SQL issue?

Tromador · Jul 24, 2018

Avi0013 said:
So, @Tromador, can you leverage @gazelle's work for your OTHER SQL issue?

Errmm.. this is the speed (or lack thereof) of importing the big data.

Honestly, I'm not sure as it's a different problem, which I don't have and there is a lot of data to import. That said. taking 40 mins (as in Mark's case) to process listings.csv makes my jaw drop. I do it in 5.

Code:

NOTE: Processing market data from listings.csv: Start time = 2018-07-24 20:26:49.463557
NOTE: Finished processing market data. End time = 2018-07-24 20:31:53.941352

Now, that said I am using some of the same SQLite tunings I use on the server, which I've not released yet. Once I'm happy I've squeezed the best out of it, I'll ask you guys to test.

EDIT:

My update times on the listener start to look closer to the server's now too, so some of the speed is definitely my tunings

Code:

Without Tuning
Market update for PARUTIS/EVANS PORT finished in 0.606 seconds.
Market update for TANGAROA/ASYLUM finished in 0.591 seconds.
Market update for SHINRARTA DEZHRA/JAMESON MEMORIAL finished in 0.715 seconds.
Market update for TSONDAMA/HEYERDAHL ENTERPRISE finished in 0.692 seconds.
Market update for CEOS/NEW DAWN STATION finished in 0.419 seconds.
Market update for VODYANES/DASSAULT ORBITAL finished in 0.723 seconds.
Market update for IENPALANG/BRUNDAGE CITY finished in 0.443 seconds.
Market update for DAIKU/WRANGEL ORBITAL finished in 0.488 seconds.
Market update for ALRAI/BOUNDS HUB finished in 1.06 seconds.

With Tuning
Market update for SHINRARTA DEZHRA/JAMESON MEMORIAL finished in 0.03 seconds.
Market update for ROBIGO/ROBIGO MINES finished in 0.026 seconds.
Market update for KITCHE/OOSTERHOFF CITY finished in 0.05 seconds.
Market update for OCHOSI/PERRY'S FOLLY finished in 0.048 seconds.
Market update for LAVE/LAVE STATION finished in 1.561 seconds.
Market update for OGONDAGE/HENSON TERMINAL finished in 0.104 seconds.
Market update for ZIBAL/HIRE PORT finished in 0.041 seconds.
Market update for LTT 2667/STEPHENSON HUB finished in 0.034 seconds.

Avi0013 · Jul 25, 2018

My computer also finishes the update (-O listings) quickly:

Code:

NOTE: Downloading file 'listings.csv'.
NOTE: Requesting http://elite.ripz.org/files/listings.csv
NOTE: Downloaded 151.5MB of gziped data   6.5MB/s
NOTE: Processing market data from listings.csv: Start time = 2018-07-24 20:48:37.248911
NOTE: Finished processing market data. End time = 2018-07-24 20:52:42.778965

I am running on an i7-8700K overclocked to 5Ghz if it matters.

MarkAusten · Jul 25, 2018

Avi0013 said:
Thank you, Mark. I think Tromador said that reconstituting the DB isn't a big deal so he was willing to risk it, in which case, wrapping the entire update in one transaction should be even faster than market-by-market, right?

On his server that is fine but for individual installations that would be a pain. I would seriously hacked off if every time the database threw an error I had to rebuild the entire database. As I said, it's a trade off. By making each market update a single transaction you are spreading the additional processing required for the transaction to be committed and a new one opened over many markets. It is quite annoying to have to wait for 2+ minutes for the listener update over a single transaction to commit especially since you get no feedback that anything is happening. The system just stops until the transaction is committed. One the other hand, the additional time that committing each market takes spread over all the markets with visual feedback make the longer time seem shorter. The point being that with visual feedback the longer time that transaction by market is likely to be over the single transaction model it doesn't seem so long. Or not to me at least.

It also needs to be borne in mind that we are not looking for super-speed all the time, just fast enough to prevent the non-updates from happening. The consensus at the moment seems to be that transaction by market give the best solution taking all things into account, not just speed.

Perhaps I should add another test and get some accurate timings for the transaction by market model.

eyeonus · Jul 25, 2018

MarkAusten said:
On his server that is fine but for individual installations that would be a pain. I would seriously hacked off if every time the database threw an error I had to rebuild the entire database. As I said, it's a trade off. By making each market update a single transaction you are spreading the additional processing required for the transaction to be committed and a new one opened over many markets. It is quite annoying to have to wait for 2+ minutes for the listener update over a single transaction to commit especially since you get no feedback that anything is happening. The system just stops until the transaction is committed. One the other hand, the additional time that committing each market takes spread over all the markets with visual feedback make the longer time seem shorter. The point being that with visual feedback the longer time that transaction by market is likely to be over the single transaction model it doesn't seem so long. Or not to me at least.

It also needs to be borne in mind that we are not looking for super-speed all the time, just fast enough to prevent the non-updates from happening. The consensus at the moment seems to be that transaction by market give the best solution taking all things into account, not just speed.

Perhaps I should add another test and get some accurate timings for the transaction by market model.

Don't bother. It's already implemented on the listener.

Tromador · Jul 25, 2018

MarkAusten said:
It is quite annoying to have to wait for 2+ minutes for the listener update over a single transaction to commit especially since you get no feedback that anything is happening.

What exactly takes 2 minutes when running the listener?

It sounds like you mean a single market update. If so, be aware that this takes less than a second, often only 100ths of a second for my server, or 3 different PCs we have feedback from.

Along with taking 40 odd minutes to process listings.csv when the rest of us do it in 5.

Only so much can be done programmatically if you simply have a slow machine.

MarkAusten · Jul 25, 2018

Tromador said:
What exactly takes 2 minutes when running the listener?

It sounds like you mean a single market update. If so, be aware that this takes less than a second, often only 100ths of a second for my server, or 3 different PCs we have feedback from.

Along with taking 40 odd minutes to process listings.csv when the rest of us do it in 5.

Only so much can be done programmatically if you simply have a slow machine.

Sorry, I wasn't being clear again. If you import the listings file in a single transaction, the commit can take nearly 3 minutes to complete during which time the system appears to freeze in that nothing appears to happen during that time.

As for a slow machine, I didn't think I had one. I have a i5-4690 running at 3.5GHz with 16GB RAM

Still, I have updated my games PC with the latest code and the import is as follows:

Code:

NOTE: Downloading file 'listings.csv'.
NOTE: Requesting http://elite.ripz.org/files/listings.csv
NOTE: Downloaded 151.8MB of gziped data  11.5MB/s
NOTE: Processing market data from listings.csv: Start time = 2018-07-25 09:59:50.560386
NOTE: Finished processing market data. End time = 2018-07-25 10:10:50.699030

So it now takes 11 minutes on my PC instead of the 40 or so previously, a factor of around 4 increase in performance with your recent code improvements. A significant improvement, many thanks.