In-Development TradeDangerous: power-user trade optimizer

eyeonus · Apr 29, 2018

Hmm. Weird. Maybe it's just me. I have to wait until the next dump to test again, I'll let you know tomorrow.

Just to be clear, you guys do have the same version of the script I have, right? The one on github?

(Note to self: TODO: Figure out an easy way to make it so each category only gets updated once, rather than the current method of blindly updating it every time it shows up in the commodity list.)

CMDR Ibizan · Apr 29, 2018

eyeonus said:
Hmm. Weird. Maybe it's just me. I have to wait until the next dump to test again, I'll let you know tomorrow.

Just to be clear, you guys do have the same version of the script I have, right? The one on github?

(Note to self: TODO: Figure out an easy way to make it so each category only gets updated once, rather than the current method of blindly updating it every time it shows up in the commodity list.)

No, I think my version is the one where you added Tromador’s listener to it. I’ll grab the one from GitHub now and time the updates again once we have the next dump.

Sorry, I can’t help on the category updating you mention. To be honest, most of what you and Tromador have been talking about goes right over my head!

Tromador · Apr 29, 2018

Update:

I think that I was using the same dataset when I timed an update yesterday. Tried the update today (so that it downloaded new data) and that was clearly taking much longer. In fact, I didn't sit to time it (as I needed to use the system) and did a -O clean instead, which was the usual ~10-15 mins.

Is there anything in the maddavo plugin you can crib which might help your update code, or is it something inside TD itself?

@CMDR Ibizan

Github TD + Github EDDB does use my mirrored data, so you are likely using the same one. To be honest, it doesn't matter where the data came from, it's the same data being processed either way.

eyeonus · Apr 29, 2018

CMDR Ibizan said:
No, I think my version is the one where you added Tromador’s listener to it. I’ll grab the one from GitHub now and time the updates again once we have the next dump.

Sorry, I can’t help on the category updating you mention. To be honest, most of what you and Tromador have been talking about goes right over my head!

That's the same version. I didn't make any changes when I added it to github- yet.

eyeonus · Apr 29, 2018

Tromador said:
Update:

I think that I was using the same dataset when I timed an update yesterday. Tried the update today (so that it downloaded new data) and that was clearly taking much longer. In fact, I didn't sit to time it (as I needed to use the system) and did a -O clean instead, which was the usual ~10-15 mins.

Is there anything in the maddavo plugin you can crib which might help your update code, or is it something inside TD itself?

I don't know, but hopefully. It's not TD's fault, though. It's SQL. For some reason updating information in the database takes forever, and that's an SQL thing. I think I might need to switch tactics and edit the .csv's instead of the database....

Tromador · Apr 30, 2018

eyeonus said:
I don't know, but hopefully. It's not TD's fault, though. It's SQL. For some reason updating information in the database takes forever, and that's an SQL thing. I think I might need to switch tactics and edit the .csv's instead of the database....

This is what I don't get. The world runs on SQL queries. What is it about TD+SQL which is so long winded?

eyeonus · Apr 30, 2018

Tromador said:
This is what I don't get. The world runs on SQL queries. What is it about TD+SQL which is so long winded?

It's not the querying that's taking long- that's why when you try to update with the same data it finishes so quickly- it's checking the date, which is an sql query, seeing that's it's not newer, and moving on.

It's the modifying that's taking forever. The "INSERT OR REPLACE" statement it runs when it does find new data.

From what I can tell, that's just a standard SQL thing: adding to a database is quick and easy, modifying existing entries takes forever.

Andargor · Apr 30, 2018

eyeonus said:
It's not the querying that's taking long- that's why when you try to update with the same data it finishes so quickly- it's checking the date, which is an sql query, seeing that's it's not newer, and moving on.

It's the modifying that's taking forever. The "INSERT OR REPLACE" statement it runs when it does find new data.

From what I can tell, that's just a standard SQL thing: adding to a database is quick and easy, modifying existing entries takes forever.

I'm gonna totally come in sideways to this argument, and ask why not NoSQL (Mongo, Redis, etc.) since most stuff is JSON anyway? It was built for this kind of performance requirement.

Tromador · Apr 30, 2018

Andargor said:
I'm gonna totally come in sideways to this argument, and ask why not NoSQL (Mongo, Redis, etc.) since most stuff is JSON anyway? It was built for this kind of performance requirement.

TD is a written with an SQLite backend DB and we are looking to replace maddavo with a usable data source, not rewrite the whole thing to a run a different database.

Tromador · Apr 30, 2018

eyeonus said:
It's not the querying that's taking long- that's why when you try to update with the same data it finishes so quickly- it's checking the date, which is an sql query, seeing that's it's not newer, and moving on.

It's the modifying that's taking forever. The "INSERT OR REPLACE" statement it runs when it does find new data.

From what I can tell, that's just a standard SQL thing: adding to a database is quick and easy, modifying existing entries takes forever.

Nope don't buy that either, a database where the data cannot be changed in a timely manner is not fit for purpose.

I am not really up with python. I am a systems guy who used to script in perl, but I will have a look at your code and see if I can see anything obvious with the sql.

eyeonus · Apr 30, 2018

Because TD uses a SQLite3 database.

eyeonus · Apr 30, 2018

Tromador said:
Nope don't buy that either, a database where the data cannot be changed in a timely manner is not fit for purpose.

I am not really up with python. I am a systems guy who used to script in perl, but I will have a look at your code and see if I can see anything obvious with the sql.

Welp, this is the line that takes forever when updating systems:

Code:

db.execute("""INSERT OR REPLACE INTO System
                        ( system_id,name,pos_x,pos_y,pos_z,modified ) VALUES
                        ( ?, ?, ?, ?, ?, ? ) """,
                        (system['id'], system['name'],
                         system['x'], system['y'], system['z'],
                         modified))

It's basically the same line in all the other update methods as well.

Admittedly, I didn't know thing 1 about SQL before working on this plugin, but I think it's safe to say the python code doesn't have much to do with it.

CMDR Ibizan · Apr 30, 2018

eyeonus said:
From what I can tell, that's just a standard SQL thing: adding to a database is quick and easy, modifying existing entries takes forever.

Just a thought - ignore me if I’m talking rubbish - if modifying the out of date entry is the problem, would deleting it and then adding the updated entry work any quicker? Forgive me if that’s a stupid idea - I’m not a programmer &#55357;&#56839;

eyeonus · Apr 30, 2018

CMDR Ibizan said:
Just a thought - ignore me if I’m talking rubbish - if modifying the out of date entry is the problem, would deleting it and then adding the updated entry work any quicker? Forgive me if that’s a stupid idea - I’m not a programmer ��

Unfortunately, no, because that's what the "OR REPLACE" part does- deletes the old entry and then puts in the new one.

However, I may have something: Apparently, SQLite opens the DB, executes the statement, commits it, and then closes it, every time. It's the opening that apparently is the slow bit, so, possibly, changing the code to do all the updates at once, rather than one at a time, will help tremendously, as from what I can tell, that means only opening the DB once, rather than N=#-of-updates times.

For those that understand, that means putting all the updates into a list of tuples within the for loop instead of doing the execute command as it is now, and then after exiting the for loop, running an executemany using that list. I'll work on it tomorrow and see if I can get it working. Cross your fingers.

Tromador · Apr 30, 2018

eyeonus said:
Welp, this is the line that takes forever when updating systems:

Code:

db.execute("""INSERT OR REPLACE INTO System ( system_id,name,pos_x,pos_y,pos_z,modified ) VALUES ( ?, ?, ?, ?, ?, ? ) """, (system['id'], system['name'], system['x'], system['y'], system['z'], modified))

It's basically the same line in all the other update methods as well.

Admittedly, I didn't know thing 1 about SQL before working on this plugin, but I think it's safe to say the python code doesn't have much to do with it.

Totally agree that it isn't the python.

I think the problem is that it has to search the whole table each and every time, in case a REPLACE is needed. INSERT on its own is much quicker because it doesn't care about the existing data.

Tromador · Apr 30, 2018

Any reason why you've not chosen to use tdb.updatesystem
e.g.

Code:

                tdb.updateLocalSystem(
                    system, sysName, x, y, z, added, modified,
                    commit=False,
                )

This is a whole read of the table as well, so you are doing it twice for each entry.

Code:

                result = db.execute("SELECT modified FROM System WHERE system_id = :id", {"id": system['id']}).fetchone()

Can you not crib the import_systems in Maddavo?

I suck at python, but hopefully this is something along the right lines...
(bearing in mind, system names are unique and this is easier for me to crib that using the ID, use whatever you want).

Code:

import somestuff
import tradedb
import morestuff

<LOTS OF CODE DOING OTHER STUFF>

# Do systems import

def importSystems(self):

<STUFF BEFORE LOOP>

<BEGIN LOOP>
    system = json.loads(line)
    sysLookup = tdb.systemByName.get
    result = syslookup(system['name'], none)

    if not result:
                    tdb.addLocalSystem(
                    system['id'], system['name'],
                    system['x'], system['y'], system['z'],
                    modified,commit=False,
                    )
                    self.modSystems += 1
    elif result.posX != system[x] or result.posY != system[y] or result.posZ != system[z]:
                    tdb.updateLocalSystem(
                    system['id'], system['name'],
                    system['x'], system['y'], system['z'],
                    modified,commit=False,
                    )
                    self.modSystems += 1

<GO BACK AROUND LOOP FOR NEXT LINE OF JSON>

        if self.modSystems:
            tdb.getDB().commit()

<STUFF AFTER LOOP>

#End systems import

<MORE CODE DOING MORE STUFF>

I guess what I'm aiming at here is that KFSOne already wrote the functions to manipulate the database and you should probably be using those, instead of directly accessing the db from the plugin.

Use functions in tradedb.
Only update if the system has moved position in the source data. Otherwise even if modified has changed, it's the same exact system and modified is lying.
Use KFS functionality to hold off on committing to database until we are done (commit=false and tdb.getDB().commit() ) so we only write it out once and otherwise do it in memory.

I guess it doesn't help that the TD api is sparsely documented, but it's there to be used. KFSOne already did the heavy lifting on the database, rather than invent the wheel, we can simply plug into what he's already done.

(Really hoping I make sense and this is useful and not total bum gravy of the worst kind)

Tromador · Apr 30, 2018

CMDR Ibizan said:
To be honest, most of what you and Tromador have been talking about goes right over my head!

That's fine, just pay us in +rep :O

eyeonus · Apr 30, 2018

Tromador said:
Any reason why you've not chosen to use tdb.updatesystem
e.g.

Code:

tdb.updateLocalSystem( system, sysName, x, y, z, added, modified, commit=False, )

A few reasons, the most important being that it doesn't take into account the database changes and therefore would have to be edited to do so before I could use it.

Code:
This is a whole read of the table as well, so you are doing it twice for each entry.

Code:

result = db.execute("SELECT modified FROM System WHERE system_id = :id", {"id": system['id']}).fetchone()

No, SELECT operates differently, because it's a read, not a write, so that's not a problem.

Can you not crib the import_systems in Maddavo?

I don't know what you mean by 'crib', but I can't use the maddavo plugin stuff, because that was built to import the .csv files that Maddavo made and provided on his site.

Code:
I suck at python, but hopefully this is something along the right lines...
(bearing in mind, system names are unique and this is easier for me to crib that using the ID, use whatever you want).

Code:

import somestuff import tradedb import morestuff <LOTS OF CODE DOING OTHER STUFF> # Do systems import def importSystems(self): <STUFF BEFORE LOOP> <BEGIN LOOP> system = json.loads(line) sysLookup = tdb.systemByName.get result = syslookup(system['name'], none) if not result: tdb.addLocalSystem( system['id'], system['name'], system['x'], system['y'], system['z'], modified,commit=False, ) self.modSystems += 1 elif result.posX != system[x] or result.posY != system[y] or result.posZ != system[z]: tdb.updateLocalSystem( system['id'], system['name'], system['x'], system['y'], system['z'], modified,commit=False, ) self.modSystems += 1 <GO BACK AROUND LOOP FOR NEXT LINE OF JSON> if self.modSystems: tdb.getDB().commit() <STUFF AFTER LOOP> #End systems import <MORE CODE DOING MORE STUFF>

I guess what I'm aiming at here is that KFSOne already wrote the functions to manipulate the database and you should probably be using those, instead of directly accessing the db from the plugin.

Use functions in tradedb.
Only update if the system has moved position in the source data. Otherwise even if modified has changed, it's the same exact system and modified is lying.
Use KFS functionality to hold off on committing to database until we are done (commit=false and tdb.getDB().commit() ) so we only write it out once and otherwise do it in memory.

I guess it doesn't help that the TD api is sparsely documented, but it's there to be used. KFSOne already did the heavy lifting on the database, rather than invent the wheel, we can simply plug into what he's already done.

(Really hoping I make sense and this is useful and not total bum gravy of the worst kind)

Except we can't, because they're not designed to work with the materials we have, they only exist for systems and stations so I'd have to write all the other ones anyway, and if you actually look at those functions, they are doing basically the same thing as what I wrote. The only real difference is that instead of doing REPLACE, it does UPDATE.

I've already decided last night that I was going to try switching to using UPDATE, since that's what it's for. The reason I hadn't used UPDATE up to now is because it uses a completely different syntax than INSERT. It was easier to use INSERT OR REPLACE, but apparently it's just not worth it. It'll take me a bit to make the changes in the code, but I'm hopeful.

And if it doesn't work, I'll do the executemany thing as well.

Tromador · Apr 30, 2018

eyeonus said:
A few reasons, the most important being that it doesn't take into account the database changes and therefore would have to be edited to do so before I could use it.

No, SELECT operates differently, because it's a read, not a write, so that's not a problem.

It still has to go through the whole table each and every time to find (SELECT) and return the results that match your criterion (WHERE). This is expensive when repeated 1000s of times.

I don't know what you mean by 'crib', but I can't use the maddavo plugin stuff, because that was built to import the .csv files that Maddavo made and provided on his site.

Crib means to sorta copy but tailor to your requirements. Taking code and using it as an example.
Now, yes - Maddavo is designed to work with CSV files, but once it's read a line from the CSV, it's just variables, same as when you read a line of JSON, the API doesn't care.

Step 1 - Parse line of source data -- Different for JSON / CSV
This will necessarily require somewhat different code.

Step 2 - Output parsed data into SQLite3 via API -- Same regardless of source.
This could be the exact same code, with minor differences, as I showed in my example, using the existing API, which we know isn't slow.

Except we can't, because they're not designed to work with the materials we have, they only exist for systems and stations so I'd have to write all the other ones anyway

The API is designed to work with TD files and its SQLite3 database, including .prices files which aren't a maddavo design. You could just as easily simply convert the json pricelist into .prices and call "trade.py import myprices.prices" .
Isn't that easier than directly accessing the database. Maddavo is a red herring, get unhooked from it. I'm suggesting using what's in TD, *NOT* what's in Maddavo (except as examples of how to use the TD API). TD doesn't care where the data comes from, or if it's in CSV, JSON, or some proprietary format, so long as the plugin can correctly supply the data to the API calls.

I mean, do it the way you want to do it, but I'd be looking for the easy way, instead of reinventing what's already there.
So, yes, we totally can - but it's your plugin

eyeonus · Apr 30, 2018

Oh, well then, cribbing is exactly what I've been doing.

No, I can't use it, for the reasons I've already said. 'tdb.addLocalSystem' doesn't take the system id as an argument because it assumes the DB is using AUTOINCREMENT for the table, which it isn't, because the plugin changes that to use the EDDB ID instead. So, I have to either change that method to be able to pass the system id- which will break TD anywhere else it calls that method- or I have to write my own.

Same goes for the update system, add station, and update station methods.

And even if I could use those four, I'd still have to write my own methods for Upgrades, UpgradeVendors, Ships, ShipVendors, Categories, and Items, because equivalent methods for them don't exist in TD.

I wasn't talking about Maddavo's plugin. I was talking about TD itself. Maddavo's plugin has literally nothing I can use, except as a source of inspiration and examples.