ExplOCR: An OCR Application Explorers May Find Useful

Hi Explorers,

I've just released a hotfix for the program to work with 1.5 / horizons, available at https://github.com/ThoroughlyLostEx.../TagBeta3-1.5-a/ExplOCRbeta3-for-Horizons.zip

Unfortunately, there were more problems than I had expected and I haven't been able come up with solutions for all of them yet. I still think that I should publish a hotfix for now, so you will get useful data in most cases.

The remaining problems mostly affect the body type description text and the mining reserves / terraforming information directly below that. This creates the rather awkward situation that you will see "good" precision while reading the table items / numbers, but may occasionaly get the wrong planet type!

This issue will affect only some planet types (ammonia is sometimes mistaken for water world, I think). Another example is that Wolf-Rayet stary may show up with the wrong subtype (C/N/O). I'll improve performance there when I have time (soonish). Keep an eye on the planet type for now, and fix it manually in the few cases where it is off. Notify me of new examples you find.

Has anyone used the workaround I suggested above? I haven't included a mass-reread yet. That will make sense only after I fixed the remaining problems of the software. PM me to rell me how many systems you will need to convert :eek:, it may make me hurry up.

Thank you,

TLE

PS: Any chance someone will use my program on the upcoming major expedition?
 
Hi, i was just testing it out (as well as a few others), as a way to prepare for the Distant Worlds expedition. It's very interesting and nicely done, though I'm not sure if i want to use it massively or just for memorable places.
From the first few systems where i tried it out, there were 2 things that slowed me down a bit. Though maybe it just means that i'm not the target audience, as i probably won't ever want to carefully check each image for errors, even when they are well marked.

1° Having to click enter (at the very least) between each screenshot, mainly because it's far away from the default keybind (left hand) while the right hand is on the mouse and i don't want to move it. I'd rather be able to scan the whole system with the least clics possible.
Maybe a toggle where it'd auto-validate each screenshot ? (it is easily solved by using a keybind right next to enter, but that'd still mean one additional key press each time).

2° Having to tick "await stitch" for ring planets, as that ends up being quite a lot of work for systems with 10 or more ringed bodies. Maybe add an option for a 2nd keybind that would be used to automatically stitch the image to be taken with the previous image taken (might have to overwrite or delete the previous image to keep things clean though ?)
 
Last edited:

Jon474

Banned
I've been following the workaround and I intend to use your application during DW. I am happy to carry on with the workaround until you say otherwise. In fact, I'll be keeping all my screenshots so it will be possible for me to obtain data analysis using both methods until you say otherwise, really. There will be LOTS of screenshots!

Kind regards
Jon
 
Great app,

time to start a nice Planet DB for me :D
DW is the perfect opportunity for this


any chance on support for 2560x1080 resolution ?

Switching resolutions temporarily works as a workaround but is a hassle in the long run ;)
If you need any information for that just contact me.

Regards
Andre
 
Hi Explorers,

I have just published a new beta release of my OCR fixes for 1.5 / Horizons. The new version will hopefully work with adequate accuracy. Ammonia/water worlds are now handled correctly ;), and terraforming / mining resource information is now shown in the grid.

If you have made any scans with previous versions, especially before the first 1.5 / Horizons fix, you can now re-read those from the screenshot archive. This will potentially correct incorrectly OCRed data. The way to do it:

  • You are acting at your own risk.
  • Make a backup of the database and screenshot directories. Find them via menu "Config"/"Save directories". The systems.xml file is most important, but better backup all of it!
  • Really, do make a backup.
  • Close the program.
  • Restart the program with command line argument "multiread" (This is a safety feature, just so people necer ever activate this by accident)
  • Open the table with scanned objects in it in menu "Table"/"Display".
  • You should be able to see a button "Multi-Read" somewhere near the bottom of the dialog.
  • Uncheck the Read-Only box.
  • Use standard windows <Ctrl>+Click / <Ctrl>+<Shift>+Click selection to select the first cell in all rows you need to read again.
  • Click "Multi-Read".
  • If there are many items to re-read, go make a cup of tea.
  • When all is done, check if the items you re-read became better through this.
  • Keep the backup.

As usual, send feedback to me by private message.

There are still a few quirks, e.g. Wolf-Rayet Sunbtype O is detected as Wolf-Rayet Subtype N. Nevertheless, I think I have caught most of the other known bugs by now. I can now start thinking of actual improvements, and hope they don't change the font again for 1.6/2.1

TLE
 
Last edited:

Jon474

Banned
Hi TLE

Is the download on the same link as in post #41, please?

Thanks in advance
Kind regards
Jon
T-6E pilot
 
Hello Explorers,

as it turns out the 1.6/2.1 patch includes a few changes to system map information panel layout that confuse the ExplOCR reading process. At thispoint, I am carefully optimistic that I might get away by just adaptimg to the layout changes rather than having to retrain each letter for reading. I'll spend some time exploring in the game to figure out what to fix, and then I'll fix things later on.

TLE
 

Jon474

Banned
Hope it is not too much work for you.

I am still using this excellent App in all of my exploration. Many hundreds of items scanned now!

Kind regards
Jon
 
Hello Explorers,

I'm not quite done here yet and still busy redoing all the 200+ screenshots in my standard automated testing set (... which they apparently make me do on every release by changing the layout). By what I can see, I should be able to upgrade the application to handle the new layout in a reasonable amount of time. Of xourse, this will take while to actually do because I have a lot of other things to take care of.

I'd like to remind all users that you'll be able to re-read existing screenshots once the program has been updated, so that the (badly OCRed) systems you currently enter into ExplOCR can be corrected later on (Be sure to make a backup just in case). There is however one important thing: The new layout uses up more screen space so many types of planets (even without rings) will no longer completely fit on a screenshot without scrolling. It is absolutely necessary that you use the "Await Image Stitch" feature, scroll down and take additional screenshots until all info is covered. If you don't have all parts of the info in your screenshot archive (-> created by the program automatically) you cannot re-read all information!

I am somewhat annoyed that the layout was changed in a way that the information no longer fits in one screenshot even for common planet types and without rings involved. That makes ExplOCR somewhat more cumbersome to use, and there is absolutely nothing I can do to work around it! I somehow assume that the layout was changed to make things easier for people with lower resolution (consoles?), so I guess the change serves some kind of purpose. As my program is only for PC, I wish the PC had stayed in its classic layout :cool:. I apologize fo any inconvenience.

TLE
 
I am somewhat annoyed that the layout was changed in a way that the information no longer fits in one screenshot even for common planet types and without rings involved.

You and me both! FWIW, I submitted a bug report for it (here). Not only have they increased the space between each line item, but a good portion of them needlessly wrap which wastes even more space. Not convinced it was for low resolution setups either as the font size hasn't increased to make use of the extra space... hopefully they'll reconsider and put it back as it was. OCR issues aside, it looks plain fugly now.
 
PHP:
This may or may not be useful to you; - I've been messing about with OCR on the system map screenshots and while I didn't get anything like the excellent results you have, I did manage to find a simple image processing thing to remove the interference from having a star present "behind" the text: go through the image pixel by pixel and strip out any pixel where the r,g,b values are all less than 100, and either pass everything else through unchecked or convert them to a greyscale average of r,g,b.
Example using Python PIL:
Code:
def crude_brightness(target_image,threshold):
    imagex,imagey = target_image.size
    for x in range(0,imagex):
        for y in range(0,imagey):
            r,g,b = target_image.getpixel((x,y))
            if r > threshold and g > threshold and b > threshold:
##                avg = int((r + g + b) / 3)
##                target_image.putpixel((x,y),(avg,avg,avg))
                target_image.putpixel((x,y),(r,g,b))
            else:
                target_image.putpixel((x,y),(0,0,0))
    return target_image
Another that might be useful - I found a quick sanity check for the ice, rock, metal values; for a free-orbiting world with any significant percentage of ice, the percentage of rock will be ~ ((200/3) - (ice * 2/3)), for a moon the percentage of rock will be ~ ((600/7) - (ice * 6/7)); getting this can be a way of recognising whether the object in question is a planet or moon.
 
Last edited:
Thanks for the suggestion, I'll keep it in mind next time tinkering with the code section that fits little boxes around things that seem to be letters before they are fed into the classifier. Depending on the particular context and background, separate letters tend to stick together (ATMOSPHERE [TY]PE), or a single letter may be split into pieces (3,500 KM -> 3,500 KNI).

For now, this will have to wait because I'm focussing on improving the "auto stitch" feature that has become somewhat more important and as it turns out occasionally "burps".

Please note that my sourcecode is freely available from the Github project that the download link points to. Of course all are invited to try out improvements on that, or even copy the code and start their own project (The license allows a lot of use scenarios). From what I remember, I made the settings so you can't commit changes back into my project directly, so just PM me if you have interesting changes, or are working on something you'd like to contribute.

TLE
 
Hi Explorers,

I have just published a beta release of my OCR fixes for 1.6 / Engineers available at

https://github.com/ThoroughlyLostEx...agBeta3-1.6/ExplOCRbeta3-for-Engineers2.1.zip

The new version has a few quirks which are mainly related to the fact that the screenshot "auto-stich" feature has become so much more important with 1.6/2.1. It was built as a little extra for people who like planetary rings, but since the layout changes came along it now affects main data. Until I have time for some improvements in this, it will create minor issues like items occuring twice or "orbit major axis: 0" from different rings occuring only once.

With all that said, it is probably better to have a slightly rickety auto-stitch than none at all. Be sure to read up on the auto-stitch section "Await Image Stitch" in the first post of this thread before you try to cope with the new layout.

Of course many of you will have made screenshots since 1.6 that were OCRed badly and that you'd like to re-read. Re-reading manually will work as descriped in the original thread post. Here is a bit I copied from the 1.5 release on how to use the mass-re-read feature (remember: an experimental feature, use AT YOUR OWN RISK, backups are important!)


  • You are acting at your own risk.
  • Make a backup of the database and screenshot directories. Find them via menu "Config"/"Save directories". The systems.xml file is most important, but better backup all of it!
  • Really, do make a backup.
  • Close the program.
  • Restart the program with command line argument "-multiread" (This is a safety feature, just so people necer ever activate this by accident)
  • Open the table with scanned objects in it in menu "Table"/"Display".
  • You should be able to see a button "Multi-Read" somewhere near the bottom of the dialog.
  • Uncheck the Read-Only box.
  • Use standard windows <Ctrl>+Click / <Ctrl>+<Shift>+Click selection to select the first cell in all rows you need to read again.
  • Click "Multi-Read".
  • If there are many items to re-read, go make a cup of tea.
  • When all is done, check if the items you re-read became better through this.
  • Keep the backup.

As usual, send feedback to me by private message. Or post something here to keep me from being snowed over by "why I hate X" posts.

TLE

PS: I have tried but failed to find out why the layout was changed. My request was forwarded to the development department, but I have not heard from them (they are likely busy). Of course, feel free to petition for a layout where all fits on one screenshot page. Will save me the time of improving auto-stich. :cool:

PPS: I'd very much prefer an API in the game where people can get the data without needing my OCR!
 
Ooooops!

As I've noticed in the meanwhile the automatic system name / body code detection of ExplOCR version Beta4 / 1.6 is broken.

This happened because the feature uses the NetLog system information line and the format of that line has been changed. I hadn't noticed because I develop my fixes using a set of testing screenshots and don't do any actual flying around.

I am somewhat thrilled to see the change, because there are now "StarPos" coordinates in the log which seem to be galactic coordinates. That look like helpful extra information to me, have any of you worked with / used / validated that info? Which coordinate is by convention x/y/z by the way? :cool:

Expect a fix soon, but not immediately because I want to add the star position information to the table view / systems.xml file and I guess I should put some thought into that first. In the meanwhile, please type in the system name and body code by hand. Sorry for the inconvenience. Note: Rereading the entry in the database will not help because the information mainly doesn't come from OCR! You can of course edit the system name / body code manually at a later time also.

TLE

PS: Does anyone know if the change to net log line (and new galactic coordinates "StarPos") happened in 1.5/2.0 or 1.6/2.1? I always miss out on those things.

PPS: Now would be a good time to point out other bugs I can fix with the next patch btw.
 
This would be a great tool to use for filling out the system / body form for http://edmaterializer.com
I just messed around with making a tool for that and this could make the annoying data entry vastly easier.

If this one could also capture the distance of the selected body which is over the 'tooltip' when selecting it and the coordinates then this would have everything that EdMat needs :D
And it could be simply transfered by putting it in json in the clipboard and then pasting it on EdMat's form (which would need to be set up for that first but that'd be not the biggest of hurdles).

As for the coordinate convention question, the galmap uses x/y/z in the form of LR/UD/FB, don't ask me why FD decided to go with the derp route of putting height on the y axis.
I'll never understand the reasoning for this. Might make sense for control mapping as that'd put Yaw from PYR on the axis about which it'd then rotate, but then again, with height on z would it be the same for PRY.

Whatever... https://xkcd.com/927/
 
I'm afraid it's not just the system name that's broken. For me ExplOCR crops a too small section of the screenshot, marks a bunch of characters with different colours, and makes up some random text.
explocr.png

Edit: Apparently that's because ExplOCR isn't resolution independent. It really should be; the neural nets likely have no problem if you just rescale to a reference size, and you can use the plentiful horisontal divider bars to measure the width of the information panel. Pretty please?

For comparison, here's the result of shrinking the entire screenshot to 1080 and loading it in ExplOCR, it then only made one clear mistake:

explocr_1080.png
 
Last edited:
For comparison, here's the result of shrinking the entire screenshot to 1080 and loading it in ExplOCR, it then only made one clear mistake:
He mentioned that non 1920x1080/1200 resolutions are "excepted to not work properly".
As for the resizing, it depends on how ED handles the font size in regard to resolution, if it proportionally scales the font up with the resolution then it could work likely easy, but if the font stays roughly the same then it'd be smaller after scaling down.

You could upload the screenshots for him, the original size ones, so he can take a look at it.

As for the accuracy: the first time I used it (1920x1080 here) it recognized nothing, not a single letter.
I almost gave up and threw it off the disk again but then I somehow had the feeling I should test the star atleast before doing so as I tried it at first with the first planet of the system, and weirdly enough, that seemed to have make it change it's mind as it read almost everything well from that screen.
And when I then tried the planet again that didn't work at all before did that work too now. So yeah, might be a little picky? XD

I also had tons of crashes with if after pressing ctrl+alt+shift, not sure if that was my Firewall messing things up, it sadly has the habit of silently blocking memory access which can easily mess things up.
But after a while it seemed to work, minus that one time where it messed up the capture and mixed half of my desktop into the capture for some reason. Maybe there should be a small delay before and after capturing before jumping out of the game.


As for my own EdMatClient, I managed to make it crawl netlogs for coords, now I'm trying to make it use that and fix entries with missing or not fully accurate coords.
Is there some coordinate database somewhere by chance? Just wondering.


Edit: regarding the cropping, I once attempted to make my own OCR thing much like this for mining, as in reading out the rings from system map screens.
Managed to get quite far considering it wasn't even OCR in the literal sense, just a lot of per-letter image comparisons, anyways, I determinated the left side's crop area by going down vertically until I've hit the wide line, then 'crawled' that along to the right and then I had the width I needed for cropping. Just as an idea to get around a hardcoded crop width.
 
Last edited:
Back
Top Bottom