Discussion Has any one experimented with OCR tools for scraping the commodities market prices?

zxctypo · Dec 2, 2014

Yeah I just tested with Seeebeks lang file, and got slightly lower accuracy than the default lang, but that is with my preprocessing

cowboy: I've been playing around with that for hours actually. I detect the rows of text with my own code, then cut them up into pieces. This is at 1920x1080.

Scaling the text up (normally good for OCR) seems to drop my accuracy to below 30%, and I'm not quite sure why.

I also believe I have the font that they use (taken from another thing they made), but training tesseract on it using jTesBoxEditor has been rather disappointing so far.

I'd love to be able to work together with everyone currently trying to find an OCR solution, it's just funny that seeebek is using python, I'm using .net, and you're using straight up bash lol

wolverine2710 · Dec 8, 2014

zxctypo said:
Yeah I just tested with Seeebeks lang file, and got slightly lower accuracy than the default lang, but that is with my preprocessing

cowboy: I've been playing around with that for hours actually. I detect the rows of text with my own code, then cut them up into pieces. This is at 1920x1080.

Scaling the text up (normally good for OCR) seems to drop my accuracy to below 30%, and I'm not quite sure why.

I also believe I have the font that they use (taken from another thing they made), but training tesseract on it using jTesBoxEditor has been rather disappointing so far.

I'd love to be able to work together with everyone currently trying to find an OCR solution, it's just funny that seeebek is using python, I'm using .net, and you're using straight up bash lol

He has REALLY enhanced his tool. It currently uses ML (machine learning) for the digits/numbers and let EliteOCR do processing the names. He's atm at 99.94% accuracy. That number can grow if he receives more data from commanders. He's experimenting with other ML solutions as well. In the end he will try to do it for names as well. Also the auhor of EliteOCR, zxctypo and filth are looking into ways to decouple the lot. Purpose let commanders with OCR skills but no programming skills make an OCR solution and then another program takes the data, processes it and uploads it to for example EDDN/BPC/TD/Thrudd. Sounds cool ;-)

zxctypo · Dec 9, 2014

I know he is, I'm the one who wrote all of the the code to get it to 99.94, I'm the one working on the accuracy ;-)

cowboy · Dec 9, 2014

You know, I've been meaning to learn python for ages and should have for this project... but I forgot and before I knew it, I had a huge bash script.

Have you ever worked with arrays in bash? SHOOT ME PLZ

On a side note, I can't wait for there to be a better alternative than what I've done. The last thing I need to do is maintain another OSS project!

wolverine2710 · Dec 9, 2014

cowboy said:
You know, I've been meaning to learn python for ages and should have for this project... but I forgot and before I knew it, I had a huge bash script.

Have you ever worked with arrays in bash? SHOOT ME PLZ

On a side note, I can't wait for there to be a better alternative than what I've done. The last thing I need to do is maintain another OSS project!

There MIGHT be an alternative. I like your idea of stitching multiple screenshots together to one screenshot. EliteOCR could pick up that one. It seems your solution with a VM might not be suitable (as in easy enough) for the casual commander. Ive gone through my OCR thread again today and found back which I knew was in there. Multiple commanders have made great efforts to get to an OCR solution, some have a partial POC which covers a specific area but have not evolved into a publicly release complete solution. One is particularly interesting.

Its made by commander Crook. See his post #162. also check post #164 and #165 where he responds to a question of mine. A few snippets from those posts: "I've knocked up an OCR app that could run in the background, then scrape the commodities market when it sees the data." and also "The idea would be to play with the app in the background looking for the commodities screen. When our finds one it takes the data. No interaction by the player needed beyond remembering to scroll through the commodities to get them all. I think taking screenshots will be a major pain and break us out of the game."
Not sure if its limited to windowed mode like DestroyBoy's solution but he didn't mention it. If its works in full screen mode it would mean a capture of the complete market would be generated which can be fed to EliteOCR or to zxctypo's OCR engine/solution.

Also of interest wrt screen capturing - and not filling the normal screenshot directory with images - is something cmdr graham.reeds made. His post can be found on the OCR project thread here. The interesting part wrt capturing of that post is: "So far I have a tray app that takes captures the screen but if I get time this evening I will look at putting in the tesseract". Also "I wasn't going to actually save the image to disk (though it currently does just to test) and simply upload the extracted result. Future expansion could be downloading of updated training but to begin with it will just do the uploading."

Perhaps that would be useful for A future version of EliteOCR. It would make it easier to detect/select commodities market screen - instead of wading through all screenshots made by a cmdr.

It might be worth contacting both commanders to see if they can work with EliteOCR or later zxctypo OCR engine.

Edit: Corrected url for post by graham.reeds and extra snippet from him.

cowboy · Dec 10, 2014

The reason I used Vagrant was to avoid installing a bunch of dependencies into Windows, which is usually pretty painful. Most of the tools I'm familiar with are most easily built or installed in Linux or OS X. All I'm really using are imagemagick, tesseract and agrep for the fuzzy string matching. Stitching can be done with just imagemagick.

FWIW, taking screenshots is trivial. I even created some AutoHotKey code to deal with it:

Code:

#IfWinActive ahk_class FrontierDevelopmentsAppWinClass

  ; Move the cursor in such a way as to allow scrolling the
  ; commodities market with the mousewheel *without* highlighting
  ; the commodities market panel, then mousewheel up to the top.
  PrintScreen::
    ; Uncomment to get x,y cursor position.
    ; MouseGetPos, xpos, ypos 

    ; Msgbox, The cursor is at %xpos%, %ypos%
    MouseMove 2266, 400
    MouseMove 2160, 400
    MouseClick,WheelUp,,,20,0,D
    ; Delete previous market temp images.
    FileDelete C:\Users\cowboy\Documents\GitHub\ed-trade-helper\output\tmp\*
    Return


  ; Because there's no "take screenshot" keybind, piggyback onto
  ; the existing screenshot key to mousewheel down 4 ticks after
  ; the screenshot is taken.
  ~F10::
    Sleep, 250
    MouseClick,WheelDown,,,4,20,D
    Return

#IfWinActive

I think fully half the Elite userbase are programmers.

wolverine2710 · Dec 10, 2014

cowboy said:
The reason I used Vagrant was to avoid installing a bunch of dependencies into Windows, which is usually pretty painful. Most of the tools I'm familiar with are most easily built or installed in Linux or OS X. All I'm really using are imagemagick, tesseract and agrep for the fuzzy string matching. Stitching can be done with just imagemagick.

FWIW, taking screenshots is trivial. I even created some AutoHotKey code to deal with it:

Code:

#IfWinActive ahk_class FrontierDevelopmentsAppWinClass ; Move the cursor in such a way as to allow scrolling the ; commodities market with the mousewheel *without* highlighting ; the commodities market panel, then mousewheel up to the top. PrintScreen:: ; Uncomment to get x,y cursor position. ; MouseGetPos, xpos, ypos ; Msgbox, The cursor is at %xpos%, %ypos% MouseMove 2266, 400 MouseMove 2160, 400 MouseClick,WheelUp,,,20,0,D ; Delete previous market temp images. FileDelete C:\Users\cowboy\Documents\GitHub\ed-trade-helper\output\tmp\* Return ; Because there's no "take screenshot" keybind, piggyback onto ; the existing screenshot key to mousewheel down 4 ticks after ; the screenshot is taken. ~F10:: Sleep, 250 MouseClick,WheelDown,,,4,20,D Return #IfWinActive

I think fully half the Elite userbase are programmers.

Thanks for the explanation. As you use a limited set of tools/commands could that not be distributed? Iirc you are using bash which make it slightly less easy.
Can your program automatically detect if its at the end of the commodities market?
Did you have a chat/PM with Seebek to see if your method could be integrated into EliteOCR?

Dejay · Dec 10, 2014

wolverine2710 said:
Thanks for the explanation. As you use a limited set of tools/commands could that not be distributed? Iirc you are using bash which make it slightly less easy.
Can your program automatically detect if its at the end of the commodities market?
Did you have a chat/PM with Seebek to see if your method could be integrated into EliteOCR?

Just a note about taking screenshots: I've looked into it and the usual way is using GDI programming api which 99% of standard screenshot apps use. But it's slow / not good for capturing "video".
There are faster ways using the DirectX9 interface and GetFrontBufferData but I'm not sure if that works reliably with windows 7. I'm more of an OpenGL guy. Ideally I wanted to do OCR on the graphics card so grab the screen and do all stuff on the GPU to avoid transferring lots of data from the gfx card to the cpu.

http://stackoverflow.com/questions/5069104/fastest-method-of-screen-capturing

cowboy · Dec 11, 2014

wolverine2710 said:
Thanks for the explanation. As you use a limited set of tools/commands could that not be distributed? Iirc you are using bash which make it slightly less easy.
Can your program automatically detect if its at the end of the commodities market?
Did you have a chat/PM with Seebek to see if your method could be integrated into EliteOCR?

All I'm doing is cropping out the scrollable area from subsequent images and using imagemagick to detect where the vertical overlap offset is and to composite the crops. If the OCR really works, stitching together images is probably an unnecessary step.

zxctypo · Dec 12, 2014

cowboy said:
All I'm doing is cropping out the scrollable area from subsequent images and using imagemagick to detect where the vertical overlap offset is and to composite the crops. If the OCR really works, stitching together images is probably an unnecessary step.

If you have anything on OCR, please message me, I'm the one working on that part of it

Rarz · Dec 12, 2014

I was wondering; does the OCR tool allow for automation, or a command line option? That way I can combine it with a directory/file watch and automate the gathering of data.

Take screenshot -> Triggers OCR -> Wait for CSV -> Delete screenshot -> Import CSV into datasource -> Profit.

wolverine2710 · Dec 12, 2014

Rarz said:
I was wondering; does the OCR tool allow for automation, or a command line option? That way I can combine it with a directory/file watch and automate the gathering of data.

Take screenshot -> Triggers OCR -> Wait for CSV -> Delete screenshot -> Import CSV into datasource -> Profit.

Iirc the next version of EliteOCR (due/planned I believe Sunday or so) will create a csv file. I can't speak for the other OCR solutions out there.

zxctypo · Dec 12, 2014

Rarz said:
I was wondering; does the OCR tool allow for automation, or a command line option? That way I can combine it with a directory/file watch and automate the gathering of data.

Take screenshot -> Triggers OCR -> Wait for CSV -> Delete screenshot -> Import CSV into datasource -> Profit.

The actual OCR will be a standalone plugin, so you could use it in any tool you want, etc.

peatew · Jan 11, 2015

Hey zxctypo,

I have created an opensource cargo planner tool in .NET that could really do with some OCR-ification if you want to team up on it?
I wrote it to help me out finding good trades, i did not even check whether anyone else had made any tools.

Source Home: https: //bitbucket.org/peatew/elite-cargo-planner/
Download Binary: https://bitbucket.org/peatew/elite-cargo-planner/downloads/Elite Cargo Planner 1.1.zip

How about it

Xntryk · Jan 11, 2015

The way trading, and trade information in this game works is just silly. If I visit a station and get that trade info for 24 hours,.. I should really be able to pull up the whole commodities price list for that station for the next 24 hours,.. not some vague and incorrect list of imports and exports. I have a computer that can calculate hyperspace and super-cruise, but cant remember a 6 column spreadsheet? Really? I understand not being able to purchase a pricelist for a place you haven't been yet, but if you've been there in the last 24 hours, your computer should remember that list.

wolverine2710 · Jan 20, 2015

peatew said:
Hey zxctypo,

I have created an opensource cargo planner tool in .NET that could really do with some OCR-ification if you want to team up on it?
I wrote it to help me out finding good trades, i did not even check whether anyone else had made any tools.

Source Home: https: //bitbucket.org/peatew/elite-cargo-planner/
Download Binary: https://bitbucket.org/peatew/elite-cargo-planner/downloads/Elite Cargo Planner 1.1.zip

How about it

I've put it today on the announcements section of the 3rd party tools thread. Also its now in the TODO section. I hope you get some help!!
Might I suggest creating a thread for it. That way I can link to it ;-)

An alternative to OCR-ing if you want data is to connect to EDDN. EliteOCR and RegulatedNoise (can) upload to EDDN. Another alternative is to install Trade Dangerous and use Maddavo's import tool. Its a merge tool for TD which supports EDDN.

mearmortal · Jan 20, 2016

**** As of Today ****
For OCR check out the following post #1881 here by Otis B:

https://forums.frontier.co.uk/showthread.php?t=68771&page=126&p=3450577&viewfull=1#post3450577

foxpur · Jan 21, 2016

mearmortal said:
**** As of Today ****
For OCR check out the following post #1881 here by Otis B:

https://forums.frontier.co.uk/showthread.php?t=68771&page=126&p=3450577&viewfull=1#post3450577

Yay! A new ver!!!

Discussion Has any one experimented with OCR tools for scraping the commodities market prices?

zxctypo

wolverine2710

Tutorial & Guide Writer

zxctypo

cowboy

wolverine2710

Tutorial & Guide Writer

cowboy

wolverine2710

Tutorial & Guide Writer

Dejay

cowboy

zxctypo

Rarz

wolverine2710

Tutorial & Guide Writer

zxctypo

peatew

Xntryk

wolverine2710

Tutorial & Guide Writer

mearmortal

foxpur