Introductions:This topic is quite interesting and a lot more fun, to tell you frankly,I had a little applications experience of using it before. Perhaps this time ,I have a simple way to use these freely available OCRs . Of course, back in our old school age we didn’t have cameras, mobile phones nor inexpensive Digicams, to apply such kind of software .But it so cumbersome of doing jotting/writing every time during class hours then and wouldn’t it have saved hours of copying notes!
Basically , this blog is just a complementary to my project "Robo-Book Scanner" and what we will be discussing here are: the main software that will make it easier to use OCR's engine (Tesseract or OCROpus) as a back end applications.And by using Gambas version(VB for Linux) which has an advantage featured of designing front end GUI.
Hence using Open source software as what I have said awhile ago, requires a lot of tweaking before we avail it by our own convenience .Anyway- that is how we patronize and support open community -freedom for all digital applications(my own term). Optical character recognition (OCR) is a system of converting scanned printed/handwritten image files into its machine readable text format. OCR software works by analyzing a document and comparing it with fonts stored in its database and/or by noting features typical to characters. Some OCR software also puts it through a spell checker to “guess” unrecognized words. 100% accuracy is difficult to achieve, but close approximation is what most software strive for.
To make it short ,we need to come across of how we can use Gambas 2 & Tesseract to work one application together ,and for us to see how each would give an excellent results as Robo-Book Scanner software which is an OepnGL release OCR software tool.So, our intentions is to use Gambas as IDE /GUI for OCR and its engine no other than- Tesseract or OCROpus.
The last one , is to give you a brief an idea technically
1) We need to hack Tesseract engines, i have read some articles "Hacking Tesseract V0.04" and this is the website - http://tesseract-ocr.repairfaq.org/ and if we could combine testing OCROpus which documentations can be found here -http://code.google.com/p/ocropus/wiki/InstallTranscript well that is better!
2) We need to study Gambas interfacing with C/C++ specially their shared libraries in the Linux Kernel. "How To Program Components In C/C++" and this website -http://gambasdoc.org/help/dev/overview
3) How we open image file in Gambas version 2 (esp: *.TIFF,JPEG,BMP ) and other available free "pdf"converter ;in this website - http://gamblis.com/2009/08/08/sharpdevelop-tutorial-how-to-rotate-an-image/
Objectives:
Requirements:
g++sconssvnlibpng12-dev libjpeg62-dev libtiff4-dev libavcodec-dev libavformat-dev libsdl-gfx1.2-dev libsdl-image1.2-devMethodology:
Installing iuLib
Download iulib 0.3 package from iulib's google code page.
http://code.google.com/p/iulib/Get any missing libraries, run:
sudo apt-get install libpng12-dev libjpeg62-dev libtiff4-dev libavcodec-dev libavformat-dev libsdl-gfx1.2-dev libsdl-image1.2-dev if you get errors downloading any of these libraries, change your package download server. See the note at the top of this document.
Run:
sudo scons install (This will help avoid an error in the next step. Make sure to have
scons installed.)
Run:
sudo make installInstalling Tesseract
svn co http:cd tesseract-ocr
./configure
make
sudo make install
Install OCROpus
# pick one of the following
release="-r ocropus-0.4.4" # this selects release 0.4.4
release=""
# download everything
hg clone $release https://iulib.googlecode.com/hg/ iulib
hg clone $release https://ocropus.googlecode.com/hg/ ocropus
hg clone $release https://ocroswig.ocropus.googlecode.com/hg/ ocroswig
hg clone $release https://ocropy.ocropus.googlecode.com/hg/ ocropy
wget -nd http://mohri-lt.cs.nyu.edu/twiki/pub/FST/FstDownload/openfst-1.1.tar.gz
hg clone $release https://pyopenfst.googlecode.com/hg/ pyopenfst
date;
# compile iulib
cd iulib
sudo sh uninstall
sudo sh ubuntu-packages
scons -j 4 sdl=1
sudo scons -j 4 sdl=1 install
cd ..
date;
# compile ocropus
cd ocropus
sudo sh uninstall
sudo sh ubuntu-packages
scons -j 4 omp=1
sudo scons -j 4 omp=1 install
cd ..
date;
# compile openfst
tar -zxvf openfst-1.1.tar.gz
cd openfst-1.1
./configure
make -j 4
sudo make install
cd ..
date;
# compile ocroswig
cd ocroswig
make
cd ..
date;
# compile ocropy
cd ocropy
sudo python setup.py install
cd ..
date;
# compile Python bindings for openfst
cd pyopenfst
make
cd ..
date;
You may have to update your LD_LIBRARY_PATH to include /usr/local/lib, for example:
export LD_LIBRARY_PATH=/usr/local/lib:/usr/lib:/lib
Remarks:
Conclusions: