Friday, June 21, 2013

Android and Tesseract (Part 1)

Over the past few days I've been playing around with the Tesseract native packages that one can rope into a library for Android applications. This library allows one to conduct optical character recognition on Android mobile devices, which is a rather intriguing concept. The ability to do this has been around for some time (2006). You can read more general information on its history and where it comes from here: http://en.wikipedia.org/wiki/Tesseract_(software). The story is rather interesting as the software was originally written by Hewlett Packard in the late 80s early 90s area and sometime down the road ended up in the possession of Google and thereafter available for use in Android. So I figured I'd share a bit on my experience with it in two parts. The first of these will be a brief overview on the setup and the next part will be a bit with some sample code I managed to get together for its use.

The set up is a fairly easy task, though it does require a little bit of critical thinking as there are some problems that can be hard to work through even with community resources. Before you start a project of your own, make sure your IDE (in my case Eclipse) has the ability to compile Java and C++. If you need to add this feature on Eclipse you can find it in their Indigo repository by adding it through the Help > Install New Software dialog.



Once you have those things you'll need to go out and download the Tesseract library project files which you can find here: https://github.com/rmtheis/tess-two. You can either clone the repository or simply download an archive copy, the choice is yours on that front. Upon download you then can simply import this project into your IDE environment. When you've finished doing so you'll want to make sure you have checked off in your project properties that it is indeed an Android library. In Eclipse it looks like the below screenshot.



After that you will need to make sure you have set up the Java NDK (http://developer.android.com/tools/sdk/ndk/index.html) with the TessTwo project . All you have to do here is unpack the archive somewhere accessible and define the path to it in your IDE. In my Eclipse set up the setting is here under the project properties:




Then you just need to run a build on it and let the IDE do its work. The build can take some time so if I were you I'd suggest finding something to do while the time passes. After the build completes you are ready to use it in a project. We will go into actually making use of this on the next post which I hope to have hammered out here this coming week.