Experiments with Tesseract

March 1, 2009 by jinsbond007

I promised my friends at Swathantra Malayalam Computing long time back about extending Tesseract to support Malayalam. For last some weeks i have been talking with Debayan who does for Bengali and was trying to understand the detail of work required. We decided to work together on enhancing the existing tesseract system for indic languages. I will be following the work up in Indic tesseract space.

I conducted some small intial experiments. It gave me an idea of what i have and what is to be done(i am looking forward to a high performance system with efficiency for practical use).

To test the symbol classifier of tesseract(note just classifier), i trained it with a single page and tested on another of same font.

Training data was of about 1000 symbols.Which is pretty small compared to the usual number of symbols we encounter in malayalam which is pegged around 250-350(there are many variations! Hussain sir can give a better number). To my amazement, tesseract training is easy,simple and takes pretty less time(performance evaluation might be immature since, we haven’t trained it perfectly yet).

My initial observations are,

  • The segmentation part of Tesseract is not great and it might not work well with Indic languages( from what i understand, lot of research work is going on improving the segmentation of tesseract). I found it handicapped in case of upper and lower matras.
  • Since it is not designed for languages with pre base post base modifier forms, it wont do any re arrangement of modifiers(we have to add language heuristics after recognition).
  • Their DAWG based language model is pretty buggy at the moment and might not help us much since,
    • In symbol to code mapping, we don’t have a one to one map.
    • The standard word length it assumes and what we have(when counted in unicode level) very different, which makes a dawg based system very inefficient.
    • A simple dictionary based post processor might help us better i think.

I decided on a future work plan(i will soon update the wiki with these details).

Future work

  • Understand the code flow and working of tesseract system(mainly how each functions are called from where for what etc.).
  • Identify the modules which affects us and try to understand how.
  • Keeping the classifier intact, add a better segmentation system(better fix the bugs in current algo if possible).
  • Add a reordering mechanism which is scalable to all languages(i have pretty good idea how to do it, just have to find the right place to insert it to get the right results).
  • Add a simple aspell or similar spell checker based language model which should help in correcting the words better than an expensive dawg system.

Immediate Plans

  • Train with more data(more fonts,more samples,more symbols). I am planning to do this update before 15th of this month if everything goes will according to plan.

By the way sorry for the tech document kind of style! More tech writing is affecting my normal writing too!Plus day night writig code makes it tough to write something which is not in proper syntax!

“Please Read the Offer Document Carefully Before ….”

June 19, 2008 by jinsbond007

I hope all are familiar with the above words. After all those Shining India mutual fund advertisements, a warning like this is mandatory. I felt the same for the offer documents supplied at campus placements after reading a news in economic times about the act of Keane India on last monday. A small account of it can be read over here.

I can imagine a little about what was there within the minds of those people who got sacked. I believe all of them were people who got placements through campuses (since Keane takes 80% their workforce from campus and its not easy to sent one with prior experience). All the offer documents i got (not many anyway only two) had one section called “Job Termination“. There was a notice period specified before terminating the job. It usually varies according to your value and employers value.For me those values were in months.

Friends, who get a job from campus are usually not bothered to read through the whole document before agreeing and sending it to specified address. Or even if they mind to read, thinking about loosing the job nobody is ready to reject any job they have in hand. Or moreover, the monetary benefits page blinds them while reading through other instructions. I rejected a job in industry for one in research in academia. As the academia is not as rich as corporate India and my preference was not monetary, what i rejected was the one which gave me more monetary benefits. That made me answer a lot of questions like, how much you think you will get after you complete the course, Is that bigger than what you will get after that time in industry and a lot similar questions. People who are not ready to ignore the curiosity of people also don’t like to reject some offer despite of the risk associated with it.

But they are not understanding that, the value of people who gets fired by a company for which they spent two years is not impressive unless you have made a value for your own. The companies which like new recruits than experienced professionals, sometimes takes around 2 – 3 weeks sometimes to give the offer document after placing a verbal offer. The figures they boasted at the recruitment drive might never match, or appears like only something which you get after the highest performance incentive and no tax cuts. 

Friends, money matters. But it is not the only thing matters. The first job is your life and better be careful with it. Please read the ful document, even the fine prints. In case of any doubt, call the company or mail them. Usually they provide an explanation. Placement offices in colleges, more than making the numbers please teach your students about the importance of a job and how one should be careful about offer documents. More over, students, even you are doing a job, try to make your own space in world so that even at the worst moment of the company, to fire you should be the last option!!!

“So please read the offer document carefully before investing your life

Love!!! My thoughts!!!

April 20, 2008 by jinsbond007

This is my second article of the series, God,Love,Freedom and Life. All what I write here is just simply my thoughts and my own conclusions so I welcome your opinions on the matter. Thanks for Anwar for inspiring me to write a series like this. It was easy for me to write about God because, its a topic which I studied hard and thought very deep to get some insight. But when it comes to love, I never thought deep into the subject and tried to find whats actually love, or of its several appearances, which one is true.

To start with, whenever we think of love, its an emotion which after all and is a feeling for which people will go to the farthest extents. The parameters of love between one person to another varies. We can never find the same intensity on two relationships. Moreover it is everywhere. When we say we hate someone, it means we love to hate that buddy!

Moreover, how much one love someone else, is the measure of intensity of relation between them. All relationships in the world is ruled by the laws of love. When we think in deep, one can understand, more than thing else, the world is ruled by laws of love.

When I say laws of love, people might get confused, “are there any laws for love? Its a free flowing emotion na?” Ultimate law of love is “There is only one kind of love and its the ultimate,’Love for life’”. When someone is sure, he can’t love his life anymore, it ends all other love. Or we can put it like this, “if there is no feeling of love in any form towards anything in ones life, then his love for life finished.” I believe love is the purpose of life and when love is extinct in ones life, he will be dead. So any living being will be having some kind love to keep him alive.

So there appears the question of the people who are forced to death. Are they dead because they don’t love their life? Always it will not be, there can be another strong reason. It is the disturbances to the energy conditions of one due to love causes that ending. Or simple law of nature, unstable beings can’t exist in nature as it is.

So what is the relation between love and energy? In my view, energy is the factor responsible for the universe, and love is the factor which controls the flow of energy in our lives. So more than an emotion, love is the thing which is responsible for our lives.

Being Lonely(its good if it makes you think!!!)

March 1, 2008 by jinsbond007

For me being lonely is the time to think and let your mind gaze through the rich grasslands life. Even when i am in a crowd, sometimes i will go thinking. I never need to be alone to feel alone. Being alone is simply something which happens to me anywhere unpredictable. When I feel like that, what i will try is to draw conclusions on what i observed.

On such an occasion, i was trying to draw conclusions on the truth of GOD. I just tried to think how GOD can be? As he is one capable of driving the whole universe, he will be either somewhere outside the universe at a centralized place or a distributed source. The existance of such a centralized source is something which is beyond imagination and almost impossible when we think that it is having a role in each and everything happening in the whole universe(just think of a big energy source which provides the force for our movements within the earth and for earth to move around the sun and so on). It led me to the conclusion that the force can be something distributed every where in the whole universe(just understand its a simple conclusion by a crooked mind!!! I am ready to believe otherwise if you convince me).

Then also the problem was actually not solved completely. I just kept thinking what can be that distributed source of power be? It should be something associated with each and everything. When i looked in such a sense, it just clicked me that whenever some chemical reaction happens an amount of energy is released. All activities in the universe when thought of can be reduced to simply things which are using energy or things which displaces energy. Energy is the single driving force in all the actions in the universe. All actions are exchanges or conversions of forms of energy!!!

Then I suddenly felt like, is things this simple? But when i started thinking deeply into the facts it appeared to me more clear that, the concept of a single GOD or a group GODS or anything can be rounded off to the definition of it as the ultimate source of energy in the whole universe. Suddenly it appeared something which explains everything to me(may be because it was so simple).

My thoughts then went into the fact that no religion or beliefs are explaining things like this. Then when i thought a little more, i understood that image the god as something like a watchdog and the ultimate jury was a better solution to many problems in the society and answers to a bunch of questions(another of my assumptions!!!cross me if you want, always welcome!!!).

This just a sample of how my thoughts used to go when I am alone(not physically!!!). Please be patient to read more!!!

My New Blog

March 1, 2008 by jinsbond007

Hi all….

I decided to keep my English and malayalam writings on different weblogs and decided to start a new one. As I am using blogger already, i thought wordpress will be a better choice so that i can also check out the facilities given by wordpress too. I am moving some my old english posts in my blog here… please bear with me!!!