Archive for the ‘SMC’ Category

Asus eeepc 1005HA and Meego 1.0.1

July 16, 2010

I bought my eeepc in last December and since then i am using Ubuntu Netbook Remix(UMR) or Ubuntu Netbook Edition(UNE). It was working good and with some hacks i was able to use it to the best.

I don’t use my netbook for much multimedia purposes(still i need to listen to a set of songs once in a while). I have a desktop from my institute and a another pc with a big screen TV connected to it for my multimedia choices. When i saw Meego coming out from maemo and moblin i was actually excited. There were many reason for excitement, one was of to see how they choose or combine the two legacies(maemo comes from so called debian side and moblin from fedora). Other was of happiness, since an os meant for mobile/smartphone computing will definitely be simple and faster.

Due to all excitement and my interest in OSes for netbooks based on Linux, i tried it out yesterday(yeah, i know its one and half months after 1.0 came out). To say the least, it was a very good system. Yeah there are a lot of drawbacks. But for a 1.0 release it appears pretty good.

The thing i hated most in the in these first 24 hrs is(whether you believe or not), it doesn’t have g++. It is quite essential to compile somethings which doesn’t come with the default system or in repository. Really speaking, i didn’t care that i don’t have multimedia capability, but g++ is essential for me.

As most of the other reviews say, interface is very good and the best for a netbook. I will go to the extent like, this is the best i have seen. Still there are issues with policy kit authetication(for my NTFS partition from windows installed by ASUS and ext4 of 10.04 UNE). Since the system or syslinux based bootloader doesn’t support ext4, i had to recover the grub for ubuntu(which took some of my time).

Lack of office programs wont bother me much, but i didn quite understand the part of gadget(which allows you to add numerous plugin scripts the system). Poorly designed and quite out of focus. I use latex and beamer for my document and presentation purposes, so i was ok when ooffice was not there in Meego. But, when i get a ODF standard document, i don’t really know what to do.

I noted another interesting aspect, the whole Meego claims to have Malayalam support and can input and render malayalam quite well(smc-fonts are missing, i am planning to put a repo in our savannah and later make sure meego uses smc-fonts and it comes default). Chromium has some issue when it comes to rendering Unicode 5.0(pre unicode 5.1 chilus, its not able to convert cons+virama+ zwj-> chillu). The interface renders it very well though. So, i think it has something to do with the chromium rendering modules(webkit i suppose, to be fair, chrome in windows doesn’t show such tendencies).

Anyway, for serious netbook users and for future smartphones, systems based on meego and android are the future. I am all out to make sure, malayalam is working perfect and out of the box for meego(its easier for me and since its built especially for atom, i am one of the few who can test it).

I will setup a repository for smc fonts for meego in savannah. Next step is to make meego developers add smc-fonts in the repo. Another task is about verifying the rendering issue with chromium(which might take time, since i need to find chromium users in GNU.Linux).

For a first release, i should say, meego is very impressive. Yeah i know it doesn’t have multimedia capabilities(can’t even play my MPs files), still it gets a lot of marks for the interface. If they succeed in making the interface flawless, with a good sync with policy-kit to authenticate, i think meego can make a big leap with next release.

Experiments with Tesseract

March 1, 2009

I promised my friends at Swathantra Malayalam Computing long time back about extending Tesseract to support Malayalam. For last some weeks i have been talking with Debayan who does for Bengali and was trying to understand the detail of work required. We decided to work together on enhancing the existing tesseract system for indic languages. I will be following the work up in Indic tesseract space.

I conducted some small intial experiments. It gave me an idea of what i have and what is to be done(i am looking forward to a high performance system with efficiency for practical use).

To test the symbol classifier of tesseract(note just classifier), i trained it with a single page and tested on another of same font.

Training data was of about 1000 symbols.Which is pretty small compared to the usual number of symbols we encounter in malayalam which is pegged around 250-350(there are many variations! Hussain sir can give a better number). To my amazement, tesseract training is easy,simple and takes pretty less time(performance evaluation might be immature since, we haven’t trained it perfectly yet).

My initial observations are,

  • The segmentation part of Tesseract is not great and it might not work well with Indic languages( from what i understand, lot of research work is going on improving the segmentation of tesseract). I found it handicapped in case of upper and lower matras.
  • Since it is not designed for languages with pre base post base modifier forms, it wont do any re arrangement of modifiers(we have to add language heuristics after recognition).
  • Their DAWG based language model is pretty buggy at the moment and might not help us much since,
    • In symbol to code mapping, we don’t have a one to one map.
    • The standard word length it assumes and what we have(when counted in unicode level) very different, which makes a dawg based system very inefficient.
    • A simple dictionary based post processor might help us better i think.

I decided on a future work plan(i will soon update the wiki with these details).

Future work

  • Understand the code flow and working of tesseract system(mainly how each functions are called from where for what etc.).
  • Identify the modules which affects us and try to understand how.
  • Keeping the classifier intact, add a better segmentation system(better fix the bugs in current algo if possible).
  • Add a reordering mechanism which is scalable to all languages(i have pretty good idea how to do it, just have to find the right place to insert it to get the right results).
  • Add a simple aspell or similar spell checker based language model which should help in correcting the words better than an expensive dawg system.

Immediate Plans

  • Train with more data(more fonts,more samples,more symbols). I am planning to do this update before 15th of this month if everything goes will according to plan.

By the way sorry for the tech document kind of style! More tech writing is affecting my normal writing too!Plus day night writig code makes it tough to write something which is not in proper syntax!