Computational linguistic research for Indic languages

There are many languages in the Indian subcontinent that have speakers ranging from tens of millions to hundreds of millions. For most of these languages, not much computational linguistic work has taken place. Without computational tools, even widely spoken languages can become useless and “defunct” in this electronic age of the Web, Internet and smart devices.

Prior Work

The REU student should have some interest in languages of the region, at least in non-English languages. We have done initial work with one such language. Here are links to a couple of our papers: ACM Transactions on Asian Language Information Processing 2008 paper on morphology learning, ACL 2009 paper on POS tagging, ACL 2002 workshop paper on unsupervised morphology learning.

Future work

There are various ways the current work can be extended. Some ideas are given below. Note that we can test our initial ideas with English before we implement them for these languages.