Feras Al Tarouti


Oral PhD Qualifying Exam

Committee Members

Dr. Jugal Kalita

Dr. Edward Chow

Dr. Sudhanshu Semwal



Information Processing in Arabic:

 Current Status and Challenges



Table of Contents




2.1  Script Issues

2.2   Morphological Issues

2.2.1 Morphological Generators

2.2.2 Morphological Processing and Dialects

2.2.3 Word Agglutination in Arabic

2.3   Syntactical Issues


3.     Arabic treebanks

3.1   The Penn Arabic Treebank

3.2   The Prague Arabic Dependency Treebank

3.3   The Columbia Arabic Treebank


4.     Conclusion




Attia, M. 2007. Arabic tokenization system. In Proceedings of the Association of Computational Linguistics (ACL’07).


Attia, M. 2008. Handling Arabic morphological and syntactic ambiguities within the LFG framework with a view to machine translation. PhD Dissertation, University of Manchester.


Beesley, K. 1996. Arabic finite-state morphological analysis and generation. In Proceedings of

the 16th International Conference on Computational Linguistics (COLING’96). 89–94.


Buckwalter, T. 2004. Issues in Arabic orthography and morphology analysis. In Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages (CAASL’04). 31–34.


Bhattacharya, Samit, et al. "Inflectional morphology synthesis for bengali noun, pronoun and verb systems." Proc. of NCCPB 8 (2005).


Bloomfield, Leonard. 1933 Language. New York: Holt.


Cavalli-Sforza V., Soudi, A., and Mitamura, T. 2000. Arabic morphology generation using a concatenative strategy. In Proceedings of the 6th Applied Natural Language Processing Conference (ANLP’00). 86–93.


Diab, Mona, Mahmoud Ghoneim, and Nizar Habash. "Arabic diacritization in the context of statistical machine translation." Proceedings of MT-Summit. 2007.


Diab, Mona, and Nizar Habash. "Arabic dialect processing." MEDAR09. April (2009).


Darwish, Kareem. "Named Entity Recognition using Cross-lingual Resources: Arabic as an Example." Proceedings of the 51st Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pages 1558–1567, Sofia, Bulgaria, August 4-9 2013.


Farghaly, A. and Senellart, J. 2003. Intuitive coding of the Arabic lexicon. In Proceedings of the MT Summit IX, the Association for Machine Translation in the Americas (AMTA’03).


Farghaly, A. 2007. Information retrieval and the Arabic noun construct. In Proceedings of the

Workshop on Computational Approaches to Arabic Script-based Languages (CAASl’07).


Farghaly, Ali, and Khaled Shaalan. "Arabic natural language processing: Challenges and solutions." ACM Transactions on Asian Language Information Processing (TALIP) 8.4 (2009): 14.


Farghaly, A. 2010. Introduction in Arabic computational linguistics. CSLI Publications,

Stanford, CA.




Ferguson, Charles A. Diglossia. Vol. 15. No. 2. New York.-: Word, 1959.


Green, Spence, and Christopher D. Manning. "Better Arabic parsing: Baselines, evaluations, and analysis." Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010. 


Habash, N. 2004. Large-scale lexeme based Arabic morphological generation. In Proceedings of Traitement Automatique du Langage Naturel (TALN’04).


Habash, N., & Roth, R. M. (2009, August). Catib: The columbia arabic treebank. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp. 221-224). Association for Computational Linguistics.


Habash, Nizar Y. "Introduction to Arabic natural language processing." Synthesis Lectures on Human Language Technologies 3.1 (2010): 1-187.


Habash, Nizar, Ramy Eskander, and Abdelati Hawwari. "A morphological analyzer for egyptian arabic." Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology. Association for Computational Linguistics, 2012.


Habash, Nizar, et al. "Morphological Analysis and Disambiguation for Dialectal Arabic." Proceedings of NAACL-HLT. 2013.


Hajic, J., Smrz, O., Zemánek, P., Šnaidauf, J., & Beška, E. (2004, September). Prague Arabic dependency treebank: Development in data and tools. In Proc. of the NEMLAR Intern. Conf. on Arabic Language Resources and Tools (pp. 110-117).


HLAL, Y. 1985. Morphological analysis of Arabic speech. In Proceedings of the 2nd Conference on Computer Processing of the Arabic Language (CPAL’85).


Hosny, A., Shaalan, K., and Fahmy, A. 2008. Automatic morphological rule induction for Arabic. In Proceedings of the Workshop on Human Language Translation and Natural Language Processing within the Arabic World (LREC’08). 97–101.


Kilany, H., et al. "Egyptian colloquial Arabic lexicon." LDC catalog number LDC99L22 (2002).


Larkey, Leah S., and Margaret E. Connell. Arabic information retrieval at UMass in TREC-10. MASSACHUSETTS UNIV AMHERST CENTER FOR INTELLIGENT INFORMATION RETRIEVAL, 2006.


Lewis, M. Paul, Gary F. Simons, and Charles D. Fennig (eds.). 2013. Ethnologue: Languages of the World, Seventeenth edition. Dallas, Texas: SIL International. Online version: http://www.ethnologue.com.


Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004, September). The penn arabic treebank: Building a large-scale annotated arabic corpus. In NEMLAR Conference on Arabic Languag Resources and Tools (pp. 102-109).


Maamouri, M., Bies, A., Kulick, S., Tabessi, D., & Krouna, S. (2012). Egyptian Arabic Treebank Pilot.


Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19(2), 313-330.


Mccarthy, J. 1981. A prosodic theory of nonconcatenative morphology. Linguistic Inquiry. 12, 373–418.


Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1), 3-26.


Pew Research Center, Pew Forum on Religion & Public Life, 2012


Shaalan, K., Abdel Monem, A., and Rafea, A. 2006. Arabic morphological generation from Interlingua: A rule-based approach. In Intelligent Information Processing III, Z. Shi, K. Shimohara, and D. Feng, Eds. Springer, 441–451.


Shaalan, k. And raza, H. 2009. NERA: Named entity recognition for Arabic. J. Amer. Soc.

Inform. Sci. Technol. 60, 7, 1–12.


Sgall, P., Hajicová, E., & Panevová, J. (1986). The meaning of the sentence in its semantic and pragmatic aspects. Springer.


Smrž, O. (2007, June). Elixirfm: implementation of functional arabic morphology. In Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources (pp. 1-8). Association for Computational Linguistics.


Date & Time

13, December 2013. 11:30am