Committee Member |
Dr.
Jugal Kalita |
Dr.
Edward Chow |
Dr.
Sudhanshu Semwal |
Information Processing in Arabic:
Current Status and Challenges
Table of Contents
1.
INTRODUCTION
2.
fUNDAMENTAL ISSUES IN aRABIC
2.1
Script
Issues
2.2
Morphological Issues
2.2.1
Morphological Generators
2.2.2 Morphological Processing and Dialects
2.2.3 Word Agglutination in Arabic
2.3
Syntactical Issues
3.
Arabic
treebanks
3.1 The Penn Arabic Treebank
3.2 The Prague Arabic Dependency Treebank
3.3 The Columbia Arabic Treebank
4.
Conclusion
Attia, M. 2007. Arabic tokenization system. In Proceedings of the
Association of Computational Linguistics (ACL’07).
Attia, M. 2008. Handling Arabic morphological and syntactic
ambiguities within the LFG framework with a view to machine
translation. PhD Dissertation, University of Manchester.
Beesley, K. 1996. Arabic finite-state morphological analysis and
generation. In Proceedings of
the 16th International Conference on Computational Linguistics
(COLING’96). 89–94.
Buckwalter, T. 2004. Issues in Arabic orthography and morphology
analysis. In Proceedings of the Workshop on Computational Approaches
to Arabic Script-based Languages (CAASL’04). 31–34.
Bhattacharya, Samit, et al. "Inflectional morphology synthesis for
bengali noun, pronoun and verb systems." Proc. of NCCPB 8 (2005).
Bloomfield, Leonard. 1933 Language. New York: Holt.
Cavalli-Sforza V., Soudi, A., and Mitamura, T. 2000. Arabic
morphology generation using a concatenative strategy. In Proceedings
of the 6th Applied Natural Language Processing Conference (ANLP’00).
86–93.
Diab, Mona, Mahmoud Ghoneim, and Nizar Habash. "Arabic
diacritization in the context of statistical machine translation."
Proceedings of MT-Summit. 2007.
Diab, Mona, and Nizar Habash. "Arabic dialect processing." MEDAR09.
April (2009).
Darwish, Kareem. "Named Entity Recognition using Cross-lingual
Resources: Arabic as an Example." Proceedings of the 51st Annual
Meeting on Association for Computational Linguistics. Association
for Computational Linguistics, pages 1558–1567, Sofia, Bulgaria,
August 4-9 2013.
Farghaly, A. and Senellart, J. 2003. Intuitive coding of the Arabic
lexicon. In Proceedings of the MT Summit IX, the Association for
Machine Translation in the Americas (AMTA’03).
Farghaly, A. 2007. Information retrieval and the Arabic noun
construct. In Proceedings of the
Workshop on Computational Approaches to Arabic Script-based
Languages (CAASl’07).
Farghaly, Ali, and Khaled Shaalan. "Arabic natural language
processing: Challenges and solutions." ACM Transactions on Asian
Language Information Processing (TALIP) 8.4 (2009): 14.
Farghaly, A. 2010. Introduction in Arabic computational linguistics.
CSLI Publications,
Stanford, CA.
Ferguson, Charles A. Diglossia. Vol. 15. No. 2. New York.-: Word,
1959.
Green, Spence, and Christopher D. Manning. "Better Arabic parsing:
Baselines, evaluations, and analysis." Proceedings of the 23rd
International Conference on Computational Linguistics. Association
for Computational Linguistics, 2010.
Habash, N. 2004. Large-scale lexeme based Arabic morphological
generation. In Proceedings of Traitement Automatique du Langage
Naturel (TALN’04).
Habash, N., & Roth, R. M. (2009, August). Catib: The columbia arabic
treebank. In Proceedings of the ACL-IJCNLP 2009 Conference Short
Papers (pp. 221-224). Association for Computational Linguistics.
Habash, Nizar Y. "Introduction to Arabic natural language
processing." Synthesis Lectures on Human Language Technologies 3.1
(2010): 1-187.
Habash, Nizar, Ramy Eskander, and Abdelati Hawwari. "A morphological
analyzer for egyptian arabic." Proceedings of the Twelfth Meeting of
the Special Interest Group on Computational Morphology and
Phonology. Association for Computational Linguistics, 2012.
Habash, Nizar, et al. "Morphological Analysis and Disambiguation for
Dialectal Arabic." Proceedings of NAACL-HLT. 2013.
Hajic, J., Smrz, O., Zemánek, P., Šnaidauf, J., & Beška, E. (2004,
September). Prague Arabic dependency treebank: Development in data
and tools. In Proc. of the NEMLAR Intern. Conf. on Arabic Language
Resources and Tools (pp. 110-117).
HLAL, Y. 1985. Morphological analysis of Arabic speech. In
Proceedings of the 2nd Conference on Computer Processing of the
Arabic Language (CPAL’85).
Hosny, A., Shaalan, K., and Fahmy, A. 2008. Automatic morphological
rule induction for Arabic. In Proceedings of the Workshop on Human
Language Translation and Natural Language Processing within the
Arabic World (LREC’08). 97–101.
Kilany, H., et al. "Egyptian colloquial Arabic lexicon." LDC catalog
number LDC99L22 (2002).
Larkey, Leah S., and Margaret E. Connell. Arabic information
retrieval at UMass in TREC-10. MASSACHUSETTS UNIV AMHERST CENTER FOR
INTELLIGENT INFORMATION RETRIEVAL, 2006.
Lewis, M. Paul, Gary F. Simons, and Charles D. Fennig (eds.). 2013.
Ethnologue: Languages of the World, Seventeenth edition. Dallas,
Texas: SIL International. Online version:
http://www.ethnologue.com.
Maamouri, M., Bies, A., Buckwalter, T., & Mekki, W. (2004,
September). The penn arabic treebank: Building a large-scale
annotated arabic corpus. In NEMLAR Conference on Arabic Languag
Resources and Tools (pp. 102-109).
Maamouri, M., Bies, A., Kulick, S., Tabessi, D., & Krouna, S.
(2012). Egyptian Arabic Treebank Pilot.
Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993).
Building a large annotated corpus of English: The Penn Treebank.
Computational linguistics, 19(2), 313-330.
Mccarthy, J. 1981. A prosodic theory of nonconcatenative morphology.
Linguistic Inquiry. 12, 373–418.
Nadeau, D., & Sekine, S. (2007). A survey of named entity
recognition and classification. Lingvisticae Investigationes, 30(1),
3-26.
Pew Research Center, Pew Forum on Religion & Public Life, 2012
Shaalan, K., Abdel Monem, A., and Rafea, A. 2006. Arabic
morphological generation from Interlingua: A rule-based approach. In
Intelligent Information Processing III, Z. Shi, K. Shimohara, and D.
Feng, Eds. Springer, 441–451.
Shaalan, k. And raza, H. 2009. NERA: Named entity recognition for
Arabic. J. Amer. Soc.
Inform. Sci. Technol. 60, 7, 1–12.
Sgall, P., Hajicová, E., & Panevová, J. (1986). The meaning of the
sentence in its semantic and pragmatic aspects. Springer.
Smrž, O. (2007, June). Elixirfm: implementation of functional arabic
morphology. In Proceedings of the 2007 Workshop on Computational
Approaches to Semitic Languages: Common Issues and Resources (pp.
1-8). Association for Computational Linguistics.
13, December 2013. 11:30am