REU Site for Machine Learning Theory

Here is a sample of projects in which our faculty members are interested. Depending on your background and your interests, you will be assigned to a professor and a project area. Then, you will communicate with the professor to select a topic by the first week of the summer internship.

REU Site for Machine Learning Theory

Here is a sample of projects in which our faculty members are interested. Depending on your background and your interests, you will be assigned to a professor and a project area. Then, you will communicate with the professor to select a topic by the first week of the summer internship.

Machine Learning (Boult and Kalita):

Formalizing and analyzing Open Set Recognition (Scheirer et al. 2013), doing recognition in the presence of unknown unknowns.  This is the basis of a new NSF grant (Award #1320956).  As part of that, he is exploring using extreme-value based techniques for estimating probabilities. If the students already have an understanding of a particular classifier (e.g. SVM, random forests, etc.), they can work with Dr. Boult on the theoretical development of an implementation of extreme-value based probability estimation for that model.  Furthermore, we believe that with appropriate guidance, a solid undergraduate can work on the implementation and integration with existing packages with SVMLight,  R and sciKit-learn, being the primary targets.

Learning to learn, i.e., how learning can be flexible and how the performance of learning algorithms can be improved.

Natural Language Processing (Kalita):

Automatically generating questions from text passages for comprehension testing. For example, given a children\UTF{201A}\UTF{00C4}\UTF{00F4}s storybook, we want to generate appropriate questions of various types.

Automatically creating lexical resources such as bilingual dictionaries, thesauruses and Wordnets for low-resource and endangered languages.

Summarizing natural language text, including a collection of microblogs. Given a number of passages or sentences that are similar in content, how can we summarize them to produce compact and grammatical descriptions of the same?

Data Mining, Machine Learning and Signal Analysis (Lewis)

The first objective of the project is building models from signals. While there has been recent success with domain adaptation techniques being applied to natural language processing, success in other areas is limited, due in large part to the limitations in autonomous machine learning to adapt inhomogeneous signal-based models, and in learning to select and fuse such models. To address these issues, we intend to explore multilevel domain adaptation to adapt, select and fuse models. The driving application of the proposed work is machine intelligence for pathologic signals (Lewis 2013b). The second objective of the project is to expand upon preliminary findings utilizing surface or hippocampal EEG monitoring to detect insipient abnormalities. Beginning with Na\UTF{00D8}ve Bayes, Maximum Entropy, and SVM for Fourier based domains (Pang 2004), we focus on the content-free features particularly those free of artifact that include interictal, synaptic, and neurological features (Abbasi 2005), specifically to optimize our identification algorithms so a machine can autonomously identify abnormal neurological events. Our criterion for success will be based upon how efficiently our system balances the speedy detection of seizures with optimization of both the sensitivity and specificity of the algorithm given a fixed amount of computational capability. Our goal now is to achieve classification accuracy comparable to previous subjective-classification research that also used Blitzer\UTF{201A}\UTF{00C4}\UTF{00F4}s multi-domain sentiment data set and adopted unigrams and bigrams as features (Blitzer 2007, Li 2008).

Mobile Software Testing (Walcott-Justice):

The focus is on using Machine learning algorithms in software testing and debugging on mobile devices or embedded systems.  This work involves extracting, exporting, analyzing, and learning from data gathered from mobile devices using data gathered from 1) hardware monitoring and/or 2) software-level application monitoring. The students will research machine learning prioritization schemes to incorporate the resource-constraints of mobile systems. Additional constraints such as power, memory usage, and code structure may also be considered. We want to evaluate the effectiveness of the learning results in terms of fault-finding ability, energy and memory consumption, and/or test quality.

Intelligent Compilers (Yi)

Compiler research has been facing increasingly steep challenges in automatically bridging the widening gap between complex software and diverse hardware platforms. Funded by two NSF grants, Dr. Yi is developing an unconventional compiler optimization model (Yi 2011, Yi 2012) that allows 1) developers to effectively interact with advanced optimizing compilers to provide both domain-specific knowledge and high-level optimization strategies; 2) computational specialists to easily define arbitrary domain-specific transformations to directly control performance optimizations to their code; and 3) architecture-sensitive optimizations to be easily parameterized and empirically tuned to achieve portable high performance. We want to take advantage of machine learning models and algorithms to develop intelligent compilers that can automatically learn 1) how to optimize various algorithms and coding patterns within user applications from examples supplied by HPC experts; and 2) how to specialize advanced compiler optimizations for varying multi-core architectures from past experiments of tuning these optimizations with different configurations.