Web-based facial DB building to support large-scale experimentation (Dr. Terry Boult)

Biometric researchers working to improve face-based biometrics have collected modest size databases, and larger and larger databases continue to be needed to advance the state of the art in that field. Most of these databases have been collected in controlled settings, which limits their size and diversity. This effort will expand the size and diversity of datasets by utilizing tools developed by undergraduates to collect and analyze data obtained via intelligent web crawlers. This effort will expand on recent work of (Huang et-al. 2007) who “labeled faces in the wild”, a database. This dataset has 13000 images, but many of those are single view images of people, and only a modest number of people (425) with at least views. This dataset was directly useful for the original researchers, who were not really focused on biometrics/identification but on face detection and simple matching. However, the dataset lacked the type of ground-truth and cofactor-data expected/needed by biometrics researchers including eye-location and demographics on the subjects (approximate age, race, gender, if they were wearing glasses, etc.). Using funding from an ongoing NSF biometric STTR, a team of undergraduates at UCCS developed simple programs for labeling the images and then proceeded to produce ground truth, which has been released to the research community. Papers based on the analysis of this data with both standard and new algorithms are expected to be submitted for publication in the fall, including some of the students as co-authors. This effort will expand on this year's successful project by improving the tools to increase both the efficiency of processing and to support a broader range of co-factor estimation (including approximate pose). With improved tools, integrated with some of the above search tools, the students will develop a face-finding web crawler and will attempt to build the largest publicly available face-database with over 100,000 people each with multiple views.