Wikipedia Mining: Obtaining Geographical, Temporal and Ontological Information

The Wikipedia represents an amazing amount of human knowledge and judgement. However, the content remains largely unstructured. While the content is marked up for display, there is very little structure around the content to allow direct machine understanding. Therefore, more complex operations are required to extract information from the text for meaningful machine processing. This project seeks to extract geospatial and temporal information from Wikipedia articles.

Prior work

We started with a simple approach where POS tagging and regular expressions were used. This work was done by Suzette Stoutenburg as an independent study project, but was later published at a conference in the Czech Republic. Jeremy Witmer, who received his MS in May 2009, developed a system that is able to extract all spatial named entities (i.e., names of places) from Wikipedia articles about wars and battles (mostly concerning the Civil War) and then geocodes to specific locations on the globe. We published two papers based on his work in 2009: AAAI Spring Symposium at Stanford, and IEEE Semantic Computing Conference.

Future Work

Our future work will involve extending the work in novel ways. Here are some initial ideas for extension.