Visualization of the quantity and quality of scanned historical newspapers.
What is it?
Mapping Texts began in 2010 as a collaborative project between the University of North Texas and Stanford University. The goal of the project is to develop a series of experimental new models for combining the possibilities of text-mining and geospatial analysis to enable researchers to develop improved quantitative and qualitative methods for finding and analyzing meaningful language patterns embedded within massive collections of historical newspapers.
What you’d need to know
Several tools were used to build this project, including:
- GNU Aspell is an Open Source spell checker that was used to correct recurring errors introduced by the OCR process.
- MALLET was used for topic modeling, which uses statistical methods to uncover connections between collections of words (“topics”) that appear in a given text.
- Stanford NER is a program that attempts to identify and classify various elements in a text (i.e., nouns such as people or location).
- GitHub – The source code was uploaded here for downloading and re-use.
Get Started
Resources:
Leave a Reply