{"id":16,"date":"2017-02-01T15:51:02","date_gmt":"2017-02-01T20:51:02","guid":{"rendered":"https:\/\/blogs.library.unt.edu\/digital-humanities\/?p=16"},"modified":"2017-04-04T17:56:42","modified_gmt":"2017-04-04T21:56:42","slug":"project-profile-mapping-texts","status":"publish","type":"post","link":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/2017\/02\/01\/project-profile-mapping-texts\/","title":{"rendered":"Project Profile: Mapping Texts"},"content":{"rendered":"<div id=\"attachment_38\" style=\"width: 612px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/mappingtexts.org\/\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-38\" class=\"wp-image-38\" src=\"https:\/\/blogs.library.unt.edu\/digital-humanities\/wp-content\/uploads\/sites\/20\/2017\/01\/mapping-texts.png\" alt=\"Texas map with circles plotting overall quality of scanned historical newspapers.\" width=\"602\" height=\"355\" srcset=\"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-content\/uploads\/sites\/20\/2017\/01\/mapping-texts.png 933w, https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-content\/uploads\/sites\/20\/2017\/01\/mapping-texts-300x177.png 300w, https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-content\/uploads\/sites\/20\/2017\/01\/mapping-texts-768x453.png 768w\" sizes=\"auto, (max-width: 602px) 100vw, 602px\" \/><\/a><p id=\"caption-attachment-38\" class=\"wp-caption-text\">Visualization of the quantity and quality of scanned historical newspapers.<\/p><\/div>\r\n\r\n&nbsp;\r\n\r\n<em>What is it?<\/em>\r\n\r\n<a href=\"http:\/\/mappingtexts.org\/\">Mapping Texts<\/a> began in 2010 as a collaborative project between the University of North Texas and Stanford University. The goal of the project is to develop a series of experimental new models for combining the possibilities of text-mining and geospatial analysis to enable researchers to develop improved quantitative and qualitative methods for finding and analyzing meaningful language patterns embedded within massive collections of historical newspapers.<!--more-->\r\n\r\n<em>What you&#8217;d need to know<\/em>\r\n\r\nSeveral tools were used to build this project, including:\r\n<ul>\r\n \t<li><a href=\"http:\/\/aspell.net\/\">GNU Aspell<\/a> is an Open Source spell checker that was used to correct recurring errors introduced by the OCR process.<\/li>\r\n \t<li><a href=\"http:\/\/mallet.cs.umass.edu\/\">MALLET<\/a> was used for topic modeling, which uses statistical methods to uncover connections between collections of words (\u201ctopics\u201d) that appear in a given text.<\/li>\r\n \t<li><a href=\"http:\/\/www-nlp.stanford.edu\/software\/CRF-NER.shtml\">Stanford NER<\/a> is a program that attempts to identify and classify various elements in a text (i.e., nouns such as people or location).<\/li>\r\n \t<li><a href=\"http:\/\/github.com\/\">GitHub<\/a> \u2013 The source code was uploaded here for downloading and re-use.<\/li>\r\n<\/ul>\r\n<em>Get Started<\/em>\r\n<ul>\r\n \t<li><a href=\"http:\/\/aspell.net\/man-html\/index.html\">http:\/\/aspell.net\/man-html\/index.html<\/a><\/li>\r\n \t<li><a href=\"http:\/\/mallet.cs.umass.edu\/mallet-tutorial.pdf\">http:\/\/mallet.cs.umass.edu\/mallet-tutorial.pdf\u00a0<\/a><\/li>\r\n \t<li><a href=\"http:\/\/github.com\/mcgeoff\/Document-OCR-Quality-Visualization\">http:\/\/github.com\/mcgeoff\/Document-OCR-Quality-Visualization<\/a><\/li>\r\n<\/ul>\r\nResources:\r\n<ul>\r\n \t<li><a href=\"http:\/\/mappingtexts.org\/whitepaper\/MappingTexts_WhitePaper.pdf\">http:\/\/mappingtexts.org\/whitepaper\/MappingTexts_WhitePaper.pdf<\/a><\/li>\r\n \t<li><a href=\"http:\/\/aspell.net\/\">http:\/\/aspell.net\/<\/a><\/li>\r\n \t<li><a href=\"http:\/\/mallet.cs.umass.edu\/\">http:\/\/mallet.cs.umass.edu\/<\/a><\/li>\r\n \t<li><a href=\"http:\/\/www-nlp.stanford.edu\/software\/CRF-NER.shtml\">http:\/\/www-nlp.stanford.edu\/software\/CRF-NER.shtml<\/a><\/li>\r\n \t<li><a href=\"http:\/\/github.com\/\">http:\/\/github.com\/<\/a><\/li>\r\n<\/ul>","protected":false},"excerpt":{"rendered":"&nbsp; What is it? Mapping Texts began in 2010 as a collaborative project between the University of North Texas and Stanford University. The goal of the project is to develop a series of experimental new models for combining the possibilities of text-mining and geospatial analysis to enable researchers to develop improved quantitative and qualitative methods&#8230;  <a href=\"https:\/\/blogs.library.unt.edu\/digital-scholarship\/2017\/02\/01\/project-profile-mapping-texts\/\" class=\"more-link\" title=\"Read Project Profile: Mapping Texts\">Read more &raquo;<\/a>","protected":false},"author":69,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":false,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2}},"categories":[3],"tags":[6,13,8,16,14,7,15],"class_list":["post-16","post","type-post","status-publish","format-standard","hentry","category-project-profiles","tag-gis","tag-mallet","tag-projects","tag-stanford","tag-text-mining","tag-unt","tag-unt-how-did-they-make-that"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p8keRV-g","_links":{"self":[{"href":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-json\/wp\/v2\/posts\/16","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-json\/wp\/v2\/users\/69"}],"replies":[{"embeddable":true,"href":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-json\/wp\/v2\/comments?post=16"}],"version-history":[{"count":5,"href":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-json\/wp\/v2\/posts\/16\/revisions"}],"predecessor-version":[{"id":219,"href":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-json\/wp\/v2\/posts\/16\/revisions\/219"}],"wp:attachment":[{"href":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-json\/wp\/v2\/media?parent=16"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-json\/wp\/v2\/categories?post=16"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogs.library.unt.edu\/digital-scholarship\/wp-json\/wp\/v2\/tags?post=16"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}