{"id":259,"date":"2018-07-16T22:05:05","date_gmt":"2018-07-16T22:05:05","guid":{"rendered":"http:\/\/sarahjpurcell.sites.grinnell.edu\/sandbox_clone\/?page_id=259"},"modified":"2019-11-04T15:43:24","modified_gmt":"2019-11-04T15:43:24","slug":"natural-language-processing","status":"publish","type":"page","link":"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/tutorials\/textual-analysis\/natural-language-processing\/","title":{"rendered":"Natural Language Processing (Stanford&#8217;s Named Entity Recognizer)"},"content":{"rendered":"<p>This tutorial was written by <a href=\"https:\/\/www.grinnell.edu\/users\/waldenka\">Katherine Walden<\/a>, Digital Liberal Arts Specialist at Grinnell College. Tutorial instructions were co-authored by Sarah Purcell (L.F. Parker Professor of History, Grinnell College) and Papa Ampim-Darko, a student research assistant at Grinnell College.<\/p>\n<p>This tutorial was reviewed by <a href=\"https:\/\/www.grinnell.edu\/users\/donovang\">Gina Donovan<\/a> (Instructional Technologist, Grinnell College).<\/p>\n<p>This tutorial is adapted from Michelle Moravec\u2019s <a href=\"http:\/\/historyinthecity.blogspot.com\/2014\/06\/how-to-use-stanfords-ner-and-extract.html\">History in the City Stanford NER tutorial<\/a> and Rachel Burma\u2019s <a href=\"https:\/\/github.com\/rbuurma\/rise-2015\">The Rise of the Novel Robinson Crusoe NER assignment<\/a>.<\/p>\n<p><a href=\"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/\" rel=\"license\"><img decoding=\"async\" style=\"border-width: 0;\" src=\"https:\/\/i.creativecommons.org\/l\/by-nc\/4.0\/88x31.png\" alt=\"Creative Commons License\" \/><\/a><br \/>\nNatural Language Processing (Stanford\u2019s Named Entity Recognizer) is licensed under a <a href=\"http:\/\/creativecommons.org\/licenses\/by-nc\/4.0\/\" rel=\"license\">Creative Commons Attribution-NonCommercial 4.0 International License<\/a>.<\/p>\n<hr \/>\n<p>Developed in 2006 by a team based out of Stanford University\u2019s National Language Processing Group, the Stanford <a href=\"https:\/\/nlp.stanford.edu\/software\/CRF-NER.shtml#History\">Named Entity Recognizer<\/a> (NER) is a Java-based tool for recognizing and extracting named entities in an unstructured textual dataset. In this tutorial, we\u2019ll be using Stanford\u2019s NER to identify named entities in <em>The Interesting Narrative of the Life of Olaudah Equiano, Or Gustavus Vassa, The African<\/em>, an autobiographical slave narrative published in 1789.<\/p>\n<hr \/>\n<h5>Data<\/h5>\n<p>1- Navigate to\u00a0<a href=\"http:\/\/sarahjpurcell.sites.grinnell.edu\/digital_methods\/files\/Equiano_Text.txt\">http:\/\/sarahjpurcell.sites.grinnell.edu\/digital_methods\/files\/Equiano_Text.txt<\/a> in a web browser and save the \u201cEquiano_Text\u201d text file to your Desktop.. Open <strong>Equiano.txt<\/strong> to see <a href=\"http:\/\/www.gutenberg.org\/ebooks\/15399?msg=welcome_stranger\">the plain text utf-8 file<\/a> downloaded from Project Gutenberg.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture-1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture-1.png\" alt=\"\" width=\"780\" height=\"959\" \/><\/a><\/p>\n<p>2-Copy the file to your <strong>Desktop<\/strong>, and <strong>right click on the file<\/strong> to open it in <strong>Notepad<\/strong>, the native Windows text editing program.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_12.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_12-1024x535.png\" alt=\"\" width=\"840\" height=\"439\" \/><\/a><\/p>\n<p>3-Because the text file was downloaded from Project Gutenberg, it contains information at the beginning and end of the file that is not part of the original <em>Equiano<\/em> text. Delete these lines, re-save the file to your Desktop, and close Notepad.<\/p>\n<hr \/>\n<h5>Opening Stanford\u2019s NER<\/h5>\n<p>4-The NER has already been downloaded and installed on the Library computers. To install on your own computer, visit the program\u2019s <a href=\"https:\/\/nlp.stanford.edu\/software\/CRF-NER.shtml#Download\">Download page<\/a>.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_3.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_3.png\" alt=\"\" width=\"927\" height=\"682\" \/><\/a><\/p>\n<p>5-To open the program, navigate to C:\\Program Files (x86)\\Stanford NER and double click on the Windows Batch File named <strong>ner-gui<\/strong>.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_4.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_4.png\" alt=\"\" width=\"657\" height=\"653\" \/><\/a><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_5.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_5.png\" alt=\"\" width=\"676\" height=\"339\" \/><\/a><\/p>\n<p>6-The program will launch two windows\u2014a Command Prompt shell that will run the program via the CLI, and a GUI interface. Both windows need to remain open for the program to run.<\/p>\n<hr \/>\n<h5>Loading data into the NER<\/h5>\n<p>7-<strong>Erase the sample text<\/strong> in the GUI interface window.<\/p>\n<p>8-Click <strong>File-&gt;Open File<\/strong>, and select the <strong>Equiano.txt<\/strong> file saved to your Desktop.<br \/>\n<a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_6.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_6.png\" alt=\"\" width=\"655\" height=\"656\" \/><\/a><\/p>\n<p>9-Text will appear in the GUI window once the file has loaded.<\/p>\n<hr \/>\n<h5>Identifying Classifiers in the NER<\/h5>\n<p>10-Next, we need to load a list of sample terms for the NER to use when analyzing our text.<br \/>\n<a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Pic_2.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Pic_2.png\" alt=\"\" width=\"225\" height=\"127\" \/><\/a><\/p>\n<p>11-Click <strong>Classifier-&gt;Load CRF from File<\/strong>. Navigate to\u00a0C:\\Program Files (x86)\\Stanford NER\\classifiers and open the classifiers folder.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_7.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_7.png\" alt=\"\" width=\"524\" height=\"363\" \/><\/a><\/p>\n<p>12-Stanford\u2019s NER includes classifier resource files with 3, 4, and 7 entity categories it can search for in your text.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_8.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_8.png\" alt=\"\" width=\"659\" height=\"660\" \/><\/a><\/p>\n<p>13-Select the <strong>english.all.3class.distim.crf.ser.gz<\/strong> file and click the <strong>Open<\/strong> icon. Three entity categories (organization, location, person), with color labels, will now display on the right-hand side of the GUI window.<\/p>\n<p>14-Click on the <strong>Run NER icon<\/strong> on the bottom of the GUI window to run the program.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_9.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_9.png\" alt=\"\" width=\"675\" height=\"337\" \/><\/a><\/p>\n<p>15-You can see the results being generated from the NER program in the CLI window.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_10.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_10.png\" alt=\"\" width=\"659\" height=\"651\" \/><\/a><\/p>\n<p>16-The GUI window now shows the color-coded results of the NER program\u2019s analysis.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Pic_1.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Pic_1.png\" alt=\"\" width=\"80\" height=\"73\" \/><\/a><\/p>\n<p>17-The GUI interface gives you options to export your tagged file. Click <strong>File-&gt;Save Tagged File As<\/strong>. Navigate to your Desktop and save the file as <strong>Equiano_Tagged.txt.<\/strong><\/p>\n<p>18-You could also copy the text generated in the CLI and paste into a text file to see a list with just the tagged entities.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_11.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/07\/Capture_11-1024x535.png\" alt=\"\" width=\"840\" height=\"439\" \/><\/a><\/p>\n<p>19-Right click on the icon for the <strong>Equiano_Tagged.txt<\/strong> file and open in <strong>Notepad<\/strong>.<\/p>\n<p>20-You\u2019ll see that the NER has added &lt;LOCATION&gt;, &lt;PERSON&gt;, or &lt;ORGANIZATION&gt; tags around the entities it recognized based on the terms in the classification file. We could use a combination of XML and XPath to isolate terms with particular tag categories.<\/p>\n<hr \/>\n<h5>Reflection questions<\/h5>\n<ul>\n<li><em>Scroll down and see how accurate the NER was in identifying and tagging elements in the text. What errors or problems do you notice? How would those errors impact analysis of this text?<\/em><\/li>\n<li><em>How did the process of using the NER impact or shape your understanding of the original text?<\/em><\/li>\n<li><em>What kinds of historical research questions could a tool like Stanford\u2019s NER help address?<\/em><\/li>\n<li><em>What types of questions would not be a good fit for Stanford\u2019s NER?<\/em><\/li>\n<li><em>What problems or limitations can you see for using Stanford\u2019s NER as a tool for historical analysis?<\/em><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<hr \/>\n<h5><strong>Validating Tagged Entities as an XML Document<\/strong><\/h5>\n<p>21-The output of the NER program is a text file with tagged elements. To prepare for Friday\u2019s experimental lab, we\u2019ll be importing that data into an Excel document so we have a list of the tagged entities.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_14.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-716\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_14-1024x540.png\" alt=\"\" width=\"676\" height=\"356\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_14-1024x540.png 1024w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_14-300x158.png 300w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_14-768x405.png 768w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_14-676x356.png 676w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_14.png 1426w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><\/a><\/p>\n<p>22-Open the <strong>Equiano_Tagged.txt<\/strong> file in Notepad. Click <strong>File\u2014Save As<\/strong> to save a second copy of the NER output.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_13.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-715\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_13.png\" alt=\"\" width=\"954\" height=\"537\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_13.png 954w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_13-300x169.png 300w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_13-768x432.png 768w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_13-676x381.png 676w\" sizes=\"auto, (max-width: 954px) 100vw, 954px\" \/><\/a><\/p>\n<p>23-Notepad\u2019s default settings will save the file as a plain-text (.txt) file. We want to save the output as an extensible markup language (.xml) file to be able to extract the tagged entities.<\/p>\n<p>24-In the File name box, add <strong>.xml<\/strong> to the end of the file name. Save the XML file to your Desktop.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_18.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-720\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_18-1024x540.png\" alt=\"\" width=\"676\" height=\"356\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_18-1024x540.png 1024w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_18-300x158.png 300w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_18-768x405.png 768w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_18-676x357.png 676w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_18.png 1918w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><\/a><\/p>\n<p>25-Navigate to <a href=\"https:\/\/codebeautify.org\/xmlvalidator#\">https:\/\/codebeautify.org\/xmlvalidator#<\/a> in a web browser to validate your XML<\/p>\n<p>&nbsp;<\/p>\n<blockquote><p><em>Free online validators can be a useful tool to identify errors or problems in a variety of file types (XML, HTML, JSON, etc.).<\/em><\/p><\/blockquote>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_19.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-721\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_19-1024x514.png\" alt=\"\" width=\"676\" height=\"339\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_19-1024x514.png 1024w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_19-300x150.png 300w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_19-768x385.png 768w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_19-676x339.png 676w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_19.png 1920w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><\/a><\/p>\n<p>26-Copy the text in the <strong>Equiano_Tagged<\/strong> file, and paste it into the XML validator. Click <strong>Validate<\/strong> to check your XML.<\/p>\n<p>27-The XML validator identifies an error in Line 1 of our Equiano data.<\/p>\n<p>28-The <a href=\"https:\/\/www.w3schools.com\/xml\/xml_syntax.asp\">World Wide Web Consortium (W3C) guidelines<\/a> say that valid XML has to declare a root element at the start the XML document.<\/p>\n<p>29-Add <strong>&lt;root&gt;<\/strong> to the first line of your XML, and <strong>&lt;\/root&gt;<\/strong> at the end of your XML.<\/p>\n<p>30-Re-paste your text into the XML validator, and run the Validator again.<\/p>\n<p>31-We have another error in Line 518 of our XML.<\/p>\n<blockquote><p><em>The ampersand symbol has a special function\/meaning in XML, meaning the \u201c&amp;c.\u201d combination of characters will cause errors in validating or loading the XML.<\/em><\/p><\/blockquote>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_20.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-722\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_20-1024x540.png\" alt=\"\" width=\"676\" height=\"356\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_20-1024x540.png 1024w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_20-300x158.png 300w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_20-768x405.png 768w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_20-676x356.png 676w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_20.png 1426w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_21.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-723\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_21.png\" alt=\"\" width=\"347\" height=\"188\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_21.png 347w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_21-300x163.png 300w\" sizes=\"auto, (max-width: 347px) 100vw, 347px\" \/><\/a><\/p>\n<p>32-Go to <strong>Edit\u2014Replace<\/strong> in Notepad. Instruct the program to find all instances of \u201c&amp;c.\u201d and replace them with \u201cetc.\u201d. Click <strong>Replace All<\/strong>.<\/p>\n<p>33-Save the XML file, re-paste into the Validator, and validate once again.<\/p>\n<p>34-We now have valid XML to load into Microsoft Excel.<\/p>\n<hr \/>\n<h5><strong>Loading XML Data into Excel<\/strong><br \/>\n<a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_23.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-725\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_23-1024x567.png\" alt=\"\" width=\"676\" height=\"374\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_23-1024x567.png 1024w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_23-300x166.png 300w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_23-768x425.png 768w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_23-676x374.png 676w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_23.png 1809w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><\/a><\/h5>\n<p>35-Open Microsoft Excel. Click <strong>File\u2014Options<\/strong> to open the Excel Options window.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_22.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-724\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_22.png\" alt=\"\" width=\"832\" height=\"683\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_22.png 832w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_22-300x246.png 300w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_22-768x630.png 768w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_22-676x555.png 676w\" sizes=\"auto, (max-width: 832px) 100vw, 832px\" \/><\/a><\/p>\n<p>36-Under <strong>Customize Ribbon<\/strong>, check the box next to <strong>Developer<\/strong> to enable the Developer tools. Click OK.<\/p>\n<p>37-A <strong>Developer tab<\/strong> should now be visible in the top-level menu.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_24.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-726\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_24.png\" alt=\"\" width=\"991\" height=\"222\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_24.png 991w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_24-300x67.png 300w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_24-768x172.png 768w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_24-676x151.png 676w\" sizes=\"auto, (max-width: 991px) 100vw, 991px\" \/><\/a><\/p>\n<p>38-Click on the <strong>Developer<\/strong> tab, and select <strong>Import<\/strong>.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_25.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-727\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_25.png\" alt=\"\" width=\"950\" height=\"538\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_25.png 950w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_25-300x170.png 300w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_25-768x435.png 768w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_25-676x383.png 676w\" sizes=\"auto, (max-width: 950px) 100vw, 950px\" \/><\/a><\/p>\n<p>39-Select the <strong>Equiano_Tagged<\/strong> XML file and click Import.<\/p>\n<p>40-We still have a variety of parsing errors in our XML file. Click on <strong>Details<\/strong> to view the specific line for each error.<\/p>\n<p>41-You can go back into the Validator to view that line, or open the file in Notepad++ to identify and correct these errors.<\/p>\n<p>42-Save the file after correcting each error, and retry the import into Excel.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_26.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-728\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_26.png\" alt=\"\" width=\"426\" height=\"131\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_26.png 426w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_26-300x92.png 300w\" sizes=\"auto, (max-width: 426px) 100vw, 426px\" \/><\/a><\/p>\n<p>43-Once all the errors have been resolved, click <strong>OK<\/strong> on the window the pops up.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_27.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-729\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_27.png\" alt=\"\" width=\"265\" height=\"159\" \/><\/a><\/p>\n<p>44-You can add the data to an existing worksheet, or import it to a new worksheet. Click <strong>OK<\/strong>.<\/p>\n<p><a href=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_28.png\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-large wp-image-730\" src=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_28-1024x569.png\" alt=\"\" width=\"676\" height=\"376\" srcset=\"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_28-1024x569.png 1024w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_28-300x167.png 300w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_28-768x426.png 768w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_28-676x375.png 676w, https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-content\/uploads\/2018\/11\/Capture_28.png 1810w\" sizes=\"auto, (max-width: 676px) 100vw, 676px\" \/><\/a><\/p>\n<p>45-You can now see the entities tagged in the NER as distinct columns of Location, Person, and Organization.<\/p>\n<p>46-From here, we could isolate Locations to map in a GIS system. Or, we could look at Persons and Organizations to build a network graph.<\/p>\n<hr \/>\n<h5><em>Reflection Questions:<\/em><\/h5>\n<ul>\n<li><em>What changes did you make to the XML file to allow Excel to import it? <\/em><\/li>\n<li><em>How could these changes impact the meaning(s) of the original text? <\/em><\/li>\n<li><em>Why do you think these changes are necessary? <\/em><\/li>\n<li><em>What is gained or lost by going through this process manually (by hand) versus automating these changes?<\/em><\/li>\n<li><em>Now that you have extracted a list of tagged entities, what could you do next with this data?<\/em><\/li>\n<li><em>What questions do you have about this data? How could you use textual analysis methods\/tools to address or answer those questions?<\/em><\/li>\n<\/ul>\n<p><strong>Answer these question in your Reflection Journal BY Wednesday, November 6 at 5:00 pm<\/strong><\/p>\n<ul>\n<li>Did you gain any new insights into Olaudah Equiano&#8217;s narrative by using topic modeling or named entity recognition?<\/li>\n<li>If you were able to pursue these lines of inquiry further, what new historical questions might you be able to ask about Equiano?<\/li>\n<li>What are the limitations of these tools?<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>This tutorial was written by Katherine Walden, Digital Liberal Arts Specialist at Grinnell College. Tutorial instructions were co-authored by Sarah Purcell (L.F. Parker Professor of History, Grinnell College) and Papa Ampim-Darko, a student research assistant at Grinnell College. This tutorial was reviewed by Gina Donovan (Instructional Technologist, Grinnell College). This tutorial is adapted from Michelle [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":658,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-259","page","type-page","status-publish","hentry","post-preview"],"_links":{"self":[{"href":"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-json\/wp\/v2\/pages\/259","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-json\/wp\/v2\/comments?post=259"}],"version-history":[{"count":15,"href":"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-json\/wp\/v2\/pages\/259\/revisions"}],"predecessor-version":[{"id":794,"href":"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-json\/wp\/v2\/pages\/259\/revisions\/794"}],"up":[{"embeddable":true,"href":"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-json\/wp\/v2\/pages\/658"}],"wp:attachment":[{"href":"https:\/\/his100.sarahjpurcell.sites.grinnell.edu\/spring-2021\/wp-json\/wp\/v2\/media?parent=259"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}