Keywords and Digital Projects

Tony Guidone’s presentation was engaging and I really enjoyed how he brought in coins and newspaper articles from the past and let us examine them. The information he provided as well as the reading we did the night before really put keyword searching into a different light for me. I find it to be a very useful tool for when I’m writing papers and just want to find something immediately, but the Illusionary Order article talked about how when researchers do this, they tend to stop looking for many different sources and therefore lose the element of different opinions and writings. For example, in 1998 the Toronto Star was cited 74 times out of 67 dissertations, but after the Toronto Star was digitized in 2010, a year with 69 dissertations, it was cited 753 times. The paper argues that this is a result of historical researchers not branching out into other sources and only getting their information from one area. This reduces their chances of getting different information and opinions on the time period because it limits them to only one area of Canada and to only one group of writers and publishers in that one newspaper. This could end up negatively impacting their research and doesn’t give enough room for nuances. 1

Guidone also pointed out in his presentation that by only looking at specific pieces of information, we lose a lot of context that might help us in learning even more about the time period and boost our argument in papers. Like how when we only look at one very small article in a paper, we lose what other news that had happened that day which can lead to a loss of understanding what life was like during that time. He also pointed out how men were the only newspaper writers for a long time so we lose how diversity comes into play during that time if we only read the words of men, and one could argue that most of those men were probably white too so we lose even more varying opinions.

Digital media keyword searches do make it easier to find very specific topics however and can be very helpful in finding that information quickly and without wasting time. I’ve used them a lot in papers before, especially in high school, but now that I’ve learned about how limiting it can be I’m definitely going to try and branch out to use more sources in the future. 

I think it would take a lot for a digital project to cause social change because it would have to be entertaining and interesting enough for a lot of people to see it and want to share it. Many of the digital projects we’ve looked at have been for very specific areas, such as the Enslaved Children of Mason project, and wouldn’t apply to a lot of people. Theoretically, another university could see this project and replicated it with their own founder and then another university does it and then it continues on to most universities, but I sincerely doubt that will happen. This kind of project wouldn’t be able to apply to colleges who’s founders didn’t have slaves, or those that don’t have the money, time, or area to put up this kind of project. It also depends on how widespread the hypothetical social change would be. Is this all of American society? Just this university? Just this class? Because I think a digital project could definitely change a classroom or even a universities society, but I think it’d take a lot for one to affect American society and cause such widespread change. 2

Response: Digitization and the changes it brings

I found Guidone’s presentation to be very interesting. I appreciated all the examples of primary sources he brought into the class. Most of them were artifacts representative of the time period we are focusing on like newspaper clippings and ads and coins from Germany varying in feel. He discussed with us the digitization of those resources and the troubles historians often have as a result.~

In Ian Milligan’s “Illusionary Order” 1, he discusses the value of digitizing national newspapers in Canada and how they have become the most go to source. Basically, not a lot of information is made available in a digitized format beyond the Toronto Star and the Globe and Mail digital databases which can hold a specific bias and not be good for getting the whole picture of the past. Another problem pointed out in this article that Guidone also touched upon is “what changes when you digitize a source?” Guidone mentions that you can lose context as many databases will use specific newspaper sections rather than the whole deal. He showcased the Readex database 2 which is full of newspaper clippings. Various pieces of the paper reveal their own parts of the story and not so much a complete picture. If these papers get smudged, then the transcribing technology 3 will not be able to read the document completely and we end up with gaps, leading to lost information. They could try to use the clippings together, but then it becomes an extremely long process, tying into Milligan’s point that databases can be a mixed bag of sources or data. Most of the downsides of digitization are the result of a learning curve that needs to be taken by historians in order to better improve their work and resources. Now, let’s discuss some of the benefits.

Easily, the best benefit that comes to mind is easy access to sources, and, with keyword searching in mind, an increase to research speed. The word Guidone used was “democritized.” This word means that the user has easy access to the sources they wish to find and use. Honestly a good word to use as it gives a since of “information by the people, for the people” and implies collaboration between users which in turn allows for more complete information gathering and collecting. Instead of scrounging through books, each of us can easily divide a subject into parts and look for sources based on our individual topics. If we need to, keyword searches will bring a lot of the information we need to the forefront, and leave out the stuff we do not necessarily need, making the searching process more efficient.

Digitization can also play a good role in social change, or rather, social awareness as well. At Mason, we have two projects: The Enslaved Children of George Mason 4 and Mason’s Legacies 5. These projects seek to illuminate the story of George Mason and look into his life, which there is not a lot too unless you do some good digging! Essentially, Mason owned slave children and may have been in dealings with his slave-trading brother. Bringing attention to this topic helps people to re-evaluate what they originally thought as well as consider something new. These new ideas would help feed social awareness overtime, but I feel social change might be a bit much. These archives can gain traction, but I believe it would take some time for people to wrap their minds around the situation before change can take root.

So, when looking at the big picture, digitization has really shaken up how we look for things. Sources are much easier to find and read through. A quote from Guidone that I thought was interesting was (paraphrased), “Keywords should not be your first choice, learn the primary source and read it first.” The quote here goes back to my point on the fact that keyword searches can be good, but you lose the full force of the information the source has to offer. You can get the info you wanted, but it comes at the cost of not seeing what the full document or source. There very much is a good and bad side to digitization of sources; and as Milligan suggests, we as digital historians need to take it upon ourselves to become well acquainted with the technology to preserve the original context and be able to glean fuller information from our sources.

Works Cited

  • Milligan, Ian. “Illusionary Order: Online Databases, Optical Character Recognition, and Canadian History, 1997–2010.” The Canadian Historical Review, University of Toronto Press, 27 Nov. 2013, muse.jhu.edu/article/527016.
  • Marr, Bernard. “What Is Data Democratization? A Super Simple Explanation And The Key Pros And Cons.” Forbes, Forbes Magazine, 12 Dec. 2018, www.forbes.com/sites/bernardmarr/2017/07/24/what-is-data-democratization-a-super-simple-explanation-and-the-key-pros-and-cons/#7a6bd8126013.

Footnotes

Digitalization: OCR & Text Searching

I remember when the internet became accessible to households. It changed people’s lives. Throughout the years, search engines became the main tool to access web pages. You would enter keywords, and it would filter the search for you within thousands of references and URLs and give you the best matching results possible.

Could we use this concept to apply to Digital History? If you choose any book or newspaper collection published before 1920 for example, it is most likely digitalized; you will be able to read millions of pages in databases1. In other words, the only way not to get lost is to use text searching process to filter and efficiently focus your research on the period of history you are working on.

Instead of searching in paper archives for hours to find one memo or one article, we can now obtain the same result thanks to text searching online, and more precisely thanks to Optical Character Recognition (OCR) technology. According to Milligan (2013), it is “a process that takes an image, recognizes shapes that are in the forms of letters, and writes the output in plain text 2.” In other words, if I search for the word “China”, it will be highlighted throughout the digitalized newspaper or book wherever this word is mentioned.

At the time you would read the newspaper, on-hand, you would first visualize it in its entirety. You could understand it in a more global way which gives a quick look into each of the different titles. It would allow the reader to know what to expect from the newspaper.

Text searching changed the way we engage with information. OCR brings focus on small fragments of texts within newspapers or articles. When reading digitalized sources, like the newspaper, one would not read it with a global vision of it, but only focused on one article at a time. On one hand, it is great to target words and focus the research, on the other hand, there is a risk to miss important clues. Those clues might only be seen by understanding the source in its entirety. Articles can be correlated. Information may need to be crossed in order to understand the “atmosphere” at the time of the artifact.

The use of digital media democratized the access to knowledge and to our history. The speed of research highly increased and helped to raise new research questions.

There are also downsides when digitalizing a source. Indeed, the touch and feel are lost. According to Tony Guidone (2019), the Nazi coins produced in 1940 are heavier than the ones from 1943 because the German army needed to use as much metal as possible for weapons. In other words, they would use less metal for the coins in 1943. This fact cannot be deduced from a digital source but only by touching the coins. The original context of a source can be lost as well when digitalized. The original context could bring more meaning to the artifact.

Digital projects may create significant social change in the future. People can quickly access knowledge freely on the web and they can share it. Digital projects are great vectors of social interactions 3, such as sharing information or commemorating an event after main disasters like “hurricane Sandy” or “9/11”4.  

Digitalizing sources and using efficient tools like OCR for text searching present many strengths to participate in Digital History. Now we can wonder, do you think digitalized sources will completely replace the non-digitalized ones? Personally, I think there are complementary.

Digital vs. Physical

Tony Guidone’s 1 presentation was really informative.  It was great to see the artifacts that he brought in, especially the newspaper from Salem. One of the things that he talked about was viewing primary sources with more context. Specifically what stood out to me was when he showed the individual ads in the back of old newspapers and then mentioned how they can be viewed as part of the entire paper. When you look at the whole source, like a newspaper, you can see what else was going on in the area at that time. For example, The Virginia Gazette has historic newspapers all digitized. If you just look at the ads you see what people are selling, or looking to buy, maybe a horse has been stolen, etc. But when you also read the stories covered in the other pages there may be information about how many Virginians are selling out and heading back to England. It gives context that the advertisements alone don’t have. 

One of the other things that Tony mentioned that I thought was great was keyword searches. Just using keyword searches may bring back good information, but it also could bring results that have nothing to do with your research. He mentioned that words that we commonly use to describe things today are not what they may have been called historically. I know I have encountered this in my current internship. In the late 19th century, Fairfax County Circuit Court called all of the things the County Clerk dealt with “judgements” because it all was approved by a judge. However if you go to the records room today they are filed under “term papers”. The archivists there know the difference in language and can point people to the right places. 

Which brings me to my next topic on Tony’s presentation. He showed the class the slave index file cards that the Historic Records Center maintains, and how they have been digitized as part of the Mason’s Legacy Project. It really made sense when coupled with Posner’s presentation on data. Those cards in their physical sense are archives, but they are also data since they have been digitized into a searchable database. It helped to tie the ideas together into something easier to understand. Her example of a photo album2 as data also really made sense to me, the ability to physically interact with history is something that is lost when working with digital media. Seeing a jpg of an old newspaper isn’t the same as holding it in your hands. The text and information is there, but the smell of old paper is missing. The physical records of something have sentimental value in the way that a database doesn’t. Just the act of calling it data somehow strips some of the sentimentality 3.

Overall, between Tony’s presentation and Posner’s blog post I have a much better understanding of how digital history projects are relevant and very useful to historians today.

Tony Guidone’s Presentation

Tony Guidone brought up an interesting debate I was not previously aware of: digital research versus analog research. I had always figured the digital means of research had usurped the analog means, and that there was no real contention in the matter. Guidone demonstrated that there are two sides to any coin– namely, German coins in the 1930s. He explained that although the web’s improved fluidity of resources enriches researching, there is a loss of physical context that analog researching affords. One must always view the digital data through the eyes of the author; Trevor Owens states as much in his article, describing data sets as a ‘human-made artifact… having the same characteristics as text. Data is created for an audience. Humanists can, and should interpret data as an authored work and the intentions of the author are worth consideration and exploration.’ Guidone simplified this sentiment into more tangible terms: you can’t feel the weight of a coin from a picture, and you can’t feel the material of a newspaper from its transcript. Anyone who creates anything has an agenda, and there is more room for exclusion through a digital medium. 1

Another aspect Guidone brought to my attention was that the web’s astounding ease of access might just serve as a detriment. As Ian Milligan elaborates, ‘in 1998, a year with 67 Canadian history dissertations in the ProQuest dissertation database, the Toronto Star appeared 74 times in that data set; by 2010, it appeared 753 times in a slightly larger data set of 69 dissertations.’ Milligan explains that the Toronto Star went digital. This is an almost 1000% increase in citations– but the Montreal Gazette and the Toronto Telegram, papers that did not go online, ‘remained relatively stagnant’ at a ’16 per cent increase’ and a ’72 per cent decrease’, respectively. The marvellous convenience of the web skews towards resources with the fortune of being digitised, thus biasing researchers to those resources. Milligan posed this as a problem for Canadian historiography and historiography in general, criticising digital databases’ ‘lack of methodological reflection about how these databases work’. 2

Blog post 2

I found Tony Guidone’s presentation to be very interesting and educational for this class. Text searching definitely makes looking up or researching certain topics much easier. For example, when I go to type in a keyword to find something I could receive thousands of responses to help me narrow down what I am really trying to find. However, as he discussed in his presentation, there are several methodological problems that come with digitization. First of all, it creates unnecessary hierarchies. He asked the question “does Toronto star deserve to be cited 900%?” the answer he was leaning towards was no. Ian Milligan’s article “Illusionary order” also mentioned somethings about keyword searching. He said keyword researching brings something new and transformative. He also brought up the fact that when you keyword search it brings you a plethora of results, sorted by, date, newspaper page number, the section it appears in, and many more. 1. Gender was also an issue in digitization. The reason for this is because in the 19th century and early 20th-century newspaper editors were primarily male. Therefore, women did not get their say or free speech in any of the news that was published; because men dominated that industry.

Tony also brought up the fact that not everything is digitized. For example, a newspaper from the 1800s was not digitized. There are certain things you can gain and lose when you digitize a source. You can lose the original meanings conveyed by touch when you digitize something. To some people, it may lose its personal meaning or connection. The surrounding context is also lost. The reader loses the original reading experience which can make the text become less interesting to read. However, there are two things that are gained when you digitize a source. First, access is democratized. When something is democratized it means it is accessible to anyone; so anyone can go online and learn about the source without having to spend money or leave their house. It is a great way to spread information fast and conveniently. The second thing you gain is the spread of research. It enables others to come up with new research questions and lets them further the research. Overall, I learned a lot from Tony Guidone’s presentation and I am looking forward to gaining more knowledge in digitization.

Works cited: Milligan, Ian. “Illusionary Order” The Canadian Historical Review. https://muse-jhu-edu.mutex.gmu.edu/article/527016. Date Accessed 13th September 2019.

Blog Post #2

Text searches allow us to sift through thousands of documents in an instant, to find helpful sources when conducting research. While many of these documents, such as published articles, already exist in a digital format, most primary sources must undergo a process of digitization. Some primary sources have different typesetting styles than modern computers, and others are handwritten primary sources. In these cases they must be manually digitized and typed out by the database creators. Once entered into the database, keywords can be used to find specific details from the collection of primary source material. Text searching easily allows one to track topics and even individuals through the source record. One thing that I feel is lost in digitization of source material is the significance of the physical, visual aspects of the primary source, such as the style of the script used in handwritten ones, or the specific formatting used in the typesetting of historical documents. Although small details, I feel that these are still important for giving contextual clues as to the origins of the information, as well as cultural indicators in terms of stylistic trends. However, in exchange for this loss, we receive both convenience and accessibility, which is equally vital.

In our class on Wednesday, we touched upon the use of open source websites and databases in instances of national tragedy, or civil unrest, such as in response to 9/11, or the Baltimore Uprising in response to the death of Freddy Gray while in police custody. These websites compile information that can be searched to help make sense of shocking and sudden events, in which people are separated from their loved ones, and need a means of checking updates to determine their whereabouts. These websites are examples of digital projects, of one nature or another—website or database—effecting social change. With the Baltimore Uprising 2015 Archive Project, people involved in and/or interested in the cause and legacy of the protests are able to keep alive the knowledge of the injustice dealt upon Gray. They are able to contribute to the preservation of the information circulating about his unlawful death, as well as the response to it, without being censored or silenced. Digital media gives a voice to those who otherwise might be drowned out. Another example of this, although loosely tied to this posting, would be the use of twitter and “live-tweeting” during the Ferguson protests in response to the death of Mike Brown in 2014, a year before the Baltimore Uprising. Protesters were able to give live updates about the situation in Ferguson, and took on the roles of journalists in their own right by recording footage and writing about the events as they transpired, much of which went ignored by popular news outlets. Digital media allowed them to spread news of their situation nationally, as well as world-wide, while it was still transpiring. I think that these examples demonstrate the power of global, online accessibility for the sake of social change. Digital projects, from formal organized websites like the Baltimore Uprising Archive, to informal collections such as the hashtags used to compile tweets from the Ferguson protests, have an incredible reach. While the act of digitizing old sources might take away context or flavor, the creation of new digital projects permits the creation of sources that otherwise would have gone silent. In either case, the need for accessibility is ultimately paramount.

“Preserve The Baltimore Uprising 2015 Archive Project.” Preserve The Baltimore Uprising 2015 Archive Project, The Maryland Historical Society, baltimoreuprising2015.org/about.

Desmond-Harris, Jenée. “Twitter Forced the World to Pay Attention to Ferguson. It Won’t Last.” Vox, Vox Media, 14 Jan. 2015, www.vox.com/2015/1/14/7539649/ferguson-protests-twitter.

Guest Lecture

Tony Guidone started off with a slide called ‘Digitization and Keyword Research’. Following this slide, he includes how online databases have shaped our ability to find information at the tip of our fingers. An example created was from the reading The Canadian Historical Review, concluding that the newspaper Toronto Star was cited 753 times in 67 different dissertations. Our accessibility from then on, creates a tainted vision for searching for primary sources. It becomes easier to find sources that contain multiple sources, which become umbrellas to different research questions. While reading The Canadian Historical Review, Ian Milligan said “why focus on the keywords? While undoubtedly some users are skimming occasionally, the main reason for the increased use of databases is keyword searches. This enables large scale media searching: representations of a specific word, activities of a group or evolving cultural conceptions of a term.” This reminded me of when Tony stated that when looking up a keyword, we must put ourselves in the position of “what would they call this?” as opposed to “what would I call this?” 1

Tony explains that there are overall pros and cons to digital sources. Con’s would be the loss of original meanings conveyed by touch, or gaining the original experience and thrill of experiencing history face to face. Pro’s include accessibility whenever and wherever, because of smartphones, tablets, and laptops. Accessibility also enables critical thinking in ways that could engage new research questions. Tony mentions that both are important and could not thrive without incorporating one or the other. Tony also discusses the importance of using as many primary sources as possible, because those are the sources that hold much more original value. This lecture reminded me of Trevor Owens Defining Data where he states that sources are texts, artifacts, and processable information. However, he defines data as a species of artifacts, which could be understood as precious valuable information. Trevor mentions how data was created for an audience and should therefore engage the audience, all while pleasing the author who created it. 2

Overall this was a great guest lecture. Very informative, and it has had me brainstorming ever since.

Works Cited

Milligan, Ian. “Illusionary Order: Online Databases, Optical Character Recognition, and Canadian History, 1997–2010.” Project MUSE, muse.jhu.edu/article/527016.

Owens, Trevor. “» Defining Data for Humanists: Text, Artifact, Information or Evidence? Journal of Digital Humanities.” Journal of Digital Humanities, journalofdigitalhumanities.org/1-1/defining-data-for-humanists-by-trevor-owens

Blog Post #2: Keyword Mapping: What is lost and what is gained

While many researchers may or may not change the data and information that is gathered overtime, the way that the it is viewed has become drastically different. Throughout history, and with the rise of digitization of information, a simple keyword search can save hundreds of hours of sifting through documents looking for that one single piece of information that you need. This new way to view information has made the researching process overall much easier. As stated before, there was a time even before index cards which found people needing to simply look through years of data that to them could have been just meaningless jargon. Up until they find that single piece of info that was needed for their research. Simply put, researching has become far more efficient that what it used to be. That is not to say that it wasn’t good before it just has grown to another level with keyword searching. This also greatly extends to the world of academics, especially now in which we are in the final transition of digitization. There are still students (most likely late in college) who may still remember pulling out encyclopedias from their home shelves in order to find out about a certain topic. However, some younger students don’t even know that the word encyclopedia didn’t originate from Wikipedia. With keyword searches old school research is becoming simply too slow and inefficient compared to new methods. With the rise of digitization also comes the rise of digital project, these being online experiences meant to enrich the user and/or researcher in that topic. These most definitely can cause social change as they can do something unique; combine primary sources and secondary sources to put the user into the what they are trying to convey. Looking back at the project I was assigned, Harlem: 1915-1930 1, it was obvious the amount of time was dedicated to putting together a useful interface that allowed the user to explore that time in history. Also, by using the combination of both types of sources, the project was able to put me into whatever moment in those years I wanted. This level of immersion can truly only ever come from digital projects. However, that is not to say that something isn’t lost when doing these types of projects. While they can show you primary sources as well as secondary sources, there is nothing like doing research with the actual source in front of you. Digital projects give you a unique experience however so do museums, archaeology, and other firsthand types of research. Anyone can do a simple word search, but it takes some real digging to go and find something that was never mapped out and digitized. That is the fundamental thing that is lost when using digital means; you can never truly find your own path, only one that has already been mapped. Sources: Digital Harlem “Digital Harlem: 1915-1930” (2017) Australian Government|Australian Research Counsel, Sydney University Retrieved: http://digitalharlem.org/ History.com Editors “Harlem Renaissance” Ian Milligan “Illusionary Order: Online Databases, Optical Character Recognition, and Canadian History, 1997–2010” (2013) The Canadian Historical Review Retrieved: https://muse.jhu.edu/article/527016 Library of Congress “About Keyword Search” (2019) United States Government Retrieved: https://catalog.loc.gov/vwebv/ui/en_US/htdocs/help/searchKeyword.html Tony Guidone “Digital History Presentation” (2019)
css.php