Piggybacking off of my last post, Purple Ink, and last week’s CDH discussion led by the wonderful Jacqueline Wernimont, today I want to reflect on the concept of “datafied” archives, to borrow Wernimont’s phrase, and authenticity.
Unlike projects which create brand new archives, take almost any Tumblr for example, datafied archives are the digital versions of brick-and-mortar institutions with two major changes: the archival material has been scanned (digitized) and any text has been rendered searchable (datafied). A particularly comprehensive example of this is the Library of Congress’s Chronicling America, where you can quickly search for articles within American newspapers from 1836-1922. Notice how the digital reflects the physical: you still read the article within the context of the whole newspaper page, the fonts and images remain as they do in the original newspaper, and you can browse the collection by year or publication as you would in the Library of Congress itself. Nevertheless, the archive is made up of individual pieces of abstractable data–phrases, words, and pixels.
Medieval European archives were originally meant to exclusively serve the needs of the king, to insure that he and his advisers had control of the charters and agreements within his realm. Authenticity was based on the literal proximity to the physical body of the king–he included the materials so they were authentic.
As early modernity continued on into the European Enlightenment, archives became less about this closeness and more about organizing, filing, and structuring knowledge. They still enforced the king’s authority, but now just having the documents was not enough–one had to be able to quickly access the stored knowledge. Across Europe competing methods of archival practice sprang up; however, all of these systems began defining authority and authenticity in the relationship between the item and all of the other materials in the archive. File folders and library cataloging styles define the physical place the item has within the archive and library, creating authority by exercising control and authenticity by comparison.
Authenticity for a datafied archive is similarly built, only in this case it is constructed in the digital’s relationship to the physical. Each of these archives are constructed from digital reproductions of material originals–records and documents, visual and written–which reside together online as they do in reality. If, for example, the Library of Congress has an archive of newspapers, they carry the the LOC’s authority and the newspapers’ authenticity into their digital versions.
However, this slippage between the the digital/datafied and physical archives’ authenticity leads to a potentially huge fallacy: big data’s illusion of objectivity.
This illusion has two parts, internal and external. First, datafying archives leads to the assumption that all material inside the archive is available in digital form. I do not think this is intentional on the part of the archives, nor is it simply stupidity on the side of the users. Take, for example, the Getty’s recent launch of its digital library–a fantastic and fascinating project which I have already used several times. An uninformed user may mistake this site for the totality of the Getty’s collection, which is far from true. Not only does this increasing availability of datafied material replicate issues of access which affect physical archives, they create new lacuna within the collections themselves.
Second, big data in the form of a datafied archive abstracts the information from the human hand which constructs it, side-stepping questions which all digital consumers should always ask: who selected this information for digitization and why? What was not included and why? Who did the work and why? Why? Why? Why? Nothing is objective in the physical archive, and, likewise, nothing should be considered so in the digital even if it claims to be. The mission to digitize every book, for example, can clothe subjective choices in objective goals. The massive scale of big data scanning projects, most notably Google Books, masks real-world political and economic motives behind the datafied archive–take this example of the racial and class issues behind the scanning of Google’s books.
I cannot say that this shift in authenticity in the archive is necessarily bad, excepting the last example of Google Books which is deeply troubling. Instead I am saying that datafying archives and redefining authenticity is occurring all around us. In fact, we are participating in its redefinition every time we use these materials. What we, as scholars and members of the online public, must do is to be hyper-vigilant of our assumptions when using datafied or digitalized archives, to educate ourselves on how we are constructing “the authentic” within this new media. Several institutions are furthering such efforts, like the Centre for Digital Library Research and the Emory Center for Digital Scholarship, focusing on the importance of digital literacy and education.
Above all, I want us to be self-aware as we live our digital lives, think about how our actions are constructing and reconstructing authenticity, and question the silences which continue to exist in our data.