ArchivesSpace-Archivematica-DSpace Workflow Integration: Methods and Tools for Characterizing and Identifying Records in the Digital Age

Methods and Tools for Characterizing and Identifying Records in the Digital Age

Consider this a follow-up to The Work of Appraisal in the Age of Digital Reproduction, one of our most popular blog posts by Mike. According to an extensive review of emerging professional literature, e-discovery search and retrieval in this era of rapid inflation in the amount of electronically stored information is, in short, a challenge. [1] "Traditional" keyword searching, even it's "semantic" variety, has real disadvantages in this environment. Without advanced techniques, keyword searching can be ineffective and often retrieve unrelated records that are not on your topic.
So what's an archivist and/or researcher to do?
According to some, assigning and then browsing by subject headings is the solution to this challenge. [2] Before we all jump on that bandwagon, though (how do you even get to meaningful subject headings in the first place when you have regular born-digital accessions that are many GBs to TBs in size, containing tens of thousands, if not hundreds of thousands, of individual documents), let's walk through a couple of exercises on methods and tools that can be used for characterizing and identifying records in the digital age:
- Analysis of Drives
- Digital Forensics
- Digital Humanities Methods and Tools
- Text Analysis with Python
- Named Entity Recognition
- Topic Modeling
Analysis of Drives
This is a good place to start when you don't know anything at all about what it is you've got. The size of the squares indicates the size of the files and folders in a given drive, while the colors indicate the file type.