Analysis

ArchivesSpace-Archivematica-DSpace Workflow Integration: Methods and Tools for Characterizing and Identifying Records in the Digital Age

Methods and Tools for Characterizing and Identifying Records in the Digital Age

Provenance-Driven Data Curation Workflow Analysis

Manually designed workflows can be error-prone and inefficient. Workflow provenance contains fine-grained data processing information that can be used to detect workflow design problems. In this paper, we propose a provenance-driven workflow analysis framework that exploits both prospective and retrospective provenance. We show how provenance information can help the user gain a deeper understanding of a workflow and provide the user with insights into how to improve workflow design.

Towards Automated Design, Analysis and Optimization of Declarative Curation Workflows | Song | International Journal of Digital Curation

Data curation is increasingly important. Our previous work on a Kepler curation package has demonstrated advantages that come from automating data curation pipelines by using workflow systems. However, manually designed curation workflows can be error-prone and inefficient due to a lack of user understanding of the workflow system, misuse of actors, or human error. Correcting problematic workflows is often very time-consuming. A more proactive workflow system can help users avoid such pitfalls.

Towards Automated Design, Analysis and Optimization of Declarative Curation Workflows | Song | International Journal of Digital Curation

Data curation is increasingly important. Our previous work on a Kepler curation package has demonstrated advantages that come from automating data curation pipelines by using workflow systems. However, manually designed curation workflows can be error-prone and inefficient due to a lack of user understanding of the workflow system, misuse of actors, or human error. Correcting problematic workflows is often very time-consuming. A more proactive workflow system can help users avoid such pitfalls.

The Future of Computing: The ISE ("eyes") Have It | The Digital Shift

Recently thanks to a colleague I’ve been playing around with iPython. iPython is an interactive version of Python that many people are beginning to use to teach Python, to create and run simulations and visualizations, and to just generally have a richer environment within which to work while coding. This investigation led me to Xiki, which is both similar in some ways and different in others. But I would like to suggest that both of these, and likely more such tools, are creating an entirely new paradigm for computing.

Web Archive FITS Characterisation using ToMaR | Open Planets Foundation

From the very beginning of the SCAPE project on, it was a requirement that the SCAPE Execution Platform be able to leverage functionality of existing command line applications. The solution for this is ToMaR, a Hadoop-based application, which, amongst other things, allows for the execution of command line applications in a distributed way using a computer cluster. This blog post describes the combined usage of a set of SCAPE tools for characterising and profiling web-archive data sets.

A Risk Analysis of File Formats for Preservation Planning | Scape

This paper presents an approach for automatic estimation of preservation risk for file formats. The main contribution of this work is a definition of the risk factors with associated severity level and its automatic computation. Our goal is to apply a solid knowledge base automatically extracted from linked open data repositories as the basis of the risk analysis system for digital preservation. This method is meant to facilitate decision making with regard to preservation of digital content in libraries and archives.

Digital Curation: D3.1—Evaluation of Cost Models and Needs & Gaps Analysis (MS12 Draft) | 4C

This draft report. . . provides an analysis of existing research related to the economics of digital curation and reports upon the investigation of how well current cost and benefit models meet stakeholders' needs for calculating and comparing financial information. It aims to point out gaps that need to be bridged between the capabilities of currently available models and tools, and stakeholders' needs for financial information.