When the Library of Congress first issued the Recommended Formats Statement, one aim was to provide our staff with guidance on the technical characteristics of formats, which they could consult in the process of recommending and acquiring content. But we were also aware that preservation and long-term access to digital content is an interest shared by a wide variety of stakeholders and not simply a parochial concern of the Library. Nor did we have any mistaken impression that we would get all the right answers on our own or that the characteristics would not change over time.
The British Library’s Digital Preservation Team is sometimes asked to help resolve the preservation planning challenges of Library colleagues and other organisations. This post describes a recent request for assistance, the steps taken to learn more, the conclusions reached, and where this leads us next.
The TIFF format and lossless compression
About DPF Manager
DPF Manager is an application and a framework designed to allow end users and developers to gain full control over the technical properties and structure of TIFF images intended for Long Term Preservation.
The main objective is to give memory institutions full control of the process of the conformity checks of files. This is a three-step process:
- Validation: validating the conformance to a specific normative. These normative can be defined by some standardization organization or specific acceptance criteria based on a locally-defined policy rules.
Bill McCoy’s article, “Takeaways on the Future of Documents: Report from the 2015 PDF Technical Conference,” offers some interesting thoughts on the future of PDF. I can’t find much to disagree with. PDF is in practice a format for reproducing a specific document appearance, and that’s becoming less important as the variety of computing devices increases. He makes a point I hadn’t thought of, that the “de facto interoperable PDF format” is well behind the latest specifications, which may explain why I haven’t seen complaints that JHOVE doesn’t know about ISO 32000 PDF!
The veraPDF consortium is pleased to announce the latest release of the veraPDF PDF/A validation software and test-suite currently under development.
Highlights for this release are:
- validation of all conformance criteria for ISO 19005-1 (PDF/A-1), conformance level b;
- a complete PDF/A-1b test corpus, including 200 new test-files:
- PDF features reporting; and
- a cross-platform installer.
Prototype features include:
- PDF metadata fixing;
- validation model and rules for PDF/A-1a, PDF/A-2 & PDF/A-3;
With digital preservation, and in particular the preservation of digital assets created by digitisation, very much a hot topic in the archives and libraries communities recently; we are being asked more and more frequently by clients which is the “best” image format to use.
Of course the answer is almost always “It depends on your project’s goals.”
The digital preservation community is a connected and collaborative one. I first heard about the Europe-based PREFORMA project last summer at a Federal Agencies Digitization Guidelines Initiative meeting when we were discussing the Digital File Formats for Videotape Reformatting comparison matrix. My interest was piqued because I heard about their incorporation of FFV1 and Matroska, both included in our matrix but not yet well adopted within the federal community.
EPUB is the favorite format for e-books (ignoring Amazon, which like to be incompatible so it can lock users in). EpubCheck is the open-source industry standard for validating EPUB files. If you’re an author creating your own e-book files, you should run them against EpubCheck before releasing them. It’ll make hosting sites happier, since they’ll probably run it themselves and will like your book better if it passes. A book that passes EpubCheck will also give you fewer headaches with readers complaining it doesn’t work on their reader.
A proposal to use PDF/A as a Submission Information Package (SIP) under the Open Archival Information System (OAIS) model has generated a small stir on Twitter.
The aim of a SIP is to deliver a collection of documents in a form suitable for ingesting into an archive. It needs to have enough metadata to create a proper Archive Information Package (AIP). The model doesn’t specify what SIP format(s) an archive should accept. XML files following well-known archival schemas such as METS for the overall package and PREMIS for preservation information are popular.
A report from the 69th meeting of the JPEG Committee, held in Warsaw in June, mentions several recent initiatives. The descriptions have a rather high buzzword-to-content ratio, but here’s my best interpretation of what I think they mean. What’s usually called “JPEG” is one of several file formats supported by the Joint Photographic Experts Group, and JFIF would be a more precise name. Not every format name that starts with JPEG refers to “JPEG” files, but if I refer to JPEG without further qualification here, it means the familiar format.