Web Archiving

How the National Library is working to preserve our digital present

Changes to copyright law will allow the National Library of Australia to significantly increase its collection of electronic publications, including ebooks and websites

Recent changes to Australia's Copyright Act will allow the National Library of Australia to broaden significantly its efforts to preserve the country's digital cultural heritage.

The Civil Law and Justice Legislation Amendment Bill 2014, passed in June, was an omnibus bill that included among its provisions a number of amendments to the Copyright Act 1968.

Using the wayback machine to mine websites in the social sciences: A methodological resource

Websites offer an unobtrusive data source for developing and analyzing information about various types of social science phenomena. In this paper, we provide a methodological resource for social scientists looking to expand their toolkit using unstructured web-based text, and in particular, with the Wayback Machine, to access historical website data. After providing a literature review of existing research that uses the Wayback Machine, we put forward a step-by-step description of how the analyst can design a research project using archived websites.

The Future of the Web Is 100 Years Old | Nautilus

The Earth may not be flat, but the web certainly is.

“There is no ‘top’ to the World-Wide Web,” declared a 1992 foundational document from the World Wide Web Consortium—meaning that there is no central server or organizational authority to determine what does or does not get published. It is, like Borges’ famous Library of Babel, theoretically infinite, stitched together with hyperlinks rather than top-down, Dewey Decimal-style categories.1 It is also famously open—built atop a set of publicly available industry standards.

Data on the Web Best Practices | W3C

This document provides best practices related to the publication and usage of data on the Web designed to help support a self-sustaining ecosystem. Data should be discoverable and understandable by humans and machines. Where data is used in some way, whether by the originator of the data or by an external party, such usage should also be discoverable and the efforts of the data publisher recognized. In short, following these best practices will facilitate interaction between publishers and consumers.

Re-inventing the scholarly record: taking inspiration from Renaissance Florence | hangingtogether.org

Re-inventing the scholarly record: taking inspiration from Renaissance Florence

On February 11th, we presented the Evolving Scholarly Record (ESR) Framework at the EMEA Regional Council annual meeting, in Florence. The topic was on spot, as the plenary talks preceding the ESR break-out session had paved the way for a more in-depth discussion of how libraries can re-invent their future stewardship roles in the digital domain.

The downside of web archive deduplication | Kris's Blog

I've talked a lot about deduplication (both here and elsewhere). It's a huge boon for web archiving efforts lacking infinite storage. Its value increases as you crawl more frequently, and crawling frequently has been a key aim for my institution. The web is just too fickle.

Still, we mustn't forget about the downsides of deduplication. It's not all rainbows and unicorns, after all.

Data Armageddon: preserving data can be tricky business

There's a crisis looming in scientific data, with a leading scientific journal estimating 80 per cent of current scientific research will be lost in 20 years.

The study, published in Nature, included all data – even paper stored in garages or mouldy basements – but digital-only information may be under even greater threat. Digital conservation and what's called "digital archaeology" (picking up the pieces of data loss) is going to be an increasingly important strategy.