A Tika to ride; characterising web content with Nanite
This post covers two main topics that are related; characterising web content with Nanite, and my methods for successfully integrating the Tika parsers with Nanite.Introducing NaniteNanite is a Java...
View ArticleCSV Validator - beta releases
For quite some time at The National Archives (UK) we've been working on a tool for validating CSV files against user defined schema. We're now at the point of making beta releases of the tool...
View ArticleDigital Preservation Awards 2014 - nominations now open!
We are pleased to announce we are partnering with the Digital Preservation Coalition (DPC) to sponsor the Award for Research and Innovation in the Digital Preservation Awards 2014.This award is one of...
View ArticleThe final SCAPE developers workshop in The Hague
The third and final SCAPE developers workshop was held at the Royal Dutch Library in The Hague on 23-25 April. This workshop was the final opportunity to work together face to face in a large group,...
View ArticlePreserving PDF: identify, validate, repair
Save the date! 1-2 September, HamburgOur next event focuses on the PDF file format and associated tools. The agenda is being defined by our members who have identified the following themes for the...
View ArticleA Weekend With Nanite
Well over a year ago I wrote the ”A Year of FITS”(http://www.openplanetsfoundation.org/blogs/2013-01-09-year-fits) blog post describing how we, during the course of 15 months, characterised 400 million...
View ArticleAn Analysis Engine for the DROID CSV Export
I have been working on some code to ensure the accurate and consistent output of any file format analysis based on the DROID CSV export. One way of looking at it is an executive summary of a DROID...
View ArticleWebinar: Let Scout Be Your Preservation Guide
Link: RegisterOverviewAn important part of digital preservation is analysing content to uncover the risks that hinder its preservation. This analysis entails answering diverse questions, for example:...
View ArticleBulk disk imaging and disk-format identification with KryoFlux
The problemWe have a large volume of content on floppy disks that we know are degrading but which we don't know the value of.ConsiderationsWe don't want to waste time/resources on low-value content.We...
View ArticleCSV Validator version 1.0 release
Following on from my previous brief post announcing the beta release of the CSV Validator, http://www.openplanetsfoundation.org/blogs/2014-03-21-csv-validator-beta-releases, today we've made the formal...
View ArticleWhen (not) to migrate a PDF to PDF/A
It is well-known that PDF documents can contain features that are preservation risks (e.g. see here and here). Migration of existing PDFs to PDF/A is sometimes advocated as a strategy for mitigating...
View ArticleWeirder than old: The CP/M File System and Legacy Disk Extracts for New...
We’ve been doing legacy disk extracts at Archives New Zealand for a number of years with much of the effort enabling us to do this work being done by colleague Mick Crouch, and former Archives New...
View ArticleSix ways to decode a lossy JP2
*/ Some time ago Will Palmer, Peter May and Peter Cliff of the British Library published a really interesting paper that investigated three different JPEG 2000 codecs, and their effects on image...
View ArticleWebinar: Long-term preservation with dArceo
Overview:Multiple cultural heritage institutions in Poland are involved in digitization activities, building together a network of over 100 Polish digital libraries, and making over 2 million of...
View ArticleRunning archived Android apps on a PC: first impressions
Earlier this week I had a discussion with some colleagues about the archiving of mobile phone and tablet apps (iPhone/Android), and, equally important, ways to provide long-term access. The immediate...
View Article