In 2018 the award-winning director Peter Jackson released a landmark film, “They Shall Not Grow Old” made with original WW1 footage, imagery, and sound from the extensive archives of the Imperial War Museums (IWM).
Jackson took the black-and-white archives and painstakingly coloured them to match the hues of the Western Front; transforming the era’s trademark hand-cranked, jittery film footage into vividly engrossing film, using modern production techniques. Watching all these long-dead soldiers brought to life is an emotional experience: the addition of an audio track carefully synched with the support of forensic lip readers renders the footage deeply haunting – even if its release did stir all sorts of scholarly debates over the ontology of authenticity.
Jackson, of course, was the magus of the effort, but it would not have been possible without a huge trove of archival footage from the Imperial War Museums (the plural is intentional: there are five). The IWM’s archive involves everything from nitrate-based film stock over a century old – “quite nasty stuff, prone to burst into flames; we had to store it in blast-proof bunkers for several years” notes the IWM’s longstanding Chief Information Officer (CIO) Ian Crawford levelly in a conversation with The Stack – through to contemporary footage from conflict regions shot on digital media, or VHS tapes posted in by members of the public.
See also: Standard Chartered Group CIO Michael Gorriz on taking core banking, payments to the cloud, and life in Singapore.
The storage and back-ups of all this media is the responsibility of CIO Ian Crawford and his team in IT. And Crawford, who has worked for the IWM for 14 years, is not about to run out of work: new scans in the Museums’ film collection generate some 10TB of new data every month, and a videotape scanning project is expected to create more than 900TB of data over the next four years: “It's a nonstop job really, because we're still acquiring material,” Crawford says. “We still get footage in from the National Archive -- declassified military material. Much of the [early] film has been digitised -- a lot of it was used for the Peter Jackson film; we did the digitisation and sent it over to him to work on. Right now we're concentrating on the Cold War.
Degradation, as well as explosion – and the more banal issues of accessibility, storage policies, and diversifying backups is a perennial issue. As Crawford notes: “A lot of the video backups we've had to advance [the digitisation of], because it is in a worse state than the film. With a lot of magnetic-based mediums the shelf life isn't good; they start shipping particles; we’ve used an external company to help accelerate their digitisation.
Across the museum’s estate meanwhile an infrastructure first sketched out by Crawford some 14 years ago continues to modernise and bear fruit for the museum. At its heart is two Spectra T950 tape libraries. These support the ongoing digitalisation efforts, conducted by the IWM Collections Management Team, which are underpinned by two 4K film scanners.
As Crawford notes: “We have two platforms, both using Spectra equipment. The archive masters go to tape for long-term preservation -- it’s perfect for that: we use LTO and IBM’s TS1150 standard and we’ve just done a big migration from LTO5 to LTO7 with a few button presses in the tape library. Then, when we generate the archive master, we also generate what we call ‘proxies’. These are things like H.264 or H.265-type files, or MPEG, or ProRes that we would use for day-to-day activities: the ones that get put online, sold on to commercial suppliers to edit and put into documentaries. These go on to spinning disk.”
The CIO has to ensure the IWM has multiple copies of everything digitally put to tape. These are stored at two different sites on Duxford Air Field: an old WW1 and WW2 airfield that is also home to Europe's largest air museum, IWM Duxford. As Ian Crawford explains: “We squirt two copies across to them – we’ve got a fibre network that sends it simultaneously using Spectra BlackPearl [a storage system with multiple standard interfaces that lets users push data to disk, tape, and cloud storage, etc.] -- and use two different tape technologies as well; one on each library, in case one format turns out to have a bug in it.”
But it’s not just video being backed up. And while Peter Jackson’s film may have been the first genuinely high-profile and pioneering bit of work based on IWM archival footage, other opportunities rear their heads. As IWM CIO Ian Crawford notes: “We’ve got more than 10 million photographs in the collection. The more of those we digitise, the more possibilities open up to use AI in the future to help with the cataloguing; we've already experimented with some machine learning trained to recognise different aircraft.”
Tape as a storage medium has long (for decades, even) been written off as obsolete, but innovations keep happening. Did Crawford assess cloud-based deep storage alternatives before settling for the twin tape libraries approach? Ultimately the need for speed means on-site remains the best option, he tells The Stack.
“We keep it on site for two reasons”, he says. “One is speed. When we started, we were scanning at HD. Now we’ve gone to 2K, and are going to 4K. So a 30-minute film, once it’s gone through the scanner, could be 1TB in size. We’ve only got a 1Gb connection into the internet and it would get hammered.
“We’re putting in 40Gb connections at Duxford to deal with the speed, but keeping it on-site makes most sense. We’re looking at IFFF formats which are more efficient than the non-compressed DPX format we use ; at some stage we might want to transcode the DPX into another standard and save some space. But we’ve had tape libraries now for 10 years. They’ve been upgraded, new capabilities added and they’ve really returned their total cost of investment. The British Film Institute – who are hugely bigger than us – looked at our infrastructure earlier on, spoke to some experts and put in the exact same thing – two T950s, this BlackPearl S3 gateway to move the files around -- which verified what we were doing in a lot of ways.”
The IWM has, over the years, also accumulated a lot of asset management data that winds up sitting on production servers: unlikely to be used soon, but needed just-in-case. This includes designs and audio-visual material generated for earlier exhibitions.
It’s ultimately “stale” data that’s unlikely to be revisited in the near-future, but needs to be kept. As Crawford notes: “Once the exhibition is done they throw it over the fence to me and say ‘we need to store this and never, ever get rid of it’, because it might be referred to in 12 years’ time for some reason. To handle that we use StorCycle, which is ideal because it lets us store it on tape, but be accessible to our service desk, or for power users direct access from a server.”
That system lets users like Crawford and his team set automated migration functions whereby directories or file shares can be scanned and migrated to storage medium of choice (e.g. SATA disk, cloud, or tape) with clear user-defined policies, including the option to migrate inactive data based on age, size and file type.
As the CIO puts it: “We have something like 90TB of media files on our SAN at the moment and on some NAS boxes we’ve got that we know won’t be touched for a long time, but which need to be accessible. Then there’s the normal business stuff that people are very loath to get rid of! That’s been on spinning disc and is on StorCycle now. We’re hoping to move 60%-70% of the data that’s currently on full-flash SAN and NAS on to StorCycle.”
With the volumes and eclectic diversity of data coming in to IT, it's a never-ending job. And while the behind-the-scenes storage work being done by IT at IWM might not be glamorous, and occasionally has its moments of absurdity, it’s also a responsibility like no other. And as fresh treasure troves of archival footage and photographs continue to land from various sources, Crawford and his team in IT’s efforts to digitalise it and store it safely may bear who knows what future creative fruit. Rendered into vivid colour that brought the Somme to life, it’s already made more than one grown man get a touch emotional watching in astonishment, sorrow, and delight at seeing the past brought so vividly and painfully to life. Happy World Backup Day.