Archives for category: Long Term Archive

tramsThe Power of Spin

I was looking at my ipad recently and reflecting on the advances made since I started in I.T. Many people have written about the processing power in a digital phone versus the processing power available on Apollo 11 which took three men to the moon but I am interested in the mechanics behind bulk storage. My tablet has 8GB in a few square millimetres of silicon. My mainframes used to have great rolling disk drives with platters bigger than your barbecue.

I remember an incident back in the 70s (that’s the 1970s) on a cold Sunday in Melbourne at the old Gas and Fuel Centre in Collins St. That was the pair of rectangular buildings you see above which have long been demolished and replaced by Federation Square. We were updating a Burroughs B7700 system to a B7800 system and this morning we were installing the processors. Each CPU was a cabinet about two metres tall, three metres long and sixty centimetres deep and weighed about a tonne. Today the equivalent would be about a square centimetre I.C.

They were too big to come up the lift to the third floor so they had to be lifted by crane and come through a hole where some windows had been removed. Unfortunately, there are trams in Flinders St and as you can see from the photo taken a few kilometres away in St. Kilda, there is not much space between the overhead wires. These wires are at 600 volts DC and can carry enough power for a network of heavy trams so you don’t want to touch them with your jib. We had to negotiate with the city council to have the Flinders St trams shut down for two hours! We then powered off the entire computer room. (Almost an entire floor of the building on the right.)

When the time came to power up the system, I started with the disk drives. Now I want you to forget the 500GB disk buried somewhere in your laptop or even the $200, 4TB external backup drive you may have picked up at Dick Smiths or Frys. These were subsystems consisting of a controller and eight disk cabinets. Each cabinet housed four disk platters on a common spindle rotating in a vertical plane. Each platter was about 6 – 8mm thick and about a metre in diameter and quite heavy. At speed each platter had hundreds of read/write heads pushed within nanometers of the surface by compressed air and they were sealed and pressurised through the most amazing set of filters. Each track had its own head hence the term “head per track” disk. The entire cabinet contained 5MB of storage. That’s not a misprint – that is FIVE MEGABYTES. That would hold maybe one photo from a high def smart phone camera today and it cost hundreds of thousands of dollars. Anyway, Gas and fuel had four complete subsystems. (32 disk cabinets)

When you hit the ‘on’ button on a subsystem, the first disk drive would slowly begin to come up to speed. Given my marketing background and the title of this BLOG entry you probably expected a different type of spin eh? After about two minutes when the disk is nearly at maximum speed and the current draw is reduced it would automatically start the second disk in the string and so on. You can imagine that it takes a long time to get them all going. I decided to shortcut the process. I powered up each system as fast as I could walk between the controllers so that I had four coming up together. It would have made a good movie scene from Frankenstein’s lab as the deep throbbing noise and vibration began to shake everything. As the pitch started to rise there was a sudden thump and it all began to spin down. The lights flickered off and on and then the noise of the disks died away to be replaced by other distant sounds. I had dropped power to the entire building, all eleven floors. Lucky it was Sunday!

Being a gas company, G & F had installed two gas turbines on the roof to supply backup power and they sprang into action although crawled might be a better description. They were spinning up almost as slowly as the disks but with much more drama. It sounded like a 737 landing on the roof. Being aware of the lag in starting gas turbines, they had also installed two diesel powered generators made by International Harvester. (Remember them?) These diesels were kept warmed by electric sump heaters and could literally spring into action reaching full revs in way less than one second. Not fast enough to avoid bedlam in the computer room but boy was it exciting. Being responsible for all this made it very, very exciting for a young technician like myself. It turned into an extremely long day but was quite a learning exercise with regard to electric power sources.

Today, I can turn on my ipad and have it alive in a flash with more processing power and storage than the entire glass house we have been discussing. It runs all day on its internal battery with no moving parts and the trams are safe.

Tell all this to the kids of today and they won’t believe you. (Monty Python)(needs Yorkshire accent)



Long Term Archive IX

What media will you use to store your everlasting data? We have discussed the issues with keeping the spinning brown
stuff (Disks) going due to mechanical issues. We have also mentioned the chemical decomposition and coercivity decay of magnetic tape over time. I have admitted my predilection for physical records that do not require fancy technology to recover so I like rock paintings, engravings and ink on paper. It is difficult however to balance the convenience of storing a terabyte of data on a half kilo, two hundred dollar disk drive with the mountain of paper required for the equivalent in written pages. (Approximately five hundred million pages)

Back in the nineteen nineties some guy tried to market a system which converted data to a 2-D barcode which could be printed on any printer. It could be photocopied or faxed and it could be input as data again by a simple scanner and some software. It stored 4MB on one A4 page. That’s about the same as two thousand typewritten pages. I really liked this but it never caught on. Even so, you would need two hundred and fifty thousand such encoded pages to match the aforementioned hard disk. (1TB)

This brings me to the optical disk. CDs, DVDs and Blu-Ray have tiny physical pits which are read by a laser and although the players might be obsolete or just worn out in the future, the pits should still be there even if the Martians have to read them bit by bit through a microscope. Of course the encoding is digital and it is not just zeroes and ones but a special code (Atkinson code) to eliminate strings of the same bit. Someone could work it out.

You may have heard of CD rot or the discolouration of early CD plastic due to oxygen getting in. The gold CDs were less vulnerable to this than the polished aluminium substrate examples but it has all been solved with an epoxy edge layer. Anyway, it only applied to pre-pressed commercial CDs and DVDs such as music and movies. Your own data goes on CD/DVD-R or -RW which use dyes and the manufacturers assure us they will last for 100 years.

Still, 100 years is not that much. Imagine if the oldest recorded history we had was news articles on the Wright brothers flight or ‘House Rule’ being accepted for Ireland or turning of the first sod for Canberra. I’m still waiting for the next breakthrough technology in long term storage.



Long Term Archive VIII - The Museum

About 10 years ago I was invited to tender for magnetic tape storage for a certain military museum in Canberra. They have bombers and submarines and tanks on display and even more large hardware in warehouses around the ACT but there are almost innumerable, small, irreplaceable artifacts as well. The majority of these items will never be displayed because of the lack of space, some are too fragile and some will decay to nothing. The museum had decided to create high quality photographs of everything to be stored and indexed electronically and archived. (Forever?) We have examined earlier the challenges of keeping electronic data for eons but let us look at some issues that came up much more quickly than you might expect.

Firstly, high definition photos in the ‘naughties’ meant about four megapixels per picture. Today my phone has an 8MP camera and professionals use much greater definition so in just ten years the museum must be thinking, “Could we have done better?” or “Should we photograph them again?” In both cases the answer is complicated by the huge amount of time and effort that was expended. Still it is interesting to think that most of the artifacts are 60 to 100 years old and in the ten years since the photo program began, they haved only aged another 10% while the photos are almost obsolete.

What about digital photography itself. It has changed. Maybe the photos should have been shot in 3D or perhaps using one of those New Lytro cameras. Perhaps they should have been recreated using 3D printing or maybe we should wait for colour 3D printing.

The point is that when considering long term archiving you must deal with the unknown. Technology changes more quickly than the archived items will age. Will maintaining the archive data be more costly and time consuming than preserving the actual artifact? Do we conserve a photo of the Mona Lisa or concentrate on keeping the canvas. Food for thought.


Long Term Archive VII - What's the answer?

If you have been following my rambling rants you might think that I am about to suggest the best technology for your long term archiving needs. Well, not today. We might discuss some hardware and software options down the track but the most important component is a piece of wetware- the ‘Trusted Adviser’. Only you know your desired outcome but others may be better equipped to get you there. It might be your I.T. Vendor, application provider, third party consultant or your wife. No matter how well equipped you or your team think you are there is always room for advice, even if you ignore it.

I recall a story from my distant past when I worked on ledger machines. These electro-mechanical monsters (mostly mechanical) would use a striped card per account. The idea was that you pull Mr Jones’ card from the filing cabinet and feed it into the beast. It would read the account details, balances etc and await input. You would then key in details of the current transaction and the animal would print a line in the ledger, update totals and add details to the magnetic stripe on the ledger card. Then you replace the card, get the next one and so on. You end up with a printed ledger and updated account cards.

We had a small business customer who decided that we were ripping him off for the cost of blank cards so he went out to tender. The winning tender was a printer who duly delivered 1,000 nice, new cards all neatly printed with lines and logos and smelling of lavender. (Just kidding about the smell) They didn’t work. After a week of inconvenience, wasted time, costly service calls, anger, frustration and torn out hair, it was discovered that the printer had innocently printed the brown stripe down the side of the card with no idea that it was supposed to be magnetic oxide. He was not aware of its significance.

Everybody had a good laugh at the business owners expense and we delivered a new box of cards. In the end, the customer was out of pocket, behind time and embarrassed. The printer got no repeat business and wasted a lot of time. We had delayed our sale and somehow came out looking like the bad guys for not making obvious the purpose of the stripe.

The real problem was that the business owner did not trust us. (Probably our fault). He did not turn to someone with appropriate knowledge for independent advice. Good business relationships are gold. I encourage you to work on developing a mutually beneficial relationship with a ‘trusted advisor’ wherever you can find him or her.


Long Term Archive VI - Multiply and Reproduce

When considering the long term storage of data. the storage industry likes to promote two processes:

1. Redundancy
2. Technology Refresh

Redundancy is about having multiple copies of the data in case of loss and technology refresh is about having the data stored on technology that has not been obsoleted. You can see why we like to promote this stuff. Both processes require you to buy more storage and to keep buying it!

In the last episode we talked about the difficulties associated with trying to keep archived data for hundreds or even thousands of years. Neither of these two approaches will be much help here as that solution is more dependent on choosing some kind of incorruptible and easily readable medium. My years of experience in the storage industry led me to understand that very few vendors care about very long term archiving. The business focus is on data which must be kept for financial or regulatory reasons and the time-frames are usually 5, 7, 15 or 25 years. Let’s limit today’s discussion to this kind of archiving.

Firstly redundancy. The word has always bothered me because multiple copies of valuable data are not redundant, they are a valuable asset. Redundant means excessive or unnecessary. If I am made redundant at work it does not mean that I am a valuable asset in reserve. Semantics aside, you must determine how much extra equipment you need to maintain continuous access to your archived data. The simplest approach is a RAID disk system. That stands for Redundant Array of Inexpensive (sometimes Independent) Disks. There’s that word redundant being misused again. RAID will protect data against the possibility of a disk failure. To protect against multiple disk failures or subsystem failure you might consider mirroring between subsystems. To further protect against fire, flood, theft, terrorism or plane crashes you might want that mirrored system to be remote. Finally, to protect against data corruption or software failures you will want a backup copy similarly remote. This might be tape or optical disk or flash memory or holographic cubes or whatever. Now you are a valuable asset to your storage vendor.

Secondly, technology refresh. Moore’s Law applies to processors and suggests a doubling of performance every two years and it has continued to amaze by continuing to apply. A similar model for storage, especially tape sees capacities and speeds doubling in a similar time-frame. Because of this the industry moves ahead and the storage medium you are putting your tax files on today will be obsolete in a few years. The vendors suggest, correctly, that you must plan to migrate to a new storage medium (removable tape or disk) or new storage system (fixed disk) every second generation. Manufacturers will guarantee backward compatibility for one or two generations of product but seldom more. You will also gain in economies of scale as the newer products will inevitably hold more, take up less space, use less power and maybe even cost less. The old technology will be getting rusty and slow anyway and the maintenance providers will conveniently increase costs on older gear to help you along.

The cost and inconvenience to you, apart from buying the shiny, new hardware is that you must have a set of procedures and a workflow for ensuring the ongoing management and migration of your archive data. Consultants like myself can help. Just call.


The image is of bridge redundancy. Gateway Bridge, Brisbane, Australia

Long Term Archive V - Visualising the Past

Can We See It?

It is the year 4013. The Bangladeshi Empire rules the world. Being used to annual inundation from the monsoons, the Bangladeshi people gradually moved west during the global warming crisis of the late 21st century until they inhabited the Himalayan Archipeligo and most of the world went submarine. The short ice age of the fourth millennium gradually sequestered much of the water and as the glaciers slowly recede, Bangladeshi future archaeologists (FAs) are starting to dig up the past of the legendary western civilizations. Contrary to the cynics it seems that helicopters, private cars and hover boards really did exist once. The real treasure is emerging from the archives. Vast libraries and data centres were hermetically sealed when the inevitability of “The Flood” was finally accepted.

Millions of books have been preserved but very few were printed after people turned to tablets. Many e-readers have been found and some had documents in local storage including movies. The FAs have been able to reproduce “Dumb and Dumber” and “Twilight” but most of these devices kept their data on something called the interweb.

CDs and DVDs are unreadable even after using information from books to recreate the players because the medium has yellowed, cracked and warped. However, microscopic scanning of the metallic layer on the substrate has been successful in recovering some data however painfully and slowly and only where this was gold.

Huge libraries of magnetic tape have turned up. When new, this media had a guaranteed 15 year service life based on 15% loss of magnetic coercivity in that time. After 2,000 years there is almost no signal left but this is the least of its problems. The tape sticks together and when parted the data carrying oxide layer separates from the substrate. Some very old reels of tape have been found to store data at only 556/800/1600 bits per inch and although nobody remembers what an inch was, the data can be made visual by the use of very fine metal particles suspended in alcohol. Again, visibility comes to the rescue. Cartridges of very dense data on hundreds of serpentine or helical tracks are almost always unrecoverable using this method. It has been possible for the drive technology to be painstakingly reproduced at very high cost. Some data has been recovered this way.

These data centres also contained farms of disk based storage systems but it is obvious how these high precision rotating devices look after all this time. They certainly don’t rotate any more. Also, the very light read/write heads have mostly become stuck to the surfaces or their landing areas and are immediately ripped off on any attempt to start a rotation. The magnetic coercivity of the surfaces have dropped to almost zero anyway. These are useless.

The most successful media have proved to be solid state disks (SSD) and memories including portable devices such as Flash cards and USB sticks. Unfortunately the former were mainly used for temporary storage in high speed systems and the latter for transfer of files and photos. At least the photos are proving to be fun.

The point I have been making is that for extremely long term data storage it is hard to see how we can rely on anything we can’t see even with the aid of microscopes etc. Have you seen the 50’s classic sci-fi movie “Forbidden Planet”. It is scarred into my brain as one of the films that gave me nightmares as a child but it did contain a terrific example. The technology of a long lost civilisation was recovered because they had micro-engraved their information onto crystals. Very cool. The movie also had Robby the robot who predated the robot from “Lost in Space” by years.

Our current advice for long term archiving revolves around two principles:
1. Redundancy.
2. Technology refresh.
I will look at these recommendations next time.



Rock strata, t-rex bones, old pots, mummies and memos. All these things make great items in our archives but when we think about human knowledge we think of books. The terms “before recorded history” or “written history” bring up visions of libraries full of books. I think long term data archiving really began with books and books began with printing in the 15th century.
When the origin of printing is being discussed, who comes to mind?  For me it was Caxton the man who introduced the printing press to the British but really Gutenberg in Germany invented the printing press and preceded Caxton by forty years. He was also responsible for the Gutenberg Bible which rocked the Catholic Church even more than the current child abuse scandals. It undermined their power. But one definition of printing is “to produce by means of a reproduction process” and by this I think that the handprint images on cave walls created by cavemen blowing ochre past their fingers would qualify. So would the early use of wood blocks, Chinese chocks, rubber stamps and stencils. 
Regardless of how it started, the ability to create thousands of copies of information fairly quickly and easily provided what the I.T. industry would refer to as redundancy or backup, and enabled business continuance and even disaster recovery.  Archives of books exist all over the world including the example above, the Australian National Library in Canberra which contains every book every published in Australia. It houses 10 million books and recently reached 8 million in the number of newspaper pages digitised. It adds about 80,000 items per year. See
The U.S. Library of Congress has over 34 million books and is also actively digitising everything. The average 200 page book holds about half a megabyte of data so the books in the National Library could theoretically be digitised in about 5 Terabytes of data. This is about the size of a desktop backup disk that you can get from Dick Smith for a couple of hundred dollars.
In the 1980s there was a fire in the computer room at the ANL and I was involved in recovering our systems and data which was all business related rather than historical but it highlights the vulnerability of our archives whether paper or magnetic disk. 
The medium of books is paper and as long as we keep the items (books) we will be able to read them. This is not necessarily true of digital technologies and I will address this topic next time.
(Image by Buttontree Lane via Flickr)