Thursday, November 15, 2012
Amazon has pushed out a couple of important updates related to Glacier recently. The most notable is a new S3 feature to automatically migrate data from S3 to Glacier based on certain rules like age and organization structure. The other update deals with partitioned retrieves of large data sizes; for instance restoring a large file in multiple operations to keep restore costs under control.
Now, as much as I love the idea that S3 based structures can be automatically archived or migrated to Glacier, I would really prefer a storage client that offer intelligent use and configuration of these services. These two services offer very different price points and usage patterns, and a healthy combination would probably make sense for most backup scenarios. One example is that read/write data like a backup manifest or database would need a service like S3 whereas stable data like photos and videos would benefit from Glacier storage. Data used in sync scenarios would probably need to be on S3.
Browsing through various developer forums it is obvious that a lot of them are trying to find a reasonable user interaction model to deal with both the recovery delay and recovery throttling needed for Glacier. There is little doubt in my mind that there is ample room for a good Amazon AWS client in the backup and sync space. I would really like to have my entire backup on this platform, and the ability to offer certain parts of it for instant file sync and other parts of it for slow file sync. Within the current Glacier pricing model, keeping file history around for a long time becomes very appealing – so good purge options would also make a lot of sense. Even better if the purge options were respectful of the Glacier early delete penalty model.
In short, all I need now is a great client – and if that is not available then at least a decent one.
Monday, November 12, 2012
As the Amazon Glacier FAQ clearly states, it is designed for infrequent retrieval. In other words, excessive retrieval will make a clear dent in your monthly bill – and this comes in the form of something called Peak Hourly Retrieval rate. There are other components related to the Glacier retrieval policy, but they are straight forward and in line with other Amazon Services.
Let us say you have 250 GB of data stored in Glacier and we have a month with 30 days. You get to retrieve 5% of your data at a monthly basis for free, which in a 30 day month is about 0.17% or 0.42 GB per day or 17.8 MB per hour. Anything more than this will be added to your bill based on your Peak Hourly Retrieval rate.
Let us start by investigating the rate you would get if you had a dumb or generic backup client. You want to restore your 250GB of data, and the backup client will request this data from Glacier. If I understand Glacier correctly, the minimum processing time would be 4 hours. So let us use this number – leaving you with an hourly retrieval rate of 62.5 GB. As previously mentioned you would get about 17.8 MB for free, which won’t matter much at this volume so let us ignore it.
With a peak hourly retrieval rate at 62.5 GB, this hour will cost you 1 cent per GB, leaving you at 62.5 cent. The bad news is that this rate, or rather your maximum rate, will be the basis for all hours within the active month. In our scenario, this is 30 days or 720 hours leaving us with a line item of 450 dollars. Obviously, this will get a lot more expensive if you dataset was larger.
One important note before we move on – retrieval rate is not the same as download rate. If you request 250 GB at once and it takes 4 hours, you will be charged for 62.5 GB regardless of whether it takes you 3 weeks to download!
A smart backup client would be able to give you options more in line with your needs and more importantly within your download speed. There is little sense in pulling out 62.5 GB per hour if your connection can’t handle it – and I guess it won’t for most of us. If you set your backup client to space this evenly out for 24 hours, you would be looking at around 10.5 GB per hour, which should be possible on a fast broadband connection. Ignoring your free hourly usage, you would pay around 75 dollars for this. Or, if you have plenty of time you could set it for a week and retrieve 1.5 GB per hour, leaving you with a fairly manageable 11 dollars.
Another thing to note is that if you are retrieving small amounts of data relative to your total storage you would able to keep it within your free allowance – or achieve the same with a complete restore if you just have enough time.
I guess the moral of the story is, Glacier is great for storing a lot of data – but you should get a backup client that understands the Glacier billing model and can give you sufficient information to let make the right decisions and avoid a huge charge on your credit card. If such a client is not available to you, you should at least try to manually restore at reasonable chunks – but this is hardly a comfortable restore scenario.
I would also like to stress that in addition to your peak hourly usage fee you would also be looking at a transfer fee and a requests fee. The transfer fee for 250GB is 12 cent per gigabyte, with 1 GB free every month. This leaves you with about 30 dollars that you need to add to your bill. I would assume your requests charge would be less than 5 dollars at this volume, maybe less than 1 dollar. If you have a look at how much your initial transfer in was, it should get you workable number.
In short, if you get your restore partitioned out for 24 hours, you would pay around 110 dollars for 250GB – which should be within reach in the event of total data loss.
Sunday, November 11, 2012
I am fascinated by the Amazon Glacier product and its potential as the cloud backup store of choice, largely because of it its extremely low price point at around 1 cent per 1 GB per month – or around 30 dollars for a years’ worth of 250 GB quality storage. Also, transferring data into Amazon Glacier is free.
So far so good, right? Well, there is an additional charge for the number of requests or API calls for a lack of a better description – but they should be relatively cheap as well. This charge isn’t very predictable, but depending on your backup software it might be a slightly larger number than the amount of files you are backing up. Anyway, there is really no way to know for sure.
In short, we have an affordable way to back up our data to the cloud. Well, I guess it is time for the confusing part. You had a massive system failure or an accidental deletion, and it is time to get your data back. The process itself is relatively simple; you just fire up your backup software and kick of a restore operation. The problem is figuring out how much it will cost you – and these charges may be substantial.
While it is cheap to get data into the Glacier cloud, it may be relatively expensive to get it back again. Even worse, you are entirely at the mercy of your backup software. Before we dig deeper, you must keep in mind that Glacier reaches its low storage price by being optimized for getting data in, not for changing it or getting it back again.
There are three main price components involved when retrieving data from Glacier, a data transfer or download fee, a requests or API call fee, and most importantly a peak hourly usage fee. Just by looking at the name you can guess that it is the peek hourly usage fee that is the confusing part.
I will write a more detailed blog post about Peak Hourly Usage, but I will give you a short introduction. Within a month or rather a billing period, they will find the hour where you retrieved the most amount of data, and charge as it that was the rate you retrieved data at for every single hour within that month. So retrieve a lot of data in a few hours, and you will pay for it through the entire month. Also, just to add an additional level of confusion to the mix – they don’t count the amount of data downloaded, but the amount of data retrieved or requested. This puts you entirely at the mercy of the backup software you use, if it starts a restore by requesting all your data in one big operation you may were well be looking at a substantial bill at the end of the month. The fact that you may spend weeks downloading the requested data is irrelevant. I won’t give you any numbers right now, as there are additional parameters involved that would typically make it somewhat cheaper – but as mentioned earlier I will cover this in a later post.
Another thing to note is that they will also charge you extra if you delete data that is less than 3 months old. This is only around 3 cents per GB, but it is important to keep in mind. Once again, Glacier is optimized for uploads, not changes or downloads.
And a final and very important note – this is the way I understand the Glacier pricing model. I may very well be confused…
Thursday, November 8, 2012
I have usually tried to keep my critical data down to a reasonable size, and with a reasonable level of redundancy. In what I suspect can be referred to as the old days in computer time, I used to backup to external hard drives – or more recently popularized and modernized as a NAS device. While this remains an almost as popular approach as no backup at all, we are living in a cloud backup era and it is a good idea to find a backup option in this domain – and I suppose a lot of us have; knowingly or not.
That said, I still like my local backup. So what does one do? Well, I did what I thought was sensible at the time. I when to my local backup supplier and found a cloud option – a no brainer one might think. It seemed to have all the bells and whistles to suit my somewhat skeptical mind. Well, my cloud experience with Acronis was short lived. For such a crucial player in the local backup and imaging marked, I have little positive feedback for their cloud service. Reminded me of when in-application internet updates came about and everyone wanted to include one – and more times than not it was obvious that this was not their core domain. I felt the same way with the Acronis Cloud Backup, it felt thrown in there and the user experience was horrible, the performance was horrible and when things stopped working, their customer support was horrible. Well, let us leave that a lone.
Therefore, I started digging around to find myself a new backup provider – finding that the web was polluted by marketing sites posing as review sites. Well, not much you can do about that. So, it was time to work my way through the stacks of online backup software. Obviously skipping anything built with Java or missing local encryption – which narrowed it down quite a bit. Then I removed the ones with horrible user reviews and the ones who seem that think unlimited storage is what they find reasonably unlimited while reserving to rights to close down anyone actually using more than they should.
The before mentioned reasonably sized private data suddenly grows quite a bit when you have a child and invest in moderately expensive digital SLR – even more so when you realize that JPEG images just won’t do it for you anymore. Keeping that in mind, one of my favorite backup architectures, the Amazon S3, suddenly became rather expensive for my price / performance scenario.
After a lot of back and forth, I ended up with Backblaze and I have been happy with it since. It is a bit tiresome to tune their configuration files that periodically replaced to make sure I backup what I want backed up, and not what they think is best. That said, they are never replaced unless I device to make some changed in the UI, so that is all fair. I guess I am trying to say that I would have preferred a more advanced configuration mode.
Even though Backblaze does a fine job, there is still the issue of bandwidth. It is a bit sad to see my fiber connection max out at around 400kb/s when it has so much more to give. Maybe that is the price I pay for not finding a suitable offering closer to my location. Amazon S3 would probably have been better, but it just isn’t optimized for backup and the price reflects that.
Then, Amazon Glacier was announced. The backup dialog I put to bed a few months ago suddenly kicks into high gear. It looks very promising, even though it is a bit new and the software support is lacking to say the least. The old S3 power player JungleDisk was sold to Rackspace, and has been quiet for months – even years, some might say. Also, even though the Glacier pricing was very tempting, it did lack some fine tuning to get it compatible with the typical backup scenario where one needs a lot of write access and a bit of read / write access. A hybrid of S3 and Glacier would make a lot of sense. So, either Amazon tweaks its offering a bit or some backup software can bridge the gap – or both. I realize there is little to do but wait. I am still on Backblaze, so I feel safe for now. My initial uploads are done, and the incremental uploads aren’t annoyingly time consuming.
Today I was looking at the Glacier prices again, and I feel something has changed – and I feel it is time to awaken this little side project again. Time to go software hunting…