Quite a while ago (I was surprised when I looked it up: 2008) I subscribed to a backup app called Jungle Disk. The interesting thing about it was (a) it used Amazon S3 (then relatively new) as a backup store, and (b) you subscribed to it at a rate of a mere $1 per month. So, in essence, it’s an online backup program and it allowed me to keep documents and photos – about 6 folder trees in all – somewhere else than a local backup drive. It was the “house burns down” option: in the event of a catastrophe (like, say, if the Black Forest fire last year had been a little more ferocious and the wind from the north-east a little stronger) I’d have our decade’s worth of photos still around once we’d rebuilt.
And, for the next 6 years all went well. The monthly bill from Amazon for the storage came to around $15, sometimes more, sometimes less, but not by much. Even when I experimented a few months back in deploying a couple of static websites to Amazon, using Route 53, the bill never really made it over $20 every month.
And then, boom, February’s bill arrived: somehow I’d managed to spend just over $100. WTF?
The statement/invoice was no real help: all it said was that I’d somehow managed to incur over $80 of outward-bound data transfer. A grand total of two thirds of a terabyte had been downloaded from my S3 account in February. I don’t know about you, but a distinct chill went down my spine. Had I been hacked? Was there someone out there just continually downloading the larger files – images, PDFs, zips – I’d linked to from my blog? 0.67TB worth? The possibilities all looked dark.
I fired off an email to AWS support asking for help in trying to understand my latest bill. They responded after about a business day with lots of details about how to find out which buckets were being downloaded from and when – details that I’ll admit to being hard to find from scratch. In order to see the data transfer usage and when it started, I downloaded a usage report for S3 (actually it’s a CSV file, ideal for opening in Excel) – you can download one for your AWS usage from here. This report gives you an hourly breakdown of data transfer from your buckets and could help in identifying what caused these charges to accrue.
For me, the results were shocking. Here’s a glimpse from the time it all started:
AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 2/07/2014 3:00 2/07/2014 4:00 275 AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 2/09/2014 18:00 2/09/2014 19:00 275 AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 2/11/2014 20:00 2/11/2014 21:00 275 AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 02/18/14 02:00:00 02/18/14 03:00:00 2,905,266,065 AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 02/18/14 03:00:00 02/18/14 04:00:00 2,905,262,825 AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 02/18/14 04:00:00 02/18/14 05:00:00 2,905,262,934 AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 02/18/14 05:00:00 02/18/14 06:00:00 2,905,263,028
From a minimal data transfer of a few bytes every other day, suddenly on 18-Feb, from 2am onwards Amazon time, something somewhere had started downloading nearly 3GB every hour. From where? Well that GUID-like thing in the middle is Jungle Disk’s generated name for my backup storage. Something was downloading data from my backup.
AWS support’s other hint was to turn on S3 server access logging for the bucket and regularly check the log reports. After a few hours, I checked the logs: lo and behold, all of the data requests were made from my IP address from the Jungle Disk Desktop app. No weird hackers out there who are really, really enamored of my data files: it’s all just Jungle Disk. For grins, I disabled Jungle Disk backups for a couple of hours, and the big data downloads stopped; enabled backups again, and the big downloads restarted. Pretty convincing to me: something in Jungle Disk Desktop was initiating these hourly data downloads.
Time for an email to Jungle Disk support. First answer: “what are you restoring from your backup?” and “Maybe it’s your antivirus. It’s scanning the files in your network folder.” In other words, the typical non-answer to get rid of the question. The interesting thing is that I’ve never restored from a Jungle Disk backup in the six years I’ve been using it (which in and of itself is a HUGE problem: how do I know it’s backing up properly if I don’t try and restore?) and nothing changed about my antivirus anyway; besides which I don’t make use of the network (that is, “mirrored”) folder functionality in Jungle Disk.
Before I replied, I deleted a couple of older backup hives from S3, something I should’ve done ages ago when I retired (and wiped) those machines being backed up. After I’d done that, I enabled backups again to see whether the clean-up had any effect. The data downloads were still there, but now they were only 750MB in size per hour (0.75GB). Score! I disabled the backups again before replying and presenting this new information. The reply was (in essence): “Some computer/person is initiating restores on a regular basis.” and recommended changing my Jungle Disk password.
Let’s get one thing straight: in order to initiate a regular backup (750MB every hour, remember), a hacker would have to know my Jungle Disk password (there’s an authorization check to see if you are still subscribed to the service every time you run the Desktop) and ALSO my Amazon AWS Access ID and Secret Key. Of course, once they have the latter, they have complete control over my AWS account and wouldn’t need to do bloody stupid “24×7 hourly restore” tricks (“hey I can store a cracked installer for Windows 8 on this stupid idiot’s S3 account and send all my friends the link”). Sorry, but it doesn’t fly. Also they must have access to my main machine to spot when I turn on or off Jungle Disk backups (this big data download only happens when backups are turned on for this one and only machine that uses Jungle Disk in my household). Not only doesn’t fly but doesn’t even walk. This, my friends, has no signs of life.
So, yesterday I decided that all this kerfuffle and investigation just wasn’t worth the time and effort. Nice app, but I’ve better things to do. I cancelled my Jungle Disk subscription, deleted the backup on S3, uninstalled Jungle Disk, and deleted the cache it uses on my C: drive (all 59GB of it, WTF?). I performed a backup to an external drive and put that in my car.
Update: so last night at the Denver Visual Studio User Group meeting – it was a boring bit of the proceedings – I viewed the access logs I had for the Jungle Disk bucket on S3. And found an interesting bit of behavior I’d completely missed before. Here’s a synopsis of the server access logs over a particular period of time late Friday evening/early Saturday morning showing access to various files that are numbered sequentially in a DB folder in the Jungle Disk backup bucket.
21-Mar 20:23:28 GET ~/DB/75780 * 21-Mar 20:24:50 PUT ~/DB/75781 * 21-Mar 20:31:59 DELETE ~/DB/75770 21-Mar 21:31:51 PUT ~/DB/75782 * 21-Mar 21:33:20 DELETE ~/DB/75771 21-Mar 21:38:19 GET ~/DB/75781 * 21-Mar 22:30:36 PUT ~/DB/75783 * 21-Mar 22:32:00 GET ~/DB/75782 * 21-Mar 22:40:51 DELETE ~/DB/75772 21-Mar 23:30:25 DELETE ~/DB/75773 21-Mar 23:30:26 PUT ~/DB/75784 * 21-Mar 23:26:21 GET ~/DB/75783 * 22-Mar 00:22:43 PUT ~/DB/75785 * 21-Mar 00:23:33 GET ~/DB/75784 *
So for example: at 20:23:28 a GET is requested for file ~/DB/75780; about a minute later a PUT is requested for file ~/DB/75781. Then file ~/DB/75770 is DELETEd. And so on. As you can see, some part of Jungle Disk is reading these files one by one, sequentially, some other part is adding new new ones at regular intervals, again sequentially, and some other part is sequentially deleting the older ones with a delay of about 10 files. These files are 750MB in size, and I’m guessing are the database of the backups that Jungle Disk is doing.
And the reason for the asterisks? These requests come from Jungle Disk Desktop at a completely different IP address: 18.104.22.168. No idea who owns or what’s at that IP address, but it’s very peculiar that it was totally in sync with my copy of Jungle Disk and the only operations that it was doing was GET & PUT (with the files being deleted a little while later by my copy of Jungle Disk Desktop).
Well, it’s all moot now of course. Now, I’m researching apps that automatically mirror local folders to Amazon S3.