The rise and fall of my Jungle Disk : Algorithms for the masses

The rise and fall of my Jungle Disk

Quite a while ago (I was surprised when I looked it up: 2008) I subscribed to a backup app called Jungle Disk. The interesting thing about it was (a) it used Amazon S3 (then relatively new) as a backup store, and (b) you subscribed to it at a rate of a mere $1 per month. So, in essence, it’s an online backup program and it allowed me to keep documents and photos – about 6 folder trees in all – somewhere else than a local backup drive. It was the “house burns down” option: in the event of a catastrophe (like, say, if the Black Forest fire last year had been a little more ferocious and the wind from the north-east a little stronger) I’d have our decade’s worth of photos still around once we’d rebuilt.

And, for the next 6 years all went well. The monthly bill from Amazon for the storage came to around $15, sometimes more, sometimes less, but not by much. Even when I experimented a few months back in deploying a couple of static websites to Amazon, using Route 53, the bill never really made it over $20 every month.

And then, boom, February’s bill arrived: somehow I’d managed to spend just over $100. WTF?

The statement/invoice was no real help: all it said was that I’d somehow managed to incur over $80 of outward-bound data transfer. A grand total of two thirds of a terabyte had been downloaded from my S3 account in February. I don’t know about you, but a distinct chill went down my spine. Had I been hacked? Was there someone out there just continually downloading the larger files – images, PDFs, zips – I’d linked to from my blog? 0.67TB worth? The possibilities all looked dark.

I fired off an email to AWS support asking for help in trying to understand my latest bill. They responded after about a business day with lots of details about how to find out which buckets were being downloaded from and when – details that I’ll admit to being hard to find from scratch. In order to see the data transfer usage and when it started, I downloaded a usage report for S3 (actually it’s a CSV file, ideal for opening in Excel) – you can download one for your AWS usage from here. This report gives you an hourly breakdown of data transfer from your buckets and could help in identifying what caused these charges to accrue.

For me, the results were shocking. Here’s a glimpse from the time it all started:

AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 2/07/2014 3:00    2/07/2014 4:00               275
AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 2/09/2014 18:00   2/09/2014 19:00              275
AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 2/11/2014 20:00   2/11/2014 21:00              275
AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 02/18/14 02:00:00 02/18/14 03:00:00  2,905,266,065
AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 02/18/14 03:00:00 02/18/14 04:00:00  2,905,262,825
AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 02/18/14 04:00:00 02/18/14 05:00:00  2,905,262,934
AmazonS3 GetObject DataTransfer-Out-Bytes jd2-f12ac61040xxxxxxxc267e70f7e26c07-us 02/18/14 05:00:00 02/18/14 06:00:00  2,905,263,028

From a minimal data transfer of a few bytes every other day, suddenly on 18-Feb, from 2am onwards Amazon time, something somewhere had started downloading nearly 3GB every hour. From where? Well that GUID-like thing in the middle is Jungle Disk’s generated name for my backup storage. Something was downloading data from my backup.

AWS support’s other hint was to turn on S3 server access logging for the bucket and regularly check the log reports. After a few hours, I checked the logs: lo and behold, all of the data requests were made from my IP address from the Jungle Disk Desktop app. No weird hackers out there who are really, really enamored of my data files: it’s all just Jungle Disk. For grins, I disabled Jungle Disk backups for a couple of hours, and the big data downloads stopped; enabled backups again, and the big downloads restarted. Pretty convincing to me: something in Jungle Disk Desktop was initiating these hourly data downloads.

Time for an email to Jungle Disk support. First answer: “what are you restoring from your backup?” and “Maybe it’s your antivirus. It’s scanning the files in your network folder.” In other words, the typical non-answer to get rid of the question. The interesting thing is that I’ve never restored from a Jungle Disk backup in the six years I’ve been using it (which in and of itself is a HUGE problem: how do I know it’s backing up properly if I don’t try and restore?) and nothing changed about my antivirus anyway; besides which I don’t make use of the network (that is, “mirrored”) folder functionality in Jungle Disk.

Before I replied, I deleted a couple of older backup hives from S3, something I should’ve done ages ago when I retired (and wiped) those machines being backed up. After I’d done that, I enabled backups again to see whether the clean-up had any effect. The data downloads were still there, but now they were only 750MB in size per hour (0.75GB). Score! I disabled the backups again before replying and presenting this new information. The reply was (in essence): “Some computer/person is initiating restores on a regular basis.” and recommended changing my Jungle Disk password.

Let’s get one thing straight: in order to initiate a regular backup (750MB every hour, remember), a hacker would have to know my Jungle Disk password (there’s an authorization check to see if you are still subscribed to the service every time you run the Desktop) and ALSO my Amazon AWS Access ID and Secret Key. Of course, once they have the latter, they have complete control over my AWS account and wouldn’t need to do bloody stupid “24×7 hourly restore” tricks (“hey I can store a cracked installer for Windows 8 on this stupid idiot’s S3 account and send all my friends the link”). Sorry, but it doesn’t fly. Also they must have access to my main machine to spot when I turn on or off Jungle Disk backups (this big data download only happens when backups are turned on for this one and only machine that uses Jungle Disk in my household). Not only doesn’t fly but doesn’t even walk. This, my friends, has no signs of life.

So, yesterday I decided that all this kerfuffle and investigation just wasn’t worth the time and effort. Nice app, but I’ve better things to do. I cancelled my Jungle Disk subscription, deleted the backup on S3, uninstalled Jungle Disk, and deleted the cache it uses on my C: drive (all 59GB of it, WTF?). I performed a backup to an external drive and put that in my car.

Update: so last night at the Denver Visual Studio User Group meeting – it was a boring bit of the proceedings – I viewed the access logs I had for the Jungle Disk bucket on S3. And found an interesting bit of behavior I’d completely missed before. Here’s a synopsis of the server access logs over a particular period of time late Friday evening/early Saturday morning showing access to various files that are numbered sequentially in a DB folder in the Jungle Disk backup bucket.

21-Mar 20:23:28 GET    ~/DB/75780 *
21-Mar 20:24:50 PUT    ~/DB/75781 *
21-Mar 20:31:59 DELETE ~/DB/75770 
21-Mar 21:31:51 PUT    ~/DB/75782 *
21-Mar 21:33:20 DELETE ~/DB/75771
21-Mar 21:38:19 GET    ~/DB/75781 *
21-Mar 22:30:36 PUT    ~/DB/75783 *
21-Mar 22:32:00 GET    ~/DB/75782 *
21-Mar 22:40:51 DELETE ~/DB/75772
21-Mar 23:30:25 DELETE ~/DB/75773
21-Mar 23:30:26 PUT    ~/DB/75784 *
21-Mar 23:26:21 GET    ~/DB/75783 *
22-Mar 00:22:43 PUT    ~/DB/75785 *
21-Mar 00:23:33 GET    ~/DB/75784 *

So for example: at 20:23:28 a GET is requested for file ~/DB/75780; about a minute later a PUT is requested for file ~/DB/75781. Then file ~/DB/75770 is DELETEd. And so on. As you can see, some part of Jungle Disk is reading these files one by one, sequentially, some other part is adding new new ones at regular intervals, again sequentially, and some other part is sequentially deleting the older ones with a delay of about 10 files. These files are 750MB in size, and I’m guessing are the database of the backups that Jungle Disk is doing.

And the reason for the asterisks? These requests come from Jungle Disk Desktop at a completely different IP address: 162.209.124.28. No idea who owns or what’s at that IP address, but it’s very peculiar that it was totally in sync with my copy of Jungle Disk and the only operations that it was doing was GET & PUT (with the files being deleted a little while later by my copy of Jungle Disk Desktop).

Well, it’s all moot now of course. Now, I’m researching apps that automatically mirror local folders to Amazon S3.

Now playing on Pandora:
Depeche Mode - Dream On (Dave Clarke Acoustic Version)
(from Remixes 81...04)

Tue 25-Mar-2014 12:02 PM Blog / tags: backup jungledisk

Loading links to posts on similar topics...

previous post next post

9 Responses

#1 Travis Illig said...

25-Mar-14 12:33 PM

Does it have to be Amazon S3? I've been using CrashPlan for backup for a while now to great benefit. Unlimited plan is $6/month per computer, less if you subscribe for more than a month at a time. Also, it backs up network drives if you tell it to, so if you connect everything you want backed up, you only really need one computer registered.

#2 Craig Peterson said...

25-Mar-14 1:45 PM

S3's API doesn't support a "rename" operation, so actually doing so requires a set of GET/PUT/DELETE commands. Maybe it's a poor attempt at a rolling backup of some sort?

#3 julian m bucknall said...

25-Mar-14 3:07 PM

Travis: You are now the second person to recommend CrashPlan. I'll take a look.

Craig: Interesting point. If so, bleugh. Problem is though that I'd have noticed the gradual creep in the costs I incur if this were the sole reason. But, as I said, it all suddenly blew up a month ago.

Cheers, Julian

#4 Van Swofford said...

09-Apr-14 3:04 PM

For several years, I used Jungle Disk to backup 2 computers, based on the recommendation from your blog. It worked great for a while, and I did do some restores to check it, and those worked fine. But somewhere along the way about a year ago it started doing wonky things. I don't recall specifics, but the cost was starting to creep up, and some stuff wasn't getting backed up. I switched to Crashplan, and I'm totally happy with it.

#5 John Topley said...

11-Apr-14 6:58 AM

I used to use CrashPlan but am now a happy Backblaze user. Their client software is much nicer than CrashPlan's.

#6 WD said...

26-Jul-14 1:52 PM

I too am a long time Jungledisk user via S3, but my reservations have been growing.

Haven't seen any unusual traffic activity...that part has been solid for us, but this issue just won't go away:

www.daemonology.net/.../2011-06-03-inse

They've never had our keys....but this should make any user uncomfortable.

We still use Jungledisk for near real time backups, but these days ...we're using Amazon Glacier more and more via the FastGlacier client. Not automated, but works very well.

#7 Wes Dunn said...

17-Nov-14 10:21 PM

Julian,

I'm a Jungle Disk support tech and also help monitor our social media. I was recently looking back at some old twitter mentions, and I happened across your tweet that linked to this blog post. I can't help but want to try and make this right for you. I understand that you may have very well cut off your Jungle Disk account and may be using another service, but I believe that (if you never got a full explanation) I can help explain and hopefully make this right for you. I realize it may be too little, too late to keep your business, but if you have some time for a phone call this week, please shoot me an email and we can set up a time to talk. I really hope to hear from you.

Thanks, Julian.

#8 Jeremy Gault said...

25-Mar-15 2:46 PM

For what it's worth, the IP address you mentioned (162.209.124.28) belongs to Rackspace. If memory serves me correctly, Rackspace bought Jungle Disk quite some time back. My guess is their servers may have been doing some type of operation(s) on your S3 data. What? I have no idea. But it seems plausible that's what was happening, considering it's a Rackspace IP.

#9 julian m bucknall said...

25-Mar-15 3:21 PM

Jeremy: Fascinating; I didn't know that. I wonder what was going on -- but not too hard since this was a year ago and I now use CrashPlan instead.

Cheers, Julian

Leave a response

Note: some MarkDown is allowed, but HTML is not. Expand to show what's available.

Emphasize with italics: surround word with underscores _emphasis_
Emphasize strongly: surround word with double-asterisks **strong**
Link: surround text with square brackets, url with parentheses [text](url)
Inline code: surround text with backticks `IEnumerable`
Unordered list: start each line with an asterisk, space * an item
Ordered list: start each line with a digit, period, space 1. an item
Insert code block: start each line with four spaces
Insert blockquote: start each line with right-angle-bracket, space > Now is the time...

by Julian M Bucknall