Small Fish, Big Pond

Tag: Tahoe

Tahoe: Cloud Storage protected on distributed redundant systems.

by Kerensky97 on Aug.10, 2009, under Internet, Technology

Wow! This is cool stuff!

While I’m not drinking the Cloud computing koolaid that most people are right now I do think it’s great technology; just not as developed for full implementation as most people think it is. In my opinion it has a lot of security and reliability issues to be addressed. Tahoe addresses the reliability issue and even delves a bit into solving the security issue.

I like to look at Cloud computing from the perspective of a big business. Imagine you’re a big corporation and have publically accessible files, and some confidential company secrets you don’t want anybody to see. For the public stuff cloud computing is great, I don’t care if Google gets hacked and people see it. But files lost due to a server crash or datacenter outage at the cloud would be bad. With company secrets, I just don’t trust putting files on another company’s network with the risk maybe half of the cloud’s servers get wiped and it just happens to be the servers with our company data on it.

There needs to be a safe way to ensure data is protected in case of an outage or server damage. And if a disgruntled employee walks out with a server I don’t want my data, or worse my customer’s data, on it.

Tahoe takes the data, encrypts it, and breaks it up distributing it into 10 separate nodes. The recreation of the original data only requires 3 of those nodes to work. The others can be lost, corrupted, or currently offline. Just so long as 3 of the 10 are safe so is the data. When applied to the servers on a cloud you can have those 10 nodes spread across multiple sites to ensure that an outage at one site won’t kill half your data. Plus if one node gets hacked, the data on it isn’t worth anything because it needs at least 2 other nodes for re-combination.

The real beauty comes in when you take the Tahoe software and use it to make your own distributed cloud onsite or among peers across the web.

Tahoe is being used in a number of different ways. A common configuration that is documented at the project’s wiki is described as a “friendnet”, a group of roughly ten nodes that are connected over the Internet and provide shared secure storage capacity with optional filesharing. Another potential usage scenario is installing Tahoe on individual workstations on an office network and using their excess disk capacity as a storage pool. The Tahoe wiki describes that kind of setup as a “hivecache”.Ars

Awesome!

Back to our imaginary corporation, let’s say we’re American Express; we have 10,000 workstations distributed through 8 work sites across the continental US (numbers are made up). Each workstation has a 320GB hard drive that is currently using 100GB (OS and business apps only, employees can’t add their own software or use more than their allotted 1GB of personal storage).

That’s 1.2 Petabytes of unused potential storage on your internal protected network purely from user workstations.

Now we install Tahoe and utilize that excess storage. First off we don’t get the whole 1.2PB, there is obviously a lot of overhead to provide the 7:3 redundant to required ratio. Let’s assume a conservative 1/4 of it is available giving us 300 Terabytes of storage.

The data can be Top Secret or Customer Confidential because it’s encrypted and each workstation only gets a portion of the info. So if you have a malicious employee (remember that most attacks come from within) they can’t recreate and access the database through the data stored on their own computer.

Also many employees may turn off their computer when they go home. So long as 3/10ths of your employees leave their computer on you’re still good to go. And even if the building in Phoenix AZ has a backhoe cut them off from the rest of the world everybody can continue on without interruption.

This is super exciting technology for me; I’ve always kind of dreamed in my head what can be possible combining P2P file distribution tech with encryption like this. I’d really like to see what a full scale deployment of this could do and how much of an improvement it would make on ROI of an organizations equipment and data.

How come I have to a peon at the bottom of the ladder rather than one of the big CIOs that can make something like this happen? ;)

View Comments :, , , more...

Looking for something?

Use the form below to search the site:

Still not finding what you're looking for? Drop a comment on a post or contact us so we can take care of it!