• No results found

Map from hashes to block locators, list sorted by hash Unique blocks are located in files and remain mutable

Shared

Map from hashes to block locators, list sorted by hash

Unique blocks are located in files and remain mutable

Decentralized Deduplication in SAN Cluster File Systems

For unique blocks, we simply store the i-number of some existing file and the offset of the block within the file.

The Index

0e7a26..

15ba2b..

6cd412..

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Unique

Shared

Map from hashes to block locators, list sorted by hash

Unique blocks are located in files and remain mutable

Decentralized Deduplication in SAN Cluster File Systems

This is, of course, backed by some actual block stored somewhere on the disk, but its entirely up to the file system to resolve this block offset into a block address on the disk, just like it would for any regular file system access.

The Index

0e7a26..

15ba2b..

6cd412..

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Unique

Shared

Map from hashes to block locators, list sorted by hash

Unique blocks are located in files and remain mutable

Decentralized Deduplication in SAN Cluster File Systems

Shared blocks are somewhat more complicated. We can’t just use the block address of the block on disk because that actually causes all sorts of really hairy problems. And we don’t really want to point to all of the files that contain the block because those could change at any time without us knowing, which means that we’d have to go hunting for the shared block whenever we needed it.

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Map from hashes to block locators, list sorted by hash

Unique blocks are located in files and remain mutable

A virtual arena stores COW references to all shared blocks

Decentralized Deduplication in SAN Cluster File Systems

So, instead, we introduce a new file. We call it a virtual arena. This virtual arena’s just a regular file, just like any other file in the file system, except that...

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Map from hashes to block locators, list sorted by hash

Unique blocks are located in files and remain mutable

A virtual arena stores COW references to all shared blocks

Decentralized Deduplication in SAN Cluster File Systems

it consists exclusively of copy-on-write references to blocks that already exist in other files. Thus, in some sense, it contains all of the shared data in the file system, without actually consuming any space other than just the file system metadata that it needs.

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Map from hashes to block locators, list sorted by hash

Unique blocks are located in files and remain mutable

A virtual arena stores COW references to all shared blocks

Decentralized Deduplication in SAN Cluster File Systems

Now, whenever we need to refer to a shared block, it’s actually pretty simple. We just refer to it by its offset in the virtual arena. Again, the file system is responsible for resolving that offset into an actual block address. We never use the block addresses themselves. So, now we’ve got a way to gather the hashes of the blocks that we’ve modified. We’ve got a way to locate identical blocks.

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

Now we need a way to actually combine this all together and perform deduplication.

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

Here we have some host that’s running some virtual machine and that means...

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

that it has an exclusive lock on that virtual machine’s files. At some point it’s going to decide that it’s made enough modifications to the file system and it’s time to go hence forth and deduplicate.

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

At that point it picks up the write logs for the files that it has open—for this particular virtual machine—which contain the hashes of all recently modified blocks.

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

It sorts all those entries by hash. Now it’s going to simply walk down the index structure, walk down this sorted list of updates, and merge the two structures together.

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

So we start walking down and we see that our first hash occurs right at the beginning. And, actually, it’s not in the index yet, which means that it’s a new, unique block. There’s only a single reference to it.

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

and add a pointer to the block in our file. That’s all that we have to do. Just that single record append to the index. We don’t have to modify any metadata or anything. Furthermore, this block remains mutable; we can update this block in place. However, that also means that our unique index entries are just hints. This block can be modified at any time. The host doesn’t have to update the index to reflect the fact that it has a new hash, which means that every time that we use one of these entries, we have to verify that it’s actually correct. This is how our index is resilient to stale information.

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

c277d6..

Decentralized Deduplication in SAN Cluster File Systems

Now we can continue moving down the index. We see that our next hash fits in here and it’s already in the index. In fact, it’s in the index as a unique block, which means that this is our first duplicate of this block.

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

c277d6..

Decentralized Deduplication in SAN Cluster File Systems

Because it’s a unique block in the index, that means that it just points to some existing file. The problem is that that file might be locked by some other host, which means that, again, we can’t go mucking around with its metadata.

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

c277d6..

Decentralized Deduplication in SAN Cluster File Systems

However, we know that these two blocks are duplicates, which means that we have to do something about it. Now, wedoknow that we’ve got an exclusive lock on the file we’re currently deduplicating, which means that we’re free to muck around with its metadata.

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

c277d6..

Decentralized Deduplication in SAN Cluster File Systems

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

?

c277d6..

Decentralized Deduplication in SAN Cluster File Systems

Then we can update the index to point to the virtual arena. Now we can post a merge request, telling the other host, “When you get around to it, I think I’ve got a duplicate block for you. Check it out and deal with it yourself.” At some point in the near future, presumably, that second host is going to check for any merge requests for file that it has exclusive locks on...

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

c277d6..

Decentralized Deduplication in SAN Cluster File Systems

pick them up, verify that the block is, in fact, still a duplicate; remember that it’s unique, so it could change. If it is still a duplicate, it will rewrite the pointer to point to the shared block. We’re still scanning this index, so let’s go back to that.

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

c277d6..

Decentralized Deduplication in SAN Cluster File Systems

We keep moving down the index and we see that our last hash fits in here. Again, it’s in the index already, but it’s a shared block. This makes things a lot easier.

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

We’ve already got it in the virtual arena. We already know that it’s copy-on-write.

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

ab7373..

bc6887..

c277d6..

d5e341..

f2a4d2..

f4bea9..

Decentralized Deduplication in SAN Cluster File Systems

And we’re done. And look! I’ve just saved a bunch of space on my storage array.

Decentralized Deduplication in SAN Cluster File Systems

Related documents