**local**data set (data on my offline mobile) and a

**remote**data set (data from the remote server, changed by other users of my mobile app), that are out of sync after an offline period. I want to be able to get a difference from those two. In this approach, we don't have

**any prior context**. Information like when was the last synchronisation, what happened during offline (logs etc...) is not available.

## Bloom Filter

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. You basically compare 2 hashed codes, because of the nature of hash function you can tell an element is either*definitely not in*the set or

*may be in*the set.

## Invertible Bloom Filter (IBF): Let Magic Happen

For the purpose of synchronisation, we want to know which elements were added and which ones were removed. Replacing a set of hashed values by a more complete data structure we can achieve a clever filter. Instead of storing hashed values, we can use**Bucket**structure adding information like

**number of items**,

**key of the item**, and

**hashed value**of the item. Using the magic XOR operator with its

**reversibility**nature. We can xor a sum of element together.

xor 2 elements together, xor again and you get our initial element!

**Hashing into buckets**

In a bucket, the number of items is increased by one, each time we add an item in the bucket. The key is xored with the other keys. A hash function is used to hash the key into an hashed value, itself xored to the hashed sum.

With this same technique, you can spread hashed items into different buckets increasing odds to compare them. The bucket distribution should be uniform and reproducible on both sides (local client and remote server in our exemple).

Using recursive approach, incrementing bucket numbers each time, we build buckets set locally and remotely. With 2 sets, we compute the difference subtracting number of items. In the difference, if the count of elements is 1, we know one element was added (similarly if the count is -1, one element was removed). From the key element, we can hash it, xor it to the hash sum bucket field and get the hashed sum without it (because of the xor reversibility).

**Compute difference**

The whole purpose of the algorithm is to get a difference with number of item -/+1. You can iterate until you get to that point increasing by 2^level.

**How to treat difference**

Let's say, I have an item "green circle" that was hashed and spread into 3 different buckets (3 being the default number of hashed functions used for distributing items into buckets). With this difference set, I'm looking for item number equal to 1. I know the green circle was added, so I can removed it from the buckets index 1, 4 and 6 and add it to the set of added items. In index 2, the number of items is 2, so we carry on to index 3. I can remove blue circles from index 3, 4 and 5. etc... At the end of iteration, if I get an empty difference set, I've retrieved of differences. If the set is not empty, I don't have enough information so I need to increase numbers of buckets.

## Implementation

Using "What’s the Difference? Efﬁcient Set Reconciliation without Prior Context" paper, 3musket33rs libraires implements the IBF algorithm in Java, JavaScript and iOS.The different players:

**Resolver**is responsible of computing difference, increasing number of bucket (level) as needed. Set of buckets are called

**Summary**. Summary comparaison produces

**Difference**which hold removed and added set.

**BucketSelector**produces a set of hash functions to uniformly distribute items with buckets.

**Bucket**is the data structure to hold number of items, xored key and xored hashed value.

**Want to see it in action?**See my next blog post for an example involving JavaScript backend and iOS client.

Tweet

## No comments:

## Post a Comment

Note: Only a member of this blog may post a comment.