Tuesday, February 25, 2014

Mobile & Synchronisation - part 2 / 3

Last blog post I talked about the different approaches to mobile synchronisation. In this article, I will focus on synchronisation without prior context. With two data sets, let's say my local data set (data on my offline mobile) and a remote data set (data from the remote server, changed by other users of my mobile app), that are out of sync after an offline period. I want to be able to get a difference from those two. In this approach, we don't have any prior context. Information like when was the last synchronisation, what happened during offline (logs etc...) is not available.

Bloom Filter

A Bloom filter is a data structure designed to tell you, rapidly and memory-efficiently, whether an element is present in a set. You basically compare 2 hashed codes, because of the nature of hash function you can tell an element is either definitely not in the set or may be in the set.

Invertible Bloom Filter (IBF): Let Magic Happen

For the purpose of synchronisation, we want to know which elements were added and which ones were removed. Replacing a set of hashed values by a more complete data structure we can achieve a clever filter. Instead of storing hashed values, we can use Bucket structure adding information like number of items, key of the item, and hashed value of the item. Using the magic XOR operator with its reversibility nature. We can xor a sum of element together.

xor 2 elements together, xor again and you get our initial element!

Hashing into buckets

In a bucket, the number of items is increased by one, each time we add an item in the bucket. The key is xored with the other keys. A hash function is used to hash the key into an hashed value, itself xored to the hashed sum.

With this same technique, you can spread hashed items into different buckets increasing odds to compare them. The bucket distribution should be uniform and reproducible on both sides (local client and remote server in our exemple).



Using recursive approach, incrementing bucket numbers each time, we build buckets set locally and remotely. With 2 sets, we compute the difference subtracting number of items. In the difference, if the count of elements is 1, we know one element was added (similarly if the count is -1, one element was removed). From the key element, we can hash it, xor it to the hash sum bucket field and get the hashed sum without it (because of the xor reversibility).

Compute difference

The whole purpose of the algorithm is to get a difference with number of item -/+1. You can iterate until you get to that point increasing by 2^level.



How to treat difference

Let's say, I have an item "green circle" that was hashed and spread into 3 different buckets (3 being the default number of hashed functions used for distributing items into buckets). With this difference set, I'm looking for item number equal to 1. I know the green circle was added, so I can removed it from the buckets index 1, 4 and 6 and add it to the set of added items. In index 2, the number of items is 2, so we carry on to index 3. I can remove blue circles from index 3, 4 and 5. etc... At the end of iteration, if I get an empty difference set, I've retrieved of differences. If the set is not empty, I don't have enough information so I need to increase numbers of buckets.



Implementation

Using "What’s the Difference? Efficient Set Reconciliation without Prior Context" paper, 3musket33rs libraires implements the IBF algorithm in Java, JavaScript and iOS.

The different players: Resolver is responsible of computing difference, increasing number of bucket (level) as needed. Set of buckets are called Summary. Summary comparaison produces Difference which hold removed and added set. BucketSelector produces a set of hash functions to uniformly distribute items with buckets. Bucket is the data structure to hold number of items, xored key and xored hashed value.

Want to see it in action? See my next blog post for an example involving JavaScript backend and iOS client.

Monday, February 24, 2014

Mobile & Synchronisation - part 1 / 3

Last meet-up @JSSophia, we talked about synchronisation. It's an awesome subject that I like to chat, in more details, about it. My blog posts are related to some work done with my 3musket33rs friends, but also discussions ongoing in the AeroGear project mailing list.

When developing your cloud connected mobile app, you can't take for granted that user will be connected all the time. Supporting offline mode is crucial. Obviously, you need to consider what happen when back online. Let's talk about synchronisation.

Back online: Mobile Sync

User stores data locally on device for offline support. User pushes its data when back online. Potentially, there are conflicts to be resolved. Conflict resolution is really business focus. Depending on your app, you may choose deal with conflict very basically: reject them, deal with then automatically(providing patch), propose fine grained merge popup etc...

Once user's own conflicts are resolved, data reconciliation (synchronisation with other people data) can happen. But depending on your use case (how big is your changes, how much data), you have several options...

Different options for Synchronisation


Whole sale transfer: the simplest one. Client ask for reconciliation. The whole data structure is sent over the network. "Just send everything", this is great to build a prototype before moving on more complex algorithm or if there is only a short list of items/data.

Time based approach: log based. Client side and Server side, you need to track all changes. An audit log module is needed to record logs. An increasing timestamp or numeric log (be cautious with clock difference always use the same referencial to create check point) is added to each data items when changed. Client asks for reconciliation since its last reconciliation (last checkpoint).

Math Algorithm: without prior context. No prior assumption, you don't log changes. You send grouped/hashed table of your items and you compute the difference between client and server set. One algorithm can be Efficient set Reconciliation without prior context.

To be continued... Next part, I will blog in more details about the mathsync algo and its 3musket33rs implementation.



JSSophia making the show!

Last Thursday, JSSophia, we had fun. Lots of new faces, packed room (audience record, yes man) and last but not least I wasn't the only female dev in the room (Hoorray!!).



Our evening started with a short introduction on the e-learning products by CrossKnowledge (special thanks for hosting our event). Cool demos and even some glance at source code because at JSSophia, it's geeks talking to geeks ;) Interesting to know that CrossKnowledge is recruiting at fast pace of one person a month. Hey networking is also one of the mission of user groups.

And during this introduction lots of idea for new future presentation like HTML5 and accessibility, unit test running on phantom, TypeScript... I've posted those brainstormed ideas in our JSSophia presentation proposal, feel free to add more description to them and even new ideas.

Mathieu made the show presenting "World in conflict!" with a roman costume. Small break, and Maxime gives us in less than 1024 words an entertaining presentation of JS golfing. Last but not least, Yacine (who came from London just for JSSophia ;0 ) carried on the show with a very interactive presentation of Back end as a service. Evening finished around a beer at Green King. Perfect day!

Next not-to-be-missed JSSophia event will be hosted by Amadeus the thursday 27th March. Patrick Brosset, from Mozilla, will tell all about browsers dev tools. See you there!