They happily find out that the answer is yes - it is one of the core features that greatly simplifies life of our users. In this post we will shed some light on how things work behind the scenes.
Let’s think of a simple project directory that needs to be synchronised between multiple desktop or mobile users, containing just two files:
- a QGIS project file
my-project.qgzthat sets up map layers, styling, …
- a GeoPackage file
my-data.gpkgcontaining all GIS data
Our sample GIS data will contain a tree survey table, containing location, species and age of various trees:
When users edit data in
my-data.gpkg, the traditional cloud storage solutions (such as Dropbox, Google Drive, Microsoft OneDrive and others) simply copy the modified files there. They do not understand the file content though - so if two people modify the same file, they have no way of knowing how to merge changes together. In the worse case, when two versions of the same file are uploaded, they keep just the version which was synchronised last. Or slightly better, they resort to creation of conflicting copies which need to be manually merged later. As one can imagine, merging and consolidating modifications from multiple GeoPackages back to one copy is a slow, tedious and error-prone job.
the Mergin service has been designed to understand spatial data, especially GeoPackages that are becoming the most popular format to store vector & attribute data. This is thanks to the open source geodiff library that we have developed while working on Mergin.
Synchronising data using “diffs”
The first trick is that synchronisation of GeoPackage files between Mergin server and clients (Input app, QGIS or other apps) only transfers actual changes in tables (“diffs” in technical jargon).
Our Mergin project with the tree survey has been prepared and downloaded by users. Jack did a field survey and he added or updated some rows in the survey table (changes highlighted in yellow and green):
After pressing sync button, his changes are detected and uploaded to Mergin, encoded as a list of changes to the survey table:
Another user, Jill, also downloaded the tree survey project to her mobile device prior to Jack’s changes. When Jill synchronises the project to get the latest version, the changes as uploaded by Jack are downloaded and applied to her local copy of the project, getting the same data as seen by Jack.
At this point, the advantage of uploading/download only changes in tables may not seem obvious besides saving some network bandwidth… Read on to learn how this is used to support multi-user editing.
Merging changes from multiple users
So far we have expected that Jill does not have any pending changes to sync, so that was easy. Now let’s assume that Jill has also done some changes on her device:
Here comes the more tricky part - how do we merge changes from Jack and Jill back to a single table:
In Mergin, cases that require merging changes from multiple users are handled by the “rebase” operation, a concept we have borrowed from version control systems for source code.
Let’s assume that Jack has synchronised his changes first. Later, when Jill synchronises her changes, a couple of things happen on her device before uploading the changes: Jill’s changes will be temporarily undone, Jack’s changes get applied, and finally Jill’s changes are re-applied after being rebased on top of Jack’s changes.
What does it mean to rebase someone’s changes? There are a couple of possible edit conflicts that could happen between rows of a database table with matching IDs (insert/insert, update/delete, delete/delete, update/update). These edit conflicts need to be resolved to
In our example, both Jack and Jill have added a row with ID = 4. This is not allowed, and therefore Jill’s new row ID will get changed to ID = 5 (any unused ID would do). As a result, here’s how the merged table will look at the end - combining changes of both users:
If both Jack and Jill modified the same row (the update/update edit conflict), we can only accept one edit automatically. The conflicting edit of the other user is written to a special conflict file and uploaded to Mergin, so no data gets lost, and the conflict can be later inspected by the project admin. Fortunately, this kind of conflict does not happen often if the survey work is well planned to avoid users simultaneously modifying the same features within the GeoPackage data.
What if conflict files appear
There are some cases when automatic merging is not supported. In those cases, Mergin is unable to find out details about changes in the data file(s) and has to resort to creation of a conflicting copy which gets uploaded to Mergin project along the original data file(s). In particular the problems may appear when:
- Other format than GeoPackage is used for data storage (e.g. shapefiles)
- Database or table structure is changed (e.g. adding new columns or new tables)
In the future, these limitation may be removed, but at this point it is good to keep them in mind.
If you plan to change structure of the survey tables and the project is already being used on multiple devices, it may be a good idea to create a new Mergin project with the modified database structure and instruct users to switch to the new project. Otherwise conflict files may start popping up as long as some users have older version of the project, adding more manual work to collate data.