The merge process failed because it detected a mismatch between the replication metadata of the two replicas, such that some changes could be lost leading to non-convergence. This could be due to the subscriber not having synchronized within the retention period, or because of one of the replicas being restored to a backup older than retention period, or because of the publisher performing more aggressive cleanup on articles of type download-only and articles with partition_options = 3
The effect was that all data seemed to come down, but the result from the synchronisation attempt was failure. A bit of googling turned up an incredibly similar issue where the partion_options=3. There is a hotfix available on request for this exact issue. Helpfully microsoft say "change this", but leave you with the usual link-to-link hell of trying to find a posting in msdn that actually tells you how to make the change.
Now in our case we were actually using partition_options=0, but I found that following the process below fixed it for me:
- Set partion_options=1
To do this you will need to use partition groups. Execute the following SQL against your publication database:
sp_changemergepublication @publication='MyPublication', @force_invalidate_snapshot=1, @property='use_partition_groups', @value='true' - Then, either run stored procedure statements, or as I did, through Management Studio -> Replication -> Local Publications -> "My Publication" -> Properties -> Articles -> Article Properties -> Set Properties of all article tables
Change the "Partition Options" from "Overlapping" to "Overlapping, disallow out-of-patition data changes" - Reinitialise the snapshot
- Change the partition options back to overlapping
- Run sp_changemergepublication @publication='MyPublication', @force_invalidate_snapshot=1, @property='use_partition_groups', @value='false'
- Reinitialise the snapshot.
Synchronisation seems to have worked fine from this point. As this was our UAT environment there was little actual harm done, but you'd probably want to take precautions in a live environment - as unmerged data on local devices would be lost in the above process.