We developped a tool to manage data history, branches, and to work offline with your PostGIS-stored data and QGIS. Read more to get the insight of QGIS Versioning plugin.
The QGIS plugin is available in QGIS plugin repository, and you can fork it on GitHub too !
Even if the necessity of data versioning often arises, no standard solution exist for databases.
Unfortunately those solutions were not adapted to the specific constrains of this project, namely: using a PostGIS database as the main repository (excludes GeoGit) and the ability to working off-line (excludes pgversion).
The project we developed QGIS/PostGIS versioning looks like the following.
The database is stored in a PostGIS schema, the complete schema is versioned (i.e. not individual tables). Revisions are identified by a revision number. A revision table in the versioned schema, called 'revisions', keeps track of the date, author, commit message and branch of all revisions.
Once a table structure is defined, three operations can be performed on rows: INSERT, DELETE and UPDATE. To be able to track history, every row is kept in the tables. Deleted rows are marked as such and updated rows are a combined insertion-deletion where the deleted and added rows are linked to one another as parent and child.
A total of five columns are needed for versioning the first branch:
- PRIMARY KEY
- a unique identifier across the table
- revision when this record was inserted
- last revision for which this record exist (i.e. revision when it was deleted minus one)
- in case the row has been inserted as the result of an update, this fields stores the hid of the row that has been updated
- in case the row has been marked as deleted as the result of an update, this field stores the hid of the row that has been inserted in its place.
For each additional branch, four additional columns are needed (the ones with the prefix branch_).
- If the branch_rev_begin is null, it means that a row belongs to another branch.
SQL views are used to see the database for a given revision number. If we note 'rev' the revision we want to see. For each table, the condition for a row to be present is the view is:
(branch_rev_end IS NULL OR branch_rev_end >= rev) AND branch_rev_begin <= rev
In the special case of the current revision, or head revision, the condition reads:
branch_rev_end IS NULL AND branch_rev_begin IS NOT NULL
- Since elements are not deleted (but merely marked as such) from an historized table, care must be taken with the definition of constrains, in particular the conceptual unicity of a field values.
Withing the PostGIS database, the views on revisions must be read-only and historized tables should not be edited directly. This is a basic principle for version control: editions must be made to working copies an then committed to the database. Please note that by default PostGIS 9.3 creates updatable views.
This setup allows for multiple users to use and edit data offline from a central repository, and commit their modifications concurrently.
Two kinds of working copies are available:
- SpatiaLite working copies
- They are meant to be used off-line. They consist of the versioned tables of a given versioned database (i.e. PostGIS schema) or any subset. For each table, only the elements that have not been marked as deleted in the head revision need to be present. Furthermore only a subset of the elements the user needs to edit can be selected (e.g. a spatial extend). To create a working copy (i.e. to checkout), tables from the versioned schema (or the aforementioned subsets) are converted to a SpatiaLite database using ogr2ogr.
- PostGIS working copies
- They are meant to be used when the connection to the original database will remain available. They are quite similar to pgversion working copies since they only store differences from a given revision (the one checked out).
The following description is aimed at understanding the inner workings of the qgis versioning plugin. The user does not need to perform the described operations manually.
For each versioned table in the working copy, a view is created with the suffix _view (e.g. mytable_view). Those views typically filters out the historization columns and shows the head revision. A set of triggers is defined to allow updating on those views (DELETE, UPDATE and INSERT).
The DELETE trigger simply marks the end revision of a given record.
The INSERT trigger create a new record and fills the branch_rev_begin field.
The UPDATE trigger create a new record and fills the branch_rev_begin and branch_parent fields. It then marks the parent record as deleted, and fills the branch_rev_end and branch_child fields.
Updating the working copy
Changes can be made to the database while editing the working copy. In order to reconcile those edition, the user needs to update the working copy.
When updating, a set of records can be in conflicts: the records for which the end revision has been set since the initial checkout or last update if any.
Multiple editions can be made to the same record. Therefore the child relation must be followed to the last child in order to present tu user with the latest state of a given conflicting feature.
Conflicts are stored in a table and identified with a conflict id and the tag 'theirs' or 'mine'. A DELETE trigger on this table is used for conflict resolution. On deletion of 'mine', the working copy edition is discarded, on deletion of 'theirs' the working copy edition is appended to the feature history (i.e. the working copy feature becomes a child of the last state of the feature in the historized database).
Committing the editions to the versionned database
If a working copy is up to date, the editions can be integrated in the versioned database. This operation consists simply in the insertion of a record in the revisions table, and, for each versioned table, the update of rows that are different and inserting rows that are not present.
A branch can be created from any revision by adding the four history columns and setting the branch_rev_begin field of features that are present in their revision.
Plugin interface tutorial
Groups are used for all versioning operations in QGIS since the versions are for a complete PostGIS schema or SpatiaLite database. The versioning toolbar will change depending on the group selected in the QGIS legend.
- The group elements must share the same connection information (i.e. share the same database and schema for PostGIS working copies and revision views or share same SpatiaLite database for SpatiaLite working copies).
Versioning a PostGIS schema
Starting with an unversioned database, import a number of layers from a schema that needs to be versioned into a QGIS project. Once the layers are imported, they must be grouped together.
Selecting the newly created group will cause the versioning toolbar to display the historize button (green V). On click a confirmation is requested to version the database schema.
The versioned layers are imported in a new group and the original layers are removed from the project.
- The symobology is not kept in the process.
Working with a versioned PostGIS schema
Versioned layers can be imported in QGIS. The layers must be from a head revision or a view on any revision.
Once the layers are in QGIS, they must be grouped.
For PostGIS groups at head revision, the versioning plugin allows the user to create a SpatiaLite or a PostGIS working copy, create a view on a given revision or create a branch. A corresponding group will be imported in QGIS.
If the user chooses to create a SpatiaLite working copy, he will be asked to select a file to store the working copy.
Once the working copy is imported in QGIS, the user can start edition of its layers. With SpatiaLite working copies, this edition can be done off-line.
When the user is done with edition, he can commit the changes to the database and if commit is feasible (i.e. the working copy is up to date with the versioned database), he will be prompted for a commit message and subsequently be informed of the revision number he committed.
If the commit is not feasible, the user will be informed that he must update his working copy prior to commit.
Conflicts are detected during update, the user is informed, and conflicts layers are imported into QGIS.
To resolve conflicts, the user can open the conflict layer's attribute table. Selected entries are also selected on the map canvas and the user can decide which version, either his or the database's, he wants to keep. User version is tagged with 'mine' and database version with 'theirs'. The conflict is resolved by deleting the unwanted entry in the conflict layer.
- On deletion of one conflict entry, both entries are removed (by a trigger) but the attribute table (and canvas) are not refreshed. As a workaround, the user can close and re-open the attribute table to see the actual state of the conflict table.
Once the conflict table is empty, the commit can be done.
Due to design choices and tools used for conversion to SpatiaLite, a number of restrictions apply to the versioned database:
- schemas, tables and branch names should not have space, caps or quotes
- tables must have primary keys
- columns are lowercase (because of conversion to SpatiaLite) but can have spaces (not that it's recommended)
- geometry column is geom in PostGIS, GEOMETRY in SpatiaLite
- Do not edit OGC_FID or ROWID
- The constrains on the tables are be lost in the PostGIS to SpatiaLite conversion.
The conflict layer won't be loaded automatically is it has no geometry. The user will have to load it manually.