Target database versioning

What it is and what it is not

What it is

Target database (PostgreSQL) needs some (small) adjustments according to local customs and business rules.

In general we can call them "reusable stuff", eg creating new schema for serveral loading tasks (all target tables are inside some schema, don't use "public" schema!)

  • creating of schemas with access rules
  • creating in-house generally good functions taht can be used in several loading tasks (in SQL)

Best practices until now are to create schemas for each source database(schema) and schemas for each analytical domain (minimum is one in case you have "all-and-every analytical purpose"). Analytical schemas are meant to control access and thus separate data tables into schemas there every single schema is self-sufficient. Keep in mind that same source data can be loaded into different schemas. Duplications is not nice practice in terms of relational schema and normalizing in mind. Do it if analytical schema can be thinked about as separate database/dataset.

Source schemas (schemas for source data) is adviseable to prefix similar way. We have user "ods_" prefix to mark them as "oprational data storage (in data warehouse)". In our minds ODS-schemas will contain data near to 1:1 to actual source database.

NB! If you source database is Postgres database where different schemas are used (eg "public", "finances", "sales", "web") then create schemas in target database for every needed schema because devs of original system may have used same table names in different schemas (you can rename them to put into one target schema but it may arise more confusion). As well it makes data flow more easy to understand.

What it is NOT

Don't use target versioning for creating the target tables. Each table must be controlled by its own laoding process.

Don't use it for one-time manual commands that are different in different environments (adding dev account for dev database in manual action done by DBA, same action don't need to propagate to production database).

Don't do any changes into "meta" schema. This is loading system onw schema, don't mess with it.

How

Project folder "src/ver" must include SQL files (with command like "create schema", "create fuction", "comment", "grant" etc).

These file may have thatever names, but for reasonable handling some good system needs to worked out. If you don't have own system nor you don't take time to work it out, use our. Our naming system is:

  • file name starts with letter (we use "v" as versioning, and this letter is after letter "c")
  • then 3-digits (0-padded) for order and underscore ("_") so text based ordering of files is actual order in human mind
  • then short keyword description
  • and extension (".sql")

v001_codes.sql may point out that here we create schema for codificators

├── ver
│       ├── changes.yaml
│       ├── v001_codes.sql
│       └── v002_asjur5.sql

Keep in mind that file names for ordering are purely for human. Versioning system don't relay of naming convension. For versioning system you must give changes.yaml where actual order is given.

alias: dev1_tux24 # use (code)name of your project
files: 
 - file: v001_codes.sql
   remark: Create schema for hard-coded codes-codificators
 - file: v002_asjur5.sql
   remark: Create schema for loading data from system asjur5

"alias" allows maintain multiple versioning in one metatable. Usually there are TWO versioning needs: loading software core aka very general versioning (meta-tables for maintaining processes) and your custom versioning (creating your schemas).

Under the sheets (in stomach, behind the curtains)

Core versioning creates schema "meta" and table "version" in it.