Routes and Sources

Task id tells us about the source by route label (aka code).

id: from_arno.ods_arno.school

The task above will use route labelled as "from_arno" (FYI: Arno is operational system using SQL database).

NB! Two routes may actually point to same source. But routes can be disabled separatelly and routes may have different access rights to same source.

Route must have:

Element Purpose
label (code) which remains unchanged so it can be used as folder in path and part of task_id
type 'sql', 'file', 'api'
alias in type file points to actual connection configuration needed for that source

Files (sql.yaml, file.yaml, api.yaml) have usually different values for same sources (due the different environments).

This allows separate delarative wishes and actual parameters.

Route will be added to database when first time seeing by registrar.

For that route directory "inner", "from_arno" must have route general definition in form of file "route.yaml"

# route.yaml for "inner"
name: Transformations inside target database
type: sql
alias: dapu # This must be in global sql.yaml file (describing all metadata that are needed for connection)

# route.yaml for "from_asjur5"
name: Old system Asjur ver 5 using Sybase ASA ver 9
type: sql
alias: asjur5 # This must be in global sql.yaml file (describing all metadata that are needed for connection)

Connections by type

Type-based source definition files must reside on project (=target) directory

In case of project files are kept in git repo, don't use direct values but placeholders which will be replaced during build or deploy (eg. for GitLab CI/CD use gitlab syntax and GitLab solution for Variables by environments (Settings > CI/CD > Variables -- must be maintainer or owner of project) )

Or use %-surrounded environment variable names and assure that they exist and have right values

sql.yaml

- name: dapu
  driver: pg
  host: dwhhost
  port: 5432
  engine: ~
  database: dapu_dev1
  username: dapu_dev1_admin
  password: "****"
  extra:
    connect_timeout: 10 # sek
    keepalives_idle: 120
    application_name: dapu
- name: asjur5
  driver: asa
  host: somehost
  port: 2638
  engine: someserver
  database: asjur
  username: "****"
  password: "****"
  extra:
    encoding: utf-8

file.yaml

Reserved for file-based (csv, excel) imports.

Draft:

- name: my_excel
  path: /some/path/to/files

Usage sample: one department employees produce some Excel files at end of month and save them into common network share (eg. "h:\department1\monthly excel files"). This share is mapped (mounted) in puller DataPuller running machine as /some/path/to/files

This way datapuller don't need to know some passwords nor other "difficult" stuff. It just assumes that directory exists and looks for files.

Remark: how files will be treated by Puller is different story (every loading definition may have own rules: delete file after success, make hash and compare next time if there are changes).

api.yaml

Reserved for HTTP API-s.