Routes and Sources
Task id tells us about the source by route label (aka code).
id: from_arno.ods_arno.school
The task above will use route labelled as "from_arno" (FYI: Arno is operational system using SQL database).
NB! Two routes may actually point to same source. But routes can be disabled separatelly and routes may have different access rights to same source.
Route must have:
| Element | Purpose |
|---|---|
| label (code) | which remains unchanged so it can be used as folder in path and part of task_id |
| type | 'sql', 'file', 'api' |
| alias | in type file points to actual connection configuration needed for that source |
Files (sql.yaml, file.yaml, api.yaml) have usually different values for same sources (due the different environments).
This allows separate delarative wishes and actual parameters.
Route will be added to database when first time seeing by registrar.
For that route directory "inner", "from_arno" must have route general definition in form of file "route.yaml"
# route.yaml for "inner"
name: Transformations inside target database
type: sql
alias: dapu # This must be in global sql.yaml file (describing all metadata that are needed for connection)
# route.yaml for "from_asjur5"
name: Old system Asjur ver 5 using Sybase ASA ver 9
type: sql
alias: asjur5 # This must be in global sql.yaml file (describing all metadata that are needed for connection)
Connections by type
Type-based source definition files must reside on project (=target) directory
In case of project files are kept in git repo, don't use direct values but placeholders which will be replaced during build or deploy (eg. for GitLab CI/CD use gitlab syntax and GitLab solution for Variables by environments (Settings > CI/CD > Variables -- must be maintainer or owner of project) )
Or use %-surrounded environment variable names and assure that they exist and have right values
sql.yaml
- name: dapu
driver: pg
host: dwhhost
port: 5432
engine: ~
database: dapu_dev1
username: dapu_dev1_admin
password: "****"
extra:
connect_timeout: 10 # sek
keepalives_idle: 120
application_name: dapu
- name: asjur5
driver: asa
host: somehost
port: 2638
engine: someserver
database: asjur
username: "****"
password: "****"
extra:
encoding: utf-8
file.yaml
Reserved for file-based (csv, excel) imports.
Draft:
- name: my_excel
path: /some/path/to/files
Usage sample: one department employees produce some Excel files at end of month and save them into common network share (eg. "h:\department1\monthly excel files"). This share is mapped (mounted) in puller DataPuller running machine as /some/path/to/files
This way datapuller don't need to know some passwords nor other "difficult" stuff. It just assumes that directory exists and looks for files.
Remark: how files will be treated by Puller is different story (every loading definition may have own rules: delete file after success, make hash and compare next time if there are changes).
api.yaml
Reserved for HTTP API-s.