SQL blocks - Mage

Credentials

Before starting, you need to add credentials so Mage can execute your SQL commands. Follow the steps for the database or data warehouse of your choice:

Add SQL block to pipeline

Create a new pipeline or open an existing pipeline.
Add a data loader, transformer, or data exporter block.
Select SQL.

Configure SQL block

There are 4 - 5 fields that must be configured for each SQL block:

Field	Required	Description
Data provider	Yes	The database or data warehouse you want to execute your SQL commands in.
Profile	Yes	When you add your credentials to the `io_config.yaml` file, you added them under a key. That key is called the profile. Choose which set of credentials you want this SQL block to use.
Use raw SQL	No	You can write raw SQL and it’ll be executed as written in your data provider.
Database	Depends on data provider	Some data warehouses require that we explicitly state the name of the database we want to write to. If this is present, it’s required. The name of the table that is created follows this convention: `[database].[schema].[pipeline UUID]_[block UUID]`.
Schema to save	Yes	Every SQL block will save data to your data provider. The name of the table that is created follows this convention: `[schema].[pipeline UUID]_[block UUID]`.
Table	No	SQL blocks will automatically name your table for you using a naming convention (see the section `Automatic naming of tables` for more information). You can override this automatic naming convention by filling in a value in this field.
Write policy	Yes	How do you want to handle existing data with the same database, schema, and table name? See below for more information.

Write policies

Policy	Description
Append	Add rows to the existing table.
Replace	Delete the existing data.
Fail	Raise an error during execution.

YAML configuration

You can also modify block configuration in pipeline’s metadata.yaml file. Each block has a configuration field. Example configuration

  configuration:
    data_provider: sqlserver
    data_provider_profile: default
    data_provider_schema: ''
    export_write_policy: append
    limit: 1000
    limit_in_pipeline_run: 1
    use_raw_sql: false

In addition to the fields mentioned in the table above. Here are some extra fields that can be included in the configuration:

Field	Required	Description
limit	No	The maximum number of rows to return in notebook.
limit_in_pipeline_run	No	The maximum number of rows to return when running the block in the pipeline run.

Automatically created tables

Each SQL block will create a table in the data provider of your choice. When you run a block, it’ll execute your SQL command, then store the results in a table created in your database or data warehouse.

Using raw SQL

If you toggle this setting, you’re responsible for writing the CREATE TABLE command and the INSERT command. For example, if a table already exists then you can write the INSERT statement:

INSERT INTO mage.users
SELECT 1 AS id, 'Urza' AS username;

If the table doesn’t exist yet, you can write both the CREATE TABLE statement and the INSERT statement:

CREATE TABLE IF NOT EXISTS mage.users (
    id BIGINT
    , username VARCHAR(255)
);

WITH users AS (
    SELECT
        1 AS id
        , 'Urza' AS username
)

INSERT INTO mage.users
SELECT
    *
FROM users;

This SQL query will create a table named mage.users with 2 columns: id as a BIGINT and username as a VARCHAR(255). Then, it’ll insert a single row into that table.

Required SQL statements

When writing raw SQL, you must at least 1 of the following statements:

SELECT
```
SELECT 1;
```
INSERT
```
INSERT INTO some_table
SELECT 1;
```
CREATE TABLE
```
CREATE TABLE some_table (id BIGINT);
```
DROP TABLE
```
DROP TABLE some_table;
```

UPDATE

UPDATE some_table
SET col1 = 'value1'
WHERE col2 = 'value2'

Multiple SQL statements

You can execute multiple SQL statements in a SQL block. Separate your SQL statements using a semi-colon (;).

Automatic naming of tables

If you don’t choose the setting for using raw SQL, the name of this automatically created table follows these conventions:

If Database field is configured: [database].[schema].[pipeline UUID]_[block UUID]
If no Database field is configured: [schema].[pipeline UUID]_[block UUID]

Where pipeline UUID is the name of the current pipeline you’re editing. Where block UUID is the name of the SQL block you are running.

Upstream blocks

If your SQL block depends on upstream blocks that aren’t SQL blocks (e.g. Python code blocks), then those blocks will also automatically create tables. The name of those tables follows the same naming convention mentioned above.

Variables

All SQL blocks have the following variables they can access in their query:

`{{ execution_date }}`

The date and time the block is ran. Example

SELECT '{{ execution_date }}' AS today

Result

`today`
`2022-09-24 23:01:08.376057`

If a SQL block has 1 or more upstream blocks, then they have access to their parent blocks’ output using the following variable:

`{{ df_1 }}`

Depending on how many upstream blocks there are, the variable name changes. For example, if there are 3 upstream blocks then there are 3 variables that can be accessed:

{{ df_1 }}
{{ df_2 }}
{{ df_3 }}

The SQL block UI will display which variable maps to which upstream block. By convention, the 1st added upstream block will be {{ df_1 }}, and every upstream block added after that will have an incrementing number in the variable name after the prefix df_. Example

SELECT a.id , b.username FROM {{ df_1 }} AS a LEFT JOIN {{ df_2 }} AS b ON
a.id = b.user_id LIMIT 1

Result

`id`	`username`
`1`	`Sorcerer supreme`

Documentation Index

​Credentials

​Add SQL block to pipeline

​Configure SQL block

​Write policies

​YAML configuration

​Automatically created tables

​Using raw SQL

​Required SQL statements

​Multiple SQL statements

​Automatic naming of tables

​Upstream blocks

​Variables

​{{ execution_date }}

​{{ df_1 }}