Flowfunctions Reference
The flowfunctions
structure is designed to be as simple, clear and robust as possible, mirroring BigQuery functions but permitting programmatic definition of all arguments. It abstracts away any need for string templating or complex variable structure definition for data exchange.
The code structure separates configuration from execution, which is often not possible with the native BigQuery functions. An initialisation section (also known as the header) is also required, which is where script variables are declared and system variables are set.
Naming Conventions
Naming conventions are applied consistently across all functions to minimise abiguity. The _name
suffix is used to refer to the lowest level name for a resource, and the _id
suffix is used when referring to the full address which identifies a resource, e.g.
project_id
= the top level identifier for a projectdataset_name
= the name for a datasetdataset_id
=project_id.dataset_name
table_name
= the name for a tabletable_id
=project_id.dataset_name.table_name
=dataset_id.table_name
function_name
= the name for a functionfunction_id
=dataset_id.table_name
table_function_name
= the name for a table functiontable_function_id
=dataset_id.table_function_name
Datasets
Datasets are named either according to the action taken by the function (e.g. EXPORT
, LOAD
, INSERT
, DELETE
) or by the functional area (e.g. JSON
, LINEAGE
).
Functions
Functions within datasets are named so that the function_id
mirrors the BigQuery statement (e.g. EXPORT.DATA
= EXPORT DATA
, LOAD.DATA
= LOAD DATA
), except in datasets relating to a functional area where the function_name
describes the action (e.g. JSON.MERGE_VARIABLES
).
Arguments
Arguments consistently use the following prefix naming conventions:
source_
for data sourcesdestination_
for data destinationstarget_
for functions operating on a resource, but for whichdestination_
is not applicable as no data is added to the target table (e.g.DELETE.DATA
).
Arguments
Function arguments are implemented in different ways, depending on whether they are required or optional.
Required Arguments
Required arguments are named, separate typed variables. In the following function the query_statement
, export_format
and export_uri
are all required in STRING
format:
CALL EXPORT.DATA(
query_statement STRING,
export_format STRING,
export_uri STRING,
optional_arguments JSON
);
Optional Arguments
Optional arguments are implemented as JSON variables, from which the argument values are parsed. This has a number of benefits:
- It allows a single function to be used for every possible logical variant
- It clearly communicates which arguments are required
- It allows forward-portability in case additional variables are added in the future
- It enables pre-validation as improperly formatted JSON will error prior to function execution
- It allows us to pass arbitrary additional variables to the function to vary function behaviour (e.g. test mode described below)
If no optional arguments are needed, then an empty JSON string should be passed, i.e. TO_JSON("")
.
Test Mode
Running any query in test mode will return but not execute the query, which is useful for development and testing. This is accessed by setting the boolean argument test_mode
to the value true
(i.e. "test_mode":true
in the root of the optional_arguments
JSON)and including it in the optional_arguments JSON. This will SELECT
the compiled SQL for inspection. In all other cases (i.e. test_mode
does not exist or is not true
) then the query will execute using the EXECUTE IMMEDIATE
statement.
JSON Merging
Since the optional arguments are passed as JSON it is useful to be able to combine JSON variables in a simple and readable manner. The JSON function JSON.MERGE_VARIABLES()
function performs this action, and returns a JSON variable which is the combination of the input variables, merged at the root. The syntax to call this function on one example input JSON variable static_arguments
and two different dynamic_arguments dynamic_argument1
and dynamic_argument2
, which are dynamically computed and converted to JSON as part of the script, would therefore be:
SET optional_arguments = JSON.MERGE_VARIABLES('static_arguments', ['dynamic_argument1', 'dynamic_argument2'])
Note that this can take an arbitrary array of JSON variables, which will be merged in order of declaration.
Query Execution Location
Also note that you may have to manually set the query location in the BigQuery console (More -> Query Settings -> Advanced Options -> UNCHECK: Automatic Location Selection -> SELECT: Location type -> SELECT: Region or Multi-region
). This needs to match the location of your data.