Skip to content

Flowfunctions Reference

The flowfunctions structure is designed to be as simple, clear and robust as possible, mirroring BigQuery functions but permitting programmatic definition of all arguments. It abstracts away any need for string templating or complex variable structure definition for data exchange.

The code structure separates configuration from execution, which is often not possible with the native BigQuery functions. An initialisation section (also known as the header) is also required, which is where script variables are declared and system variables are set.

Naming Conventions

Naming conventions are applied consistently across all functions to minimise abiguity. The _name suffix is used to refer to the lowest level name for a resource, and the _id suffix is used when referring to the full address which identifies a resource, e.g.

  • project_id = the top level identifier for a project
  • dataset_name = the name for a dataset
  • dataset_id = project_id.dataset_name
  • table_name = the name for a table
  • table_id = project_id.dataset_name.table_name = dataset_id.table_name
  • function_name = the name for a function
  • function_id = dataset_id.table_name
  • table_function_name = the name for a table function
  • table_function_id = dataset_id.table_function_name

Datasets

Datasets are named either according to the action taken by the function (e.g. EXPORT, LOAD, INSERT, DELETE) or by the functional area (e.g. JSON, LINEAGE).

Functions

Functions within datasets are named so that the function_id mirrors the BigQuery statement (e.g. EXPORT.DATA = EXPORT DATA, LOAD.DATA = LOAD DATA), except in datasets relating to a functional area where the function_name describes the action (e.g. JSON.MERGE_VARIABLES).

Arguments

Arguments consistently use the following prefix naming conventions:

  • source_ for data sources
  • destination_ for data destinations
  • target_ for functions operating on a resource, but for which destination_ is not applicable as no data is added to the target table (e.g. DELETE.DATA).

Arguments

Function arguments are implemented in different ways, depending on whether they are required or optional.

Required Arguments

Required arguments are named, separate typed variables. In the following function the query_statement, export_format and export_uri are all required in STRING format:

CALL EXPORT.DATA(
        query_statement STRING, 
        export_format STRING, 
        export_uri STRING,
        optional_arguments JSON
        );

Optional Arguments

Optional arguments are implemented as JSON variables, from which the argument values are parsed. This has a number of benefits:

  • It allows a single function to be used for every possible logical variant
  • It clearly communicates which arguments are required
  • It allows forward-portability in case additional variables are added in the future
  • It enables pre-validation as improperly formatted JSON will error prior to function execution
  • It allows us to pass arbitrary additional variables to the function to vary function behaviour (e.g. test mode described below)

If no optional arguments are needed, then an empty JSON string should be passed, i.e. TO_JSON("").

Test Mode

Running any query in test mode will return but not execute the query, which is useful for development and testing. This is accessed by setting the boolean argument test_mode to the value true (i.e. "test_mode":true in the root of the optional_arguments JSON)and including it in the optional_arguments JSON. This will SELECT the compiled SQL for inspection. In all other cases (i.e. test_mode does not exist or is not true) then the query will execute using the EXECUTE IMMEDIATE statement.

JSON Merging

Since the optional arguments are passed as JSON it is useful to be able to combine JSON variables in a simple and readable manner. The JSON function JSON.MERGE_VARIABLES() function performs this action, and returns a JSON variable which is the combination of the input variables, merged at the root. The syntax to call this function on one example input JSON variable static_arguments and two different dynamic_arguments dynamic_argument1and dynamic_argument2, which are dynamically computed and converted to JSON as part of the script, would therefore be:

SET optional_arguments = JSON.MERGE_VARIABLES('static_arguments', ['dynamic_argument1', 'dynamic_argument2'])

Note that this can take an arbitrary array of JSON variables, which will be merged in order of declaration.

Query Execution Location

Also note that you may have to manually set the query location in the BigQuery console (More -> Query Settings -> Advanced Options -> UNCHECK: Automatic Location Selection -> SELECT: Location type -> SELECT: Region or Multi-region). This needs to match the location of your data.