Skip to content

EXPORT.DATA

Overview

Characteristic Value
Name EXPORT.DATA
Action Export data as files to Google Cloud Storage (GCS), Amazon S3 or Azure Blob Storage using the EXPORT DATA STATEMENT.
Type ROUTINE
Arguments query_statement STRING, export_format STRING, export_uri STRING, optional_arguments JSON
Returns N/A

Arguments

Required

Name Data Type Description
query_statement STRING SQL query statement describing the data which will be exported
export_format STRING The format of the exported data. Supported values include: "AVRO", "CSV", "JSON", "PARQUET".
export_uri STRING The destination URI for the export. The uri option must be a single-wildcard URI as described in Exporting data into one or more files. Example: "gs://bucket/path/file_*.csv"
optional_arguments JSON See below

Optional

The following optional arguments can also be passed to the function in the optional_arguments JSON variable

Name Data Type Value Options
test_mode BOOL If true, the compiled EXPORT DATA statement is displayed but not executed (useful for testing purposes). If not present or false, the compiled statement will be executed.
connection_name STRING Specifies a connection that has credentials for accessing the external data. Specify the connection name in the form "PROJECT_ID.LOCATION.CONNECTION_ID". If the project ID or location contains a dash, enclose the connection name in backticks (```).
compression STRING Specifies a compression format. If not specified, the exported files are uncompressed. Supported values include: "GZIP", "DEFLATE", "SNAPPY".
field_delimiter STRING The delimiter used to separate fields. Default: "," (comma). Applies to: CSV.
header BOOL If true, generates column headers for the first row of each data file. Default: false. Applies to: CSV.
overwrite BOOL If true, overwrite any existing files with the same URI. Otherwise, if the destination storage bucket is not empty, the statement returns an error. Default: false.
use_avro_logical_types BOOL Whether to use appropriate AVRO logical types when exporting TIMESTAMP, DATETIME, TIME and DATE types. Applies to: AVRO. More details here.

Syntax

Declaration

All variables must be declared at the beginning of a BigQuery script. Additionally, it is recommended to set the @@dataset_project_id to the flowfunctions project in the appropriate geographic region. If this is not set in the declaration section, each function call needs to be prefixed with the regional project_id (e.g. `flowfunctions--eu`.EXPORT.DATA)

--DECLARATION:
DECLARE query_statement, export_format, export_uri STRING;
DECLARE optional_arguments JSON;

SET @@dataset_project_id = 'flowfunctions--eu';

Configuration

Variables should then be set in the configuration section. This makes the script more readable and easier to maintain or replicate and reuse. An example configuration is:

--CONFIGURATION:
SET query_statement = 'SELECT * FROM PROJECT.DATASET.TABLE';
SET export_format = 'PARQUET';
SET export_uri = 'gs://bucket/path/file_*';
SET optional_arguments = `JSON '{"overwrite": true}';`

Note that if no optional arguments are required, set optional_arguments = TO_JSON("").

Additionally, if programmatic definition of variables is required then a temporary function should be used in the configuration section. This maintains the separation of configuration and execution code.

Execution

The function is then executed using the standard CALL syntax:

--EXECUTION:
CALL `EXPORT.DATA`(
    query_statement, 
    export_format, 
    export_uri, 
    optional_arguments
    );