EXPORT.DATA
Overview
Characteristic | Value |
---|---|
Name | EXPORT.DATA |
Action | Export data as files to Google Cloud Storage (GCS), Amazon S3 or Azure Blob Storage using the EXPORT DATA STATEMENT. |
Type | ROUTINE |
Arguments | query_statement STRING, export_format STRING, export_uri STRING, optional_arguments JSON |
Returns | N/A |
Arguments
Required
Name | Data Type | Description |
---|---|---|
query_statement | STRING | SQL query statement describing the data which will be exported |
export_format | STRING | The format of the exported data. Supported values include: "AVRO" , "CSV" , "JSON" , "PARQUET" . |
export_uri | STRING | The destination URI for the export. The uri option must be a single-wildcard URI as described in Exporting data into one or more files. Example: "gs://bucket/path/file_*.csv" |
optional_arguments | JSON | See below |
Optional
The following optional arguments can also be passed to the function in the optional_arguments JSON variable
Name | Data Type | Value Options |
---|---|---|
test_mode | BOOL | If true , the compiled EXPORT DATA statement is displayed but not executed (useful for testing purposes). If not present or false , the compiled statement will be executed. |
connection_name | STRING | Specifies a connection that has credentials for accessing the external data. Specify the connection name in the form "PROJECT_ID.LOCATION.CONNECTION_ID" . If the project ID or location contains a dash, enclose the connection name in backticks (```). |
compression | STRING | Specifies a compression format. If not specified, the exported files are uncompressed. Supported values include: "GZIP" , "DEFLATE" , "SNAPPY" . |
field_delimiter | STRING | The delimiter used to separate fields. Default: "," (comma). Applies to: CSV . |
header | BOOL | If true , generates column headers for the first row of each data file. Default: false . Applies to: CSV . |
overwrite | BOOL | If true , overwrite any existing files with the same URI. Otherwise, if the destination storage bucket is not empty, the statement returns an error. Default: false. |
use_avro_logical_types | BOOL | Whether to use appropriate AVRO logical types when exporting TIMESTAMP, DATETIME, TIME and DATE types. Applies to: AVRO . More details here. |
Syntax
Declaration
All variables must be declared at the beginning of a BigQuery script. Additionally, it is recommended to set the @@dataset_project_id
to the flowfunctions
project in the appropriate geographic region. If this is not set in the declaration section, each function call needs to be prefixed with the regional project_id (e.g. `flowfunctions--eu`.EXPORT.DATA
)
--DECLARATION:
DECLARE query_statement, export_format, export_uri STRING;
DECLARE optional_arguments JSON;
SET @@dataset_project_id = 'flowfunctions--eu';
Configuration
Variables should then be set in the configuration section. This makes the script more readable and easier to maintain or replicate and reuse. An example configuration is:
--CONFIGURATION:
SET query_statement = 'SELECT * FROM PROJECT.DATASET.TABLE';
SET export_format = 'PARQUET';
SET export_uri = 'gs://bucket/path/file_*';
SET optional_arguments = `JSON '{"overwrite": true}';`
Note that if no optional arguments are required, set optional_arguments = TO_JSON("")
.
Additionally, if programmatic definition of variables is required then a temporary function should be used in the configuration section. This maintains the separation of configuration and execution code.
Execution
The function is then executed using the standard CALL syntax: