This session guides you on how to import data from GCS to BQ from an Apache Airflow DAG in GCP Cloud Composer.
Another very interesting approach is to use this Airflow GCSToBigQueryOperator operator.
Follow the Composer Batch Submit guide, which sets up the DAG that runs the Batch GCS to GCS example
Step 2 - Add the following additional task in the DAG
from airflow.operators import bash
BQ_DATASET=your_dataset
BQ_TABLE=your_table
BUCKET=your_bucket
SRC_PARQUET_DATA=BUCKET+"/your_data_path/*.parquet"
PARTITION=your_bq_table_partition_field
# Load data from GCS to BQ
load_from_gcs_to_bq = bash.BashOperator(
task_id='load_from_gcs_to_bq',
bash_command=f'bq load --replace --source_format=PARQUET --time_partitioning_field={PARTITION} {BQ_DATASET}.{BQ_TABLE} {SRC_PARQUET_DATA}')
# The --replace flag overwrites if table already exists.
The added task will make Airflow load the .parquet data from the provided GCS path to a BQ table.
All code snippets within this document are provided under the following terms.
Copyright 2022 Google. This software is provided as-is, without warranty or representation for any use or purpose. Your use of it is subject to your agreement with Google.