BIGQUERY CONSTRAINTS

 BIGQUERY CONSTRAINTS FOR JOIN OPTIMIZATION STRATEGY

BigQuery recently launched the constraints namely Primary Key (PK) and Foreign Key (FK) as a un-enforced constraints as of now (the date of writing this blog). The focus of this blog is to unlock the true potential of these BigQuery constraints in terms of query optimization and the best practices to utilize them.

To begin, I’ll provide a brief overview of table constraints. Constraints are basically applied to the columns of a table which are required in a database to technically maintain the integrity, correctness, and completeness of the data, apart from domain-level constraints applied at the business level. Constraints can be classified into Unique, Not Null, Default, Check, Primary Key and Foreign Key constraints.

BigQuery is an analytical data warehouse that adheres to denormalized schemas. It is primarily designed to handle large volumes of data for analytical insights that may not require 100% accuracy compared to OLTP databases, therefore it does not require these constraints. But BigQuery’s recent launch of two non-enforced constraints: Primary Key (unique not null columns of the table) and Foreign Key (column that references the Unique or Primary Key column of another table as a referential constraint) left me with a sense of surprise.

When I dive deep into it, I discovered that the real reason behind this is not to introduce traditional constraint concepts into BigQuery. Instead, Google is mainly leverage these familiar traditional SQL concepts and extends their BigQuery optimization approach. In this context, the BigQuery Optimizer is prompted by these constraints to create an optimized query execution graphs/plans, particularly when CONSTRAINTS (PK/FK) guarantee unique rows, which helps optimizing JOINS and reduces the compute cost where JOIN is one of the most costly operations in the SQL world! These constraint features support both new tables and existing tables as well.

Website

Post a Comment

0 Comments