top of page

A curated selection of new BigQuery features, offering a high-level overview of what they are, when to use them, and when to avoid them—in easy-to-digest, bite-sized chunks with helpful links.

image.png

September 2024

What is it?

History-based optimisations in BigQuery are features that automatically improve the performance of queries over time by analysing historical query patterns and usage data. These optimisations help BigQuery learn from previous queries to make future ones faster and more efficient.

Why is it useful?

These optimisations are useful because they reduce query processing time and cost by reusing information learned from previous queries. By continuously improving performance based on historical data, users get faster results, especially for repeated or similar queries, without having to manually optimise their queries.

When to use it?

History-based optimisations are automatically applied, so you benefit from them whenever you run queries, especially for datasets you query frequently. For example, if you regularly run the same sales performance report at the end of each month, BigQuery will optimise these queries over time to return results faster.

When not to use it?

There's no need to avoid using history-based optimisations, as they work automatically in the background. However, if you're running completely new or one-off queries that don't rely on past patterns, you might not see as much performance improvement from these optimisations.

Gotchas

  • Ensure you enable Query history using alter project, you can also disable 

  • May of your queries will show null query optimisations, so it's worth filtering these

  • It can also be quite expensive, remember to check the bytes used before running

  • Ignore the scary warning signs - these are default and don't actually impact your project

  • You can measure the impact of the query optimisations on processing but remember some edits may not be worth it

What is it?

BigQuery now supports more flexible column naming, allowing you to name columns using your preferred international language and special characters, such as the ampersand (&) and percent sign (%). This feature is particularly useful for managing international data and handling migrations with complex schemas. Or even aligning naming conventions to your internal business taxonomy.

Why is it useful?

It improves data accessibility and usability by enabling the use of diverse naming conventions and characters, making it easier to work with data in multiple languages. This flexibility helps reduce friction when migrating datasets or integrating international business data, ensuring that column names better align with your data’s original structure.

When to use it ?

Use flexible column names when dealing with datasets that require international characters or special symbols in column names, such as a dataset with column names in Chinese (列名) or column names containing special characters like int-col. This is also useful when migrating datasets from systems with non-standard column names.

When not to use it?

Avoid using flexible column names when working with systems or tools that may not fully support special characters in column names, or when you don’t require multi-language or special character support in your datasets. Remember to also consider if translation is the right choice if you have a multilingual organisation.  You cannot use flexible names on external tables at present.

What is it?

Materialised View Replicas in BigQuery create copies of materialised views across multiple regions, enhancing availability and performance. A materialised view is a precomputed table that stores query results, enabling faster and more cost-effective querying. Materialised view replicas can also replicate data from external sources like Amazon S3, Apache Iceberg, or Salesforce Data Cloud into a BigQuery dataset. This reduces data egress costs and further improves query performance.

Why is it useful?

It boosts data accessibility and resilience by replicating materialised views across regions, thereby minimising the impact of regional outages and providing fast, reliable query performance for global users. Additionally, it reduces costs related to external data by pulling data directly into BigQuery, avoiding egress charges.

When to use it?

Use materialised view replicas when you have a globally distributed team requiring fast access to data from multiple regions or when you’re pulling external data from sources like Amazon S3, Apache Iceberg, or Salesforce into BigQuery. This improves query speed and helps avoid egress charges.

When not to use it?

Avoid using materialised view replicas if your operations are primarily based in a single region or if the cost of replication outweighs the benefits. Standard materialised views might be sufficient if your data doesn’t require global access or if it’s accessed less frequently.

What is it?

BigQuery Admin Resource Charts provide visualisations to monitor the operational health and performance of your BigQuery environment. These charts are available in the Google Cloud Console and offer insights into key metrics such as slot utilisation, job throughput, and query performance, enabling administrators to assess and optimise their BigQuery resources efficiently.

Why is it useful?

The Admin Resource Charts help you proactively monitor and maintain the health of your BigQuery environment. By tracking slot utilisation, query execution times, and resource allocation, administrators can identify and resolve issues quickly, optimise resource usage, and improve overall system performance. This proactive approach minimises disruptions and ensures the smooth operation of your data processes.

When to use it?

Use Admin Resource Charts when you need to gain visibility into your BigQuery workload performance and resource consumption. For example, if you notice increased query latency or high slot utilisation, these charts can help identify bottlenecks, allocate resources more effectively, and prevent system slowdowns.

When not to use it?

If your BigQuery usage is minimal or your workloads don’t require detailed performance monitoring, using Admin Resource Charts may not be necessary. In such cases, basic monitoring and logging could suffice for your needs.

What is it? 

BigQuery Workflows is a feature within Google Cloud that allows you to automate, schedule, and orchestrate your BigQuery tasks and queries. It enables the creation of data workflows by chaining together a series of SQL queries, scripts, and other Google Cloud services, making it easier to manage complex data processing pipelines within BigQuery.

Why is it useful?

BigQuery Workflows helps automate repetitive tasks, reducing manual intervention and the risk of errors. It provides a streamlined way to execute multiple queries and scripts in sequence, allowing you to build end-to-end data processing pipelines. This saves time, improves efficiency, and ensures data integrity by managing dependencies between tasks seamlessly.

When to use it?

Use BigQuery Workflows when you have a recurring series of SQL queries or data processing tasks that must be executed in a specific order. For instance, if you’re transforming data for daily reports or running ETL (Extract, Transform, Load) operations on a schedule, BigQuery Workflows can automate these tasks to run consistently and accurately.

When not to use it?

Avoid using BigQuery Workflows if your data processing tasks are simple, infrequent, or do not have dependencies on other queries. In such cases, manually running queries or scheduling individual queries through simpler methods might be more efficient and cost-effective.

Blog link: (coming soon)

image.png
bottom of page