As you’ll probably already be aware, Google Analytics 4 (GA4) provides an fantastic dataset for BigQuery for all your raw user and event level data. This data is super easy to connect to Google cloud platform (GCP). However if you connect your GA4 property to a sandbox mode, the tables will automatically have a 60 day expiry window.
Firstly, I always recommend setting up the GA4 to BigQuery connection, even if you cannot add a billing account. Ensure you plan to add a billing account in the first 60 days to benefit from pay as you go features.
You can track your billing easily via billing reports and often if you’re simply storing GA4 data for a small property you will be unlikely to spend more the $0.5 per day. There are a number of methods you can use to manage spend including alerts and quota limits set on your billing account. However, something you may not realise when you do upgrade your sandbox account to a paid for version, the 60 day expiry date on you historical tables is not automatically removed.
So how do you resolve this?
There are a number of ways to remove expiry, however I tend to like the SQL approach as I remain in the same interface, however if you want to read about these in more detail, Google helpfully provide this useful guide.
Step 1: dataset level expiry:
Navigate to the console and click on the dataset in question and untoggle the enable table expiration
Step 2: Manual approach-removing the expiry on the tables already created
You will now need to remove the expiry of the table, you can do this in the interface via the console by clicking edit details and then selecting none.
Step 2: Automated approach-removing the expiry on the tables already created
If you have more then a couple of tables I strongly advise against doing this manually, as you may find you forget tables or you simply do not want to sit for hours doing this. In comes DDL to the rescue.
The code below will need editing by:
Replacing the project name and dataset name in the concat statement
Replacing the project name and dataset name in the from statement
DECLARE SQL string;
SET SQL=
(WITH base AS
(
SELECT concat(" ALTER TABLE `projectname.analytics_12345678.", table_id, "` SET OPTIONS(expiration_timestamp=null)") as string
FROM
`projectname.analytics_12345678`.__TABLES__)
SELECT STRING_AGG( string, ';' )
FROM base);
Once you have run your code, navigate to the results and copy these into another query window, make sure you remove the quotation marks which will be at the beginning and end of your code
Now select run, et voila you now have removed all expiry dates without having to go through each table. Now to double check expiries have been removed
There are more ways of automating this process further using tools such as python and other dynamic code, however in this case I prefer the approach of manually running the final code to ensure you have a clear log of each action run.
I will be writing a further blog on how to save deleted tables and datasets in the near future in case you have gone beyond the 60 days window or accidentally delete datasets. But in the meantime — have fun with your SQL :)
Comments