×

How to Use BigQuery with ChatGPT (No SQL Knowledge Needed)

Eric Huguenin

Last Modified on August 17, 2023

Do you want to use BigQuery but are unfamiliar with SQL?

Today, BigQuery is impossible to ignore, and everyone says you should use it with Google Analytics 4. However, many people refrain from using this tool upon learning that they need to learn SQL programming.

Query GA4 data in BigQuery eBook

Subscribe & Get our FREE Query GA4 data in BigQuery eBook

Here comes ChatGPT to save the day, allowing you to benefit from its powerful analytics capabilities even with zero SQL knowledge. In this guide, we’ll teach you how to use BigQuery with ChatGPT.

To run queries in BigQuery, we will need to navigate its interface. It can look scary but do not fear, as we will go through it thoroughly.

Here is an overview of what we’ll cover:

Let’s dive in!

Running Queries in BigQuery

To run queries, we need the following three things:

Requirements for running queries in BigQuery

First, we need a BigQuery sandbox account so you can play with data for free. No credit card or billing is required unless you decide to upgrade. Next, we need a project which is a place where BigQuery data is stored. Lastly, we need to access a dataset.

First, log in to your Google account, and let’s begin!

Go to your search engine and search for google cloud console. Scroll down and select the Google Cloud Platform.

Accessing the Google Cloud Platform

You can also select Google Cloud console – Web UI Admin.

Accessing the Google Cloud console

Alternatively, you can go to Google Cloud directly by typing console.cloud.google.com on the address bar.

Alternative to accessing the Google Cloud console

If you’re an existing user, you will see a page like the one below where you will see the project you’re working on and the different projects you can access.

Google Cloud homepage

For new users, select your country, agree to the terms of service, and click Agree and Continue.

Creating a Google Cloud account

We have reached the homepage.

Creating a BigQuery Project

There are two ways of creating a new BigQuery project.

The first way is to click Select a Project.

Selecting a project in Google Cloud

Next, click New Project.

Creating a new project in Google Cloud

We prefer to go to the BigQuery workspace and create a project there.

Open the navigation menu, scroll down, and click BigQuery.

Gong to the BigQuery workspace

We are now in the SQL workspace. Click Create Project.

Creating a new project in the BigQuery SQL workspace

Provide a name for the project. If you want to change the project ID, click Edit. Finally, click Create.

Creating a new project

We will see a welcome screen. Click Done.

Closing the BigQuery welcome screen

We now have our BigQuery project in the sandbox account.

Accessing BigQuery Public Datasets

There are multiple ways to access public BigQuery datasets.

To achieve this quickly, go to the BigQuery sample dataset for GA4. Click the ga4_obfuscated_sample_ecommerce dataset.

Opening the GA4 sample dataset from the Google Analytics documentation

This link opens the table with the data we need to query.

To do the same in the SQL workspace, go to the search bar and type public. Initially, we will see 0 results. Click Search All Projects to broaden our search.

Searching for public datasets

Expand the bigquery-public-data and click Show More.

Opening the BigQuery public data project

Scroll down and select the ga4_obfuscated_sample_ecommerce dataset.

Accessing the GA4 sample eCommerce dataset

Now, clicking on the dataset is not where you will see your data. To understand this more clearly, note that the first level pertains to a project. Within your projects, you have the datasets.

Google Cloud Explorer structure

Selecting your datasets will only give you information about the dataset. To find your data, expand your dataset, and you’ll find it below.

For example, expand our dataset. Select the table called events.

Opening the events table from the GA4 dataset

A great document of reference is the GA4 BigQuery Export Schema.

GA4 BigQuery export schema documentation

This document is not mandatory, but it is nice to have to help understand how GA4 data is structured in BigQuery, the names of the columns, and the data types. You can know what’s available and what to use with ChatGPT.

We are using a free account, but we will still cover some best practices for avoiding or mitigating costs when dealing with company data.

Managing Costs

First, go to Details and scroll below to see the dataset size. Since we are still dealing with megabytes, we are still safe.

Dataset storage size

If you see data above the gigabytes or terabytes size, be prepared to pay some money.

Next, go to the table info section and copy the table ID.

Copying the table ID

Next, click QueryIn New Tab.

Opening the query window in a new tab

To quickly demonstrate how to load query data, let’s try the following code:

SELECT *
FROM [your-table-id]

The above code lets us retrieve all the data available in the table. Paste this code and pay attention to the message at the right.

Testing the code for retrieving all data from the table

First, we have a green check mark indicating the query is valid. The message beside this lets us know if there are any costs involved.

Because we are dealing with megabytes, we are still within the free tier. If the query size is above gigabytes or terabytes, you have to make edits to your query and make it fall back to the free tier.

After trimming down the query, if the size is still a gigabyte or more, only query the data you need to minimize costs. Remove all unnecessary rows or columns.

At this point, ChatGPT is extremely useful if you have limited knowledge of SQL or if you want to simplify the query writing process.

Retrieving Data from BigQuery with ChatGPT

Now, let’s go to ChatGPT.

Ask it to retrieve everything from our BigQuery table.

Using ChatGPT to write an SQL query to retrieve all data

ChatGPT will write a code for us and provide a quick explanation below. Click Copy code.

Copying the code for retrieving all data

Paste the code back to BigQuery and click Run.

Running the code to retrieve all data

After running the code, we can see the query results at the bottom.

Query results for retrieving all data

Now, let’s crank it up a little. Let’s say we want to retrieve only the event_date and event_name columns. How should we do this?

Retrieving Specific Columns & Rows

Again, we can use ChatGPT to simplify this process.

A great thing about ChatGPT is that each search is not isolated. It can answer any follow-up questions and keep the context of our search history.

We can ask it to retrieve the event_date and event_name columns. It will gather the table ID from our previous message.

Using ChatGPT to retrieve the event_date and event_name columns

Copy the updated SQL query.

Copying the code for retrieving the event_date and event_name columns

Paste this code back to BigQuery and Run the code. Expand the query result.

Running the updated SQL code and expanding the query results

We can now verify that we are only retrieving the event_date and event_name columns.

Query results for retrieving event_date and event_name columns

While we have cut down our columns, we still have all the rows in our table. Looking at the lower-right portion of the window, you’ll see that we have close to 27,000 rows.

Total rows of the query result

Since BigQuery stores table data in a column format, query costs are usually associated with the data type and the number of columns.

The number of rows does factor in the pricing, but reducing the number of rows will not decrease the cost. BigQuery reads all column data first before selecting specific rows.

Still, there may be instances where you only want to see specific rows of data. For example, we only want to see the top 10 events.

Let’s ask ChatGPT to show the first 10 rows of data. Copy the updated code.

Copying the updated code for retrieving the first 10 rows

After running this code in BigQuery, verify that we only get the first 10 rows.

Query results for retrieving the first 10 rows

What if we want to obtain specific rows based on certain criteria? How do we filter our query?

Filter Records

Let’s say we only want to look at specific types of events.

Looking at the event_name column, we can see the different events on our website, such as page_view, user_engagement, scroll, etc. What if we are interested in looking at only scroll event data?

Let’s open a new chat in ChatGPT. Provide the table ID and ask it to select the event_date and event_name columns and filter it by the scroll event.

Filtering rows by the scroll event

Copy the code.

Copying the updated code for filtering the scroll events

Run this code in BigQuery and confirm that we only get scroll events.

Query results for filtered scroll events

What if we want to look at specific date ranges?

SQL Dates

For dates, there is one slight modification we must remember.

Remove the date added to the end of the table ID and replace it with an asterisk when building your query.

The date at the end of the table ID

In ChatGPT, copy the latest code and replace the date at the end of the table ID with an asterisk. Next, tell ChatGPT to get rows of data where the dates are between the 1st of December 2020 and the 1st of January 2021.

Filtering the query by specific dates

ChatGPT should return the code with an additional line beneath the WHERE clause indicating the dates we want to filter.

Retrieving GA4 Event Parameters

For a quick refresher, GA4 event parameters are additional data sent along with our events. If you look at the event_params field, we have repeated in the mode column, meaning that we have repeated fields.

Event_params with repeated fields

Things become tricky when dealing with nested or repeated fields, as they are the most complex data types in BigQuery. Think of them as data within data.

If we click on the toggle, we’ll see different fields appearing. We’ll first discuss the key.

Key field in the event parameter

The key is a unique identifier for all our event parameters.

Looking at Preview, we can see the event_params field separated into different columns. The event parameter key identifier is in the event_params.key column.

Key field column in the preview tab

We can have different value types for each of our keys. Toggling the value field, we can have string, integer, float, and double data types.

Different value fields under event parameters

String values are texts, integer values are whole numbers, and float and double values are decimal numbers. All these are in different columns.

Let’s go back to the preview tab and see the columns containing the value of the event parameters. These are the columns starting with event_params.value.

Value field column in the preview tab

Let’s look at specific examples to see how key-value pairs are stored.

We have the ga_session_number containing integer values, and the page_location containing URLs which are strings. The first is in the event_params .value.int_value column, and the latter is in the event_params .value.string_value column.

Event parameter key-value pairs in BigQuery

Now, how can we retrieve data from nested fields?

Unnesting Fields (Getting GA4 Event Parameters)

Let’s say we want to retrieve two columns – one with the event_name and the other with the corresponding value of the page_title key.

We’ll ask ChatGPT to look at the event_params.key column and check for the page_title. Then, return the corresponding value from the event_parans.value.string_value column.

Key-value pair for the page_title

Next, also instruct ChatGPT to return the event_name column. Then, provide the BigQuery table ID and state that the event_params field has repeating fields.

Instructions for getting the value equivalent of page_title and event_name

Initially, ChatGPT may return a lengthy code. We can ask it to simplify the code.

Lengthy code and asking ChatGPT to simplify the code

Copy the simplified code.

Copying the simplified code for retrieving equivalent page_title value and event_name

Let’s try this out. Alright, we have the event_name and page_title columns.

Query results for event_name and equivalent page_title value

That’s how to retrieve data from nested fields.

🚨 Note: Another zero coding solution for BigQuery is the new GA4SQL tool. Check out our guide on it and learn how to use it.

Summary

We’ve seen how to use simple language to obtain data from BigQuery with ChatGPT. We learned what prompts to use with this tool to return SQL queries. We used GA4 data, but this process can apply to any dataset.

Are you interested in what other tasks ChatGPT can help with? Here are some ways how to use ChatGPT in digital marketing.

Want something more specific? Here are 10 ways to use ChatGPT with Google Tag Manager.

Have you tried ChatGPT to write SQL queries? Will you use BigQuery with your GA4 implementation? Let us know in the comments below!

Sign up to the FREE GTM for Beginners Course...

[wpforms id="9076262" title="false" description="false"]

Eric Huguenin

Eric is a web analytics content creator at MeasureSchool. He holds certifications in Google Analytics and Google Tag Manager at the CXL Institute. He enjoys breaking down marketing topics into easier-to-understand articles.

Related Posts

Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

NOW IT'S TIME TO

Start Measuring Like A Master

Itching to jump into the world of MeasureMasters? This is what you have to look forward to.

Ready to take your digital marketing to the next level?

Subscribe to our newsletter and stay ahead with our latest tips and strategies.