Search
Close this search box.

How to Use BigQuery with ChatGPT (No SQL Knowledge Needed)

Last Modified on August 17, 2023

Do you want to use BigQuery but are unfamiliar with SQL?

Today, BigQuery is impossible to ignore, and everyone says you should use it with Google Analytics 4. However, many people refrain from using this tool upon learning that they need to learn SQL programming.

Query GA4 data in BigQuery eBook

Subscribe & Get our FREE Query GA4 data in BigQuery eBook

Here comes ChatGPT to save the day, allowing you to benefit from its powerful analytics capabilities even with zero SQL knowledge. In this guide, we’ll teach you how to use BigQuery with ChatGPT.

To run queries in BigQuery, we will need to navigate its interface. It can look scary but do not fear, as we will go through it thoroughly.

Here is an overview of what we’ll cover:

Let’s dive in!

Running Queries in BigQuery

To run queries, we need the following three things:

Requirements for running queries in BigQuery

First, we need a BigQuery sandbox account so you can play with data for free. No credit card or billing is required unless you decide to upgrade. Next, we need a project which is a place where BigQuery data is stored. Lastly, we need to access a dataset.

First, log in to your Google account, and let’s begin!

Go to your search engine and search for google cloud console. Scroll down and select the Google Cloud Platform.

Accessing the Google Cloud Platform

You can also select Google Cloud console – Web UI Admin.

Accessing the Google Cloud console

Alternatively, you can go to Google Cloud directly by typing console.cloud.google.com on the address bar.

Alternative to accessing the Google Cloud console

If you’re an existing user, you will see a page like the one below where you will see the project you’re working on and the different projects you can access.

Google Cloud homepage

For new users, select your country, agree to the terms of service, and click Agree and Continue.

Creating a Google Cloud account

We have reached the homepage.

Creating a BigQuery Project

There are two ways of creating a new BigQuery project.

The first way is to click Select a Project.

Selecting a project in Google Cloud

Next, click New Project.

Creating a new project in Google Cloud

We prefer to go to the BigQuery workspace and create a project there.

Open the navigation menu, scroll down, and click BigQuery.

Gong to the BigQuery workspace

We are now in the SQL workspace. Click Create Project.

Creating a new project in the BigQuery SQL workspace

Provide a name for the project. If you want to change the project ID, click Edit. Finally, click Create.

Creating a new project

We will see a welcome screen. Click Done.

Closing the BigQuery welcome screen

We now have our BigQuery project in the sandbox account.

Accessing BigQuery Public Datasets

There are multiple ways to access public BigQuery datasets.

To achieve this quickly, go to the BigQuery sample dataset for GA4. Click the ga4_obfuscated_sample_ecommerce dataset.

Opening the GA4 sample dataset from the Google Analytics documentation

This link opens the table with the data we need to query.

To do the same in the SQL workspace, go to the search bar and type public. Initially, we will see 0 results. Click Search All Projects to broaden our search.

Searching for public datasets

Expand the bigquery-public-data and click Show More.

Opening the BigQuery public data project

Scroll down and select the ga4_obfuscated_sample_ecommerce dataset.

Accessing the GA4 sample eCommerce dataset

Now, clicking on the dataset is not where you will see your data. To understand this more clearly, note that the first level pertains to a project. Within your projects, you have the datasets.

Google Cloud Explorer structure

Selecting your datasets will only give you information about the dataset. To find your data, expand your dataset, and you’ll find it below.

For example, expand our dataset. Select the table called events.

Opening the events table from the GA4 dataset

A great document of reference is the GA4 BigQuery Export Schema.

GA4 BigQuery export schema documentation

This document is not mandatory, but it is nice to have to help understand how GA4 data is structured in BigQuery, the names of the columns, and the data types. You can know what’s available and what to use with ChatGPT.

We are using a free account, but we will still cover some best practices for avoiding or mitigating costs when dealing with company data.

Managing Costs

First, go to Details and scroll below to see the dataset size. Since we are still dealing with megabytes, we are still safe.

Dataset storage size

If you see data above the gigabytes or terabytes size, be prepared to pay some money.

Next, go to the table info section and copy the table ID.

Copying the table ID

Next, click QueryIn New Tab.

Opening the query window in a new tab

To quickly demonstrate how to load query data, let’s try the following code:

SELECT *
FROM [your-table-id]

The above code lets us retrieve all the data available in the table. Paste this code and pay attention to the message at the right.

Testing the code for retrieving all data from the table

First, we have a green check mark indicating the query is valid. The message beside this lets us know if there are any costs involved.

Because we are dealing with megabytes, we are still within the free tier. If the query size is above gigabytes or terabytes, you have to make edits to your query and make it fall back to the free tier.

After trimming down the query, if the size is still a gigabyte or more, only query the data you need to minimize costs. Remove all unnecessary rows or columns.

At this point, ChatGPT is extremely useful if you have limited knowledge of SQL or if you want to simplify the query writing process.

Retrieving Data from BigQuery with ChatGPT

Now, let’s go to ChatGPT.

Ask it to retrieve everything from our BigQuery table.

Using ChatGPT to write an SQL query to retrieve all data

ChatGPT will write a code for us and provide a quick explanation below. Click Copy code.

Copying the code for retrieving all data

Paste the code back to BigQuery and click Run.

Running the code to retrieve all data

After running the code, we can see the query results at the bottom.

Query results for retrieving all data

Now, let’s crank it up a little. Let’s say we want to retrieve only the event_date and event_name columns. How should we do this?

Retrieving Specific Columns & Rows

Again, we can use ChatGPT to simplify this process.

A great thing about ChatGPT is that each search is not isolated. It can answer any follow-up questions and keep the context of our search history.

We can ask it to retrieve the event_date and event_name columns. It will gather the table ID from our previous message.

Using ChatGPT to retrieve the event_date and event_name columns

Copy the updated SQL query.

Copying the code for retrieving the event_date and event_name columns

Paste this code back to BigQuery and Run the code. Expand the query result.

Running the updated SQL code and expanding the query results

We can now verify that we are only retrieving the event_date and event_name columns.

Query results for retrieving event_date and event_name columns

While we have cut down our columns, we still have all the rows in our table. Looking at the lower-right portion of the window, you’ll see that we have close to 27,000 rows.

Total rows of the query result

Since BigQuery stores table data in a column format, query costs are usually associated with the data type and the number of columns.

The number of rows does factor in the pricing, but reducing the number of rows will not decrease the cost. BigQuery reads all column data first before selecting specific rows.

Still, there may be instances where you only want to see specific rows of data. For example, we only want to see the top 10 events.

Let’s ask ChatGPT to show the first 10 rows of data. Copy the updated code.

Copying the updated code for retrieving the first 10 rows

After running this code in BigQuery, verify that we only get the first 10 rows.

Query results for retrieving the first 10 rows

What if we want to obtain specific rows based on certain criteria? How do we filter our query?

Filter Records

Let’s say we only want to look at specific types of events.

Looking at the event_name column, we can see the different events on our website, such as page_view, user_engagement, scroll, etc. What if we are interested in looking at only scroll event data?

Let’s open a new chat in ChatGPT. Provide the table ID and ask it to select the event_date and event_name columns and filter it by the scroll event.

Filtering rows by the scroll event

Copy the code.

Copying the updated code for filtering the scroll events

Run this code in BigQuery and confirm that we only get scroll events.

Query results for filtered scroll events

What if we want to look at specific date ranges?

SQL Dates

For dates, there is one slight modification we must remember.

Remove the date added to the end of the table ID and replace it with an asterisk when building your query.

The date at the end of the table ID

In ChatGPT, copy the latest code and replace the date at the end of the table ID with an asterisk. Next, tell ChatGPT to get rows of data where the dates are between the 1st of December 2020 and the 1st of January 2021.