Big Query Live Training – A Deep Dive into Data Pipelining

What is BigQuery and why would you use it as a marketer? Amad is an agency owner who started to report on more and more data for his clients. Find out why processing your data in BigQuery is so awesome and get a deep dive into data pipelining with BigQuery in this live training replay.

Today we have a MeasureMaster live training recording we did with Ahmad of Siavak.com. If you are not yet aware of our live trainings, we do them every month inside of our MeasureMaster membership and train our members on the latest going on in measurement, analytics, tracking and marketing.

Now in this session, Ahmad gave us an introduction into what BigQuery is all about, why we should care about it as marketers, how we can perform an ETL process on our data. Then he actually goes into a live demonstration where he takes data from Google Analytics, imported into BigQuery and enriches it and visualize it in Data Studio. Now this was an in depth training.

So be sure to check out the timestamps in the description down below if you want to skip forward to one or the other topic. But for now, we’ve got lots to cover. So let’s dive in.

So Shall we get started? Everybody is here. We have Costa here from the UK. Mario from India 21 day curfew right now. Yeah, I heard about that. We have Marco Peters. And he has already a question, but we will get to a q&a part at the end of this. So I would like to now switch over to Ahmad and yeah, you can get out my screen or share your screen. Yeah.

First of all, a short introduction about me. That’s my picture. My name is Ahmad. I’m a founder of siavak analytics. We are a company that do analytic setups, data pipelines and reporting dashboard. Were a small team like four people at the moment. I personally have started with ecommerce into some marketing about 10 years ago, and then I narrowed down my focus to conversion rate optimisation in about like six, seven years ago.

And finally, I further narrowed it down to just data and analytics for the last five years because I figured out that I love data. And I was seeing that data is essential for any online marketing or conversion optimization project, nothing can work, or optimize without data. So let’s begin. He wanted to make it short, just introduction.

What We Will Cover

Today, I’m first going to introduce you to the problem of data silos and the problem that I had, which forced me to start going to a journey that finally ended up creating data pipelines using BigQuery. Then I’m going to introduce you to the solution data pipelines, what they are, how can we use them. And the evolution of data pipeline is just missed. It is not the evolution of data pipeline. It’s evolution of data pipeline for me.

So my journey from being a simple user of Google Analytics built-in standard reports, to today, creating data pipelines to extract data from different marketing sources, and bringing into BigQuery, transforming them and reporting for my clients. Now, in this presentation, I’m going to most of the time I will refer to Google Analytics. But from now on, please keep in mind that I’m using Google Analytics as an example, just because I assume that everyone on this call, and whoever is watching it later, is using Google Analytics at least. But the concepts they apply to any other marketing tool or CRM, or advertising tool that you use like Facebook ads, Bing Ads, Shopify, any CRM that you use that have a API, you can interchangeably use this same concepts that I’m talking about to those tools as well.

Then when we see that what a data pipeline is and how BigQuery is involved within I’m going to talk about the query itself, what is BigQuery, what is the benefits of it. And then how can we extract and build data into BigQuery. And once the data is inside BigQuery, how can we transform it, clean it, join it, prepare it for the use cases, and the use cases. Once data is cleaned, prepared. And we have it’s ready, we can use it for visualization, analysis and activation. Finally, I’m going to as I said that do a hands on demo a short one to show you how it is exactly done. So it’s not Just theoretical we do it in practice as well.

The Big Data Problem

So let’s start with a problem marketing and analytics data are in silos. They are in Google Analytics, HubSpot, Salesforce, Bing, etc. Any other tool that you’re using your data are right now in these tools, your internal data sets included. Okay? The problem with these is that, first of all, they don’t talk with each other.

So data inside Facebook is not shared with your Google Analytics is not shared with Shopify unless you do some kind of connection, and deliberately try to set up a connection for these data to be shared. But by default, they are not shared. And the bigger problem about it is the lack of data ownership. Okay, I’m talking about this more, a bit later on. But when your data is inside any of these platforms, you don’t own your data. You have to use the platform as an interface. To get access to your data to kind of create reports, and you’re limited by whatever that platform gives you in terms of data.

The next problem is limited analysis capabilities. So, you cannot most of the time you cannot do advanced segmentation, you cannot do trend analysis, comparison forecasting, etc, within the tool themselves, so you wants to get your data out, so you can do more exciting analysis on your data. And finally, is limited reporting. Any of these tools even Google Analytics interface, they have limited capabilities for reporting, there are reports that are there by default. There are reports that some of these platforms allow you to create custom reports. But even then, there are times that you have some questions that you cannot create the reports that you want to add. For example, in Google Analytics Another problem that exists is the concept of data sampling. If you have a large volume of data, in the timeframe that you’re going to report, Google Analytics, just do not process all of your data, it just process a sample of your data and extrapolate it to give you the final numbers. And this is not good.

The Solution

Now let’s uh, let’s take a look at the solution the data pipeline. What I use as a solution to escape from those limitations. A data pipeline, what we do is, first of all, we extract data from all of these tools into a central place. So if we extract data, and load it into a data warehouse somewhere that is owned by ads that can store our data and can give us access to our data, whatever way that we want, okay. Now, the extraction part can be done through different means, like API’s multi tools manual export. I will cover these in a bit more detail later on.

But after we load data in BigQuery, from all those data sources, they are still in different tables, we need to do some kind of transformation, join, preparation, etc. So we can merge those data together, we can extract inside we can clean them up and kind of create a final or several final tables that allow us to do actual meaningful analysis on our data.

And once our transformation cleaning and preparation is done within BigQuery, and any other data warehouse that we use, then we can visualize data or analyze the data. And that is the my I don’t know if you can see my mouse pointer but this is the logo of Google Data Studio, which is my tool of preference for data visualization and giving reports to my clients. But you can also connect the final data set to lots of tools, Tableau or other tools for analysis, machine learning, or visualization or reporting.

Your Data Pipeline Journey

Next up, I’m going to talk about the evolution of my data pipeline. So hi, how I how I started from being a simple user of Google Analytics to what I’m doing today. So first of all, when I saw that, like, several years ago, I was just using standard reports. And those days they were not as beautiful as these screenshots here. But quickly, I figured out that I needed to create reports for my clients based on the questions that they had. That were not among the standard reports in Google Analytics. Those days, the only way of doing that would be to do custom reports and Google Analytics. This is before Data Studio was released.

So the next step for us and me and people like me, was to create custom reports in Google Analytics to get the answers that we wanted. But there was a problem.  And the problem was that even with custom reports, the data was limited to whatever was there in Google Analytics. Sampling was an issue as well. And I couldn’t join it with other data sets and do anything that was not possible within the reporting interface. So the next step, which was the easiest one was to use the export feature of Google Analytics, create the report that you want, click on export it into a CSV, Excel or Google Sheets, and then try to do whatever you want to do with your data inside a spreadsheet.

Okay, so Google ethics, CSV, and then spreadsheet, processing a spreadsheet. And at that time, we also used to create charts and tables into spreadsheets. So different reports may be from different views or different standard or custom reports. we exported them. We did the transformation in sheets and sand created charts and graphs inside the Excel or Google Sheet that was a few years ago. And then we’re started to bring data from different tools as well like Facebook analytics and Google ads, and combine them with the data that we exported from Google Ads. But the process was manual and it was tedious and I mean, if we wanted to do weekly reporting, then we needed to rinse and repeat and do the process from scratch every week of the status reports look likes.

Then we kind of graduated from that. And instead of manually extracting data from Google Analytics, we started using Google Analytics API. And API. Most of you know what it is it is. It stands for application programming interface. It allows you to write some piece of code that connects to Google Analytics and extract data from Google entities and then decode itself. It can just put data into Google Sheet for you. And we use this we use the Google Analytics add on for Google Sheets to connect to the API bring data into sheets. Also, we use other tools like supermetrics, we also get data from Facebook ads and Google Ads into sheets, and then combine all the sheets together again, but this time, it was the time the Data Studio was live.

So we actually created our reports inside Data Studio, we connected Google Sheets as a data source to Data Studio. To visualize data, we didn’t use the charts in Google Sheets anymore. Now, the final stage, which we are at right now, and gives us the more flexibility, the most power, and is the easiest. I mean, it’s far more easier than manual export, and also from getting data, preparing data into sheets, is that we, again started to bring data to sheet first. But then we use the Google Apps Scripts to send data from Google Sheets inside BigQuery for transformation.

So instead of doing formulas, and v lookups and things like that, which were slow and limited, we brought our data sources as table in BigQuery. And we use that sequel to do data transformation. That was the instance that I told you that I was forced to use BigQuery because at that time, we were using the previous set up for a client of ours.

And as you know, Google Sheet has a limit of 5 million cells per sheet. And then even if you get close to that, any kind of formula and vlook up and arrays and pivot tables, they will get super slow. So we were hitting the limitation of the volume of data and also the reporting was so slow. That was frustrating for everyone. So, we had to find an alternative and we started doing in BigQuery. Then when data is prepared in BigQuery, again, now we can connect BigQuery to Data Studio, for the visualization and reporting.

Now something that we need to give your mind that we do not have to bring data to Google Sheet first. We did this at that time because he was closest to the setup that we had. And also it was really cost efficient. We, we were using super metrics to get data from Facebook, Google ads, and many other data sources to Google Sheet already, we were paying for the subscription. So we said we are, we have our data already in sheets. Now let’s just create a script and find a solution to send them to BigQuery. Rather than finding a way of bringing data directly from those tools into BigQuery, which is right are possible. There are third party tools and different ways of streaming data directly from a tool, many different tools directly to the query and eliminate the Google Sheet and all the limitations in between.  Okay, let me just drink some water.

Now that we’ve seen that BigQuery is kind of the solution to the creating of efficient and reliable data pipeline, let’s see what it is. So over time, Google had to invest a lot in the servers and infrastructure to store and analyze and process data for its own operations and its own products in said, add analytics, etc. And finally, Google made the decision to kind of make their internal petabyte scale data analysis tools available to something called Google Cloud platform to everyone so everyone could use the same infrastructure to store and analyze data.

What is BigQuery?

BigQuery is a data warehouse – somewhere that you just put data in and It stores your data securely and reliably. It allows you to query that data and transform that data, manipulate the data, join it with other data sets, etc. It is fully managed.

So you don’t need to set up any servers or do the maintenance or install softwares to do the data analysis or upgrade software etc. You don’t need to deal with anything, you just ingest your data. And you can start querying your data right away. It is petabyte scale, not gigabytes or terabytes. You can have petabytes of data, and it can handle any volume of data that you can possibly imagine. With no problem and fairly quickly, and it’s cost efficient as well.

So just about what the comment I made earlier. It is actually for everyone. It is is free to begin with, first of all, and then you only pay for what you use. I will get to the details of pricing and later on. But just know that there is no barrier it is not at all expensive. It is so reliable its backed by the Google data centers that Google eats themselves. And then is secure and auditable.

So by secure I mean, when you have your data in data warehouse in BigQuery, you can allow different users just like a Google spreadsheet, to either view data, query data, add, remove, delete, you can manage the permissions of the users and you have an audit trail to see who has made what changes, who has run what queries, exported data, whatever someone does. Any of these users is completely auditable. So in short, it’s enabled data storage and analysis in massive scale for everyone.

Benefits

Now, yes, it is powerful, it is good. But what are the benefits? Why should a marketer Why should a small company doing online marketing for themselves decide to move their data into BigQuery? Why should we even consider it is okay, it’s powerful, but what is the reason? First of all is the concept of data ownership, which I talked about earlier. So companies, they invest a lot of time, money and resources into tracking and collecting useful, clean and meaningful data, okay, they want to track data about user behavior, about conversions about your acquisition channel channels, but the data that they collect, right now, if they don’t use a data warehouse, it is owned by the tool. It is owned by Google Analytics.

It is owned by Facebook as they are the ones who decide what you can do with your data. What portion of data you can see and use. And then you’re limited to the tools reporting interface as well. So companies, they want to own the data. And that is the main reason someone wants to send a data into BigQuery. And even you can, companies can decide where they wants to their data to be located, they can select the data to be stored in EU or us data centers, to just comply with the regulations, if any.

And then they can have access to unsampled reports, any query that you can run in BigQuery. It is not samples not online Google Analytics. So you don’t have any limit on the volume of data. Another reason would be data enrichment. So when you have your data from your marketing tools in BigQuery, you can enrich your data with external datasets that you already have.

So most of you I do have product profitability data, data about cost of goods sold the vendors and operational costs of your marketing. You just, you don’t just pay for ads. You also pay for the vendors who are running ads. Okay, you have access to the shipping cost per order, or data about the users inside your CRM. I mean, there are lots of data that you have that is external to Google Analytics and any other platforms that you’re collecting data into. You can now bring those data also into BigQuery and enrich your marketing data and have a more meaningful datasets to report and analyze.

And then you have unlimited possibilities for reporting and analysis. You can do machine learning, you can do AI, you can do forecasting, sky’s the limit. I mean, if you have access to data scientists and data analyst and you can give them access to your BigQuery data warehouse and just to start just asking smarter questions of your data, and they will help you just get the answers that you want. Ah, what are these?

I forgot about it. Yeah, I just wanted to say that you have unlimited power to process transform, explore, clean, and extract insight. And you need to use SQL, which is a structured query language inside BigQuery to do that.

Pricing

Now, the good news about pricing, it is pay as you go pricing, you only pay for the amount of data that you saw, and the amount of data that you process with your query.

So if you don’t query your data for a long time, like a few months, you want to pay any price for querying data, and you just pay for the storage of the data. Today, you actually have in there. And there is also very generous free tiers as well. So for storage, you can install the first gigabyte 10 gigabytes of storage per month is completely free. And from that point forward, you just pay two cents of the dollar per gigabyte per month. And if you keep your data for more than three months, the same data set, the price will be cut in half as well.

So it’s really cheap in that sense. And then the price for queries is that the first terabytes of data that you process with your query is is completely free. And then for each additional term, it data that you process you just pay $5. So as you can see, it’s, it’s within the reach of most organizations, I would say. It is more expensive to hire someone to actually to process query as a user so the tool itself is quite cheap.

Extract & Load Data: Getting Data into BigQuery

Now, I talked about extracting data from different tools, loading it up to BigQuery, transforming it, and then visualizing it. And now I’m going to dive a bit deeper into each of these stages. Okay, so the extract and load getting data into BigQuery. There are different ways of doing that. I’m going to mostly talk about Google Analytics. But again, most of these are applicable to other tools as well.

1. GA 360 to BigQuery

So for the users of Google Analytics 360, they can directly export data on a daily basis automatically into BigQuery. So it is, it doesn’t need any setup. It’s fully automatic. You just enable it and Google Analytics 360 for the push data is streaming directly to BigQuery every day, and it will do a complete data export. So whatever is there inside your Google Ads account will be directly exported to BigQuery. But I said it is free, it is free and it comes as free by with Google Analytics 360, where it’s GA 360.

It costs a minimum of 150k US dollars per year, which is out of reach for many small companies and organizations. But the good news is that with the new Google App plus web properties, you will have the option to directly export to BigQuery for free, at least for the moment, I hope it stays the same. But yes, if you have a property, which is a apalis event space property, you can just set it up to send data every query, there is absolutely no cost needed for the property, nor for the data transformation data transfer.

2. Google Analytics + API to BigQuery

The second method for the users of free Google Analytics is to use Google Ads Fix API and optionally, send data first to a Google Sheet. And then from within the Google Sheet use Google Apps Scripts to stream it to BigQuery. And the one of the cons of this method is it is manual setup, you need help of a developer. To do that, and you need to use a tool. I mean, the tool is easy, you need to use a tool a big add on or super metrics to bring data from analytics to Google Sheet. This part is easy. But from Google Sheet to be query you need to write App Script. The other downside of this method is that the data that you export is selected. You need to decide what dimensions and metrics Do you want to pull from Google Analytics API and sent to BigQuery. You can’t just say Oh, Once all data, okay, because the API GA API has its limits, you can, I guess it is seven or nine dimensions and a few metrics at the same time.

So yeah, it is not possible to do a complete replication data ownership cannot be achieved easily. With this method. You’re still limited to the volume of data, a sheet can process and store, pair run at least. And then Google Apps Scripts has a run time limitation of 90 minutes per day, as well. So if you go more than that, you’re in trouble. And you need to monitor maintain your data pipeline as well. So you need to check the runtime of your app scripts and coming in data from ga to Google Sheet. You need to keep an eye on it to make sure that every day you’re receiving the data that you want, but it’s free. It’s almost free. The tools are almost free, the Google Sheet ads on is free and running an App Script for 90minutes a day is also free. So it’s a good alternative to GCP 64 uses a free GA.

3. Google Analytics + 3rd Party Tools to BigQuery

Now, the final method is to use a third party tool, okay to get data directly from Google Analytics and push it inside BigQuery. Now, I use supermetrics mainly, I know that five Tran does it as well. And I’ve used as stitch and still sometimes uses stitch as well. The setup is easy. With these tools, you just connect them to BigQuery and connect them to Google Analytics that they do the rest for you.

Some of these tools they do a complete data replication. like supermetrics does that, supermetrics for BigQuery does that, five Tran does that, but when using a stitch data, you need to just just like previous methods, you need to define the actual dimensions and metrics that you want to replicate. There’s almost no maintenance needed. So the tool, you’re paying for the tool, so it was they do their maintenance and updates for you. But you have to pay for the subscription. But anyway, it is really cheap compared to buying Google Analytics 360. Okay, it depends on your users and the volume of data and they have different pricing structure, but they’re still fairly easy to access.

Transform Data: Clean, Join, Prepare and Manipulate

Now that we’ve seen that, how can we bring data into BigQuery? Let’s see two ways of transforming data and by transforming I mean preparing, like and clearing duplicates, matching data, like all the numbers between your CRM and your Google Analytics and other data tools, join different data tools, enrich your data, clean them up, converts capital cases to lowercase seems like that.

1. Using SQL

The first way of actually transforming your data is through SQL, which is stands for Structured Query Language. It’s easy to learn, you can use it to query data, join multiple data sets and manipulate your data. And it looks like the code that I’m showing on the right side of my screen.

2. Using Cloud Dataprep

The second way is the Google clouds Data Prep, which allows you to do the same kind of transformations visually. And through a web based easy to use user interface. And this is a quick animation of the way you can explore data see the visual part of your data set, see the how your data looks like. And it is really easy to use and for those who do not want to learn or use SQL. It’s a really good alternative. So that was for transformation.

And now I’m going to quickly cover using your data. Okay, now you brought your data into BigQuery from different data sources your transforms them, prepare them, clean them, you have a final, or find several final tables that are really clean, and you’re ready to use your data.

The first way of using the data is reporting and visualization. I use Data Studio you can use any similar tool like tableau, or anything like that to visualize the data, create reporting dashboards and share it with their friend teams and stakeholders in your organization’s or your client organizations. Different people need different data needs a CEO might need an overview and a monthly trend. While someone doing day to day ad optimization.

They might need tables and sheets of data with some organic granularity. In the metrics and dimensions that they want, so you can create different reports for different functions in any organization. You can analyze the data if you are a data analysts or if you can hire someone to just dive into the data set and analyze it, create advanced queries, applying machine learning algorithms do time series forecasting. Anything it is possible because BigQuery allows you to do anything you want with your data.

Using Your Data: Visualization, Analysis, and Upload

And then finally, this is a really cool example. You can upload your the result of your analysis back to Google Analytics, or other tools to do exciting things. I’m going to give you an example. So I have a friend they run a company in the UK. They bring data from different sources like Facebook, Google ads, Google Analytics, Bings, etc. They do some data modeling inside BigQuery and then they run the final model data through machine learning to figure out how much likely is a user to convert in the next seven or 14 days or 30 days. So they give a score to each user, between one and 100. That shows how likely are they to convert anyway.

And then if you can upload this data back to Google Analytics to create a retargeting audience, okay, so there is a data upload functionality in Google Analytics. You can upload these as custom dimensions and any user scope. And then you can create audiences, which you can which you can link to Google ads, to retarget just to the people that are like 70% to 90% likely to buy but not more than that, because they are likely to buy anyway.

You don’t want to spend on retargeting to those people. So this is one of the cool use cases of activating your data and they also do machine learning to do attribution, data driven attribution in a really sophisticated way. That was really exciting for me at least to see the use cases. So it sky’s the limit the when you have your data and you have the power and tools of Google Cloud, you can do almost everything you want, and your team is capable of.

Hands-On Demo Using BigQuery

Now, let me start with the hands on demo. Drink some water. Julian, are we good? Are you guys can hear me excited? Okay, one second.

Okay. Okay. So, let me start with a with a hands on demo. There might be some issues in between, but I will try to fix them. I’m going to the let me just give you the objective the use case. So I would like to create and calculate the market penetration of a client’s pair US estate. And it is a really, really simple calculation.

But let me tell you about this. We know how much traffic our client is getting per state from this JIRA account from the data inside Google Analytics. But the number of users naturally are higher from the states that have a larger population.

Extract Data from Google Analytics into Google Sheets

So if you take a look at any websites, they have more users from New York, for example, or bigger cities or, you know, larger states. But we’d like to illustrate how that traffic volume per states compares to the total population of the states. This information population of each state is not available in Google Analytics. So this is an example of an external data set that we want to join with the data that we have inside GA, okay to calculate a really simple metrics users per state population to figure out the market penetration as we define it for the purpose of this demo, okay. Now the steps that I’m going to take is that first, I’m going to extract data from GA, into Google Sheets using GA  add on for Google Sheet.

Then I’m going to load that data from Google Sheet into BigQuery, with Google Apps Scripts, so so far, everything is free. And then I’m going to create another table, USA population in BigQuery. And then I’m going to do a simple data transformation. Join those two tables together to calculate users per state population. And then the table I’m going to connect it to Data Studio and visualize. Okay, that is the end of the presentation. Let me exit.

And let’s begin. Okay. Oh, let me just do some quick preparation. Just bear with me. This is a document, I just outline the steps when I was trying it out. So we don’t run into many issues and errors. And I’m going to follow that so I can do it quickly and be mindful of your time. Now, first of all, this is our Google Sheet.

For BigQuery live demo, I’m going to use the Google Analytics add on to create a new report. If you don’t have this, you can just click on Get add ons and search for Google Analytics and it will show up on the add ons. Then you can create a new report. It’s working.

So it will show you a sidebar. And in that sidebar, it will show you all the accounts and properties and we use that you have access to you can name your report. And you can choose the different metrics and dimensions and segments that you wants to get from Google Analytics. So for metrics I wants to get users the number of users, for the dimension I want states and I want the US estate, let me just I figure out that I need to put the state first and it is called region in Google Analytics. Region and date, okay. So this add on this helps me to connect to the this Google analytics for you.

I’m going to call it daily users state. And then I’m going to create a report. Now I’m not going to use this client for this. Later on, I’m going to change data with a random data set, because I don’t want to expose my client data in here.

So as soon as you create a report, it will create a report configuration sheet inside your Google Sheet. And you have the view ID a start date and end date are set to 30 days ago and yesterday, you can change these and if you want to know how, please refer to the link here. There are lots of data, how you can configure different fields in here. And then, right now, let me just just bear with me for a moment. I’m going to

change this to another view. Because I don’t want you to see which Google Analytics we’re getting my data from. Now. Okay, so this is another client. I’m getting users per state and per day for the last 30 days. Next I’m going to add ons Google Analytics and run reports.

Load Data from Google Sheets into BigQuery

Okay, so the report has completed successfully it is. It has created a new sheet for me and it is connected to Google Analytics and brought me to data for region data and users now as you can see, we have a lot of regions that are not they don’t seem to be US regions, done a different parts. So we need to filter this report to just get data for the United States. And this is something that was not possible to do through the sidebar, we need to do it here. Now I want to, I need to write it down GA country equals United States. Now it will apply this filter to the report. And just for the sake of it, I’m going to increase this limit, just in case there’s more than 1000 rows in the data sets that it’s going to retrieve. And now I’m going to run the report again. Okay, so let’s see, what do we have?

We still have some not set regions, which I’m not sure what they are, but the rest of the regions they seem to be USA stays Fine. Let’s quickly take a look at how many rows do we have. So even for a simple query like this, we have like 2500 rows. And believe me, it can easily get to 5 million when you’re working with bigger accounts.

Now, what I need to do is that I cannot use data in this shape because there are lots of rows at the top. What I’m, what I want to do is that I want to extract the actual datasets from this sheet into another sheet and then sent that stream data into BigQuery. Okay, so Let’s call this sheet data. This is the data set, I’m going to stream it to be query and for this I’m going to

reference this range here ok. And we need to put it in inside an array for it to work and we want from row cell A 15 until the end of the document Okay, it works.

Now, something that I need that I know that I will need is that I need to change the change these two columns, but I cannot do that right now because data here I would say to are in this order. So what I need to do is to change the order here. Bring GA date, first and then GA region and run my reports again. I guess I made a mistake with changing the order in the sidebar at the beginning.

But the reason that I need to do it is that yesterday I was trying this and the code that I wrote to send it to BigQuery expects data to be in this order, first date, then region, and then users. If we were going to use first region and date, it would end up throwing an error.

Okay, so right now I have my data. inside Google Sheet, I created a connection between Google Sheets and Google Analytics to bring in data the way I wanted, this was my first easy transformation from this ugly sheet to the clean data set. So even this is a transformation that I done, I did in sheets.

And now let’s do some preparation in BigQuery. So this is the interface of Google Cloud BigQuery. BigQuery is just part of the Google Cloud Platform. If you click on the menu here, you see that you have access to lots of services inside Google Cloud, and BigQuery is just one of them, which I’m using right now.

Okay, now, here. This is the project. Sarawak. BigQuery. demo, okay. And I’ve already created a, this is the trial that I did yesterday. And this is the live data, live data set that we’re going to work with. Okay. Now, the way I created it I should have waited for the live demo to create it, I just clicked create data sets, gave it a name and clicked create data set to create a state GA data live. Okay, now here I’m going to create a table. Okay, an empty table to bring this data to stream this data into BigQuery. So my table in BigQuery should have three columns, state region, and users. Let’s see. It is an empty table. This is the project this is the data set. The name of the table would be daily underlined users per state.

Now the schema which means the actual structure of the table we need to define it here so we have date, okay, which is in the date format. We have region, which is a string format. And we have users, which is in numeric form. Okay. That is the type of date. I’m going to create a table.

The table has been created; we can go to the table. So this is our project, our data sets a table. And when we preview the table, it’s completely empty. Now we need to go back to Google Sheet and use Google Apps Script to stream this table into Google Sheets into Google Cloud BigQuery. Okay, so for this, I’m going to use tools, Script Editor. When you use that you will be given access to a script editor environment, which you can write test scripts in the JavaScript language. And then it will have access to the data inside your Google Sheet. And you can write and run any function here. Now let me grab the actual just bear with me for a moment. I’m trying to copy and paste the actual script. Okay, that’s it.

So this is the script that we were going to use to send data from this Google Sheet to BigQuery. As you can see, it’s not so long. It’s just 58 lines. We’re creating some variables and then we are defining the shape of the data as you can see the exact same values that we used in BigQuery. We have a table there Which a date region and a user’s column. Now I need to change the actual values here. So project ID is the same by the data set and table are referring to the ones that I created yesterday. So we need to put the data set ID GA underline data underline live.

This way, we are actually telling this script where to put data. So go find that projects. In that project, go find this data set. And then what did I name my table daily users state? Okay, so no Google Analytics. It’s a good practice to put GA as well, but I just forgot to do it today, so you know where your data for the table is coming from daily user per State.

Okay, so we can save our let’s give it a name Google Sheets to Big Query. Okay, everything seems to be fine. It is getting the values from this spreadsheet and a sheet named data, which we named. Let’s just make sure yes, the sheet is named data.

And the rest is the code that pushes data actually into BigQuery. Now, before I want to run this script, we need to enable the BigQuery API inside Google Apps Scripts. So BigQuery API I need to enable

and then I’m when I’m trying to run the function for the first time, they most likely will receive an error which requires me to authenticate. But let’s see. Okay. Yes, authorization required. I need to give access to this script to also access my BigQuery account. Okay. It is using my other personal use my personal account here. Do you have it just in BigQuery already that data? I think we can all imagine it will be uploaded. Ah, yeah, it will be uploaded, I can use BigQuery, you’re right.

Okay. So basically you authenticate your BigQuery. You run it and bam data goes into BigQuery. I have the setup from yesterday and this GA data account. Good point, Julian. We can save time this way.

So never, never trust tutorials and live demonstrations because usually they don’t work. So here is the All that I use yesterday to send data to. And we can see the preview here, the date region and users

have been strained into this table, we have 1760 rows into this table, I can preview. And then when I said you can use BigQuery to query your data, just the simplest SQL query of all time, you can select everything from this table, and you can set a limit. So in this table, we have 1760 rows, we can select just the first thousand rows, we can run this query and it will return the resulting data set. Okay, so this is the same data set basically, but we just wanted to see the first thousand results. So this is how we run a SQL query on a table.Okay.

Now,this is also another data set that I imported into BigQuery, the same way. But there is also another way of importing it I just wanted to show you very quickly is that you can come to your data set and create a table. And you can look it up online later. But you can create a table, not an empty table, but from a Google Sheet. So you give access to anyone to view to a Google Sheet, put the URL here, give it a name, and it will synchronize with your Google Sheets every time big sheet updates. It is a bit slower and there are pros and cons of different methods. But just know that this is also a possibility of bringing data from sheets to Google Sheets to BigQuery.

Enrich with Another Data Source (US State Population)

Now, wherever I Okay, so the US state population This is another table that I created. I believe the other Yes, this is coming from this source. Okay, this is an external configuration is coming from Google Sheet and it’s connected to another Google Sheet. Now what we need to do is to join these two tables together with a query. Let’s let me bring my query. So as I said, everything is auditable inside BigQuery. So I can come here and look at all the queries that I ran yesterday, the day before,

I guess this one, the was the final query that I use to join these two tables together. Okay. Let me just check it quickly. So it is selecting everything from the GA daily users estates three, which are our which is our users per state table and then it is joining it with the US State population, okay, this table where region equals a state.

Okay, so it is looking up, it’s just like a V lookup, it sees that we’ve had like, hundred people 100 users from this region in this table, and it will go to another table and looks for the value versus state to match. And then it will return the population. Now if I run this query, it will connect to both tables, this one and this one. And it will join them together and will return a final data set.

Something I wanted to show you also is that when you run a query, it will show you how much data it has processed like 59 kilobytes. And you know, you’ll have one terabyte per month for free. So feel free to just play around with the query running queries on your own data. It is not going to cost you almost anything. So this is a new table. This is not the one that we have in Google Sheet.

This is not the just the state population. This is these two data joined together, we have dates, and on each date from each state, how much population the state had, and how many users did we have from that estate on that day. Now, the quickest way of connecting BigQuery data to do Data Studio is when you’re looking at a table, you can just click on explore data, and explore data with Data Studio.

Of course, you can start from within Data Studio, create a data connector on a data set and use choose BigQuery to be your data set. And then connect to any tables that you have here, or even write your query inside Data Studio. So you can define your query in Data Studio as well. Maybe you have a large table in BigQuery, and you just want the first thousand rows, for example. Now, it is going to load the contents of that

table inside Data Studio. It is not a report, it is a Explorer, I’m not going to add it to a report because basically, the features are the same. We can see data, we can visualize them, we can, for example, create a bar chart of region and the number of users. you can see how easy it is to visualize our data that we have in BigQuery.

Connect to Google Data Studio and Visualize

In Google Data Studio, we can just connect and visualize. But remember, we wanted to create a calculation. We didn’t just want to join data together.

So right now, I want to convert it to a table again. And then instead of just a number of users, I’m going to keep us as keep users, I’m going to create a new field, a calculated field, and I’m going to call it market penetration. I mean, it might not be the exact definition for this metric, okay. Now this would be the number of the sum of users. So we have to find we can use functions, some of our Okay, so users divided by now we cannot solve the population because for each day, if if we want to, like if your state has 10 million people inside and for each day, we want to sum this 10 million and increment it.

Simple Data Transformation to Calculate Users/State

Then if we are reporting on a 10 day period or 15 day period, we end up with like 150 million people, which is not correct, but because for every row we have exactly the same number from population, we can use any other aggregation like maximum minimum average, I’m going to go with average of population.

And we want it to be a percentage. Okay. That’s it. So, as we can see, we can see the state, number of users and the market penetration for that state. So some states are higher, some state or lower, depending on their population and the traffic they received from that states. I guess that is all let me check my notes to see if I wanted to cover anything else.

But no, basically, it’s the same. You can save this as a report, you can share it within your organization with your clients. And the other thing, which yes I wanted to cover was that when you set up this connection between Google Analytics and Google Sheet, then you need to also set up a trigger. Here to schedule a report. You can schedule it to run every day, at a certain hour to get new data inside this sheet, and then also your App Script, which it didn’t run and I closed it.

You can also trigger that app scripts every day, maybe an hour later than your data ingestion from Google Analytics. So for example, 5am in the morning, every day you get new data from Google Analytics. 6am you trigger that script to run to send data to BigQuery, and by 7am, everything is here ready. And if you refresh your Data Studio reports, our numbers will be updated to reflect the final number of users.  And that’s basically it.

Summary

Alright, so there you have it. BigQuery for marketers. Thanks so much for conducting this live training for us. If you want to find out more about what Ahmad is up to, we have his links in the description down below.

And if you want to take part in the next MeasureMaster live training, then definitely click on this button right here, which will take you to our landing page, and then you’ll be able to join us next time ask your questions live and be there.

I’m looking forward to it. That’s already it with this week’s video. If you want to stay up to date, which is what we do. Maybe hit that subscribe button, and I’ll see you in the next video. My name is Julian til next time.

Julian Juenemann
About the author: Julian Juenemann

Julian started and grew venture-backed startups with his unique 'data first' approach to Online Marketing. He then founded MeasureSchool.com to help marketers, like him, the data-driven way of digital marketing.

guest
7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Marcell Kolos
Marcell Kolos
2 months ago

Hi. Thanks for the video! How can I reveal the client ID of my GA sessions without using the GA360+BigQuery integration? Is it possible using only the Google Analytics Sheet Add-on?

Marcell Kolos
Marcell Kolos
2 months ago

Thanks a lot, Julian!

Marcell Kolos
Marcell Kolos
2 months ago

And what about the previous sessions’ Client IDs? The only way to get them is GA360+BigQuery integration? But at least it would do that, wouldn’t it?

Jack
Jack
2 months ago

Thanks a lot for this content Julian, as usual! Any chance you can post the app script that Ahmad used?