Last Modified on July 5, 2024
Digital analytics has come a long way in making our lives easier when it comes to collecting, analyzing, and getting valuable insights.
Regular Expressions aka RegEx, RegExp, or RegEx has played an important role in making it easier for us.
Anyone who has worked with Google Tag Manager and/or Universal Analytics has come across them and while some are good with them, others of us had a basic understanding and made things work. But hey! As long as it works, right?!
Anyways, RegEx is a prime example of how you can use something small to do more. With Google Analytics 4, things have changed, but RegEx is still there and it can be a big help if we know when and where to use them.
Here’s what we are going to cover in this GA4 RegEx tutorial:
- What is RegEx and Why Use it?
- What are RegEx Match Types?
- Where Can You Use RegEx in GA4?
- Common RegEx Characters
Time to RegEx!
What is RegEx and Why Use it?
RegEx is a set of characters used together to identify patterns in text, i.e., string. It has been in use for a long time before Google Analytics existed.
They are used quite a lot in programming languages as well, to manipulate text based on the matching criteria.
RegEx can be super helpful in many cases. To cut it short, the objective is to match a certain pattern and return all the values in the text that match the criteria.
The output of RegEx also depends on what type of match you’re looking for, i.e., should it contain a pattern, exactly match with it, and so on (generally what we see in the filtering conditions).
This saves us trying to individually find each item, especially when we are not sure about the exact output when the values can be dynamic.
Let’s have a brief look at what RegEx can be used for:
- Text Extraction – It’s useful to extract specific information from a larger text body or multiple lines of text data when there’s a match with the pattern we are looking for.
- Data Validation – It’s commonly used to validate user input in fields. You can define the patterns the text data should match for it to be accepted, e.g., passwords or email addresses.
- Text Manipulation – You can use RegEx to edit, replace, or remove any portion of the text, which can be helpful when quickly cleaning or formatting data.
- Tokenization – It’s also used to break text into smaller pieces, aka tokens, based on specific criteria, which is primarily used in text analysis and natural language processing.
Out of all these, the most common reason we use RegEx in GA4 is for text data extraction. As you now know, it has a very wide usage.
What are RegEx Match Types?
Match types are filtering conditions we’ve seen in UA and GA4 even when not using RegEx. The match type can affect the results we see and is often easily overlooked, but it plays an important part.
Let’s do a quick recap of these RegEx match types:
- Matches RegEx (and does not match RegEx) – This condition matches RegEx patterns exactly or not, if you choose the ‘does not’ condition.
For instance, in Google Marchandise’s GA4 account, we want to see all the page path dimension values that contain/Google\+Redesign. Then, it won’t match since it won’t be exact. But, if we enter /Google\+Redesign/Apparel, then it will as that exact value is present.
- Matches partial RegEx (and does not match partial RegEx) – This match type might be used more often when you want to get all the values that ‘contain’ a word or pattern, as it shows the results of all the values that have that word in it.
So, now if we use the /Google\+Redesign for page path and screen class dimension, it will show all the pages that contain those values.
🚨Note: Filter match types are case-sensitive, so if use a different case than what’s in the values, it won’t match unless you account for case variations.
Where Can You Use RegEx in GA4?
You can use RegEx in the GA4 interface in several places for different reasons. What are they?
- Standard Reports
- Explorations
- Segments and Audiences
- Internal Traffic and Unwanted Referrals
- Create or Modify Events
- Custom Channel Groups
Let’s analyze them one by one.
Standard Reports
Standard reports, aka detailed reports, are the ones that are available by default in your GA4 account. There are places generally, where you can filter:
- Comparisons
- Report filter
- Table filter (right below the visualization)
Out of these three, currently, we can only use RegEx when using the Report filter option, though it would be handy to have it for comparisons and the table filter.
To see it in action, go to Reports → Acquisition → Traffic Acquisition → Click on the ‘Add a filter’ button at the top.
You can choose any other report as well, as long as you see the ‘Add a filter’ option at the top. Note that it is not available for Overview, Conversions, In-app purchases, and Publisher ads reports.
The Build filter interface that opens on the left side is where we can use RegEx. Let’s say we want to see the traffic performance for Organic and Email channels.
Select Session default channel group as our dimension, and choose the Match type – matches regex (for exact match) or matches partial regex (as long as that word is present).
We want to go for the partial RegEx match because there are no channel groups with the word Organic only. In the value, we’ll enter Organic|Email (the pipe in between acts as ‘or’) and click on the blue Apply button.
Now we can see the results for all the available Organic and Email channels, and others are excluded:
What if you don’t want to see these channels, but all the others? You can repeat the whole process but select does not match partial regex as your match type.
Now, these three channel groups are excluded from the results:
It would’ve been quite helpful if we could do it directly in the table filter, but for now, let’s use what we have and see how we can use RegEx in GA4 explorations.
Explorations
Click on Explore in the sidebar nav and open any exploration from the template or the one you’re already working on.
In the Exploration, under the Settings column, scroll down and you will find an option to apply a filter.
Let’s say that this time we want to filter the Source/Medium dimension. You can either drag and drop or click on the + Drop or select dimension or metric and next select the conditions.
As you can see, there’s only the matches regex option (and the does not match regex), but no matches partial regex, which is a bummer! Because if we want to filter Source/Medium with all ‘organic’ values in them, it won’t work because there isn’t a value with ‘organic’ only.
So what can we do? We will have to list all the values we want to filter as they appear under the Source/Medium dimension including any spaces or it won’t match.
So, to get all the organic source/medium categories, we used bing / organic|google / organic|baidu / organic.
The problem with this match type is there are a lot of values we might end up missing, which is often the case when we want to use RegEx. For instance, we didn’t include yahoo / organic because we didn’t see it.
So, in this case, RegEx isn’t as helpful. We hope that Google will provide a matches partial regex option here in the future. However, there’s a way around it that you can find in the Common RegEx Characters section.
Segments and Audiences
Segments can only be applied in explorations, so let’s use the same exploration and create a new segment under the Variables column.
Once in the segment setup interface, we can select the conditions. For instance, here we want to see data for events where the device category is mobile and desktop.
Once again, there are only matches regex available, so we have to enter the exact values – which is simpler in this case.
Once we save and apply the segment, we will see it in the results. To verify, you can add the device category as a dimension as well.
As for audiences, when we are creating the segment, we have the option of building an audience from it as well. If we open the segment we just created, you can see on the right the checkbox to Build an audience.
With the Build an Audience option, you can also add sequences that are otherwise only available for the user segments.
Using RegEx here also means that we still carry over the limitations of the matches regex match type.
Internal Traffic and Unwanted Referrals
Another place where we can use RegEx is when we want to define internal traffic and list unwanted referrals found under the Admin → Data Streams → Configure tag settings.
We can use the IP address matches regular expression match type to define patterns of multiple IP addresses as seen in the example below:
So, what we are saying is that if the IP address matches ‘90.204.’ and the rest of the value can be anything, then filter it as internal traffic.
When it comes to unwanted referrals, RegEx can help us exclude multiple domains and any variations in one go.
This is especially helpful when we want to exclude third-party payment processors or want to send users to a different domain to change their login details.
Once you click on the List unwanted referrals option, you will find the setup interface where you can choose the match type – which will be Referral domain matches RegEx.
So, the stripe|paypal\.com RegEx will exclude any referrals from Stripe and PayPal, as it doesn’t make sense to see them as referrers because they are part of running your business. They should be listed as unwanted referrals.
Create or Modify Events
You can create new events in GA4 based on the events you are already receiving, as well as modify any existing ones where you can use RegEx’s help to make your life easier. Go to Admin → Click on Events under the property settings.
We will go with the Create event option here, as RegEx works the same for both of them with two match types: matches regular expression and matches regular expression (ignore case here). The ‘does not’ variation exists; however, we see that no partial matches are here either.
Let’s say we want to fire an event called measuremasters_visit, whenever someone visits the MeasureMasters page on our website.
We can do this by creating an event from the existing page_view event using the RegEx of https://measureschool\.com\/measure-masters\/.
Note how we need to use the characters in ‘https://’ as the values in this field need to start with it exactly.
The highlighted warning by Google is a good reminder that if not necessary, you don’t have to use RegEx. This might be the case in our example, but it’s just to show where you can use it.
Custom Channel Groups
The last place in our list for now is when you create custom channel groups. Hopefully, Google will be adding RegEx to more places in GA4 in the future.
Let’s go to Admin → Data Settings → Channel Groups.
Google Analytics’s predefined channel group cannot be edited or deleted, which is a good thing so that no one messes with the traffic that’s coming in.
Clicking on the Create new channel group button opens the interface where you can Add a new channel, edit an existing one, copy to create a channel from it or delete the existing ones as they already exist in the default channel group.
As you can see, you can’t use RegEx on this screen but when you’re creating a new channel. This time, we have the partially matches regex option as well – which is quite important for channel groups.
Let’s just say you want to create a new channel for QR Codes where Medium has qr or code in it in any variation.
Now we will see another channel group after we have saved it, and with that, we have covered all the areas where we can use RegEx in GA4.
While mastering RegEx can take time and practice, knowing about some common characters and their usage can still prove helpful.
Common RegEx Characters
The below list is not exhaustive, but these characters are commonly used in GA4:
- Pipe | – Acts as OR match type between or more values. It doesn’t work at the end of an expression, e.g., mobile|desktop.
- Dot asterisk .* – Together these two act as a wildcard match, i.e., anything that comes after it, e.g., organic.* will match all the values after it. Remember, how we talked about GA4 matches RegEx acting as exact match? This can help us deal with that. In our example, we mean anything that starts with organic and then whatever comes after it. If we also use it in the beginning, then that means anything before or after that word .*organic.* will match all values where the word organic is present.
- Backslash \ – Aka Escape tells to treat the next character as it is rather than part of the RegEx e.g. www\.measureschool\.com. Here the backslash is escaping the dot as a literal dot vs. treating it as RegEx.
- Dollar Sign $ – Signifies the end of the string and tells the expression that the pattern must occur at the end of the string. For example, you only want pdf files to match, so you can use \.pdf$.
- Caret ^ – This is the opposite of the dollar sign, so this means the pattern should match in the beginning. E..g., we want to filter all IP addresses that begin with 192.0, so we can use ^192.\0\. which would match strings like “192.0.2.1”, “192.0.100.10”, and “192.0.2.0/24.
- Question mark ? – It is used to match the preceding character 0 or more times meaning it is optional. For instance, the expression ^https? would mean that the URLs must start with http and they can be either http or https.
It can also turn greedy matches into lazy ones when used with quantifiers, but now we are sliding into a lot more RegEx than is needed.
You might not be able to master RegEx yet, but you can certainly start using it at a basic level. Handy websites like RegEx101 are also a great way to test whether your expression matches what you want.
Summary
We’ve learned a lot in today’s post from understanding what RegEx is and why we should use it to where we can use it in GA4, which will hopefully expand in the future.
We also learned some really useful RegEx characters that we can quickly start using in GA4, especially the pipe, as we are often looking for two or more things to match.
RegEx can be quite powerful to get more done with less, but at the same, it can get complex. If you can use other match types like contains, begins with, etc., then there’s nothing wrong in doing so.
But practicing is the only thing that will help you get a grip on it, along with understanding its nuances and using resources like RegEx101 to ensure you’re on the right path.
Even ChatGPT can come in quite handy when it comes to RegEx. Check out this guide on How to Use ChatGPT in Digital Marketing to find out how.
Let us know in the comments below how you use RegEx in GA4 for your day-to-day analysis!