Last Modified on February 22, 2024
Google Analytics 4 (GA4) comes with many exciting features and new tech, but this can also mean learning new concepts and understanding them well enough to use them to your advantage.
Now that there’s hardly any time left for GA4’s predecessor Universal Analytics (UA) as 1st July is quite near, it’s even more important to know what should be used, when, and how.
One of these new features is reporting identity which can affect the data you collect, so understandably it will also impact how you analyze it.
In this blog post, we will delve into the world of GA4 reporting identity, exploring its significance and impact on businesses. Here’s a quick snapshot of what we’ll learn today:
- What are GA4 Reporting Identities?
- Where Can You Find the Reporting Identities in GA4?
- Which Reporting Identity Should You Use?
- Limitations of Reporting Identities
We’ve got some exciting topics to cover, so let’s get going!
What are GA4 Reporting Identities?
As the name suggests, GA4 uses reporting identities to ‘identify’ the users in the reports, and these identities can affect what you see in the reports.
Google groups these identities into what it calls ‘Identity spaces’ and your chosen identity space(s) define(s) the reporting identity of your users.
Currently, there are four identity spaces in GA4. These are:
- Google Signals
If you’re an old analytics user, you might understand some of these right away, but we’ll still look into each one of them.
Before we move on, why does Google need reporting identities?
As you know, we don’t use just one device or platform in most cases when we are browsing the internet, which results in several sessions but we’re still one user.
Reporting identity is how Google tries to stitch the journey together for the same user who uses different devices and platforms before they complete an action.
Note how we say ‘tries’ because even so, there are some gaps that are left unfilled or data can be lost for several reasons, but this will not be discussed here.
Moreover, since these identities are used in all the reports, they are also helpful not to duplicate users in different reports/sessions – painting a more realistic story of your website users.
Let’s have a brief look at what these identity spaces mean.
Generally, user IDs are assigned to logged-in users and they are one of the most accurate ways of identifying and tracking your users across different devices and platforms because they have the IDs you assign them.
But it’s important that these IDids are tracked consistently across different devices and platforms and then sent to GA4 so that the user journeys can be associated with them.
If your users are signed into their Google account and have consented to share their information, then Google Signals can track across different devices.
Google Signals are designed to work with GA4 for better tracking and rich insights. It also allows cross-device remarketing and cross-device conversion export to Google Ads.
As the name suggests, this identity space is specific to the device which is taken from the client ID(unique ID for browser-device pair) for the websites. Whereas for apps, it is derived from the app-instance ID.
This is where Google uses its magic of machine learning to predict the behavior of users who don’t accept the analytics cookies. Modeling helps to fill in these gaps using the data of similar users who accept the tracking cookies.
Let’s see how these spaces can be used in the reporting identities. There are three reporting IDs available in GA4:
- Blended – It uses user-ID, Google signals, device-ID, and modeling. This reporting identity uses the user ID, but if that’s not available then it goes with Google signals if that’s enabled. If neither of them is available, then the device ID is used. Finally, if nothing is available then modeling is used.
- Observed – Unlike blended, this reporting identity evaluates based on user-ID, Google signals, and device-ID. As the name suggests, it’s based on observed identity spaces, and modeling isn’t observed, so it’s not included.
- Device-based – This is only focused on the device-ID, so your reports might only reflect a subset of users.
To use the first two identities to their fullest, you ought to collect user-IDs and enable Google signals. Where are these reporting identities in GA4?
Where to Find the Reporting Identities in GA4?
In your Google Analytics account, click on the Admin cog in the left bottom of the GA4 interface and click on Reporting Identity under the property settings column.
Once you do that, you will see the interface where you can choose how you want to identify your users. But the Device-based is not visible by default. You have to click on the small ‘Show all’ text at the bottom of these first two options.
We’re not sure why Google would hide it, but maybe that’s because they want us to use methods that involve Google signals and/or modeling.
Now, you can see that Device-based identity is available to choose from as well.
The small downward arrow on all three options shows you more information about the identity method aka identity spaces. The text before the downward arrow mentions if any method is inactive, e.g., Blended has 2 inactive methods.
As you can see in the screenshot above, you can’t select any specific methods when you click on the downward arrow – just some information on how these methods evaluate identity.
But which one should you use then?
Which Reporting Identity Should You Use?
The answer comes down to the fact whether your website has user-IDs setup or not, as they are more helpful in getting more accurate user counts vs other methods.
If the users-IDs are set up, then you can go with either of the first two options, i.e., Blended and Observed.
The first one provides you with the added benefit of modeling to fill the reporting gaps, and the second one ignores modeling. However, both of them will provide robust cross-platform and device tracking of users.
Device-based identity can be used in instances where the user-IDs are not set up and/or the traffic volume is low because Google Signals can cause thresholding.
This also means that you will see a lower user count in general because Device-based identity won’t be as robust as the other two methods in stitching the user journeys over different platforms and devices.
But, what is thresholding? And if it’s a problem, how is Google Signals good to use?
Limitations of Reporting Identities
There are three major limitations when it comes to reporting identities.
Thresholding is applied when Google Signals are enabled and some data can reveal potential user attributes like demographics and interests.
This is Google’s way of preventing you from inferring sensitive user information since it goes against privacy regulations.
So, you’re missing out on some user data and since data thresholds are system defined, you cannot change them (at least not for now).
How do you know it’s applied? You will see an orange triangle icon either on the top of the whole report or the specific card it’s applied to. If you click on it, you will see another graph-like icon that says Thresholding applied.
To deal with it, you can increase the date range so that there’s enough data that cannot be attributed to a small number of users, or remove the dimensions that are more related to demographics or user interests from the reports altogether.
A more reliable solution would be changing the reporting identity from Blended or Observed to Device-based, as it doesn’t use Google Signals.
So, thresholding is a limitation of any reporting ID that uses Google Signals. Switching between IDs shouldn’t be an issue as it works retroactively, but it would mean doing some back and forth in admin settings.
Modeling itself is not a limitation, but if your account doesn’t have enough data or doesn’t meet the other requirements, then Google won’t be able to train a model to provide estimated user data.
This only applies to the Blended reporting identity and even if you have selected it if the modeling is not available for your property, then you’re not going to benefit from it and it becomes like the Observed identity.
The other requirements Google has laid down for your property to use modeling are:
- Consent mode is activated on all pages of your websites and/or all screens of your applications.
- For web pages, ensure that tags are loaded before the consent dialogue appears, and Google tags load in all instances, regardless of user consent (advanced implementation).
- The property must gather a minimum of 1,000 events per day with analytics_storage=’denied’ for at least 7 days.
- The property must have a minimum of 1,000 daily users who send events with analytics_storage=’granted’ for at least 7 out of the previous 28 days.
- Training the model successfully may require meeting the data threshold for more than 7 days within the aforementioned 28-day period, but even additional data may not suffice for Analytics to train the model.
Behavioral modeling starts from the date when a particular property becomes eligible. In the highly unlikely scenario where a property no longer satisfies the prerequisites for behavioral modeling, despite previously meeting them, estimated data will stop being available.
If the property subsequently fulfills the prerequisites again, estimated data will once again be accessible. The estimated data will only be available starting from the date when the property becomes eligible again.
So you can see, just having the option of modeling under Blended reporting identity is not enough and there’s some other work that needs to be done before you can use it. But in practicality, it isn’t very difficult to meet those requirements.
Again, it’s not the user-IDs that are limited themselves in this scenario, but the fact that to use user-IDs, you have to set them up as well.
Blended and Observed, both reporting identities make use of user-IDs as the first choice of identity space. So, if user-ID is not available and you don’t fulfill the requirements for modeling, then you’re left with Google Signals and device-IDs.
Moreover, the user-IDs that you set up should not contain any personally identifiable information (PII) like name, phone number, address, etc.
You also need to comply with Google’s terms of service which require you to inform your website visitors about how you’re using identifiers to track them.
Apart from these three limitations, even when you enable Google Signals to get demographical data, there’s often a huge number of ‘unknowns’ for age brackets and gender data.
For instance, the below report from Google’s Merchandise account shows a total of 61,167 users, but 46,224 users have an ‘unknown’ gender.
This is not specific to any one account, as it’s normal to see these ‘unknown’ dimensions. There are several reasons for this like people opting out of ads personalization, they are not logged into their Google accounts, or they don’t even have a Google account altogether.
So, if you’re looking for accuracy, you would be disappointed. This data is just good enough to give you an idea about trends and patterns, but not absolute numbers.
All in all, the limitations for reporting identities can be overcome with some effort, giving you options to choose from these different identities and how to make the most of them.
The most common one that you will face will be thresholding, and as you read above, there are several ways to deal with it.
Which reporting identity should I use?
It depends on whether your website has user-IDs set up or not. If user-IDs are set up, you can choose either the Blended or Observed reporting identities. If user-IDs are not set up or the traffic volume is low, you can use the Device-based reporting identity.
How can I deal with thresholding in reporting identities?
To deal with thresholding, you can increase the date range to ensure there’s enough data that cannot be attributed to a small number of users. Alternatively, you can switch to the Device-based reporting identity, which doesn’t use Google Signals.
How accurate are the demographic data obtained through Google Signals?
The demographic data obtained through Google Signals may have a significant number of ‘unknowns’ for age and gender. This is normal and can occur due to users opting out of ads personalization, not being logged into their Google accounts, or not having a Google account at all.
GA4 reporting identities decide what type of user data you see in the reports and learning about them helps to know which one to use, when, and how it affects your data.
We also learned about different identity spaces, each one of them playing its part in providing user data and filling in the gaps.
Obviously, these identities are not without their limitations, but we covered those and now you know how you can try to circumvent them.
We also learned that one of the important and more accurate ways to track user journeys is using User-IDs, and from a privacy point of view, they are pretty safe as well.
Sofiia’s handy guide on How to Configure User ID in Google Analytics 4 will help you get started with user-IDs.
What’s important is using the reporting identity that suits your situation and provides you with quality data.
So, what reporting identity do you prefer to use and why? How do you overcome their limitations? Let our readers know all about it in the comments below!