The ability to filter data by user is critical whenever sensitive information is shared, whether internally to an organization or even more so externally. This is often known as “row level security” (RLS) in a number of relational databases and business intelligence tools.
The idea is that each row in a data table is evaluated against rules stated somewhere else in that database (or application) to check which parts of that row, if any, can be returned to any given user. Because managing by individual users is bound to collapse over the long run, I like to think of RLS as also meaning “role level security”, because it’s less about any specific people, and more about what function they have in their organization and thus what data they should be granted access to.
However, that phrasing is not universal, which can make it tedious to compare how different BI products implement this, if at all. RLS is often filed under “data level security” since it’s embedded, or at least closely associated, with the source data itself.
Object level security goes beyond the confines of rows to include tables and columns. It used to be available only in enterprise products but is making its way in the self-service BI space. Meanwhile data masking is an adjacent concept, where data is obfuscated rather than completely blocked.
This entry gives an overview of how some of the best-known players approach data-based security. It’s impossible to be exhaustive as BI is a very busy space, but this is probably as good a review as you’ll find anywhere online. The Power BI section is very thorough and up-to-date with the latest developments as of early 2023.
1. Core Concepts & How They Work in Microsoft Power BI
I’ll start with Power BI because it’s the tool I’m most familiar with, and also arguably the BI platform with the most momentum in the last couple of years. It’s also better documented and supported by a much broader community than most of its competitors. I should still mention that Tableau has been renowned for its community for a long time. Frankly, some other long-established platforms barely have a pulse.
This section is going to be the most detailed by far. Since it lays out core concepts that will apply to a large extent with many other tools, I advise you to at least skim it regardless of what you actually use.
1.1. Create and Modify Roles in the Desktop, Assign Users to Roles in the Service, Test in the Desktop and Service
Row level security was added to Power BI in June 2016, barely a year into the platform’s life, and roles and rulesets could initially be defined in the service. This was however deprecated years ago in favor of Power BI Desktop, the tool of choice for Power BI modeling. Whereas many other platforms use SQL for this purpose, in Power BI the DAX language is king, and that’s how you define table filtering expressions by role.
Power BI Desktop also conveniently lets you impersonate these roles to verify in situ that they’ll get to see what they’re supposed to see, and only that. With many other platforms you need to use a separate browser (or incognito session) to perform such tests. In practice this only works with the simplest RLS implementations as we’ll see in a minute.
Defining roles user by user within the Power BI Desktop UI is sometimes called “static RLS”, as you hardcode expressions that you then need to update manually. With its February 2023 update, Power BI Desktop added a GUI to generate rules that looks similar to how filtering works in Power Query as well as the online RLS editor for datamarts. It’s a simpler way to generate static rules without having to write DAX, but its lack of support for dynamic rules limits its usefulness.
Indeed, an alternative method called “dynamic RLS” is much easier to maintain by adding a user table to the data model and/or role table, to filter users against their userprincipalname(), username(), or roles you assign to them. With Dynamic RLS you want to limit how many roles users need to be added to, hopefully down to a single role. Either way, make sure not to leave any of your intended users (who are only workspace viewers) without roles or they won’t be able to see any data at all. Yes, that means you’ll create a role without any table filter expression to convey that users belonging to that role can see everything.
As a side note, it is possible to implement RLS with guest users using Azure Active Directory B2B, though the whole thing with [email protected] vs name_guestdomain.com#EXTfirstname.lastname@example.org user principal names is very confusing and poorly documented.
Power BI Desktop is focused on encoding RLS business logic in DAX, but the administration of users and the groups they belong to cannot be done with it. Instead, that part belongs to the service – i.e. the app.powerbi.com web application – and is accessible by dataset owners. From there you can also view reports as any of the predefined roles.
If you create roles in the service first, you’ll have to recreate them in the Desktop. The whole thing is clunky to say the least.
1.2. Map Roles to Groups or People, But Really, to Groups
Unless you’re in a tiny organization with no staff turnover, I highly recommend mapping roles to groups rather than individual users, in order to minimize administration overhead as people come and go. In this context “group” means Azure Active Directory security groups and Active Directory distribution groups (aka distribution lists), not O365 groups (sorry, “Microsoft 365 groups“). The difference between AAD security groups and O/M365 groups is explained by Microsoft here, with more details relevant to Power BI conveniently summarized by Gilbert Quevauvilliers here.
Even better, instead of manually assigning people to their respective AAD security groups, you can use dynamic membership rules that will automatically pick up changes in user properties. Add or remove someone to/from a department? Boom, they’re automatically added/removed to/from the corresponding AAD security group(s), which then trickles down to RLS rules.
To avoid having to input data in two places, you could retrieve AAD group members via the Graph API as the authoritative source for your groups/roles. However, Microsoft recommends that you do so using Microsoft Graph Data Connect (via Azure Data Factory) rather than Power Query, for a variety of reasons.
1.3. Data Masking & Object Level Security for Confidentiality Beyond RLS
Bear in mind that RLS is not applied to workspace members that have edit permissions, which makes sense because having these permissions means they’re allowed to be dataset authors. And Power BI dataset authors can see all the data as exposed in the data sources they work with.
If you don’t want your BI developers to see all the source data, they need to be provided obfuscated sources that are otherwise structured exactly like the production sources. Microsoft SQL Server does that with dynamic data masking, as shown in the video below. If you’ve been left wanting in the past at the lack of granular control in DDM rules, it’s worth revisiting with SQL Server 2022:
“Starting with SQL Server 2022 (16.x), you can prevent unauthorized access to sensitive data and gain control by masking it to an unauthorized user at different levels of the database. You can grant or revoke UNMASK permission at the database-level, schema-level, table-level or at the column-level to a user or database role.”Documentation
I’ve also worked with clients who provided me with CSVs structured like the real data but filled with bogus numbers, that they later substituted with their production databases. In this case I use Power Query parameters to make the swap easy for them.
The core principle behind Power BI’s RLS is that it’s by definition all about filtering table rows, not model objects such as entire tables or columns, and definitely not measures. Analysis Services does support table and column-level security (CLS), and object-level security (OLS) was added to Power BI (Premium) in 2021, meeting clear demand for it. Here is the public preview announcement.
Object-level security is a more secure pattern as it not only hides the underlying data, but also the object’s names so that the existence of these objects is not revealed at all. With a bit of creativity you can use OLS to hide measures.
This entry shows how you can restrict access to either an entire table or to a column using Tabular Editor as an external tool, while Chris Webb explains a more user-friendly setup that combines OLS, RLS, and field parameters to prevent charts from breaking when they include a forbidden column.
Note that the combination of RLS and OLS has to be done in the same role as a user can’t belong to both an RLS and an OLS role (video).
If you’re willing to roll your own DAX, you can restrict access to measures by using a measure table (with a column mapping to roles) and dynamic measures. This is a very flexible way to set up measures (RLS or not) but be aware this technique can have an impact on the performance of larger models.
By tweaking your model you can also emulate column-level security (CLS) as explained by Greg Deckler here.
Another consequence of filtering table rows is that these filters affect top-level aggregates too. Say your sales rep for New England should also be able to see the company’s total revenue, but not details at lower granularities. For this purpose, you’ll want to generate either summary or aggregation tables, or unpivot sensitive columns, depending on your exact requirements and underlying data structure.
A related idea is to combine some form of data masking with RLS. With this approach you’ll want to rely on explicit measures and hide physical columns from end users, which is often a good practice anyway. See this entry and the video below:
As an aside, a neat trick that you can easily add since you already have a user table is highlighting data relevant to the selected user, e.g. the sales territory they’re in charge of:
At a conference sometime in 2021 Microsoft announced they’d add the ability to control access by report page via RLS, but they’ve since then walked that back and it’s no longer part of their backlog.
1.4. What About Automation?
1.4.1. Testing RLS Programmatically with the ExecuteQueries REST API and Power Automate
In Power Automate you can run a DAX query against a Power BI dataset while impersonating another user using their UPN.
This provides an automatable and auditable way to do the equivalent of “test as a role” in the Power BI service, which up to now was available solely as manual UI feature.
Read details on how to do this here. In this example they’re getting the UPNs out of an Excel file, if I were to do this I’d look into getting them in Power Automate from Graph or from the Azure AD connector. I believe the former doesn’t require the same admin access as the latter, but it’s a bit more involved as it requires writing HTTP POST calls instead of merely using built-in Power Automate GUI actions.
1.4.2. Rules Deployment Automation
Deployment pipelines ease managing Power BI artifacts across dev/test/production workspaces, but be careful, RLS/OLS role assignment is not copied through them. You’ll have to use TMSL/TOM scripting to push the deployment of your RLS/OLS rules in addition to moving datasets through pipeline stages.
On the other hand, you can easily align the aforementioned masked/unmasked SQL DDL schemas with Power BI workspaces by using auto-binding, meaning that you can, say, used masked data as the source for the dev version of a dataset while an unmasked schema could underpin the production version of that same dataset.
Since March 2023 the Power BI Scanner API has been able to return dataset sub-artifact metadata content including RLS configuration. This returns a JSON payload that enumerates roles with their names, permissions, members, and table permissions including filter expressions, as pictured in the screenshot on the right.
“The CustomData feature allows you to add a Row filter by passing a free text (string) using the CustomData connection string property. Unlike users and roles, CustomData can’t be set within a .pbix file. CustomData can be used in a role DAX query, and can be used without a role in a measure DAX query. The CustomData feature is part of token generation functionality for dashboard, report, and tile artifacts.”Row Level Security with Power BI Embedded
I haven’t seen much written about this capability in Power BI yet and haven’t tested it, I’d love to hear from people who have. Its main purpose is to provide an alternative to Active Directory-based USERNAME authentication.
1.6. All the Above Overhead Plus the Performance Cost Mean You Should Use RLS Only When You Need To
Since RLS adds a filter to all DAX queries, there’s obviously a performance cost. That’s really a valid observation regardless of the tool, as SQL-based BI platforms will similarly add a WHERE clause to the underlying queries to pass along the RLS logic. That performance tax might be so small as to be imperceptible, or it might be the straw that breaks the back of a model already straining to deliver acceptable response times to its end users.
RLS also introduces extra modeling complexities such as conflicting with UseRelationship or bidirectional cross-filtering subtleties. You may need to create pseudo dimensions just for RLS to work the way you want, for instance if you only want to apply RLS to specific fact tables.
Because RLS is based on the data model, when you’re building a Live Connection report you’ll have to set RLS rules in the parent Power BI or Analysis model, and you won’t be able to test RLS rules from Security settings, i.e. you’ll have to test the child report itself with a user who’s restricted by RLS. Speaking of which, dashboards aren’t listed in the Security test functionality either.
And if your users like automated insights, well tough luck, they don’t work with RLS/OLS.
For these reasons as well as the administrative overhead that comes with its implementation, Microsoft actually recommends not to use RLS unless you really need to. With creative use of query parameters, hidden filters, and workspaces by role, you may roll out content appropriate to each audience without RLS. Like everything else, this should be a cost vs. benefit assessment.
1.7. RLS Is One Part of a Bigger Puzzle and Only Applies to Viewers
RLS is just one piece of the broader Power BI security architecture, it doesn’t work in a vacuum. It bears repeating that it’s only applied to workspace members with the Viewer role (even if they have Build permissions to the dataset), while “Admin, Member, or Contributor have edit permission for the dataset and, therefore, RLS doesn’t apply to them.”
Make sure to educate yourself on the overall framework and the nooks and crannies of the Power BI platform:
- Power BI Security by James Serra
- Power BI Security Whitepaper
- Power BI implementation planning: Security
- Power BI datamarts – you can implement RLS there, upstream of datasets, i.e. “the configured RLS on datamarts automatically gets applied to downstream items, including the auto-generated datasets and reports.”
- An important caveat is that as of February 2023, metrics (formerly known as goals) don’t support RLS.
2. Salesforce’s Tableau: Access Rights are Defined for Each Data Source or Via Centralized RLS
The video below (and its slides), taken from the November 2019 Tableau conference, showed the state of RLS in Tableau at the time as well as plans to add to the product the ability to centralize how these policies are defined (i.e. do this once across data sources), via a user interface as well as a policy service API. The fact that Tableau needs this reflects their historical lack of support for star schemas (support for relationships was only added in 2020). This functionality was actually added two years later in Tableau 2021.3.
3. TIBCO SpotFire
RLS in SpotFire is done via personalized information links that can be set up for individual users or groups. As always, a user table is recommended to keep things manageable over time.
4. Google’s Looker
In Looker RLS is enforced via access filters (access_filter parameter), which are translated into WHERE clauses against the underlying SQL query sent to a source (Looker is a SQL-centric tool), while access_grant is their method to control broader access to LookML structures such as views or joins.
5. Google Data Studio: Do You Like Writing SQL Queries for Everything?
I initially brushed off Data Studio as one of so many promising but half-baked toys that Google would eventually abandon, like Needlebase, Fusion Tables, and countless others, and found it very limited at launch. But I hear that it’s been getting better, and it seems to have been getting some traction. Since Google is clearly very serious about BigQuery and is the leader in web analytics, I think it’s worth keeping an eye on Data Studio if only by association. After all, Power BI didn’t amount to much in its initial Office version (the PowerView / Silverlight days), and the new architecture launched in 2015 only started to come together as something cohesive by late 2016 / early 2017.
Google offers this walkthrough to implement “row level filtering” from a BigQuery source. Shockingly convoluted in comparison to more mature products. The approach suggested in the video below is more straightforward, but still requires handwriting SQL queries. Looks like Google has a SQL fetish!
ThoughtSpot admins have access to a Rule Builder where they can type expressions with guidance from a Rule Assistant to select filtering functions and variables, as well as check for syntax validity. RLS rules are defined at the table level, not for other objects such as searches, worksheets, or pinboards.
7. Amazon QuickSight
Like Google Data Studio, Amazon QuickSight doesn’t even register in Gartner’s magic quadrant. However it’s hard to completely ignore because a) it’s from Amazon, the #1 cloud platform by far, 2) it’s cheap, and 3) after looking into it I think it’s at least worth considering for embedding scenarios.
Amazon restricts RLS to its Enterprise edition, via a file or query you need to set up to state dataset rules, which can be stated as positive (i.e. granting access under specific conditions) or negative (i.e. denying) conditions. It’s all pretty straightforward, and this approach make it easily applicable to new reports and dashboards you may add after creating your RLS query/file.
8. Metabase: Enterprise Only
Metabase calls RLS “sandboxing“, which is one of the features that’s not available in the free open source version and requires the $10K+/year Enterprise version. Their implementation relies on either simple column mapping of the filtered table with the user table (pictured below) for all users in any specific group, or custom views that use a “saved question” to hardcode more complex conditions in a SQL statement. The latter lets you hide entire columns to certain users.
9. A Few Other Platforms
I might spin off some of these into their own section eventually, but in the interest of publishing the entry before dying of old age, here are just bullet points for now:
10. Conclusion: Bringing Light and Safety to Shadow IT?
This review showed that many tools have a broadly similar approach to row level security, though under different names and with varying levels of sophistication. Meanwhile, the ability to secure other objects such as columns is usually absent from the modern wave of user-facing “self-service” BI tools, and reserved to data warehouse and enterprise BI tools. For instance column-level security can be found in platforms such as:
- Snowflake (introduced in preview in July 2020)
- Azure Synapse (formerly Azure DataWarehouse) – see also its support for dynamic data masking
- SQL Server / Azure Analysis Services, whether on premises or in the cloud, since SSAS 2017
The fact that Power BI added OLS shows that platform’s shift towards enterprise governance concerns.
RLS in “governed self-service” BI tools can be seen as a stepping stone between the all-or-nothing workspace/report/dashboard approach to security common to pure self-service BI platforms on one end, and enterprise-grade object security at the data warehouse level on the other end of the spectrum. “Managed self-service” if you will, not as out of control as “dark shadow IT” while not as ponderous as traditional enterprise IT.
As always, it’s all about your requirements. Organizations should work backwards from their data: how sensitive is it, and what’s the worst that could happen assuming that data “got out.”
If you like exhaustive (and exhausting?) entries like this one, read: Third-Party Tools to Ease Power BI Development and Increase Analyst Productivity.