Row Level Security & Beyond in Business Intelligence Applications: A Practical Overview

W### T$$ F###

The ability to filter data by user is critical whenever sensitive information is shared, whether internally to an organization or even more so externally. This is often known as “row level security” (RLS) in a number of relational databases and business intelligence tools.

The idea is that each row in a data table is evaluated against rules stated somewhere else in that database (or application) to check which parts of that row, if any, can be returned to any given user. Because managing by individual users is bound to collapse over the long run, I like to think of RLS as also meaning “role level security”, because it’s less about any specific people, and more about what function they have in their organization and thus what data they should be granted access to.

However, that phrasing is not universal, which can make it tedious to compare how different BI products implement this, if at all. RLS is often filed under “data level security” since it’s embedded, or at least closely associated, with the source data itself.

Object level security goes beyond the confines of rows to include tables and columns. It used to be available only in enterprise products but is making its way in the self-service BI space. Meanwhile data masking is an adjacent concept, where data is obfuscated rather than completely blocked.

This entry gives an overview of how some of the best-known players approach data-based security. It’s impossible to be exhaustive as BI is a very busy space, but this is probably as good a review as you’ll find anywhere online. The Power BI section is very thorough and up-to-date with the latest developments as of mid-2022.

1. Core Concepts & How They Work in Microsoft Power BI

I’ll start with Power BI because it’s the tool I’m most familiar with, and also arguably the BI platform with the most momentum in the last couple of years. It’s also better documented and supported by a much broader community than most of its competitors. I should still mention that Tableau has been renowned for its community for a long time, but frankly, some other long-established platforms barely have a pulse.

This section is going to be the most detailed by far. Since it lays out core concepts that will apply to a large extent with many other tools, I advise you to at least skim it regardless of what you actually use.

1.1. Create and Modify Roles in the Desktop, Assign Users to Roles in the Service, Test in the Desktop and Service

Row level security was added to Power BI in June 2016, barely a year into the platform’s life, and roles and rulesets could initially be defined in the service. This was however deprecated years ago in favor of Power BI Desktop, the tool of choice for Power BI modeling. Whereas many other platforms use SQL for this purpose, in Power BI the DAX language is king, and that’s how you define table filtering expressions by role.

Power BI Desktop also conveniently lets you impersonate these roles to verify in situ that they’ll get to see what they’re supposed to see, and only that. With many other platforms you need to use a separate browser (or incognito session) to perform such tests. But in practice this only works with the simplest RLS implementations as we’ll see in a minute.

RLS Basics

Defining roles user by user within the Power BI Desktop UI is sometimes called “static RLS”, as you hardcode expressions that you then need to update manually.

An alternative method, called “dynamic RLS”, adds a user table to the data model, and optionally also a role table, to filter users against their userprincipalname(), username(), or roles you assign to them. Either way, make sure not to leave any of your intended users (who are only workspace viewers) without roles or they won’t be able to see any data at all. Yes, that means you may in some instances create a role without any table filter expression to convey that users belonging to that role can see everything.

As a side note, it is possible to implement RLS with guest users using Azure Active Directory B2B, though the whole thing with [email protected] vs name_guestdomain.com#EXT#@hostdomain.onmicrosoft.com user principal names is very confusing and poorly documented.

Dynamic row level security will ease maintenance
But dynamic RLS may require a few DAX backflips, depending on your specific requirements
Watch this video if you need to go in depth on dynamic RLS. Factless fact tables (i.e. bridge tables) are a common pattern to achieve it.

Power BI Desktop is focused on encoding RLS business logic in DAX, but the administration of users and the groups they belong to cannot be done with it. Instead, that part belongs to the service – i.e. the app.powerbi.com web application – and is accessible by dataset owners. From there you can also view reports as any of the predefined roles.

If you create roles in the service first, you’ll have to recreate them in the Desktop. The whole thing is clunky to say the least.

In the service, go to the dataset’s Security section, then select “Test as role”

1.2. Map Roles to Groups or People, But Really, to Groups

Unless you’re in a tiny organization with no staff turnover, I highly recommend mapping roles to groups rather than individual users, in order to minimize administration overhead as people come and go. In this context “group” means Azure Active Directory security groups and Active Directory distribution groups (aka distribution lists), not O365 groups (sorry, “Microsoft 365 groups“). The difference between AAD security groups and O/M365 groups is explained by Microsoft here, with more details relevant to Power BI conveniently summarized by Gilbert Quevauvilliers here.

Assign roles to groups rather than people, unless you hate yourself

To avoid having to input data in two places, you could retrieve AAD group members via the Graph API as the authoritative source for your groups/roles. However, Microsoft recommends that you do so via Microsoft Graph Data Connect (via Azure Data Factory) rather than Power Query, for a variety of reasons.

1.3. Testing RLS Programmatically with the ExecuteQueries REST API and Power Automate

Check and log that RLS-restricted models spit out what they’re supposed to

In Power Automate you can run a DAX query against a Power BI dataset while impersonating another user using their UPN.

This provides an automatable and auditable way to do the equivalent of “test as a role” in the Power BI service, which up to now was available solely as manual UI feature.

Read details on how to do this here. In this example they’re getting the UPNs out of an Excel file, if I were to do this I’d look into getting them in Power Automate from Graph or from the Azure AD connector. I believe the former doesn’t require the same admin access as the latter, but it’s a bit more involved as it requires writing HTTP POST calls instead of merely using built-in Power Automate GUI actions.

1.4. Embedded Dynamic Row Filters via CustomData() or JavaScript Filters

Injecting row filters from your app via CUSTOMDATA

This functionality comes from Analysis Services and was added to Power BI Premium/PPU/Embedded at the end of 2021. From the documentation:

“The CustomData feature allows you to add a Row filter by passing a free text (string) using the CustomData connection string property. Unlike users and roles, CustomData can’t be set within a .pbix file. CustomData can be used in a role DAX query, and can be used without a role in a measure DAX query. The CustomData feature is part of token generation functionality for dashboard, report, and tile artifacts.”

Row Level Security with Power BI Embedded

I haven’t seen much written about this capability in Power BI yet and haven’t tested it, I’d love to hear from people who have.

In embedded scenarios you could also use JavaScript filters to achieve something that looks similar to RLS when users consume reports, but they’d still have “access to the model schema tables, columns, and measures”.

1.5. Data Masking & Object Level Security for Confidentiality Beyond RLS

Bear in mind that RLS is not applied to workspace members that have edit permissions, which makes sense because having these permissions means they’re allowed to be dataset authors. And Power BI dataset authors can see all the data as exposed in the data sources they work with.

If you don’t want your BI developers to see all the source data, they need to be provided obfuscated data sources that are otherwise structured exactly like the production sources. Microsoft SQL Server does that with dynamic data masking, as shown in the video below. I’ve also worked with clients who provided me with CSVs structured like the real data but filled with bogus numbers, that they later substituted with their production databases. In this case I use Power Query parameters to make the swap easy for them.

BI developers need valid data so that they can model and visualize it, but it doesn’t need to be the actual confidential data
A third-party option: Redgate’s advanced Data Masker

The core principle behind Power BI’s RLS is that it’s by definition all about filtering table rows, not model objects such as entire tables or columns, and definitely not measures. Analysis Services does support table and column-level security (CLS), and object-level security (OLS) was added to Power BI (Premium) in 2021, meeting clear demand for it. Here is the public preview announcement.

Object-level security is a more secure pattern as it not only hides the underlying data, but also the object’s names so that the existence of these objects is not revealed at all. With a bit of creativity you can use OLS to hide measures.

This entry shows how you can restrict access to either an entire table or to a column using Tabular Editor as an external tool, while Chris Webb explains a more user-friendly setup that combines OLS, RLS, and field parameters to prevent charts from breaking when they include a forbidden column.

Note that the combination of RLS and OLS has to be done in the same role as a user can’t belong to both an RLS and an OLS role (video).

OLS + RLS
OLS & RLS can work together, provided you set your users right

If you’re willing to roll your own DAX, you can restrict access to measures by using a measure table (with a column mapping to roles) and dynamic measures. This is a very flexible way to set up measures (RLS or not) but be aware this technique can have an impact on the performance of larger models.

By tweaking your model you can also emulate column-level security (CLS) as explained by Greg Deckler here.

The traditional approach for data masking/CLS is to have some access logic in the data source

Another consequence of filtering table rows is that these filters affect top-level aggregates too. Say your sales rep for New England should also be able to see the company’s total revenue, but not details at lower granularities. For this purpose, you’ll want to generate either summary or aggregated tables, or unpivot sensitive columns, depending on your exact requirements and underlying data structure.

OLS demo

A related idea is to combine some form of data masking with RLS. With this approach you’ll want to rely on explicit measures and hide physical columns from end users, which is often a good practice anyway. See this entry and the video below:

RLS + data masking

Check out my entry on Power BI external tools, they can help you set up and review RLS rules and measures. I wrote in further detail about RLS, OLS, and DDM here.

As an aside, a neat trick that you can easily add since you already have a user table is highlighting data relevant to the selected user, e.g. the sales territory they’re in charge of:

Auto highlighter

At a conference sometime in 2021 Microsoft announced they’d add the ability to control access by report page via RLS, but they’ve since then walked that back and it’s no longer part of their backlog.

1.6. All the Above Overhead Plus the Performance Cost Mean You Should Use RLS Only When You Need To

Since RLS adds a filter to all DAX queries, there’s obviously a performance cost. That’s really a valid observation regardless of the tool, as SQL-based BI platforms will similarly add a WHERE clause to the underlying queries to pass along the RLS logic. That performance tax might be so small as to be imperceptible, or it might be the straw that breaks the back of a model already straining to deliver acceptable response times to its end users.

Some data models are so huge that RLS is not a viable option and radical architectural choices might be needed instead

RLS also introduces extra modeling complexities such as conflicting with UseRelationship or bidirectional cross-filtering subtleties. You may need to create pseudo dimensions just for RLS to work the way you want, for instance if you only want to apply RLS to specific fact tables.

Because RLS is based on the data model, when you’re building a Live Connection report you’ll have to set RLS rules in the parent Power BI or Analysis model, and you won’t be able to test RLS rules from Security settings, i.e. you’ll have to test the child report itself with a user who’s restricted by RLS. Speaking of which, dashboards aren’t listed in the Security test functionality either.

You’ll struggle implementing RLS if you’re afraid of the model view!

And if your users like automated insights, well tough luck, they don’t work with RLS/OLS.

For these reasons as well as the administrative overhead that comes with its implementation, Microsoft actually recommends not to use RLS unless you really need to. With creative use of query parameters, hidden filters, and workspaces by role, you may roll out content appropriate to each audience without RLS. Like everything else, this should be a cost vs. benefit assessment.

Measuring RLS performance in 27 easy steps…
…Because RLS can have a nontrivial performance cost in some scenarios

1.7. RLS Is One Part of a Bigger Puzzle and Only Applies to Viewers

RLS is just one piece of the broader Power BI security architecture, it doesn’t work in a vacuum. It bears repeating that it’s only applied to workspace members with the Viewer role (even if they have Build permissions to the dataset), while “Admin, Member, or Contributor have edit permission for the dataset and, therefore, RLS doesn’t apply to them.”

Make sure to educate yourself on the overall framework:

An important caveat is that as of May 2022, goals (aka metrics) don’t support RLS.

2. Salesforce’s Tableau: Access Rights are Defined for Each Data Source or Via Centralized RLS

Tableau recommends using an “entitlement table” to define which users or groups can access what, as detailed in this whitepaper. These entitlements are defined at the data source level.

The video below (and its slides), taken from the November 2019 Tableau conference, showed the state of RLS in Tableau at the time as well as plans to add to the product the ability to centralize how these policies are defined (i.e. do this once across data sources), via a user interface as well as a policy service API. The fact that Tableau needs this reflects their historical lack of support for star schemas (support for relationships was only added in 2020). This functionality was actually added two years later in Tableau 2021.3.

3. TIBCO SpotFire

RLS in SpotFire is done via personalized information links that can be set up for individual users or groups. As always, a user table is recommended to keep things manageable over time.

4. Google’s Looker

In Looker RLS is enforced via access filters (access_filter parameter), which are translated into WHERE clauses against the underlying SQL query sent to a source (Looker is a SQL-centric tool), while access_grant is their method to control broader access to LookML structures such as views or joins.

5. Google Data Studio: Do You Like Writing SQL Queries for Everything?

I initially brushed off Data Studio as one of so many promising but half-baked toys that Google would eventually abandon, like Needlebase, Fusion Tables, and countless others, and found it very limited at launch. But I hear that it’s been getting better, and it seems to have been getting some traction. Since Google is clearly very serious about BigQuery and is the leader in web analytics, I think it’s worth keeping an eye on Data Studio if only by association. After all, Power BI didn’t amount to much in its initial Office version (the PowerView / Silverlight days), and the new architecture launched in 2015 only started to come together as something cohesive by late 2016 / early 2017.

Google offers this walkthrough to implement “row level filtering” from a BigQuery source. Shockingly convoluted in comparison to more mature products. The approach suggested in the video below is more straightforward, but still requires handwriting SQL queries. Looks like Google has a SQL fetish!

6. ThoughtSpot

ThoughtSpot admins have access to a Rule Builder where they can type expressions with guidance from a Rule Assistant to select filtering functions and variables, as well as check for syntax validity. RLS rules are defined at the table level, not for other objects such as searches, worksheets, or pinboards.

Row Security by user or group

7. Amazon QuickSight

Like Google Data Studio, Amazon QuickSight doesn’t even register in Gartner’s magic quadrant. However it’s hard to completely ignore because a) it’s from Amazon, the #1 cloud platform by far, 2) it’s cheap, and 3) after looking into it I think it’s at least worth considering for embedding scenarios.

Amazon restricts RLS to its Enterprise edition, via a file or query you need to set up to state dataset rules, which can be stated as positive (i.e. granting access under specific conditions) or negative (i.e. denying) conditions. It’s all pretty straightforward, and this approach make it easily applicable to new reports and dashboards you may add after creating your RLS query/file.

RLS in QuickSight
CLS in QuickSight

8. Metabase: Enterprise Only

Metabase calls RLS “sandboxing“, which is one of the features that’s not available in the free open source version and requires the $10K+/year Enterprise version. Their implementation relies on either simple column mapping of the filtered table with the user table (pictured below) for all users in any specific group, or custom views that use a “saved question” to hardcode more complex conditions in a SQL statement. The latter lets you hide entire columns to certain users.

Screenshot taken from video below
Sandboxing in Metabase Enterprise

9. A Few Other Platforms

I might spin off some of these into their own section eventually, but in the interest of publishing the entry before dying of old age, here are just bullet points for now:

10. Conclusion: Bringing Light and Safety to Shadow IT?

This review showed that many tools have a broadly similar approach to row level security, though under different names and with varying levels of sophistication. Meanwhile, the ability to secure other objects such as columns is usually absent from the modern wave of user-facing “self-service” BI tools, and reserved to datawarehouse and enterprise BI tools, for instance column-level security can be found in platforms such as:

The fact that Power BI added OLS shows that platform’s shift towards enterprise governance concerns.

RLS in self-service BI tools can be seen as a stepping stone between the all-or-nothing workspace/report/dashboard approach to security common to self-service BI platforms on one end, and enterprise-grade object security at the datawarehouse level on the other end of the spectrum. “Managed self-service” if you will, not as out of control as “dark shadow IT” while not as ponderous as traditional enterprise IT.

As always, it’s all about your requirements. Organizations should work backwards from their data: how sensitive is it, and what’s the worst that could happen assuming that data “got out.”

If you like exhaustive (and exhausting?) entries like this one, read: Third-Party Tools to Ease Power BI Development and Increase Analyst Productivity. And if you’re looking for help with your BI projects, check out my BI & SaaS integration services.

Leave a Reply

Your email address will not be published. Required fields are marked *