The ability to filter data by user is critical whenever sensitive information is shared, whether internally to an organization or even more so externally. This is often known as “row level security” (RLS) in a number of relational databases and business intelligence tools.
The idea is that each row in a data table is evaluated against rules stated somewhere else in that database (or application) to check which parts of that row, if any, can be returned to any given user. Because managing by individual users is bound to collapse over the long run, I like to think of RLS as also meaning “role level security”, because it’s less about any specific people, and more about what function they have in their organization and thus what data they should be granted access to.
However, that phrasing is not universal, which can make it tedious to compare how different BI products implement this, if at all. RLS is often filed under “data level security” since it’s embedded, or at least closely associated, with the source data itself.
This entry gives a quick overview of how some of the best-known players approach data-based security. It’s impossible to be exhaustive as BI is a very busy space, but this is probably as good a review as you’ll find anywhere online.
1. Core Concepts & How They Work in Microsoft Power BI
I’ll start with Power BI because it’s the tool I’m most familiar with, and also arguably the BI platform with the most momentum in the last couple of years. It’s also vastly better documented and supported by a much broader community than most of its competitors save a couple. I should still mention that Tableau has been renowned for its community for a long time, but frankly, some other long-established platforms barely have a pulse.
This section is going to be the most detailed by far. Since it lays out core concepts that will apply to a large extent with many other tools, I advise you to read it regardless of what you actually use.
1.1. Create and Modify Roles in the Desktop, Assign Users to Roles in the Service
Row level security was added to Power BI in June 2016, barely a year into the platform’s life, and roles and rulesets could be initially be defined in the service. This was however deprecated years ago in favor of Power BI Desktop, the tool of choice for Power BI modeling. Whereas many other platforms use SQL for this purpose, in Power BI the DAX language is king, and that’s how you define table filtering expressions by role.
Power BI Desktop also very conveniently lets you impersonate these roles to verify in situ that they’ll get to see what they’re supposed to see, and only that. With many other platforms you need to use a separate browser (or incognito session) to perform such tests.
Defining roles within the Power BI Desktop UI is sometimes called “static RLS”, as you hardcode expressions that you then need to update in the same way. An alternative method, called “dynamic RLS”, adds a user table to the data model, and optionally also a role table, to filter users against their userprincipalname(), username(), or roles you assign them to.
The desktop app is focused on encoding business logic in DAX, but the administration of users and the groups they belong to cannot be done with it. Instead, that part belongs to the service – i.e. the app.powerbi.com web application – and is accessible by dataset owners. From there you can also view reports as any of the predefined roles.
1.2. Map Roles to Groups or People, But Really, to Groups
Unless you’re in a really small organization, I highly recommend mapping roles to groups rather than individual users, in order to minimize administration overhead as people come and go. In this context “group” means Active Directory security groups and Active Directory distribution groups (aka distribution lists), not O365 groups (sorry, “Microsoft 365 groups“). The difference between AAD security groups and O/M365 groups is explained by Microsoft here, with more details relevant to Power BI conveniently summarized by Gilbert Quevauvilliers here.
1.3. Data Masking for Confidentiality Beyond RLS
Bear in mind that RLS is not applied to workspace members that have edit permissions, which makes sense because having these permissions means they’re allowed to be dataset authors. And Power BI dataset authors can see all the data as exposed in the data sources they work with.
If you don’t want your BI developers to see all the source data, they need to be provided obfuscated data sources that are otherwise structured exactly like the production sources. Microsoft SQL Server does that with dynamic data masking, as shown in the video below. I’ve also worked with clients who provided me with CSVs structured like the real data but filled with bogus numbers, that they later substituted with their production databases. In this case I use Power Query parameters to make the swap easy for them.
The core principle behind Power BI’s RLS is that it’s all about filtering table rows, not model objects such as entire tables or columns, and definitely not measures. Analysis Services does support table and column-level security (CLS), and object-level security (OLS) will come to Power BI in 2021, meeting clear demand for it. Object-level security not only hides the underlying data, but also the object’s names so that the existence of these objects is not revealed at all.
If you’re willing to roll your own DAX, you can restrict access to measures by using a measure table (with a column mapping to roles) and dynamic measures. This is a very flexible way to set up measures (RLS or not), but be aware this technique can have an impact on the performance of larger models.
By tweaking your model you can also emulate column-level security as explained by Greg Deckler here.
Another consequence of filtering table rows is that these filters affect top-level aggregates too. Say your sales rep for New England should also be able to see the company’s total revenue, but not details at lower granularities. For this purpose, you’ll want to generate either summary or aggregated tables, or unpivot sensitive columns, depending on your exact requirements and underlying data structure.
A related idea is to combine some form of data masking with RLS. With this approach you’ll want to rely on explicit measures and hide physical columns from end users, which is often a good practice anyway. See this entry and the video below:
If you use Power BI, check out my entry on external tools, they can help you set up or review RLS rules and measures.
A neat trick that you can easily add since you already have a user table is highlighting data relevant to the selected user, e.g. the sales territory they’re in charge of:
1.4. All the Above Overhead Plus the Performance Cost Mean You Should Use RLS Only When You Need To
Since RLS adds a filter to all DAX queries, there’s obviously a performance cost. That’s really a valid observation regardless of the tool, as SQL-based BI platforms will similarly add a WHERE clause to the underlying queries to pass along the RLS logic. That performance tax might be so small as to be imperceptible, or it might be the straw that breaks the back of a model already straining to deliver acceptable response times to its end users.
Because of this as well as the administrative overhead that comes with its implementation, Microsoft actually recommends not to use RLS unless you really need to. With creative use of query parameters, hidden filters, and workspaces by role, you may roll out content appropriate to each audience without RLS. Like everything else, this should be a cost vs. benefit assessment.
2. Salesforce’s Tableau: Access Rights are Defined for Each Data Source But Centralized RLS Is On the Way
The video below (and its slides), taken from the November 2019 Tableau conference, shows the current state of RLS in Tableau as well as plans to add to the product the ability to centralize how these policies are defined (i.e. do this once across data sources), via a user interface as well as a policy service API. It’s not clear to me when these capabilities will ship in the product though. And the fact that Tableau needs this reflects their historical lack of support for star schemas (support for relationships was only added in 2020).
3. TIBCO SpotFire
RLS in SpotFire is done via personalized information links that can be set up for individual users or groups. As always, a user table is recommended to keep things manageable over time.
4. Google’s Looker
In Looker RLS is enforced via access filters (access_filter parameter), which are translated into WHERE clauses against the underlying SQL query sent to a source (Looker is a SQL-centric tool), while access_grant is their method to control broader access to LookML structures such as views or joins.
5. Google Data Studio: Do You Like Writing SQL Queries for Everything?
I initially brushed off Data Studio as one of so many half-baked toys that Google would eventually abandon, like Needlebase, Fusion Tables, and countless others, and found it very limited at launch. But I hear that it’s been getting better, and it seems to have been getting some traction. Since Google is clearly very serious about BigQuery and is the leader in web analytics, I think it’s worth keeping an eye on Data Studio if only by association. After all, Power BI didn’t amount to much in its initial Office version (the PowerView / Silverlight days), and the new architecture launched in 2015 only started to come together as something cohesive by late 2016 / early 2017.
Google offers this walkthrough to implement “row level filtering” from a BigQuery source. Shockingly convoluted in comparison to more mature products. The approach suggested in the video below is more straightforward, but still requires handwriting SQL queries. Looks like Google has a SQL fetish!
ThoughtSpot admins have access to a Rule Builder where they can type expressions with guidance from a Rule Assistant to select filtering functions and variables, as well as check for syntax validity. RLS rules are defined at the table level, not for other objects such as searches, worksheets, or pinboards.
7. Amazon QuickSight
Like Google Data Studio, Amazon QuickSight doesn’t even register in Gartner’s magic quadrant. However it’s hard to completely ignore because a) it’s from Amazon, the #1 cloud platform by far, 2) it’s cheap, and 3) after looking into it I think it’s at least worth considering for embedding scenarios.
Amazon restricts RLS to its Enterprise edition, via a file or query you need to set up to state dataset rules, which can be stated as positive (i.e. granting access under specific conditions) or negative (i.e. denying) conditions. It’s all pretty straightforward, and this approach make it easily applicable to new reports and dashboards you may add after creating your RLS query/file.
8. Metabase: Enterprise Only
Metabase calls RLS “sandboxing“, which is one of the features that’s not available in the free open source version and requires the $10K+/year Enterprise version. Their implementation relies on either simple column mapping of the filtered table with the user table (pictured below) for all users in any specific group, or custom views that use a “saved question” to hardcode more complex conditions in a SQL statement. The latter lets you hide entire columns to certain users.
9. A Few Other Platforms
I’ll probably spin off some of these into their own section eventually, but in the interest of publishing the entry before dying of old age, here are just bullet points for now:
10. Conclusion: Bringing Light and Safety to Shadow IT?
This review showed that many tools have a broadly similar approach to row level security, though under different names and with varying levels of sophistication. Meanwhile, the ability to secure other objects such as columns is usually absent from the modern wave of user-facing “self-service” BI tools, and reserved to datawarehouse and enterprise BI tools, for instance column-level security can be found in platforms such as:
- Snowflake (preview as of July 2020)
- Azure Synapse (formerly Azure DataWarehouse)
- SQL Server / Azure Analysis Services, whether on premises or in the cloud, since SSAS 2017
RLS in self-service BI tools can be seen as a stepping stone between the all-or-nothing workspace/report/dashboard approach to security common to self-service BI platforms on one end, and enterprise-grade object security at the datawarehouse level on the other end of the spectrum. “Managed self-service” if you will, not as out of control as “dark shadow IT” while not as ponderous as traditional enterprise IT.
As always, it’s all about your requirements. Organizations should work backwards from their data: how sensitive is it, and what’s the worst that could happen assuming that data “got out.”