In a previous entry I explained at length the various ways to secure data in business intelligence products. In this follow-up I want to elaborate on which method(s) to use for what purposes. This entry points to Power BI specifics in some instances, but should be relevant to some extent to many other BI and data storage products.
RLS vs. OLS vs. DDM: Three Shades of Need to Know
Row Level Security (RLS) is meant to restrict the depth of what any specific user or group may see in a report, i.e. which rows can they see within any of the tables they have access to. RLS’s main purpose is to reflect organizational boundaries by geography, business unit, product line, or more rarely time. In other words, RLS makes the most sense to show you the dimensional subset of facts that are relevant to you, without gaining access to either lateral – say your colleague’s detailed sales by account – or broader – say your entire department or company – data.
RLS can usually be implemented without revisiting the structure of your reports. At worst you may have empty charts if no row is returned in specific filtering conditions. On the other hand, RLS can have interesting data modeling ramifications and requires planning to be reasonably maintainable. It’s the most common pattern by far.
Object Level Security (OLS) intends to hide the breadth of what can be seen within a data model. The purpose is to prevent users from having cues that certain business logic even exists, with the sole appearance of certain column of table names possibly revealing trade secrets and business practices that are only shared on a “need to know basis.” Let’s say for instance that only some channel partners gain access to “back margins” under certain conditions. People who are not privy to these deal structures do not need to know that there’s a corresponding column in your data model.
Unlike RLS, OLS is a bit trickier to implement in reports, as blocking specific columns or tables from “existing” from the perspective of specific users, will outright break visuals that feature these entities. At least that’s how Power BI currently works. You may need to author different reports, or at least report variations using separate pages or bookmarks, to deliver a user-friendly experience. Field parameters provide the most elegant way to avoid charts from breaking without having to create bookmarked report variations by audience, but in any case you’ll have to jump through extra hoops to preserve the end user experience.
(Dynamic) Data Masking on the third hand doesn’t outright hide rows, tables, or columns. Instead, what it does is hide true values by obfuscating or randomizing specific data points, from Personal Identifiable Information (PII) to margin rates and other data that, again, are distributed on a need-to-know basis. The benefit is that the entire data model and structure is still accessible, which is essential for data engineers and BI developers to be able to work. This practice is very often used when hiring outside consultants, though it is common to just do it by providing fake data spreadsheets rather than leveraging DDM capabilities available in some relational databases such as MS SQL Server.
I like to think of Data Masking as a tool primarily targeting BI authors while RLS and OLS are meant for BI consumers. RLS and OLS do not even apply to developers in most tools, but thanks to DDM, giving developer full data model access doesn’t have to mean giving them full data access.
Sensitivity Labels & Encryption
Sensitivity labels are part of information protection policy and are quite different from all the previous approaches. They don’t hide data outright nor are they applied at a granular level in Power BI. Instead, they’re used to label entire assets and files – datasets, reports, dashboards, and dataflows – to prevent them from being shared outside of the confines of the Power BI service – think Excel or PowerPoint exports – beyond their intended audience.
While you can apply labels to SQL databases in SSMS, this type of labeling stays local to the database and is not seen by Purview. If you want something more global, Purview labeling would be the way to go. In MS SQL Server databases you can apply labels down to the column, so it’s a bit more granular than with files. In other words, because of these inconsistencies and missing integration points, that cake is not fully baked yet.
Encryption adds an extra layer of protection and separation of concerns at the storage level:
“Always Encrypted is a feature designed to protect sensitive data, such as credit card numbers or national identification numbers (for example, U.S. social security numbers), stored in Azure SQL Database, Azure SQL Managed Instance, and SQL Server databases. Always Encrypted allows clients to encrypt sensitive data inside client applications and never reveal the encryption keys to the Database Engine. This provides a separation between those who own the data and can view it, and those who manage the data but should have no access – on-premises database administrators, cloud database operators, or other high-privileged unauthorized users. As a result, Always Encrypted enables customers to confidently store their sensitive data in the cloud, and to reduce the likelihood of data theft by malicious insiders.”MS SQL Server documentation
Common Implementation Challenges
Things can get pretty complicated whenever requirements ask for the combination or RLS, OLS, and/or data masking, not just in function of the target audience, but also depending on the grain of the returned data. It’s pretty common to want to hide or obfuscate data at the smallest grain while still allowing the lowest resolution data points to be displayed. That sounds straightforward, except when it isn’t, such as in cases when there are not enough data points to prevent people from easily guessing missing values by subtracting the granular values they do see from the total.
Another challenge arises from deciding where to apply RLS/OLS/Data Masking. My general advice follows the recommended ETL pattern of doing transformations as upstream as possible and as downstream as necessary. In other words, if you can enforce these security mechanisms in source data systems, you’re better off doing so as various downstream tools will then all benefit. That means leveraging data warehouse capabilities rather than BI frontends if possible. Just because you can do it in Power BI doesn’t mean you should.
Third, combining these techniques adds its own layer of complexity, with the combination of RLS and OLS in particular requiring handling users and rules in specific ways.
Fourth, these rules need to evolve continuously as people get hired, fired, demoted, promoted, or moved to different jobs with different attributions. You’d better leverage the enterprise system of record where these things are recorded – typically an IT directory such as Active Directory and/or a HR app, otherwise you’ll have a maintenance nightmare on your hands and likely only partial implementation, not in technical terms, but in organizational terms.
“How come this guy who moved from sales to marketing 6 months ago can still see sales quotas?Irate boss finding out your security rules are not keeping up with the org chart
In conclusion, all this complexity introduces management and technical overhead and should be considered against the real impact of information leaks. It’s easy to underestimate security challenges, but it’s also easy to over-value internal data. There are tradeoffs here in terms of development speed, data pipeline and security maintenance, and report performance.
Thanks to Aswanth Venkatesan at Lumel for suggesting this topic. As always, constructive feedback is welcome in the comments below.