With the preview launch of Power BI datamarts in May 2022, Microsoft came with yet another data entity to grasp within the Power BI ecosystem. A year later, Microsoft made an even bigger splash by introducing Fabric, an all-in-one product offering Power BI and many Azure data services under one roof. In some ways Datamarts paved the way for Fabric’s integrated SaaS approach, though of course Fabric is much bigger in scope and ambition.
These new services have enormous potential thanks to their endless integration possibilities, but I’ve seen that even seasoned Power BI professionals feel overwhelmed. Let’s review how all these various entities stack up.
1. Power BI Data Entities
1.1. Power BI Datasets: The Semantic Layer for Analytics
Datasets are the core entity in Power BI. They host a data model – in brief, DAX expressions on top of a star schema – traditionally stored in memory within the Vertipaq columnar engine and/or via live queries to Direct Query sources. Keep reading as we’re about to mention a new storage option with important ramifications.
Datasets can be accessed via third-party tools through the XMLA endpoint (Premium only). They can include ETL via embedded Power Query queries – though that’s recommended only for smaller datasets – and are meant to support analysis, not storage. Think of the data stored in Import mode as a short-term cache.
Datasets are available in any edition of Power BI, including the free version (“My Workspace”). A Power BI Pro dataset is limited to 1GB (compressed in Vertipaq), with larger sizes available through Premium licensing.
1.2. Power BI Dataflows: Cloud Data Prep & Storage for the Rest of Us
Dataflows store data in an Azure Storage Gen2 data lake as the result of Power Query Online ETL processes. That data lake is effectively a collection of CSV files in folders, but it is a black box unless you elect the Bring Your Own Storage option (which works well but is a pain to set up). Doing so gives you nice and affordable snapshots which you can access via Azure Data Explorer or load into other dataflows. You can map your dataflow entities to the Common Data Model (CDM) if you’d like.
Dataflows are available with Power BI Pro and Premium. You’ll also find them in Power Apps, which is why the newer Dataflow connector can traverse both Power BI workspaces and Power Apps environments.
Microsoft briefly put on its roadmap that Azure SQL would be added as a dataflow sink in the fall 2022. This didn’t end up happening on that schedule, and instead Dataflows Gen2 with sink choice were eventually introduced as part of Fabric in May 2023. Streaming dataflows were briefly an option before they were deprecated in favor of Azure Stream Analytics.
1.3. Power BI Datamarts: No-Code Managed Databases for Analytics Payloads
Datamarts also bring in data via Power Query Online, but they store it into a SQL database instead of a data lake. In addition to this data ingestion and storage, you’ll find modeling capabilities – schema view, DAX expression authoring, and Row Level Security – which so far had required Power BI Desktop. This paved the way for the addition of dataset modeling in the service in the first half of 2023.
The built-in SQL database can be queried via an online query editor as well as a visual editor. The Power Query user interface provided here is however only a subset of what you’d find when you use PQ for ingestion. It’s contained to Schema view and doesn’t have a formula bar, advanced editor, or query settings pane listing all the query’s steps. In other words, the intent is not for you to write your own custom M code, just to generate SQL queries for you based on point and click GUI actions.
You can also connect to the underlying SQL database from the outside, including with desktop tools such as SSMS or Azure Data Studio. Datamarts auto-generate a Direct Query dataset, which should be seen as a quick data profiling and prototyping tool. And you can work using both DAX and/or SQL depending on your preferences.
Datamarts are available in Power Premium Gen 2 workspaces. They’re limited to 100GB of data storage. Query compute is free during preview but I bet this will be throttled and lead to extra compute costs like in competing cloud DWH products such as Redshift, Snowflake, or BigQuery.
In short, datamarts combine dataflows, datasets, and a database in a single service. For more technical details, see Mimoune Djouallah’s first take, James Fancke’s first impressions, and a wrap-up by James Serra. Since they’re heavily managed for you, it remains to be seen how much SQL performance tuning one might be able to do manually.
2. Azure PaaS Data Services Transcended into Fabric, their SaaS Overlord
We’ve already mentioned Azure SQL and data lake services. They’re the IT-centric platforms beneath Power BI entities such as dataflows and datamarts. Whereas Power BI and PowerApps are SaaS (Software as a Service), Azure services are PaaS/DaaS (Platform/Database as a Service) or even IaaS (Infrastructure as a Service) in case you’re managing your own SQL server in a virtual machine. The same Business/IT duality can be found with Power Automate and Logic Apps. The tradeoffs from on-premises bare metal to SaaS include cost, control, ease of use, and speed of implementation.
When you’re choosing the SaaS route, a bunch of choices are already made for you, which is both constraining in case you need more minute controls and visibility, but also liberating for business users trying to get things done without IT. To what extent Microsoft is empowering “citizen developers” or feeding the “shadow IT” monster that IT will eventually need to clean up is beyond the scope of this entry!
Then there’s Azure Synapse which is both the successor to Azure SQL Datawarehouse and in some ways a superset of all things data-related in the Microsoft stack, including Power BI, Azure Data Factory, and even Databricks or Cosmos DB.
But wait, there’s much more! In May 2023, Microsoft made a momentous platform announcement with Fabric, an all-encompassing solution putting these Azure resources one click away from Power BI. For an overview that puts datamarts in perspective of the broader platform, read: Microsoft Fabric decision guide: data warehouse or lakehouse.
OneLake is a logical representation of ADLS Gen 2, with a single OneLake always underpinning an entire tenant. It’s exposed to end users as part of the Fabric SaaS as well as through Windows desktop file explorer integration. Its main purpose is to provide storage to all compute payloads in Fabric without requiring duplicating data.
OneLake is not just convenient unified storage though. It also comes with the promise to provide great performance for Power BI datasets even with very high volumes and near real time data intake. The intent with Fabric is very much that the sum is worth much more than the parts.
2.2. Fabric Lakehouse: Where Have We Heard This Before?
The Lakehouse entity is meant primarily for data engineers conversant in notebooks, pipelines, and Apache Spark. That said, there’s also a read-only SQL endpoint. This pattern and term have been popularized by Databricks, and I can’t imagine they’re thrilled by Fabric’s arrival. This makes for an awkward coopetition situation between the two vendors, whether they protest to the contrary or not.
In any case, with a Lakehouse, you can dump a Parquet file into storage and be able to query it as if it was a database table in just a few clicks. You can also build shortcuts to other lakes such as Amazon S3.
2.3. Fabric Warehouse: For SQL Fans Coming from Synapse
If both the lakehouse and warehouse store their data in OneLake, and the lakehouse already has a SQL endpoint that happens to use the warehouse engine behind the scenes, then why have the warehouse as a separate payload? While the lakehouse’s SQL endpoint is read only, the user-facing warehouse has full-fledged read/write T-SQL support.
3. PowerApps Dataverse: No-Code Managed Database for Transactional Apps, with an Analytical Link
Dataverse is similar to Datamart in the sense that it offers a user-friendly way to create SQL databases via Power Query Online, but it is not tied to Power BI’s use cases or governance tools. Where Datamarts are geared towards analytical payloads (aka OLAP) by using SQL Server’s columnstore indexes, Dataverse is intended for transactional apps (aka OLTP). You can load Dataverse tables in Power BI entities.
Dataverse is part of paid Power Apps plans. Like in Power BI dataflows, there’s a bunch of existing tables made available as the Common Data Model, tying PowerApps to Dynamics 365. You can store up to 4TB of data and access your Dataverse databases via SSMS. In addition to storing data, Dataverse can also load external SQL, Excel, and SharePoint resources as virtual tables.
There’s a free version of Dataverse coming with Teams (formerly “Project Oakdale”) that does include Power Query Online but is otherwise fairly limited (no link with Synapse, no SSMS support, 2GB max, no security or auditing controls. etc.). And since you could also store data in SharePoint Lists, here’s how they stack up.
Dataverse has seen even more renaming confusion than your usual Microsoft product. In the past you may have heard of the Common Data Service (CDS) or very briefly, of Dataflex.
Note that Synapse Link will come to Fabric (aka “View in Microsoft Fabric”) and is in private preview as of mid 2023:
“Coming later this year to every Dynamics 365 customer, “View in Microsoft Fabric” automatically makes all your Dynamics 365 data available for analysis in Microsoft Fabric without having to copy data, build ETL pipelines, or use third-party integration tools. This feature is currently available in private preview.”Microsoft
4. Other Power BI Entities That Don’t Store Data
I see many people getting very confused about Microsoft’s terminology, which sometimes leads to intractable support requests in the Power BI community forum because of mistaken language. We get posts that start with “I uploaded my dashboard from the desktop”. No you didn’t!
- Reports are made of report pages which are the tabs you see in the visual view of Power BI Desktop. PBIX files in Import/DQ/Composite model combines a dataset with a report, while a PBIX in Live Connection mode is a “thin report” without a corresponding dataset since it has a parent dataset as its source.
- Dashboards are purely visual entities that only exist in the Power BI service and mobile app. There’s no data in a dashboard nor can they be created with Power BI Desktop. A dashboard is a flexible one-page canvas where end users can pin entire report pages, individual report visuals, and Excel workbooks. Please don’t confuse reports and dashboards, they’re very different!
- Paginated reports are built with Power BI Report Builder, which effectively amounts to embedding SSRS in Power BI.
- Workbooks are embedded Excel Online worksheets. I bet most people don’t even know about this feature, but it has its uses.
- Apps are a way to package and distribute the aforementioned Power BI artifacts to their intended audiences. They have nothing to do with Power Apps, in case you’re wondering. One can only have one app per workspace, with up to 10 audience groups per app.
5. Access: A Legacy Desktop Precursor to the No-Code Hype
We couldn’t be thorough without mentioning Access, Microsoft’s iconic desktop relational database. Think of PowerApps as its cloud successor, as it’s clear there is not going to be a Microsoft 365 version of Access. There is an Access connector for Dataverse to bridge those two worlds. Access had many flows, but along with Excel it basically created the entire “citizen developer” category.