With the preview launch of Power BI datamarts in May 2022, there is yet another data entity to contend with in the Power BI ecosystem. I plan to update this entry as datamarts mature in the months to come, but I wanted to post quick notes about how these seemingly redundant entities stack up:
1. Power BI Datasets: Data Modeling for Analytics
Datasets are the core entity in Power BI. They handle modeled data (in brief, star schema + DAX expressions) in memory within the Vertipaq columnar engine and/or via live queries to Direct Query sources. Datasets can be accessed via third-party tools through the XMLA endpoint (Premium only). Datasets can include ETL via embedded Power Query queries, but they’re meant for live analysis and visualization, not storage.
Datasets are available in any edition of Power BI, including the free version (“My Workspace”). A Power BI Pro dataset is limited to 1GB (compressed in Vertipaq).
2. Power BI Dataflows: Cloud Data Prep & Storage
Dataflows store data in an Azure Storage Gen2 data lake as the result of Power Query Online ETL processes. That data lake is effectively a collection of CSV files in folders, but it is a black box unless you elect the Bring Your Own Storage option. Doing so gives you nice and affordable snapshots which you can access via Azure Data Explorer or load into other dataflows. You can map your dataflow entities to the Common Data Model (CDM) if you’d like. Azure SQL will be added as a dataflow sink in the fall 2022, and more control will be given for the data lake destination.
Dataflows are available with Power BI Pro and Premium.
3. Power BI Datamarts: No-Code Managed Databases for Analytics Payloads
Datamarts also bring in data via Power Query Online, but they store it into a SQL database instead of a data lake. In addition to this data ingestion and storage, you’ll find modeling capabilities – schema view, DAX expression authoring, and Row Level Security – which up to now required Power BI Desktop.
The built-in SQL database can be queried via an online query editor as well as a visual editor. The Power Query user interface provided here is however only a subset of what you’d find when you use PQ for ingestion. It’s contained to Schema view and doesn’t have a formula bar, advanced editor, or query settings pane listing all the query’s steps. In other words, the intent is not for you to write your own custom M code, just to generate SQL queries for you based on point and click GUI actions.
You can also connect to the underlying SQL database from the outside, including with desktop tools such as SSMS or Azure Data Studio. Datamarts auto-generate a Direct Query dataset, which should be seen as a quick data profiling and prototyping tool. And you can work using both DAX and/or SQL depending on your preferences.
Datamarts are available in Power Premium Gen 2 workspaces. They’re limited to 100GB of data storage. Query compute is free during preview but I bet this will be throttled and lead to extra compute costs like in competing cloud DWH products such as Redshift, Snowflake, or BigQuery.
In short, datamarts combine dataflows, datasets, and a database in a single service. For more technical details, see Mimoune Djouallah’s first take, James Fancke’s first impressions, and a wrap-up by James Serra.
4. PowerApps Dataverse: No-Code Managed Databased for Transactional Apps
Dataverse is similar to Datamart in the sense that it offers a user-friendly way to create SQL databases via Power Query Online, but it is not tied to Power BI’s use cases or governance tools. Where Datamarts are geared towards analytical payloads (aka OLAP) by using SQL Server’s columnstore indexes, Dataverse is intended for transactional apps (aka OLTP). You can load Dataverse tables in Power BI entities.
Dataverse is part of paid Power Apps plans. Like in Power BI dataflows, there’s a bunch of existing tables made available as the Common Data Model, tying PowerApps to Dynamics 365. You can store up to 4TB of data and access your Dataverse databases via SSMS. In addition to storing data, Dataverse can also load external SQL, Excel, and SharePoint resources as virtual tables.
There’s a free version of Dataverse coming with Teams (formerly “Project Oakdale”) that does include Power Query Online but is otherwise fairly limited (no link with Synapse, no SSMS support, 2GB max, no security or auditing controls. etc.). And since you could also store data in SharePoint Lists, here’s how they stack up.
Dataverse has seen even more renaming confusion than your usual Microsoft product. In the past you may have heard of the Common Data Service (CDS) or very briefly, of Dataflex.
I swear I thought Microsoft announced they would merge their Power Query Online experiences, but I can’t find that page in their release plans. Please let me know in the comments if such an announcement was actually made (hopefully with a working hyperlink).
5. Access: A Legacy Desktop Precursor to the No-Code Hype
Access is Microsoft’s iconic desktop relational database. Think of PowerApps as its cloud successor, it seems clear there is not going to be an O365 version of Access. There is an Access connector for Dataverse to bridge those two worlds.
6. The Broader Set of Azure Data Services
We’ve mentioned Azure SQL and Datalake services, they are the IT-centric platforms beneath dataflows, datamarts, and dataverses. Whereas Power BI and PowerApps are SaaS (Software as a Service), Azure services are PaaS/DaaS (Platform/Database as a Service) or even IaaS (Infrastructure as a Service) in case you’re managing your own SQL server in a virtual machine. The same Business/IT duality can be found with Power Automate and Logic Apps.
When you’re choosing the SaaS route, a bunch of choices are already made for you, which is both constraining in case you need more minute controls and visibility, but also liberating for business users trying to get things done without IT. To what extent Microsoft is empowering “citizen developers” or feeding the “shadow IT” monster that IT will eventually need to clean up is beyond the scope of this entry!
Finally, there’s Azure Synapse which is both the successor to Azure SQL Datawarehouse and a superset of all things data-related in the Microsoft stack, including Power BI, Azure Data Factory, and even Databricks or Cosmos DB.
Yes, you can import an Excel table into a SharePoint list into a Dataverse table into a Power BI dataflow into a Power BI datamart into a Power BI dataset into another dataset via Live Connection. It all makes sense, I promise, and you can always hire me if you need help!
7. Other Power BI Entities That Don’t Store Data
I see many people getting very confused about Microsoft’s terminology, which sometimes leads to intractable support requests in the Power BI community forum because of mistaken language. We get posts that start with “I uploaded my dashboard from the desktop”. No you didn’t!
- Reports are made of report pages which are the tabs you see in the visual view of Power BI Desktop. PBIX files in Import/DQ/Composite model combines a dataset with a report, while a PIBX in Live Connection mode is a report without a corresponding dataset since it has a parent dataset as its source.
- Dashboards are purely visual entities that only exist in the Power BI service and mobile app, there’s no data in a dashboard nor can they be made with Power BI Desktop. A dashboard is a flexible one-page canvas where end users can pin entire report pages, individual report visuals, and Excel workbooks.
- Paginated reports are built with Power BI Report Builder, this amounts to SSRS embedded in Power BI.
- Workbooks are embedded Excel Online worksheets. I bet most people don’t even know about this.