Data-centric jobs are sitting by definition at the intersection of business and technology, and that affects who performs these jobs, their tools, and workflows. Traditionally data analysts tend to be more on the business side and may favor graphical user interfaces, while database administrators and data engineers may be more versed in writing code and using command lines. Commercial vendors like Microsoft tend to push a UI-first approach, while open-source solutions tend to be CLI/code/config centric. But the lines are blurred and it’s rarely as clearcut.
So beyond the clicky clicky draggy draggy approach described by Power BI or Tableau marketers, should you “learn to code”, and if yes, what does that imply? We’ll attempt to answer that question with a bias towards the Microsoft platform, while trying to also be valuable to people operating in other ecosystems. This post is written primarily from the perspective of the individual planning to acquire new skills but should also help team leaders map and plan upskilling efforts.
1. Aligning Skill Acquisition with Career Goals & Organizational Needs
Before committing resources to learning a new skill, you’ll want to make sure it’s going to be worth your while to do so, as well as align with the needs of the organizations you work with. Let’s see what’s in it for you, and what’s in it for them. But in the end, keep in mind you’re the one who’s going to live with your skills, whose acquisition is effectively you trading your finite time for them. It’s your opportunity cost, not any employer’s. If your employer wants you to specialize in typewriting because they think it’s their best interest, is it yours?
1.1. The Personal Perspective: What’s in a Person? Aligning Skill Development with Life Goals
Skill acquisition is an investment meant to turn into a lasting asset: how much time and money are you investing for what kind of return over time? And how much will you need to keep spending to maintain that investment worth its while?
Here’s a framework to evaluate whether investing in the acquisition and maintenance of any given skill might be worth it, based on the skill’s attributes:
- Income potential: some skills obviously command better pay than others, and more importantly, some skill combinations can be very valuable, and not all educational endeavors are worth the investment. But current earning power cannot be the only criteria…
- Difficulty: some skills are harder and slower to master, which doesn’t always translate into income potential. Also, difficulty is very subjective. It’s normal to run into (temporary) walls while learning something new, but at some point you might have to acknowledge that a specific skill might just never be your cup of tea.
- Versatility: do you want to be a hyper specialist (one-trick-pony), a hyper generalist (jack of all trades) or be somewhere in between? For me as an independent consultant, a key-shaped skill tree works, where I go more in depth in a few areas, and am happy to be beginner/intermediate in many other areas. See also the next section of what organizations require.
- Resilience: how exposed are you to downturns, trend changes, and technical disruptions? If you define yourself as a Product X person (say an “Excel guy” or “Power BI gal”) you might be a hot commodity for a while, but how long will that last? Your skillset might become so mainstream that it’s not enough to differentiate, or it might not even be desired (“I’m great with the typewriter!”). Generally speaking, I don’t like limiting myself as a person to a single product or language-centric definition, under the school of thought that “specialization is for insects.” Yes, you can be a COBOL person and still make a living with that, but you’ll be confined to a limited, shrinking number of organizations in specific industries where that’s still a thing.
- Half-life utility: some products age very rapidly, and it’s often hard to assess upfront which ones will. Remember the visual web page designers on the 90s such as FrontPage or Dreamweaver? Pretty soon nobody would build websites that way. On the other hand, knowing how to use a CMS, HTML, or CSS, is still a relevant – even if fairly basic – skillset to this day. Usually concepts and frameworks last more than mastering specific product features.
- Independence: are you able to move to another company? Another industry? Another country? How adaptable and portable is your skillset?
- Productivity: are you able to use your tools of choice in a productive manner, in a manner that is both fulfilling internally – there’s nothing quite like being in a state of perfect flow – and recognized externally, i.e. management/clients appreciate your output?
- Fun: some people can force themselves to plow through work they hate just to make a living, and sometimes I guess that’s what you have to do to put food on the table. But I don’t think that’s sustainable or the way to live a happy and fulfilling life, so I’d always recommend gaining skills that you like to apply and nurture. Cue in the famous concept of Ikigai.
- Tech and functional alignment: are your skills complementing each other and coherent with your present (and planned future) job, or are you trying to master bonsai propagation aboard an aircraft carrier?
“New world” countries like the US tend to have a more fluid view of what a career should look like. I grew up in Europe and was astonished when I discovered that Americans were not stuck to a single track for the rest of their life based on what they studied, what with their soldiers becoming MMA fighters then real estate executives.
That, alongside with geographic mobility, had so much appeal that I reshaped my life so that I could be as independent of the local circumstances of any company or country as realistically possible. I’ve found it very rewarding to work with tradies in Australia, winemakers in Argentina, miners in Mexico, healthcare officials in Morocco, software makers in Singapore, debt collectors in the US, facility managers in Canada, and plenty of other industry+geography combinations where I’ve been able to apply, recombine, and hone my skillset. So if like me you value variety, learn accordingly.
These days I think the ability to reinvent yourself is a matter of survival as the traditional model of working for one or two companies for life is almost extinct outside of maybe Japanese corporations or North Italian family businesses.
1.2. The Organizational Perspective: What’s in a Job? Mapping skills with needs.
Larger organizations tend to favor specialization, at least on paper, while smaller ones can’t afford it and will need people to wear several hats. In our data world, you may be at times, regardless of your job title saying “business analyst” or “Power BI Developer”:
- BI Developer
- Business analyst
- Data engineer
- Report visual designer
- BI reports & workspaces
- Low-code “developer”
- Functional consultant
- Frontend data-driven applications
- RPA flows
- Technical documentation writer
- Blog post author
- Visual/Video walkthrough editor
- Organizational facilitator
- CFO Whisperer
Even in MegaCorps with two-page job descriptions it’s likely that you’ll have to step outside of your formal narrow confines to get the job done. Do I particularly like SharePoint? Not really. Yet do I work with it all the time because it’s a key enabler to sideload some data that nobody will bother to put in a formal database? You bet! If you ever run a small business you’ll quickly understand that you just have to figure stuff out in order to keep making progress.
So while it is possible to be spread so thin that you’re pretty bad across the board, I think hyper-specialization is fairly rare and makes sense only for elite practitioners in niche jobs. Top athletes and other performers come to mind, but even they often end up mixing sports with coaching, business ventures, or media gigs. Among other reasons because their peak performance will last only a few years, and then what?
The consequence for larger, more structured teams is that they’ll still need some people with a healthy amount of flexibility and breadth, lest they become momentarily ultra-efficient yet very brittle. People will get sick, bored, or move elsewhere, and your elegant construction will collapse. Just like Just-in-Time supply chains proved too easy to disrupt, teams made exclusively of hyper-specialists with no skill overlap risk folding in the face of adversity.
2. Levels in Language Proficiency: From Hacking Restaurant Menus to Writing Masterpieces
Like with any skill, there are many levels of proficiency with languages, whether we’re talking human tongues or computer languages. Can you get away with only understanding 80% of what you’re reading, or do you need to write? If the latter, do you need to write single words, simple sentences, complex paragraphs, or entire books?
You’ll want to first define the destination (“I want to become a Python expert who can write complex programs from scratch in a text editor”) then the path to get there, especially if you’re aiming for advanced proficiency which will take years or even decades to achieve.
There are no right answers here, it again all depends on your career and life goals. In my case my primary focus is on the functional and business side. I aim to express myself with clarity, focus, and structure, in order to corral my clients towards the best outcomes. I want to be technically proficient to the extent these technical skills support the overall goal, but I don’t feel like I necessarily need to reach very advanced technical levels. I act primarily as an analyst and advisor. But if you’re a data engineer in a large organization, deeper technical specialization is likely to be the best choice. Petabyte-scale projects do require elite tech skills that are not even relevant for the rest of us.
3. Languages for the (Microsoft) BI Professional
Yes, this is the longest introduction you’ve ever seen, but what can I say, I like framing things properly. Let’s now dive into the long list of computer languages relevant to the BI space.
3.1. M and Power Query for ETL (Extract, Transform, Load) – The Best Place to Start, Bar None… If You’re Committed to Microsoft
M is the language used by the Power Query ETL engine, which has become universal across Microsoft data tools, from Power BI to Excel to Power Automate to Power Apps to Azure Data Factory’s wrangling data flows (ADF mapping dataflows have their own set of expression functions). If you’re in that ecosystem, you need to know Power Query.
Since Power Query has a powerful graphical interface that acts as a macro record of sorts, you might ask why you’d even need to know the underlying language. As a practitioner, I’ve found the answer to be an unequivocal yes, for the following reasons:
- To make your code more generic and less likely to break in case of structural changes such as different column names at the source.
- To make your code more flexible with functions and parameters, which is essential to handle APIs and other complex sources.
- To enable scenarios that are hard to build with the UI alone, such as sub-indexes.
- To add comments to make your code more future proof.
That said, very few people write M from scratch. It’s fine to start from the UI-generated code, from existing code samples, or from snippets from past projects. You’ll be more productive that way as long as you don’t do it blindly without understanding what the code is supposed to do. The best no-code tools read and write code, I see them as an exobrain meant to enhance but not replace mine.
By the way, I think M sounds like James Bond’s boss, and Q – the gadget guy – would have been a much better name, also because that’s the first letter in “query.” (Microsoft, I’m available for product branding consulting as there’s simply no way to do a worse job than you’ve been doing for pretty much forever.)
If you’re in a large organization where dedicated data engineers handle the entire data pipeline for you and make a clean data warehouse available to you, you might not need to use Power Query for production purposes, but it might still be handy for prototyping while testing new data sources.
The equivalent of using M in non-Microsoft shops is usually Python, SQL, or using a GUI tool such as Alteryx or Talend.
3.2. DAX (Data Analysis eXpressions) and Data Modeling – Essential But Be Careful About Rabbit Holes
When you start learning DAX, you’re quickly referred to 500-page books and arcane blog posts that ask you to understand the complicated internals of the Vertipaq engine. There is, let’s face it, a daunting learning curve past the initial quick wins. At a minimum, you’ll need to conquer row context and filter context/transition. You’ll also need to stop trying to build things exactly like you were in Excel with SUMIFS and VLOOKUPs, as this will lead to frustrating dead ends (How do I access the previous row?).
You can stick to fairly simple DAX to solve common calculations if you’ve put the emphasis on learning star schemas and implementing a clean model. But if you don’t know what dimension and fact tables are, or how relationships work, then all hell will break loose. So learn Kimball modeling basics first then get your DAX at least to an intermediate level. But thanks to the following shortcuts, you might not need to become a DAX black belt:
- Quick measures – this feature, introduced in 2017, generates DAX for you based on a number of common patterns. You can use them as is, or at least to help you draft and learn DAX since you get to see the generated code. There’s a gallery of community-provided quick measures too, where Greg Deckler has been pushing for their broader adoption.
- Natural language (GPT3). This new functionality got very hyped and went live in preview in October 2022 as a subset of Quick Measures. Early results are not so impressive. But now ChatGPT has started an even more massive hype cycle (DAX demo), and I do believe that there’s a lot of potential there.
- DAX-generating visuals like Inforiver. Why write DAX if a visual can generate the right aggregates by just dragging and dropping columns and rows, like you would do with a pivot table? A great example is the Top N + Others pattern.
DAX was created in 2009, is foundational to Power BI, and I think it’s safe to assume it will be around as a pillar of the Microsoft BI stack for at least another decade. We may write less DAX manually, but by all indications it’s here to stay so you’ll at the very least need to be able to read and edit it.
3.3. M or DAX, or M and DAX? When to Use Which?
It is a common and valid criticism of Power BI that you have to learn two if not three languages (more on that in a minute). I’m not going to waver here: in my opinion you do need at least basic M and intermediate DAX to be a competent Power BI developer. One leads to the other, as you’ll want to shape your tables into a decent data model, which will in turn help you keep your sanity while writing DAX.
There’s no single clear-cut answer to when to use M or DAX for any specific purpose, it’s more a series of rules of thumb:
- If you’re manipulating specific rows to clean up and shape data, that’s an M job.
- If you’re calculating table/model-wide aggregates, that’s usually a DAX job.
- Some things perform better in M, other things in DAX. For instance, it’s usually better to create conditional columns with the former rather than calculate columns with the latter, but that’s not always possible nor desirable. The “best practice” of not using DAX calculated columns based on performance concerns is irrelevant to smaller datasets that are so prevalent in the real world. Meanwhile table merges in PQ will bump into refresh performance issues at some point.
- Row level security uses DAX table filtering expressions.
- In many cases, you can use either language so pick what you know best and feel most comfortable with. You’ll see date dimensions produced in either M or DAX. In my case I’d never use DAX for this purpose, I have an extensive M solution that I polished over the years to handle things such as fiscal years, working days (including holidays), as well as different languages and locations. But if you’re comfortable doing that in DAX, knock yourself out.
3.4. MDX (Multidimensional eXpressions) – Thanks but No Thanks Except for Edge Cases
Think of MDX as DAX’s ancestor. It was created in 1997 to query data cubes, but while it’s powerful, it’s even more complex than DAX and is tied to multidimensional models. Microsoft has invested heavily in tabular models during the past decade, so while MDX still has its place in organizations heavily vested in multidimensional SSAS models, there’s no sense learning it otherwise these days. Microsoft stated explicitly that AAS and Power BI would stick to tabular.
You might still run into MDX in a Power BI tabular context though, for instance for Excel pivot tables connected to a Power BI dataset or when you use XMLA endpoints, though that’s running behind the scenes, so it’s not like you have to write your own MDX in these cases. File this under edge cases that may come handy for performance tuning if and when necessary.
MDX is not dead though: Kyligence uses it in its platform that combines current cloud patterns with tried-and-true OLAP cubes. This remains a valid approach at larger scales where maintaining MDX skills may be justified. It doesn’t hurt that Kyligence plays nice with Power BI and Excel.
3.5. SQL – the Unkillable Lingua Franca
Technically you do not need to know SQL syntax to query relational databases with tools that use Power Query (PQ) – Power BI, Excel, Dataverse et. al. – since PQ will generate the underlying SQL code for you. I discussed using PQ as a quick-and-dirty ORM in the Raw Data podcast.
However, at least rudimentary SQL skills will come handy whenever you’re discovering a new OLTP source. Being able to make spot queries against a source for auditing and troubleshooting is an essential skill.
Knowing SQL will also help you do advanced performance tuning by peaking at the xmSQL code generated by Power BI’s formula engine and sent to its storage engine. Yes, DAX uses its own SQL flavor behind the scenes!
With the addition of dynamic M query parameters, there are also scenarios where you might want to send hand-crafted SQL queries in Direct Query mode. And Power BI Premium’s datamarts include a built-in SQL server that you can query using T-SQL if you’d like.
Let’s now take a step back from Power BI tactical concerns. BI jobs, like all technical jobs, are often gated behind HR departments that have little clue about technology and will list a laundry list of requirements that may or may not be relevant, if they even make sense at all (required: 10 to 15 years of experience… with a tool that was created 4 years ago!). SQL will come up all the time, whether you actually need it or not for that position.
Beyond Power BI, SQL has known a resurgence in the “modern data stack” that’s currently getting oodles of VC money and startup hype. From dbt to Dataform to Seekwell, you’ll see SQL come up across the data stack, so it’s one of the key languages to make you employable with organizations that have very diverse architectures. And it’s the language you’ll use to query cloud data warehouses and lakehouses, from BigQuery to Databricks to Snowflake. It’s funny to see SQL touted by people in their twenties as the best thing since sliced bread considering the language was created more than 50 years ago, but that’s a testament to the lasting power of some technical skills, even with the barrage of new technology we’re constantly facing.
SQL is a bit like pidgin Latin in Europe for the best part of the past two millennia: if you want to travel around and want to get the best job opportunities, you pretty much need it.
Now, do you need to learn particular flavors of SQL beyond the ANSI SQL common core, such as T-SQL in the MSFT world? Probably not, unless you’re a data engineer and your company is deeply committed to a specific platform where it’s your job to optimize complex queries.
Having some sort of database desktop client will often come handy, at least for discovery purposes. With Microsoft it’s traditionally SSMS, complemented these days by the more lightweight Azure Data Studio. You’ll find plenty of free clients dedicated to other databases – MySQL, PostgreSQL et. al. – while DBeaver and Datagrip are versatile tools that can connect to plenty of engines. And if you’re doing modeling in your source data warehouse, have a look at SqlDBM.
Aside from knowing SQL syntax, I highly recommend also having at least a veneer of architectural knowledge for performance tuning purposes.
If you’re working at a Microsoft-centric company, you may also have heard about Kusto Query Language (KQL). It is not a SQL flavor, as among other things it can only read but not update or create anything, though it is similar in spirit to SQL. You’ll find it in Azure Data Explorer to, well, explore a variety of data sources.
3.6. R and Python – Very Useful in Many Different Ways, But Don’t Do It Just Because of Hype
With the recent craze, or should I say fad, about everyone turning into a data scientist after a 2-day bootcamp, R and Python have been all over the news so it’s hard to not feel FOMO if you don’t know either of these languages. Let me tell you, as someone firmly established as a Power BI consultant, that you absolutely do not need them to make a living as they’re not central to the Microsoft stack, nor essential to many other BI tools for that matter.
However, that doesn’t mean they’re useless at all, thanks to their pervasiveness which translates into the availability of endless learning resources, code samples, and libraries. These are good languages with lasting value. So where to start?
- Python is a general-purpose language that can be used from ETL to visualization to web scrapping to building websites. If Power Query doesn’t click for you and you want to know a language you can use in many different ways, you can’t go wrong with Python. Power Query is also not the fastest ETL engine so some people resort to using Python which you can actually call from within PQ (with some limitations). However, if you can generate SQL stored procedures or views (people increasingly like to do so with dbt), there’s a lot of functional overlap between Python and SQL and you may not need both (strictly speaking from an ETL perspective here).
- R is more of a specialty language with an emphasis on stats. If you’re being tasked with predictive analytics such as forecasts, this is a good way to go.
Either way one of the first things you’ll have to learn is how to handle data tables, so be ready to dive into pandas, data frames, or Dplyr.
3.7. Scala, Java – Toys for the Big Boys
Scala is less known by the general public, but fairly popular in data science realms. It’s favored in large scale projects using platforms such as Apache Spark, which in the Microsoft universe may lead you to Databricks. It’s a valid choice for data engineering in larger organizations, so whether you might want to learn it is very context dependent. You are definitely drifting away from core Microsoft technologies at this point.
Meanwhile Java is a general-purpose language that can run across operating systems and sees a fair amount of use in data applications. Not the hottest, newest technology, but it does the job and you’ll find it in plenty of established data stacks.
3.8. Excel – VBA, Office Scripts, Formulas, New Modern Excel
The world runs on Excel and poorly informed Python snobs who think it is limited to one million rows are idiots. Investing in spreadsheet skills should be safe well into the 25th century. But what does that mean? Here’s a quick overview:
- Traditional “advanced” excel: Pivot tables and charts, tables, data validation, conditional formatting, named ranges, etc. – These are core spreadsheeting skills, not really advanced to be honest, everyone should know how to do this. If you don’t, you’ll produce unmaintainable, error-laden nightmares.
- Old-school “power user” formulas: SUMIFS, VLOOKUP, INDEX/MATCH etc. – you’ll need them to maintain or update existing spreadsheets, but be careful, there’s usually a better way. and that is…
- Everything that overlaps with Power BI – PowerPivot/DAX, PowerQuery/M, custom datatypes, datasets as a data source / Analyze in Excel: no brainer, you’ll get rid of so many formulas and macros. A caveat: PowerPivot has not seen any updates in years so most people have moved on to Power BI for their data models.
- New-school “power user” formulas: dynamic arrays, LET, LAMBDA, etc. They’re breathing a new life into Excel and are totally worth learning as they’ll make you more productive. At the very least move from VLOOKUP to XLOOKUP.
- VBA: a lot of it can be replaced with PQ and OfficeScripts. VBA will likely never work in O365 which makes it a legacy skill that is not worth learning these days in most instances, unless you inherit a mission-critical macro-based spreadsheet.
- OfficeScripts: VBA’s cloud equivalent. Promising but immature and won’t fully do what VBA can. You can start using it but I’d wait before committing a lot of time. There are not even reference books on the topic yet. That said just the fact that you can trigger OfficeScripts from Power Automate makes it a very compelling addition and I’ve started using it to help orchestrate an Excel -> Dataverse -> Power BI solution.
- Hardcore Excel: cube formulas, Solver, arcane add-ins: still relevant for edge cases, usually in finance. You’ll know if you need this.
Interactive dashboard tools like Power BI don’t intend to fully replace the extreme freedom and flexibility granted by spreadsheets. The gap is narrowing thanks to custom visual vendors such as Inforiver and PowerOn, but there will always a place for full-fledged spreadsheets. Ignore requests from your users at your own peril.
3.9. C# – DAX & TOM Automation
Like Java or Python, C-Sharp is general in purpose so mentioning it here might seem out of place. But it turns out that Tabular Editor, one of the primary third-party tools for Power BI Desktop, uses this language to create macros that automate DAX code creation. You won’t need anything too complex though, and macros are typically the type of code that you edit from snippets more often than from scratch. Consider this if you’re writing DAX for a living. You can also write TOM scripts in C#, we’ll get back to this in a minute.
3.10. Typescript – Used in the Custom Visual SDK, Otherwise Feel Free to Ignore
3.11. PowerShell – Admin Automation
PowerShell can be used to automate processes against APIs like the Power BI REST API and the admin API, so if you do a significant amount of administration and resource provisioning, it’s worth picking to script these tedious tasks.
Here again you can often get away with educated copy/paste/edit (aka “StackOverflow coding”).
3.12 XMLA, TMSL, TOM, ASSL, AMO, LINQ, BIML – Enterprise BI Tools Marginally Relevant to Modern Self-Service
To make sense of the overlapping set of tools and protocols I’m about to list, understand that they were added historically through evolving compatibility levels over more than 10 years. Only compatibility levels 1500 and above include Power BI Premium, everything below is Analysis Services only. Power BI Premium is becoming a superset of Analysis Services, which is why I’m mentioning this alphabet soup at all.
XML for Analysis (XMLA) is a “protocol for communications between client applications and an Analysis Services instance.” Now that the XMLA endpoint in Power BI Premium supports read and write operations, this is the most relevant acronym in the enterprise stack inherited from SSAS. You can do interesting stuff with it such as refreshing individual tables – or even partitions within tables – within a data model.
Tabular Model Scripting Language (TMSL) is “the command and object model definition syntax for tabular data models” which you can use to modify tabular models sending JSON definition files to the XMLA endpoint via SSMS (JSON over XMLA… LOL!). Another, more modern inheritance from Analysis Services. This can be used to do advanced stuff in your models such as adding columns to a published data model, though most people won’t ever need this.
Tabular Model Definition Language (TMDL) is a new readable and editable format meant to ease collaboration. It’s going to be supported both by Microsoft (see this VS Code extension) and third-party tools such as Tabular Editor and pbi-tools. It’s not meant to replace TMSL but rather to offer a more developer-friendly experience.
Analysis Services Scripting Language (ASSL for XMLA) is TMSL’s ancestor, was based on XML scripts, and only works in SSAS at compatibility levels of 1103 and lower, not AAS nor Power BI Premium.
Analysis Management Objects (AMO) is equally irrelevant to Power BI as it’s limited to older SSAS/AAS models.
Tabular Object Model (TOM) is an AMO extension that does work with Power BI to “expose native tabular metadata, such as model, tables, columns, and relationships objects.” TOM doesn’t use TMSL to communicate with your server, instead it’s done with SQL Server Analysis Services Tabular Protocol (MS-SSAS-T). TOM gives you even more programmatic options than TMSL, but you’ll have to use C# and it relies on .Net libraries so you’ll have to jump through hoops to execute your scripts in the cloud. The TOM library can serialize to TMSL or TMDL, both formats deserialize to TOM.
Language-Integrated Query (LINQ) “is the name for a set of technologies based on the integration of query capabilities directly into the C# language.” In other words, it’s a way for app developers to generate SQL queries in the language they already use. A long time ago there was a LINQ-DAX bridge but that project looks dead.
Data Mining Extensions (DMX) “is a language that you can use to create and work with data mining models in Microsoft SQL Server Analysis Services. You can use DMX to create the structure of new data mining models, to train these models, and to browse, manage, and predict against them. DMX is composed of data definition language (DDL) statements, data manipulation language (DML) statements, and functions and operators.” It’s deprecated in SSAS 2017, discontinued in SSAS 2022, and never made it into Power BI.
Business Intelligence Markup Language (BIML) is an XML dialect to specify BI solutions (relational models, data transformations…), used primarily with SSIS and to a lesser extent with SSAS, so it’s not directly relevant to your average Power BI shop.
3.13. Summary In Order of Diminishing Mass Appeal
- DAX and M: must-haves in the Power BI realm, sadly proprietary and useless outside of the Microsoft world. It’s only in very structured enterprise shops where there’s already a mature data warehouse that you might get away with just DAX.
- SQL: you’re crippling yourself if you don’t have at least basic SQL knowledge. The #1 language you should consider if you don’t want to be tied to a single platform.
- Modern Excel: yes! Many people have a completely stale take on what it takes to be proficient with Excel, and many people look down on Excel out of crass ignorance. Also, good luck training your CFO to use your Jupyter notebooks, data science geniuses!
- Python: must-have with many non-Microsoft BI stacks, nice-to-have in a Microsoft shop.
- R, Scala, Java: useful depending on what your organization is already using. I wouldn’t learn these out of the blue.
- TypeScript, C#, PowerShell: learn the basics if you’re pursuing the respective use cases – visuals, DAX.modeling automation, admin – where they come useful, again don’t learn this just in case.
- XMLA, TMSL, TOM, MDX, LINQ, BIML, ASSL etc.: For the most part, this is legacy Microsoft enterprise BI stuff that was already waning in relevance years ago, relatively to the rising self-service Power BI stack. Most people who need these already know them so don’t learn them unless you really have to, with the exception of XMLA/TMSL/TOM if you’re in an enterprise shop/consultancy that intends to scale up Power BI Premium for use cases where SSAS/AAS were used up to now.
4. Other Tech Stuff You’ll Need to Know Alongside the Language(s) of Your Choice
4.1. File Formats: JSON, XML, CSV, Parquet, AVRO, YAML, Geospatial, etc.
Olivier, this is scope creep, file formats are not languages so why bring them up in this entry? Technically dear
reader voice in my head, you are right, but file formats are often used for declarative purposes which can blur the line between configuration and development. Think of how Docker Compose files use YAML, or how Vega is based on JSON statements.
Besides, as a card-carrying data analyst, you deal with the structure of various data-carrying file formats all the time, so you definitely want to be familiar with at least the big three: XML, JSON, and CSV. As you get involved with bigger projects, you’ll no doubt run into Parquet and other “big data” file formats (often columnar ones) that perform better than their common “small data” brethren. In non-Microsoft shops you might run into ORC, Iceberg, or Arrow.
A challenge that often needs to be solved is how to traverse XML/JSON trees and what to do with nested arrays, which can either be expanded as extra rows within the same table or turned into a separate table. Things get even worse with ragged hierarchies that include a mix of scalar values, lists, and records. The answer will vary depending on the data model at the end of your data pipeline, whether we’re talking dimension or fact tables, or what’s the grain of your tables, not to speak of finer details such as whether your dimensions are fixed or slowly changing.
In some niche areas you may also run into a slew of custom file formats. Geospatial is one such universe of its own, with shapefiles, geojson, topojson, KLM, and dozens more! You can either safely ignore these, or you may have to do a deep dive in case you’re asked to do advanced mapping with custom boundaries.
4.2. REST APIs, HTTP POST/GET, Odata, Curl, etc.
Here again we’re outside of what strictly constitutes a “language”, but there’s a bunch you’ll have to learn to deal with APIs and Odata feeds. It’s not enough to have generic Power Query or Python string and array manipulation knowledge, you’ll also have to know how to handshake with API sources by handling their authentication (welcome to Oauth hell), pagination, throttling, query filters, whether they return JSON or XML, and many other subtleties. Meanwhile, Odata feeds can be queried via URL parameters.
Even though REST APIs tend to look the same from afar, each of them ends up having its idiosyncrasies, shortcomings, poorly documented syntax, and a ton of other headaches. If you love APIs you also end up hating APIs!
4.3. RPA & Low Code Platforms
If you build operational or financial dashboards, you’ve likely found that scope tends to leak into adjacent scenarios such as Robot Process Automation (RPA) and low-code data input apps – think SharePoint Lists, PowerApps, and plenty of equivalent platforms from other vendors from Airtable to Notion to Bubble.
Some of these tools are strictly UI-driven but most can involve the use of formulas or code to create more advanced interfaces and business logic. And these Power/O365 siblings work well together to orchestrate Power BI itself: I have set up Adaptive Cards in Teams letting end users start a Power Automate flow that chains dataflow and dataset refreshes from SharePoint and Dataverse sources.
You might not learn an entire new language there as opposed to just the formulas you need, but you’ll be too limited if you only stick to point and click. You’ll need to understand basic programming concepts such as variables, loops, or conditional statements. And you’d better adopt basic “DevOps” discipline by at least maintaining separate development and production workspaces. Generally thinking, the best business intelligence practitioners can think both like a CEO and like a developer.
5. Conclusion: Remain Calm and Keep Learning
The bad news is that business intelligence platforms encompass a humongous number of technologies and concepts, especially when you’re using one of the Big 3 imperial cloud providers (Amazon, Microsoft, Google), what with their obsession with spinning new services all the time. And as much as you may think you have a well-defined, relatively narrow job, you don’t.
The good news is that as a business analyst of superior intellect, you’re better equipped than most to make decisions involving functional ambiguity, technical sprawl, overlapping solutions, vendor hype, peer pressure, uncertainty about the future, imposter syndrome, and other noise sources that would cloud the judgment of lesser beings. (Am I pandering to my audience too much?)
Pick your battles, always keep learning, double down on what works, be ready to dump what doesn’t, and have fun! Don’t learn for the sake of learning as that won’t stick. Instead, learn while doing by applying the skills as you acquire them. And try to pick complementary skills so that they make sense as a coherent stack. With persistence and a bit of luck you can craft a skillset that works for you with immediate rewards as well as a fair amount of future proofing.
Here’s a mind map I created that outlines the types of skills and technologies you may run into as a Power BI person. I also wrote an entry on BI job profiles and skills that is a direct complement to this one. Don’t hesitate to contact me if you need individual or team coaching!
Awesome and thorough post! Haven’t seen a more comprehensive review of what it takes to be a well rounded Power BI guru. My only thought is the knowledge of rest/Jason/oauth APIs are becoming more and more necessary to be proficient in today’s exploding use of SaaS. Traditional data people seem to have trouble picking up as it is so different from traditional tabular data. I enjoy reading your posts!
Thanks Greg! Agreed on SaaS APIs, in my case I already had the skillset from doing some web development.
Exceptional blog post, thank you for sharing this. It comes just at the right time… Im about to go for an interview for PowerBI Analyst with scope for R development. I dont know R but i do know that M-code and DAX bring functional languagea, the concept knowledge and skills are totally transferable.
I wanted to tweet your blog post and tag you with acknowledgements but it says your account is suspended 🙁
Thanks! The idiots at Twitter suspended my account for a sarcastic comment that they somehow took as a threat. I couldn’t connect with anyone with a functioning brain to review my case and gave up. Once your account is suspended you’re in an insane position where you don’t even have access to your past direct messages! I hope Elon Musk does buy the company, lots of idiots to be fired over there for sure. Feel free to tweet anyway, the link to my entry will suffice 🙂 Good luck with the interview!
thank you i really enjoyed.
but what about ssis ? for a power bi developer person
SSIS is a legacy on-premises tool, I wouldn’t learn it unless my employer asked me to because they’re not migrating sources to the cloud (where you’d typically use ADF instead of SSIS).
Thank you for sharing this, I thoroughly enjoyed reading it. I really appreciate how you kept everything interesting and relevant, but also very transparent. I’m just getting started in this space, but feel more confident in what direction to go with my research now. You’re definitely right about the hype about R and Python but the usefulness of them outside the Microsoft suite. And great point about overstack and other free resources for foundational code and code solutions, such a time saver.
Thanks for the detailed post. That was a great read. My path has been SQL, SSIS and SSRS, then Python and R and lately Power BI(DAX, M) together with Synapse and Dataverse stuff.
I would add to your list:
– Pandas. Most useful Python library for general Data manipulation and analysis
– Regex: has helped me countess times, usually together with Pandas
– FetchXML. Yet another Microsoft language, useful in the Dataverse
– Spark. For Data engineering. A useful cross-cloud language
Thanks for your comment Bernardo, good stuff.