Data Modeling in Power BI
I wanted to highlight in this post some of my best practice tips for Power BI data modeling.
Data modeling is one of the foundations of your Power BI reports, so it’s important to set it up correctly. It’s crucial to create a well-developed data model to help you fully communicate your organization’s information. Data modelling helps you optimize your tables and columns. Doing it right will save you so much time.
These tips and best practices will help you create a more organized model, making it easier for you to understand the relationships in your data set. Once you’ve fully grasped how to set up data models, you can easily create intuitive and meaningful Power BI reports.
Benefits of using a Good Data Model:
• Increased report performance
• Less complex DAX to be used and accurate results
• Queries will load faster
Disable Autodetect new relationships option
If you add a new table to the model that has field names (and values) matching with other tables in the model, then Power BI automatically creates a relationship between each of the existing tables and the new table, and this can be sometimes too many new relationships.
This can lead to an unexpected relationships and would cause data issues.
Disable Auto date/time option
The main purposes of the auto date/time are to support convenient time intelligence reporting based on the date columns loaded into the data model.
Instead of using this date hierarchy which is not complete, just create a Date Calendar Table. How to create a Date Calendar Table can be found here
For each date column that generates a hidden auto date/time table, it will result in an increased model size and also extend the data refresh time.
Enable Visual cross-filtering
When setting up your data model, start by organizing your tables. One of the best ways to do this is by using the Waterfall technique (layout). Another way to do this effectively is by using the Star Schema approach.
The Layout method – Waterfall or stacked around the fact table(s) as a STAR model, does not matter as long as you understand it
My personal preference is the Waterfall layout, as you can visualize how the data will be filtered (also called “Lookup” Layout)
The *STAR* Layout
One of the ways is by using the Star Layout
- Fact table at the center of the star
- Dimension (Lookup) tables at the end points of the star
The Waterfall Layout
The second approach and layout for data modelling in Power BI is called the Waterfall technique. You set up your Dimension Tables at the top and the Fact Tables at the bottom.
Here are some of the benefits of this approach:
- Arranges tables in rows (Dimension tables in the first row, Fact tables in the second row).
- Easy to visualize the dimensions “falling” to the Fact table.
- Displays all relationships and makes it easier to see errors.
When using this layout, you need to remember these details:
- Delete any relationships autogenerated by Power BI.
- Manually create all relationships.
- Use the “Manage Relationships” dialogue to maintain relationships.
Handling Multiple Fact Tables
So maybe you are thinking of having a multi-fact table model, how do you go about doing this?
You will need to do the following. I have provided an example from the few points above.
1. Identify the two fact tables — Actuals & Forecast
2. Agree on the granularity — Monthly, with the ability to drill
3. Identify the common dimensions — Country and Dates (create a FK)
4. Link the dimensions — Only Many to one (One to Many) cardinality
Hide Foreign Key (FK)
What is a foreign key? In your data model, you create relationships between a Primary Key (PK) and a Foreign Key (FK). An example of that is demonstrated below.
In the dimProduct table, you have a list of all products of your company. Each product is listed exactly one time, so you have unique values. This is your Primary Key.
In the fctSales table, a product could have multiple sales or no sales at all. This is the Foreign Key.
The Foreign Key should not be used in the Report View and should therefore be hidden (right click on it and select Hide in report view or click on the eye icon to hide).
Each model relationship is defined by a cardinality type. There are four cardinality type options, representing the data characteristics of the “from” and “to” related columns. The “one” side means the column contains unique values; the “many” side means the column can contain duplicate values.
The four options, together with their shorthand notations, are described in the following bulleted list:
- One to many (1:*)
- Many to one (*:1)
- One to one (1:1)
- Many to many (*:*)
The one-to-many and many-to-one cardinality options are essentially the same, and they’re also the most common cardinality types.
When configuring a one-to-many or many-to-one relationship, you’ll choose the one that matches the order in which you related the columns. Consider how you would configure the relationship from the Product table to the Sales table by using the ProductID column found in each table. The cardinality type would be one-to-many, as the ProductID column in the Product table contains unique values. If you related the tables in the reverse direction, Sales to Product, then the cardinality would be many-to-one.
Please check the below example of Many to one cardinality relationship.
A one-to-one relationship means both columns contain unique values. This cardinality type isn’t common, and it likely represents a suboptimal model design because of the storage of redundant data.
For more information on using this cardinality type, see One-to-one relationship guidance.
A many-to-many relationship means both columns can contain duplicate values. This cardinality type is infrequently used. It’s typically useful when designing complex model requirements. You can use it to relate many-to-many facts or to relate higher grain facts. For example, when sales target facts are stored at product category level and the product dimension table is stored at product level.
When you create a relationship in Power BI Desktop, the designer automatically detects and sets the cardinality type. Power BI Desktop queries the model to know which columns contain unique values. For import models, it uses internal storage statistics; for DirectQuery models it sends profiling queries to the data source. Sometimes, however, Power BI Desktop can get it wrong. It can get it wrong when tables are yet to be loaded with data, or because columns that you expect to contain duplicate values currently contain unique values. In either case, you can update the cardinality type as long as any “one” side columns contain unique values (or the table is yet to be loaded with rows of data).
Data type should match
The below example the relationship is based on ProductKey which is numeric (number)
Many to one (*:1)
The ProductKey column in the dimProduct table contains unique values.
Cross filter direction
Each model relationship is defined with a cross filter direction. Your setting determines the direction(s) that filters will propagate. The possible cross filter options are dependent on the cardinality type.
Bi-directional relationships can be used carefully i.e. using SnowFlake Schema
Alternatively, to avoid using bi-directional relationships, you can add a filter to the slicer visual to exclude items that are blank. It is worth noting that you would need to do this for each visual that uses fields across dimension tables.
Active or Inactive Relationship?
There can only be one active filter propagation path between two model tables. However, it’s possible to introduce additional relationship paths, though you must set these relationships as inactive. Inactive relationships can only be made active during the evaluation of a model calculation. It’ is achieved by using the USERELATIONSHIP DAX function.
An example of using Inactive relationships is so called Role Playing Dimensions
A role-playing dimension is a dimension that can filter related facts differently. For example, at Adventure Works, the date dimension table has three relationships to the reseller sales facts. The same dimension table can be used to filter the facts by order date, ship date, or delivery date.
Create Measure Table
In order to have a well structured report and keep your data model organized, it’s best to organize measures in separate tables. See below how to create dedicated Measures Tables:
1. Add a new table
2. Name the table and load it to the report
3. The new table has been added to the report
4. Allocate a new measure to this table
5. Hide Column1
Format your measures
DAX measures can become quite complex. To still be able to read complex DAX functions, you should consider adding some formatting to the formula in order to increase readability.
1. Add line breaks. Use Shift + Enter to add a line break (Alt + Enter)
2. Indent to create a structure. Use Tab key to indent
3. Add comments.
For more complex DAX measure, you can add comments to explain what the measure is calculating.
* For short use // or --
* For multiple code lines you can use block comment /* your dax code */
Try to use good variable names like _TotalSales
Organize your measures in folders
Often, you end up having a big number of measures in a report. To still have the overview, you can organize measures in folders. To create and manage folders, you need to go to the Model view.
The newly created measure folder will appear in your model.
Once a folder has been created, you can also use drag and drop to move measures in between folders.
Create Date Dimension Table
Date table will not be discussed in this post. How to create a date dimension can be found here
* Use CamelCase or underscore.
Examples: TotalSales (measure)
Sales (fact table) or Fact_Sales or fctSales
DimProduct (Dimension table)
Disconnected tables and their purpose
Disconnected tables are useful in a variety of ways, allowing a user to select a value that drives the behavior of a dynamic measure, or by grouping values and performing some kind of dynamic calculation.
Typical use of disconnected tables is toggle slicers, i.e. MTD/YTD
This table was created using the Enter Data feature and includes only one column named Period, which includes the following values:
The purpose of this post is to demonstrate effective data modeling and not DAX fundamentals but one cannot exist without the other. For our purposes, my model in this example contains five measures that calculate the Store Sales Qty with different time series variations. The Store Sales Qty measure simply returns the Sum of the Quantity field and the Store Sales Qty MTD calculates the month-to-date value for a given date. I’ve created another measure named Store Sales Qty Selected Period which is shown below. When a value in this table is selected, this measure returns the results of one of the other measures.