top of page

Why Power BI Says a Value Is a Duplicate When It’s Not

  • Writer: Vojtěch Šíma
    Vojtěch Šíma
  • Nov 1
  • 4 min read
tl;dr: Power Query is case sensitive, while DAX and the model are case insensitive by default. That mismatch can cause weird relationship key issues. Best fix: Avoid text columns as relationship keys. If you have to use them, normalize them in Power Query with Text.Lower or Text.Upper, or de-dup with a case-insensitive comparer like:  Table.Distinct(Source, {"letter"}, Comparer.OrdinalIgnoreCase).


Unexpected duplication

The column has duplicated values on the 'One' side of the relationship
The column has duplicated values on the 'One' side of the relationship

I bet you have seen this error message:

Column 'letter' in Table 'dim_letter' contains a duplicate value 'a' and this is not allowed for columns on the one side of a many-to-one relationship or for columns that are used as the primary key of a table.


And I bet you used Remove Duplicates in Power Query before you loaded the table. Seemingly, you did everything correctly, except you didn’t.


First thing you did wrong: you used a text column as a key. Not the best practice, but I’ll let it go.


The second thing you did was trust Power Query to remove the duplicates. It did, just differently than you expected.


Case-sensitivity mismatch

Power Query, or M, is case sensitive. So when you remove duplicates on a text column, values that differ only by casing are treated as different. “a” and “A” are distinct, and both stay.


DAX works differently, though. When you load the table, the VertiPaq engine builds a dictionary from your text column case-insensitively and keeps the first occurrence it sees. If it hits “a” first, later “A” or “a” entries map to “a.”


Same for longer strings. If “gOOD morning” appears first and the rest are “Good morning,” you will see “gOOD morning” everywhere.


Difference between M and DAX cases
Difference between M and DAX cases

Remove Duplicates in Power Query

When you remove duplicates in Power Query, you won’t remove values that have the same letters in the same order (fancy sentence for a 'word') but differ by case.

M remove duplicates
M remove duplicates

This is valid, as it has removed all distinct values from that column (in this case, the whole table, since I have only one column). In the Column distribution, you can see 4 distinct and 4 unique values.


After loading, we get 4 rows but only 3 distinct values. If a relationship were built on this column, that mismatch would throw an error and fail the semantic model refresh.

DAX distribution
DAX distribution

How to fix it

I’ve got three quick fixes.


First, don’t use a text column as a relationship key. Easy win.


Second, normalize the text with Text.Upper or Text.Lower before loading, then remove duplicates. Either add a new column to preserve the original for mapping (to a degree), or transform the existing one with Table.TransformColumns.


let
    Source = src,
    addTextLower = Table.AddColumn(Source, "letter_lowered", each Text.Lower([letter])),
    removeDuplicates = Table.Distinct(addTextLower, {"letter_lowered"})
in
    removeDuplicates

or


let
    Source = src,
    addTextLower = Table.TransformColumns(Source, {{"letter", each Text.Lower(_), type text}}),
    removeDuplicates = Table.Distinct(addTextLower, {"letter"})
in
    removeDuplicates

Third, use Table.Distinct with a case-insensitive comparer ( see below).


let    
    Source = src,
    removeDuplicates = Table.Distinct(Source, {"letter", Comparer.OrdinalIgnoreCase})
in
    removeDuplicates


Case-insensitivity validation in Power Query

Let’s look at how to spot this right in Power Query, so you can do something about it. Power Query’s built-in Keep Duplicates is case sensitive, and the code it spits out can be overwhelming for most of us (to modify it). There is a much easier way to check whether you have duplicates.


If you want a quick check, click the column you want to test and use Remove Duplicates. Then, in the produced Table.Distinct step, add one equationCriteria: Comparer.OrdinalIgnoreCase.


= Table.Distinct(Source, {"letter", Comparer.OrdinalIgnoreCase} )

This comparer switches the default case-sensitive check to case-insensitive, then removes any occurrence of that value after the first match, keeping the casing of the first encountered value.


Removed duplicate case-insensitively
Removed duplicate case-insensitively

If you want to quickly see how many rows were removed, add a new step like this:

rowsRemoved = Table.RowCount(Source) - Table.RowCount(removeDuplicates)

Where Source is your original step, and removeDuplicates is the step with your Table.Distinct function.


Set the Keep Duplicates button as case-insensitive

If you do not mind a little tinkering inside the code Power Query generates, you can make the built-in Keep Duplicates behave case-insensitively. You just need to add two tiny things to the generated step.


Here is the original code (formatter) Power Query produces when you click Keep Rows -> Keep Duplicates:


Keep Duplicates
Keep Duplicates
= let
    columnNames = {"letter"},
    addCount = Table.Group(
        Source,
        columnNames,
        {{"Count", Table.RowCount, type number}}
    ),
    selectDuplicates = Table.SelectRows(
        addCount,
        each [Count] > 1
    ),
    removeCount = Table.RemoveColumns(
        selectDuplicates,
        "Count"
    )
in
    Table.Join(
        Source,
        columnNames,
        removeCount,
        columnNames,
        JoinKind.Inner
    )

The way it is written right now will not let us keep duplicate rows with a case-insensitive approach, because Table.Group groups rows using a case-sensitive match. So we need small adjustments.


The first adjustment is that you have to add a whole new set of tables for each row, that's because TableGroup will trim the rows, and we will lose the case-insensitive repeated values.


The second adjustment is to set everything as case-insensitive with the comparer function, namely Comparer.OrdinalIgnoreCase.


Here's the whole ready-to-copy code (adjust columnNames and previousStep):

= let
        previousStep = Source,
        columnNames = {"letter"},
        addCount = Table.Group(
            previousStep, 
            columnNames, 
            {
                {"tbl", each _, type table}, 
                {"count", Table.RowCount, type number}
            }, 
            GroupKind.Global, 
            Comparer.OrdinalIgnoreCase
        ),
        selectDuplicates = Table.SelectRows(
            addCount, 
            each [count] > 1
        ),
        result = Table.Combine(selectDuplicates[tbl])
    in
        result

So it is mostly the same. The only change is that I added an extra aggregation column called tbl, which becomes the returned value:

{"tbl", each _, type table}, 

Second, I set the optional parameters to make sure we compare case-insensitively:

        GroupKind.Global, 
        Comparer.OrdinalIgnoreCase

I also added a previousStep variable so you can tweak it in one place instead of editing the Table.Group step directly:

previousStep = Source,

The rest stays the same. I trimmed the part that removed the Count column, since we access the new tbl column directly. I also added a result variable so in access variable name, not the expression directly. That is just for debugging and how I usually write code. You can place the Table.Combine call directly in in if you prefer.


Case-insensitive result
Case-insensitive result

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page