Find Duplicate Rows in Excel Across Multiple Columns
Excel offers a number of ways to find rows that contain duplicate values. However, most of them focus on finding rows where the value in just one column is duplicated. In this lesson, we look at how to use the COUNTIFS function to find rows with duplicate entries in two or more columns. We then use the COUNTIFS function in combination with Excel's Conditional Formatting feature to highlight duplicate and triplicate rows.
Finding Duplicate Rows - Example data
This lesson uses an example of a product order table. Our data table lists orders of spare parts for several computers. We need to identify how often each spare part is being ordered for each machine. A sample of this data can be seen in the following picture:
As you can see in this example, there are several examples where the Product and Part Ordered are repeated. For example, a Desktop Monitor has been ordered twice, as have Backup Tapes for the Server.
The COUNTIFS syntax
COUNTIFS is a function that allows you to count only those rows in a spreadsheet where certain criteria are met. It is closely related to the COUNTIF function. COUNTIF allows you to use just one criteria, whereas COUNTIFS allows you to use multiple criteria. You can find out more about COUNTIF in our lesson on how to Use COUNTIF to count the cells in a range that match certain values. Note that the COUNTIFS function was introduced in Excel 2007, so the method we're going to look at in this lesson will only work if you have Excel 2007, Excel 2010 or Excel 2013. The screenshots and examples in this lesson have been produced using Excel 2013, but this method will work in all three versions.
COUNTIFS has the following syntax:
=COUNTIFS(criteria_range1, criteria1, [criteria_range2, criteria2]…)
As shown here, you can define multiple ranges that the COUNTIFS function should look at, and the criteria it should use when deciding whether to count each row as part of its calculation. Our example will require that we use two range/criteria pairs, but Excel allows you to have up to 127 pairs in a single formula.
Use COUNTIFS to find rows that contain duplicate values
The first step in identifying duplicate rows is to write a formula using COUNTIFS to count how often each row is repeated.
We'll start by adding a new column to our table, into which we will enter this formula.
The formula we will use is:
This to be entered into D2 in our table and then copied and pasted down the table. The results of this are shown here:
As you can see, there are four rows shown here where the Duplicate value is greater than one. If you look more closely, you'll see that there are actually only two rows which are duplicated (Desktop | Monitor, and Server | Backup Tapes). This indicates that our formula is working, since it has found each row which has a duplicate somewhere in the table.
Let's look a little more closely at the formula we've used:
- Note how we've used B:B to represent the Product column, rather than B2:B13. By doing this, we ensure that our formula will work no matter how many rows we add to the table. We've done the same with C:C to represent the Part Ordered column.
- We've used B2 and C2 to identify the criteria for the COUNTIFS function. Note that they are relative references, so sopying this formula from D2 to the rows below it ensures that the COUNTIFS function always uses the values in the current row as its criteria.
- The COUNTIFS function will only count rows in which both of our criteria are matched.
- Note that if you just wanted to count the rows where only the Product appears multiple times, you would use COUNTIF instead, like this:
- This version of the COUNTIFS formula allows you to only show a value in the Duplicates column if the row is a duplicate - in rows that only appear once, nothing will appear in the Duplicates column:
- =IF(COUNTIFS(B:B,B2,C:C,C2)>1,COUNTIFS(B:B,B2,C:C,C2),"") or, even simpler, try this:
At this point, you may consider your task complete - we've identified the duplicate rows, and we can now use this information to analyse our data further.
Use COUNTIFS to apply Conditional Formatting
However, it may be that we don't want to add another column to identify the duplicates - we only want to identify those rows which are duplicates. We can do this using the COUNTIFS function in combination with Excel's Conditional Formatting feature:
- Click on cell A1. This is important - if you don't, then the Conditional Formatting we are about to set up may be applied to the wrong cells.
- Create a new Conditional Formatting rule using the Conditional Formatting button on the Home ribbon toolbar:
- In the New Rule dialog box, change the values to look like the following:
- There are several points to note when creating the new rule:
- We are using the option to Use a formula to determine which cells to format. Although Excel offers a Conditional Formatting option to highlight duplicate cells, it won't help us in this scenario.
- The formula that you need to enter in the Rule Description (Format values where this formula is true) is a version of the formula we used in the previous section, but it's critical that you copy it as it's shown here, =COUNTIFS($B:$B,$B1,$C:$C,$C1)>1, otherwise you may get unpredictable results. The $ signs to make the references absolute are needed, as is the change of B2 and C2 in our original formula to $B1 and $C2.
- You can choose whatever formatting you want to appy - I've chosen to fill cells with green shading.
- Once you click OK to this rule, you will need to edit it further to define which cells it should be applied to. You can access the Manage Rules dialog box using the Conditional Formatting menu as shown above (it's the last item in the menu).
- You should then see the rule you just created in the list - if you don't, change the option shown above from Current Selection to This Workbook and it should appear.
- Change the Applies to value from what is shown above, =$A$2 to =$A:$D. This will tell Excel we want to apply this rule to all cells in columns A through to D.
- You can either type this formula in, or you can click and drag across the row headings to select the rows you want.
- You could also select a specific range of cells, although that will mean you may need to edit this range again in the future.
- You can then click Apply to see a preview of the rule changes in your spreadsheet, or you can click OK to save the changes and return to your spreadsheet, which should now look like this:
- If your spreadsheet looks like this instead, you probably missed the first step in the instructions, which was to click on A1 before creating the rule. In that case, you might see something like this:
- As you can see, the formatting has been applied to the wrong rows. If this happens, then return to the Conditional Formatting Rules Manager as show above, and edit the rule. You will see something like this:
- Note how the formula we entered has changed (you might see a different formula, but the key is that the criteria values for both ranges have changed). If you see this, it is because you did not have cell A1 selected when creating the rule.To fix this, you can:
- Delete and start the rule, making sure you select A1 before starting.
- Change the formula back to what it should be.
Extending your conditional formatting rules to highlight triplicates and more
In the example above, we created a conditional formatting rule to highlight any rows that are duplicates of other rows, based on a combination of the Product and Part Number fields. You could extend this to cover additional duplicate rule checks - and it's easy to do. To highlight rows that appear three times sor more, do the following:
- Create a new rule using the steps in the previous section. Make it the same as the rule in the example above, with one change to the formula:
- Don't forget to change the formatting you apply to rows that appear three times or more or you won't see any difference.
- Then, change the order of the rules in the Conditional Formatting Rules Manager, like this:
- There are two key points to note here:
- The sequence of the rules can matter. In this case, the triplicate checking rule appears before the duplicate checking rule. If you are trying to create multiple rules to apply to the same range of cells and the formatting is not being applied correctly, it is worth experimenting with the sequence of the rules (there are buttons to move rules up and down next to the Delete Rule button.
- The option to Stop if True can help solve problems in certain cases, if the second and subsequent rules are changing the formatting applied in an earlier rule. This means that the subsequent rules will not be run once the conditions of the first rule have been met.
- Here's how the spreadsheet we've been working with looks when you have both rules applied (I have tweaked the data from the original example to include Server | Backup Tapes a third time):
- As a final thought, you could also apply Excel's Autofilter tool to this table. Autofilter allows you to filter based on the colours in a cell, so you could filter out just the green or blue rows. Read our lesson on How to use Autofilter in Excel.
Checking for Duplicate Rows using COUNTIFS - Lesson Summary
In this lesson, we looked at two ways to identify duplicate rows in a spreadsheet, where a row should only be considered a duplicate if the values in two cells match (rather than just one value).
The COUNTIFS function, available in all versions of Excel since 2007, allows us to check for a match of more than one value across all the rows in our table. Unfortunately, it also requires that you create a new column in your table in which to put the COUNTIFS formula; this formula needs to be in every row of your table.
Conditional Formatting lets us take the COUNTIFS function we looked at and use it to apply formatting to rows that are duplicates or even triplicates. We can do this without having to create a new column in our table (note that while we left the Duplicate column in our examples above, we didn't need it there for the Conditional Formatting part of the lesson.