Delete Duplicate Rows in SQL Server

In this SQL Server tutorial, we will learn How to delete duplicate rows in SQL Server, Different methods to delete duplicate rows in SQL Server, and will cover the following topics.

  • How to find duplicate rows in SQL Server
  • Delete Duplicate Rows in SQL Server using Query
  • Delete Duplicate Rows in SQL Server based on one column
  • Delete Duplicate Rows in SQL Server using CTE
  • Delete Duplicate Rows in SQL Server using Row_Number
  • Delete Duplicate Rows in SQL Server using Rank

While developing SQL Server objects, we should adhere to specific best practices, which will eventually help to maintain data integrity and better performance. For example, a table should have primary keys, identity columns, indexes, and constraints.

Also, to ensure that there is no duplicate entry in the SQL Server is a best practice. So in this post, we will understand different ways to find and remove duplicate rows in SQL Server 2019.

Now to remove the duplicate rows from a table, first, we should have a table with duplicate entries. So to implement different ways to remove the duplicate entries, we are going to use the following sample table given below.

Sample table having duplicate rows in SQL Server
Sample Table

And the sample table has a total of 10 records, out of which 4 are duplicate rows.

Here I have used sql server 2019 express edition and sql server management studio.

How to find duplicate rows in SQL Server

Now before directly trying to delete the duplicate rows in SQL Server, let’s first try to find the duplicate entries. And for this implementation, we can use the “GROUP BY” and “HAVING” clauses in SQL Server 2019.

The GROUP BY clause is used to group multiple rows that have the same values in them. And HAVING is an optional condition used to restrict the rows affected by the GROUP BY clause.

Now to get the duplicate rows using the GROUP BY and HAVING clause, we can follow the following syntax.


SELECT col_1, col_2,..,COUNT(*) AS [Count]
FROM table_name
GROUP BY col_1, col_2,..
HAVING COUNT(*) > 1;

By using the above syntax, we are grouping the duplicate rows by using the GROUP BY clause. After this, we are using the COUNT function to get the count of multiple duplicate records.

Now let’s take an example to understand this implementation, and for this, consider the following query given below.

SELECT first_name, last_name, email, Country, COUNT(*) AS [Count]
FROM SampleTable
GROUP BY first_name, last_name, email, Country
HAVING COUNT(*) > 1;

Now for example demonstration, we are using the sample table. And by using the above query, we are grouping all the records where first_name, last_name, email, and Country values are the same. After this, we are using the Count() function to get the count of duplicated records.

So after successfull query execution, we will get the follwing output.

Finding duplicate rows in sql server
Finding duplicate rows in SQL Server

Read: SQL Server Create Temp Table

Delete Duplicate Rows in SQL Server using Query

So till now, we have learned how to find the number of duplicated records in a SQL Server table. But if we want to remove the duplicate rows or records, first, we need to keep one record out of duplicated ones in our table, and then remove the remaining duplicated values.

For example, in our sample table records at id 2 and 3 are duplicate records. So we only have to delete one of them. Now for this implementation, first we have to select the records, which we want in our table and then remove the remaining ones.

Now for demonstration, consider the following query given below used to delete duplicate records from the sample table.

DELETE FROM SampleTable
WHERE id NOT IN
(
SELECT MIN(id)
FROM SampleTable
GROUP BY first_name, last_name, email, Country
);

In the above example, first, we are using the SELECT statement to select all the duplicate records with a minimum id value. So that we can select the initial records. After this, we are using the DELETE statement to delete the remaining records from the sample table.

After successfully executing the above query, all the duplicate records will be deleted. And now if we query the table, we will get the following records.

removing duplicate rows in sql server
Remaining Table Records

Delete Duplicate Rows in SQL Server using Row_Number

The ROW_NUMBER in SQL Server is a function that returns a sequential integer value to each row within a result set’s partition. And it has the following syntax.

ROW_NUMBER()   
    OVER( [ PARTITION BY value_expression , ... [ n ] ] 
            order_by_clause )

The Row_Number() function is always used with OVER and the OVER function accepts to parameters. First is the Partition By clause used to partion the data, and second is the Order By clause used to arrange columns in a specific order.

Now we can use the Row_Number() function to assign ranks to duplicate rows in the SQL Server table. And based upon that rank, we can delete that row from the table.

For this demonstration, consider the following example given below.

DELETE duplicate_records
FROM
(
SELECT *, DupRank = ROW_NUMBER() OVER(
              PARTITION BY first_name, last_name, email, Country
              ORDER BY (SELECT NULL) )
FROM SampleTable
) AS duplicate_records
WHERE DupRank > 1 

In the above example, we are using the Row_Number() function to assign a rank to all the duplicate rows in the sample table. After this, we are using the DELETE statement to delete all the records where the rank is greater than 1.

In our case, there was a total of 4 duplicate records in the sample table, so the query will delete those 4 records from the table.

using row_number to delete duplicate rows
Using Row_Number() to delete duplicate rows

Read: SQL Server Replace Function + Examples

Delete Duplicate Rows in SQL Server using CTE

We can also delete duplicate rows or records in SQL Server, by using Common Table Expression. The Common Table Expression, commonly known as CTE in SQL Server, is used to temporarily store the result set with a name. And later it can be used within SELECT, INSERT, UPDATE, or DELETE statements.

Now to define a CTE in SQL Server, we can use the following syntax.


WITH expression_name [ ( column_name [,...n] ) ] 
AS 
( CTE_query_definition )

And to view the result of a common table expression, we use a SELECT query with the CTE expression name.

SELECT [column1,column2,..] from expression_name

Now to delete the duplicate records, first, we need to select them, and for this purpose, we are going to use the ROW_NUMBER() function.

For this demonstration, consider the following query given below.

WITH CTE( first_name, last_name, email, Country, duplicate_count )
AS (SELECT first_name, last_name, email, Country, 
           ROW_NUMBER() OVER(PARTITION BY first_name, last_name, email, Country                                                       ORDER BY id) AS duplicate_count
FROM SampleTable
)

SELECT * FROM CTE

In the above query, we are using the SELECT statement as CTE_query to define selected columns. And within the SELECT statement, we are using the Row_Number() function to partition and order the result set based upon the same values. In the end, we are using the CTE expression name to query the table and it will return the following output.

using cte to delete duplicate rows
Duplicate Records

From the above result, it is clearly visible the all the duplicate rows are assigned duplicate_count more than 1. And we can easily use this property to remove the duplicate rows.

WITH CTE(
	first_name, last_name, email, Country, duplicate_count)
AS (SELECT first_name, last_name, email, Country, 
           ROW_NUMBER() OVER(PARTITION BY first_name, last_name, email, Country ORDER BY id) AS duplicate_count
FROM SampleTable
)

DELETE FROM CTE WHERE duplicate_count > 1;

After executing the above query, all the duplicate rows will be deleted (dropped) from the table. In our case, there were 4 duplicate rows in the sample table.

using cte to drop duplicate rows in sql server
Using CTE to drop duplicate rows

Read: IDENTITY_INSERT in SQL Server

Delete Duplicate Rows in SQL Server using Rank

The Rank is also a function in SQL Server, that is used to generate unique values for each row. The Rank() function is very much similar to Row_Number() function, but the Rank() function leaves a gap between the groups.

Now for this implementation, first, we will use the Rank() function to assign ranks to duplicate rows in the table. Then we will use the DELETE statement to delete the duplicate rows whose rank is greater than 1.

For demonstration, consider the following code given below.

DELETE duplicate_records
FROM
(
SELECT *, DupRank = RANK() OVER(
              PARTITION BY first_name, last_name, email, Country
              ORDER BY Id ASC )
FROM SampleTable
) AS duplicate_records
WHERE DupRank > 1 

In this example, we are using the RANK() function to assign a rank value to each record in the table based upon partition. After this, we are using the DELETE statement to remove all the records where the rank value is greater than 1.

As we are using a sample table in the example which carries 4 duplicate rows. So the query will remove those 4 duplicate rows from the table.

using rank to delete duplicate rows in sql server
Using Rank() to delete duplicate rows

Delete Duplicate Rows in SQL Server based on one column

All the methods discussed to delete duplicate rows in SQL Server till now can work perfectly for multiple as well as single duplicate columns. But for most of the examples discussed until now are based on multiple columns. So in this section, we will use one of the methods to delete duplicate rows based upon one column.

For this implementaion, consider the following query given below to delete duplicate records in SQL Server.

DELETE duplicate_records
FROM
(
SELECT *, DupRank = RANK() OVER(
              PARTITION BY first_name
              ORDER BY Id ASC )
FROM SampleTable
) AS duplicate_records
WHERE DupRank > 1 

In the above-given query, we are using the Rank() function to partition the table based upon the first_name column. And then, the Rank() function will assign rank numbers to each row in the table based upon partition. In the end, we are using the DELETE statement to remove all the rows where the rank number is greater than 1.

Delete Duplicate Rows in SQL Server based on one column
Delete Duplicate Rows based on one column

As in our case, there was a total of 4 duplicate records where the first_name values were the same. So the query will remove 4 rows from the sample table.

You may like the following sql server tutorials:

So in this SQL Server 2019 tutorial, we have learned How to delete duplicate rows in SQL Server, Different methods to delete duplicate rows in SQL Server, and we also covered the following topics.

  • How to find duplicate rows in SQL Server
  • Delete Duplicate Rows in SQL Server using Query
  • Delete Duplicate Rows in SQL Server based on one column
  • Delete Duplicate Rows in SQL Server using CTE
  • Delete Duplicate Rows in SQL Server using Row_Number
  • Delete Duplicate Rows in SQL Server using Rank