Duplicate Rows in an SQL Database

There is often a need to find rows that have duplicate values. You can use this query to find duplicate rows in an SQL database when they have duplicate value in a single column or multiple columns, depending on your need.

Here’s an example of a simple query:

SELECT firstname, COUNT(*) occurrences
FROM [users]
GROUP BY
	firstname
HAVING 
	COUNT(*) > 1

This will give you a simple list that shows you any duplicate firstname values in the users table and the count of how many duplicates there are.

So now you know that there are duplicates, how can you make use of this data? In order to do something with the data, you’ll need some more information, such as the primary key for the rows.

You can’t add that to the query columns since that would then make every row unique and they’d no longer show up in your resultset as duplicate. What you need to do is wrap this query into a subquery that then pulls these records out as a predicate.

You can read more about how the GROUP BY works on Microsoft’s site here:

https://docs.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver16

  SELECT * FROM [users] WHERE firstname IN (
	SELECT firstname FROM (
		SELECT firstname, COUNT(*) occurrences
		FROM [users]
		GROUP BY
			firstname
		HAVING 
			COUNT(*) > 1) a
	)

Your original duplicate query is buried in there from lines 3 to 8. We have to alias that query in order to use it as a resultset. I simply called it ‘a’ for this example. The second inner SELECT is needed to knock off the count column in order for it to work in the WHERE IN clause of the outer SELECT statement. This now gives us a list of very actionable data.