Take the tsql query below:
DECLARE @table TABLE(data VARCHAR(20))
INSERT INTO @table VALUES ('not duplicate row')
INSERT INTO @table VALUES ('duplicate row')
INSERT INTO @table VALUES ('duplicate row')
INSERT INTO @table VALUES ('second duplicate row')
INSERT INTO @table VALUES ('second duplicate row')
SELECT data
INTO #duplicates
FROM @table
GROUP BY data
HAVING COUNT(*) > 1
-- delete all rows that are duplicated
DELETE FROM @table
FROM @table o INNER JOIN #duplicates d
ON d.data = o.data
-- insert one row for every duplicate set
INSERT INTO @table(data)
SELECT data
FROM #duplicates
I understand what it is doing, but the last part of logic (after --insert one row for every duplicate set), doesn't make sense. Where we have the set of code for --delete all rows that are duplicated, that gets rid of the duplicates so what's the part of the last section?
This query was found here
Thanks
-
Where we have the set of code for --delete all rows that are duplicated, that gets rid of the duplicates so what's the part of the last section?
First, it deletes all rows that ever had duplicates. That is, all rows, and original also. In the case above, only one row (
'not duplicate row') will remain in the table afterDELETE. All four other rows will be deleted.Then is populates the table with the deleted rows again, but now the duplicates are removed.
This is not the best way to delete duplicates.
The best way is:
WITH q AS ( SELECT data, ROW_NUMBER() OVER (PARTITION BY data ORDER BY data) AS rn FROM @table ) DELETE FROM q WHERE rn > 1Quassnoi : You mean, select all instances of rows that have duplicate values? -
The delete call deletes all matching records.
Because all duplicated rows have been deleted the last piece reinserts a single row.
-
Create table Test (Test1 int not null , Test2 varchar(10) null ) Insert Into Test Select 12, 'abc' UNion All Select 13 , 'def' Insert Into Test Select 12, 'abc' UNion All Select 13 , 'def' Select * From Test WITH t1 AS (SELECT ROW_NUMBER ( ) OVER ( PARTITION BY test1, test2 ORDER BY test1) AS RNUM FROM Test ) DELETE FROM t1 WHERE RNUM > 1
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.