Mysql中如何查找并删除重复数据

sunshine技术博客 sunshine技术博客

在数据库中我们经常会碰到重复数据了,下面我来介绍利用相关的mysql语句实现查找重复记录并且实现删除重复记录的sql语句.

考试系统中做了一个用户导入试题功能,导致用户导入了很多重复的试题,我需要查询及删除一下重复的记录,于是有了这篇文章.

(一)单个字段

1、查找表中多余的重复记录,根据(question_title)字段来判断,代码如下:

select * from questions where question_title in (select question_title from people group by question_title having count(question_title) > 1)

2、删除表中多余的重复记录,根据(question_title)字段来判断,只留有一个记录,代码如下:

  1. deletefromquestions
  2. wherepeopleIdin(selectpeopleIdfrompeoplegroupbypeopleIdhavingcount(question_title)>1)
  3. andmin(id)notin(selectquestion_idfromquestionsgroupbyquestion_titlehavingcount(question_title)>1)

(二)多个字段

删除表中多余的重复记录(多个字段),只留有rowid最小的记录,代码如下:

  1. DELETEFROMquestionsWHERE(questions_title,questions_scope)IN(SELECTquestions_title,questions_scopeFROMquestionsGROUPBYquestions_title,questions_scopeHAVINGCOUNT(*)>1)ANDquestion_idNOTIN(SELECTMIN(question_id)FROMquestionsGROUPBYquestions_scope,questions_titleHAVINGCOUNT(*)>1)

用上述语句无法删除,创建了临时表才删的,求各位达人解释一下,代码如下:

  1. CREATETABLEtmpASSELECTquestion_idFROMquestionsWHERE(questions_title,questions_scope)IN(SELECTquestions_title,questions_scopeFROMquestionsGROUPBYquestions_title,questions_scopeHAVINGCOUNT(*)>1)ANDquestion_idNOTIN(SELECTMIN(question_id)FROMquestionsGROUPBYquestions_scope,questions_titleHAVINGCOUNT(*)>1);
  2. DELETEFROMquestionsWHEREquestion_idIN(SELECTquestion_idFROMtmp);
  3. DROPTABLEtmp;

(三),代码如下:

  1. declare@maxinteger,@idinteger
  2. declarecur_rowscursorlocalforselect主字段,count(*)from表名groupby主字段havingcount(*)>;1
  3. opencur_rows
  4. fetchcur_rowsinto@id,@max
  5. while@@fetch_status=0
  6. begin
  7. select@max=@max-1
  8. setrowcount@max
  9. deletefrom表名where主字段=@id
  10. fetchcur_rowsinto@id,@max
  11. end
  12. closecur_rows
  13. setrowcount0

上面讲到了很多,下面我们一起来看实例删除重复记录实例.

例1,表中有主键(可唯一标识的字段),且该字段为数字类型,代码如下:

  1. /*表结构*/
  2. DROPTABLEIFEXISTS`t1`;
  3. CREATETABLEIFNOTEXISTS`t1`(
  4. `id`INT(1)NOTNULLAUTO_INCREMENT,
  5. `name`VARCHAR(20)NOTNULL,
  6. `add`VARCHAR(20)NOTNULL,
  7. PRIMARYKEY(`id`)
  8. )Engine=InnoDB;
  9. /*插入测试数据*/
  10. INSERTINTO`t1`(`name`,`add`)VALUES
  11. ('abc',"123"),
  12. ('abc',"123"),
  13. ('abc',"321"),
  14. ('abc',"123"),
  15. ('xzy',"123"),
  16. ('xzy',"456"),
  17. ('xzy',"456"),
  18. ('xzy',"456"),
  19. ('xzy',"789"),
  20. ('xzy',"987"),
  21. ('xzy',"789"),
  22. ('ijk',"147"),
  23. ('ijk',"147"),
  24. ('ijk',"852"),
  25. ('opq',"852"),
  26. ('opq',"963"),
  27. ('opq',"741"),
  28. ('tpk',"741"),
  29. ('tpk',"963"),
  30. ('tpk',"963"),
  31. ('wer',"546"),
  32. ('wer',"546"),
  33. ('once',"546");
  34. SELECT*FROM`t1`;
  35. +----+------+-----+
  36. |id|name|add|
  37. +----+------+-----+
  38. |1|abc|123|
  39. |2|abc|123|
  40. |3|abc|321|
  41. |4|abc|123|
  42. |5|xzy|123|
  43. |6|xzy|456|
  44. |7|xzy|456|
  45. |8|xzy|456|
  46. |9|xzy|789|
  47. |10|xzy|987|
  48. |11|xzy|789|
  49. |12|ijk|147|
  50. |13|ijk|147|
  51. |14|ijk|852|
  52. |15|opq|852|
  53. |16|opq|963|
  54. |17|opq|741|
  55. |18|tpk|741|
  56. |19|tpk|963|
  57. |20|tpk|963|
  58. |21|wer|546|
  59. |22|wer|546|
  60. |23|once|546|
  61. +----+------+-----+
  62. rowsinset(0.00sec)

查找id最小的重复数据(只查找id字段),代码如下:

  1. /*查找id最小的重复数据(只查找id字段)*/
  2. SELECTDISTINCTMIN(`id`)AS`id`
  3. FROM`t1`
  4. GROUPBY`name`,`add`
  5. HAVINGCOUNT(1)>1;
  6. +------+
  7. |id|
  8. +------+
  9. |1|
  10. |12|
  11. |19|
  12. |21|
  13. |6|
  14. |9|
  15. +------+
  16. rowsinset(0.00sec)

查找所有重复数据,代码如下:

  1. /*查找所有重复数据*/
  2. SELECT`t1`.*
  3. FROM`t1`,(
  4. SELECT`name`,`add`
  5. FROM`t1`
  6. GROUPBY`name`,`add`
  7. HAVINGCOUNT(1)>1
  8. )AS`t2`
  9. WHERE`t1`.`name`=`t2`.`name`
  10. AND`t1`.`add`=`t2`.`add`;
  11. +----+------+-----+
  12. |id|name|add|
  13. +----+------+-----+
  14. |1|abc|123|
  15. |2|abc|123|
  16. |4|abc|123|
  17. |6|xzy|456|
  18. |7|xzy|456|
  19. |8|xzy|456|
  20. |9|xzy|789|
  21. |11|xzy|789|
  22. |12|ijk|147|
  23. |13|ijk|147|
  24. |19|tpk|963|
  25. |20|tpk|963|
  26. |21|wer|546|
  27. |22|wer|546|
  28. +----+------+-----+
  29. rowsinset(0.00sec)

查找除id最小的数据外的重复数据,代码如下:

  1. /*查找除id最小的数据外的重复数据*/
  2. SELECT`t1`.*
  3. FROM`t1`,(
  4. SELECTDISTINCTMIN(`id`)AS`id`,`name`,`add`
  5. FROM`t1`
  6. GROUPBY`name`,`add`
  7. HAVINGCOUNT(1)>1
  8. )AS`t2`
  9. WHERE`t1`.`name`=`t2`.`name`
  10. AND`t1`.`add`=`t2`.`add`
  11. AND`t1`.`id`<>`t2`.`id`;
  12. +----+------+-----+
  13. |id|name|add|
  14. +----+------+-----+
  15. |2|abc|123|
  16. |4|abc|123|
  17. |7|xzy|456|
  18. |8|xzy|456|
  19. |11|xzy|789|
  20. |13|ijk|147|
  21. |20|tpk|963|
  22. |22|wer|546|
  23. +----+------+-----+
  24. rowsinset(0.00sec)

例2,表中没有主键(可唯一标识的字段),或者主键并非数字类型(也可以删除重复数据,但效率上肯定比较慢.

例2测试数据,代码如下:

  1. /*表结构*/
  2. DROPTABLEIFEXISTS`noid`;
  3. CREATETABLEIFNOTEXISTS`noid`(
  4. `pk`VARCHAR(20)NOTNULLCOMMENT'字符串主键',
  5. `name`VARCHAR(20)NOTNULL,
  6. `add`VARCHAR(20)NOTNULL,
  7. PRIMARYKEY(`pk`)
  8. )Engine=InnoDB;
  9. /*测试数据,与上例一样的测试数据,只是主键变为字符串形式*/
  10. INSERTINTO`noid`(`pk`,`name`,`add`)VALUES
  11. ('a','abc',"123"),
  12. ('b','abc',"123"),
  13. ('c','abc',"321"),
  14. ('d','abc',"123"),
  15. ('e','xzy',"123"),
  16. ('f','xzy',"456"),
  17. ('g','xzy',"456"),
  18. ('h','xzy',"456"),
  19. ('i','xzy',"789"),
  20. ('j','xzy',"987"),
  21. ('k','xzy',"789"),
  22. ('l','ijk',"147"),
  23. ('m','ijk',"147"),
  24. ('n','ijk',"852"),
  25. ('o','opq',"852"),
  26. ('p','opq',"963"),
  27. ('q','opq',"741"),
  28. ('r','tpk',"741"),
  29. ('s','tpk',"963"),
  30. ('t','tpk',"963"),
  31. ('u','wer',"546"),
  32. ('v','wer',"546"),
  33. ('w','once',"546");
  34. SELECT*FROM`noid`;
  35. +----+------+-----+
  36. |pk|name|add|
  37. +----+------+-----+
  38. |a|abc|123|
  39. |b|abc|123|
  40. |c|abc|321|
  41. |d|abc|123|
  42. |e|xzy|123|
  43. |f|xzy|456|
  44. |g|xzy|456|
  45. |h|xzy|456|
  46. |i|xzy|789|
  47. |j|xzy|987|
  48. |k|xzy|789|
  49. |l|ijk|147|
  50. |m|ijk|147|
  51. |n|ijk|852|
  52. |o|opq|852|
  53. |p|opq|963|
  54. |q|opq|741|
  55. |r|tpk|741|
  56. |s|tpk|963|
  57. |t|tpk|963|
  58. |u|wer|546|
  59. |v|wer|546|
  60. |w|once|546|
  61. +----+------+-----+
  62. rowsinset(0.00sec)

为表添加自增长的id字段,代码如下:

  1. /*为表添加自增长的id字段*/
  2. ALTERTABLE`noid`ADD`id`INT(1)NOTNULLAUTO_INCREMENT,ADDINDEX`id`(`id`);
  3. QueryOK,23rowsaffected(0.16sec)
  4. Records:23Duplicates:0Warnings:0
  5. SELECT*FROM`noid`;
  6. +----+------+-----+----+
  7. |pk|name|add|id|
  8. +----+------+-----+----+
  9. |a|abc|123|1|
  10. |b|abc|123|2|
  11. |c|abc|321|3|
  12. |d|abc|123|4|
  13. |e|xzy|123|5|
  14. |f|xzy|456|6|
  15. |g|xzy|456|7|
  16. |h|xzy|456|8|
  17. |i|xzy|789|9|
  18. |j|xzy|987|10|
  19. |k|xzy|789|11|
  20. |l|ijk|147|12|
  21. |m|ijk|147|13|
  22. |n|ijk|852|14|
  23. |o|opq|852|15|
  24. |p|opq|963|16|
  25. |q|opq|741|17|
  26. |r|tpk|741|18|
  27. |s|tpk|963|19|
  28. |t|tpk|963|20|
  29. |u|wer|546|21|
  30. |v|wer|546|22|
  31. |w|once|546|23|
  32. +----+------+-----+----+
  33. rowsinset(0.00sec)

MySQL中必须是有索引的字段才可以使用AUTO_INCREMENT,删除重复数据与上例一样,记得删除完数据把id字段也删除了,删除重复数据,只保留一条数据,代码如下:

  1. /*删除重复数据,只保留一条数据*/
  2. DELETEFROM`noid`
  3. USING`noid`,(
  4. SELECTDISTINCTMIN(`id`)AS`id`,`name`,`add`
  5. FROM`noid`
  6. GROUPBY`name`,`add`
  7. HAVINGCOUNT(1)>1
  8. )AS`t2`
  9. WHERE`noid`.`name`=`t2`.`name`
  10. AND`noid`.`add`=`t2`.`add`
  11. AND`noid`.`id`<>`t2`.`id`;
  12. QueryOK,8rowsaffected(0.05sec)
  13. /*删除id字段*/
  14. ALTERTABLE`noid`DROP`id`;
  15. QueryOK,15rowsaffected(0.16sec)
  16. Records:15Duplicates:0Warnings:0
  17. SELECT*FROM`noid`;
  18. +----+------+-----+
  19. |pk|name|add|
  20. +----+------+-----+
  21. |a|abc|123|
  22. |c|abc|321|
  23. |e|xzy|123|
  24. |f|xzy|456|
  25. |i|xzy|789|
  26. |j|xzy|987|
  27. |l|ijk|147|
  28. |n|ijk|852|
  29. |o|opq|852|
  30. |p|opq|963|
  31. |q|opq|741|
  32. |r|tpk|741|
  33. |s|tpk|963|
  34. |u|wer|546|//phpfensi.com
  35. |w|once|546|
  36. +----+------+-----+
  37. rowsinset(0.00sec)

相关广告
  • Mysql中如何查找并删除重复数据 Mysql中如何查找并删除重复数据 Mysql中如何查找并删除重复数据
相关阅读

Mysql中如何查找并删除重复数据

2019/10/10 17:36:42 | 谷歌SEO算法 | 微信公众号