Mysql中如何查找并删除重复数据
在数据库中我们经常会碰到重复数据了,下面我来介绍利用相关的mysql语句实现查找重复记录并且实现删除重复记录的sql语句.
考试系统中做了一个用户导入试题功能,导致用户导入了很多重复的试题,我需要查询及删除一下重复的记录,于是有了这篇文章.
(一)单个字段
1、查找表中多余的重复记录,根据(question_title)字段来判断,代码如下:
select * from questions where question_title in (select question_title from people group by question_title having count(question_title) > 1)
2、删除表中多余的重复记录,根据(question_title)字段来判断,只留有一个记录,代码如下:
- deletefromquestions
- wherepeopleIdin(selectpeopleIdfrompeoplegroupbypeopleIdhavingcount(question_title)>1)
- andmin(id)notin(selectquestion_idfromquestionsgroupbyquestion_titlehavingcount(question_title)>1)
(二)多个字段
删除表中多余的重复记录(多个字段),只留有rowid最小的记录,代码如下:
- DELETEFROMquestionsWHERE(questions_title,questions_scope)IN(SELECTquestions_title,questions_scopeFROMquestionsGROUPBYquestions_title,questions_scopeHAVINGCOUNT(*)>1)ANDquestion_idNOTIN(SELECTMIN(question_id)FROMquestionsGROUPBYquestions_scope,questions_titleHAVINGCOUNT(*)>1)
用上述语句无法删除,创建了临时表才删的,求各位达人解释一下,代码如下:
- CREATETABLEtmpASSELECTquestion_idFROMquestionsWHERE(questions_title,questions_scope)IN(SELECTquestions_title,questions_scopeFROMquestionsGROUPBYquestions_title,questions_scopeHAVINGCOUNT(*)>1)ANDquestion_idNOTIN(SELECTMIN(question_id)FROMquestionsGROUPBYquestions_scope,questions_titleHAVINGCOUNT(*)>1);
- DELETEFROMquestionsWHEREquestion_idIN(SELECTquestion_idFROMtmp);
- DROPTABLEtmp;
(三),代码如下:
- declare@maxinteger,@idinteger
- declarecur_rowscursorlocalforselect主字段,count(*)from表名groupby主字段havingcount(*)>;1
- opencur_rows
- fetchcur_rowsinto@id,@max
- while@@fetch_status=0
- begin
- select@max=@max-1
- setrowcount@max
- deletefrom表名where主字段=@id
- fetchcur_rowsinto@id,@max
- end
- closecur_rows
- setrowcount0
上面讲到了很多,下面我们一起来看实例删除重复记录实例.
例1,表中有主键(可唯一标识的字段),且该字段为数字类型,代码如下:
- /*表结构*/
- DROPTABLEIFEXISTS`t1`;
- CREATETABLEIFNOTEXISTS`t1`(
- `id`INT(1)NOTNULLAUTO_INCREMENT,
- `name`VARCHAR(20)NOTNULL,
- `add`VARCHAR(20)NOTNULL,
- PRIMARYKEY(`id`)
- )Engine=InnoDB;
- /*插入测试数据*/
- INSERTINTO`t1`(`name`,`add`)VALUES
- ('abc',"123"),
- ('abc',"123"),
- ('abc',"321"),
- ('abc',"123"),
- ('xzy',"123"),
- ('xzy',"456"),
- ('xzy',"456"),
- ('xzy',"456"),
- ('xzy',"789"),
- ('xzy',"987"),
- ('xzy',"789"),
- ('ijk',"147"),
- ('ijk',"147"),
- ('ijk',"852"),
- ('opq',"852"),
- ('opq',"963"),
- ('opq',"741"),
- ('tpk',"741"),
- ('tpk',"963"),
- ('tpk',"963"),
- ('wer',"546"),
- ('wer',"546"),
- ('once',"546");
- SELECT*FROM`t1`;
- +----+------+-----+
- |id|name|add|
- +----+------+-----+
- |1|abc|123|
- |2|abc|123|
- |3|abc|321|
- |4|abc|123|
- |5|xzy|123|
- |6|xzy|456|
- |7|xzy|456|
- |8|xzy|456|
- |9|xzy|789|
- |10|xzy|987|
- |11|xzy|789|
- |12|ijk|147|
- |13|ijk|147|
- |14|ijk|852|
- |15|opq|852|
- |16|opq|963|
- |17|opq|741|
- |18|tpk|741|
- |19|tpk|963|
- |20|tpk|963|
- |21|wer|546|
- |22|wer|546|
- |23|once|546|
- +----+------+-----+
- rowsinset(0.00sec)
查找id最小的重复数据(只查找id字段),代码如下:
- /*查找id最小的重复数据(只查找id字段)*/
- SELECTDISTINCTMIN(`id`)AS`id`
- FROM`t1`
- GROUPBY`name`,`add`
- HAVINGCOUNT(1)>1;
- +------+
- |id|
- +------+
- |1|
- |12|
- |19|
- |21|
- |6|
- |9|
- +------+
- rowsinset(0.00sec)
查找所有重复数据,代码如下:
- /*查找所有重复数据*/
- SELECT`t1`.*
- FROM`t1`,(
- SELECT`name`,`add`
- FROM`t1`
- GROUPBY`name`,`add`
- HAVINGCOUNT(1)>1
- )AS`t2`
- WHERE`t1`.`name`=`t2`.`name`
- AND`t1`.`add`=`t2`.`add`;
- +----+------+-----+
- |id|name|add|
- +----+------+-----+
- |1|abc|123|
- |2|abc|123|
- |4|abc|123|
- |6|xzy|456|
- |7|xzy|456|
- |8|xzy|456|
- |9|xzy|789|
- |11|xzy|789|
- |12|ijk|147|
- |13|ijk|147|
- |19|tpk|963|
- |20|tpk|963|
- |21|wer|546|
- |22|wer|546|
- +----+------+-----+
- rowsinset(0.00sec)
查找除id最小的数据外的重复数据,代码如下:
- /*查找除id最小的数据外的重复数据*/
- SELECT`t1`.*
- FROM`t1`,(
- SELECTDISTINCTMIN(`id`)AS`id`,`name`,`add`
- FROM`t1`
- GROUPBY`name`,`add`
- HAVINGCOUNT(1)>1
- )AS`t2`
- WHERE`t1`.`name`=`t2`.`name`
- AND`t1`.`add`=`t2`.`add`
- AND`t1`.`id`<>`t2`.`id`;
- +----+------+-----+
- |id|name|add|
- +----+------+-----+
- |2|abc|123|
- |4|abc|123|
- |7|xzy|456|
- |8|xzy|456|
- |11|xzy|789|
- |13|ijk|147|
- |20|tpk|963|
- |22|wer|546|
- +----+------+-----+
- rowsinset(0.00sec)
例2,表中没有主键(可唯一标识的字段),或者主键并非数字类型(也可以删除重复数据,但效率上肯定比较慢.
例2测试数据,代码如下:
- /*表结构*/
- DROPTABLEIFEXISTS`noid`;
- CREATETABLEIFNOTEXISTS`noid`(
- `pk`VARCHAR(20)NOTNULLCOMMENT'字符串主键',
- `name`VARCHAR(20)NOTNULL,
- `add`VARCHAR(20)NOTNULL,
- PRIMARYKEY(`pk`)
- )Engine=InnoDB;
- /*测试数据,与上例一样的测试数据,只是主键变为字符串形式*/
- INSERTINTO`noid`(`pk`,`name`,`add`)VALUES
- ('a','abc',"123"),
- ('b','abc',"123"),
- ('c','abc',"321"),
- ('d','abc',"123"),
- ('e','xzy',"123"),
- ('f','xzy',"456"),
- ('g','xzy',"456"),
- ('h','xzy',"456"),
- ('i','xzy',"789"),
- ('j','xzy',"987"),
- ('k','xzy',"789"),
- ('l','ijk',"147"),
- ('m','ijk',"147"),
- ('n','ijk',"852"),
- ('o','opq',"852"),
- ('p','opq',"963"),
- ('q','opq',"741"),
- ('r','tpk',"741"),
- ('s','tpk',"963"),
- ('t','tpk',"963"),
- ('u','wer',"546"),
- ('v','wer',"546"),
- ('w','once',"546");
- SELECT*FROM`noid`;
- +----+------+-----+
- |pk|name|add|
- +----+------+-----+
- |a|abc|123|
- |b|abc|123|
- |c|abc|321|
- |d|abc|123|
- |e|xzy|123|
- |f|xzy|456|
- |g|xzy|456|
- |h|xzy|456|
- |i|xzy|789|
- |j|xzy|987|
- |k|xzy|789|
- |l|ijk|147|
- |m|ijk|147|
- |n|ijk|852|
- |o|opq|852|
- |p|opq|963|
- |q|opq|741|
- |r|tpk|741|
- |s|tpk|963|
- |t|tpk|963|
- |u|wer|546|
- |v|wer|546|
- |w|once|546|
- +----+------+-----+
- rowsinset(0.00sec)
为表添加自增长的id字段,代码如下:
- /*为表添加自增长的id字段*/
- ALTERTABLE`noid`ADD`id`INT(1)NOTNULLAUTO_INCREMENT,ADDINDEX`id`(`id`);
- QueryOK,23rowsaffected(0.16sec)
- Records:23Duplicates:0Warnings:0
- SELECT*FROM`noid`;
- +----+------+-----+----+
- |pk|name|add|id|
- +----+------+-----+----+
- |a|abc|123|1|
- |b|abc|123|2|
- |c|abc|321|3|
- |d|abc|123|4|
- |e|xzy|123|5|
- |f|xzy|456|6|
- |g|xzy|456|7|
- |h|xzy|456|8|
- |i|xzy|789|9|
- |j|xzy|987|10|
- |k|xzy|789|11|
- |l|ijk|147|12|
- |m|ijk|147|13|
- |n|ijk|852|14|
- |o|opq|852|15|
- |p|opq|963|16|
- |q|opq|741|17|
- |r|tpk|741|18|
- |s|tpk|963|19|
- |t|tpk|963|20|
- |u|wer|546|21|
- |v|wer|546|22|
- |w|once|546|23|
- +----+------+-----+----+
- rowsinset(0.00sec)
MySQL中必须是有索引的字段才可以使用AUTO_INCREMENT,删除重复数据与上例一样,记得删除完数据把id字段也删除了,删除重复数据,只保留一条数据,代码如下:
- /*删除重复数据,只保留一条数据*/
- DELETEFROM`noid`
- USING`noid`,(
- SELECTDISTINCTMIN(`id`)AS`id`,`name`,`add`
- FROM`noid`
- GROUPBY`name`,`add`
- HAVINGCOUNT(1)>1
- )AS`t2`
- WHERE`noid`.`name`=`t2`.`name`
- AND`noid`.`add`=`t2`.`add`
- AND`noid`.`id`<>`t2`.`id`;
- QueryOK,8rowsaffected(0.05sec)
- /*删除id字段*/
- ALTERTABLE`noid`DROP`id`;
- QueryOK,15rowsaffected(0.16sec)
- Records:15Duplicates:0Warnings:0
- SELECT*FROM`noid`;
- +----+------+-----+
- |pk|name|add|
- +----+------+-----+
- |a|abc|123|
- |c|abc|321|
- |e|xzy|123|
- |f|xzy|456|
- |i|xzy|789|
- |j|xzy|987|
- |l|ijk|147|
- |n|ijk|852|
- |o|opq|852|
- |p|opq|963|
- |q|opq|741|
- |r|tpk|741|
- |s|tpk|963|
- |u|wer|546|//phpfensi.com
- |w|once|546|
- +----+------+-----+
- rowsinset(0.00sec)
热门评论