在sql中我们经常会碰到有重复的一些数据,下面我来介绍在mysql中删除重复记录的多种方法,有需要的朋友可参考参考.
删除重复记录方法一:
1.新建一个临时表,代码如下:
create table tmp as select * from youtable group by name(name为不希望有重复的列)
2.删除原来的表,代码如下:drop table youtable
3.重命名表,代码如下:
alter table tmp rename youtable
但是这个方法有个问题,由临时表转变过来的最终表,其表结构会和原来的不一致,需要手工更改。这个问题,待解决。
删除重复记录方法二:
1.新建一个临时表,代码如下:
CREATE TABLE tmp AS SELECT * FROM youtable GROUP BY name(name为不希望有重复的列)
2.清空原来的表,代码如下:
TRUNCATE TABLE youtable
3.把临时表插入到youtable,代码如下:
INSERT INTO tablename SELECT * FROM temp
4.删除临时表,代码如下:
DROP TABLE temp
删除重复记录方法三:代码如下:
delete table where ID not in(select min(ID) from table group by name(name:重复的字段))
删除重复记录方法四:
具体实现如下:
- TableCreateTable
- --------------------------------------------------------------------
- users_groupsCREATETABLE`users_groups`(
- `id`int(10)unsignedNOTNULLAUTO_INCREMENT,
- `uid`int(11)NOTNULL,
- `gid`int(11)NOTNULL,
- PRIMARYKEY(`id`)
- )ENGINE=InnoDBAUTO_INCREMENT=15DEFAULTCHARSET=utf8
users_groups.txt内容:
- 1,11,502
- 2,107,502
- 3,100,503
- 4,110,501
- 5,112,501
- 6,104,502
- 7,100,502
- 8,100,501
- 9,102,501
- 10,104,502
- 11,100,502
- 12,100,501
- 13,102,501
- 14,110,501
- mysql>loaddatainfile'c:\users_groups.txt'intotableusers_groupsfields
- terminatedby','linesterminatedby'n';
- QueryOK,14rowsaffected(0.05sec)
- Records:14Deleted:0Skipped:0Warnings:0
- mysql>select*fromusers_groups;
- queryresult(14records)
- iduidgid
- 111502
- 2107502
- 3100503
- 4110501
- 5112501
- 6104502
- 7100502
- 8100501
- 9102501
- 10104502
- 11100502
- 12100501
- 13102501
- 14110501
- 14rowsinset(0.00sec)
根据一位兄弟的建议修改,代码如下:
- mysql>createtemporarytabletmp_wrapselect*fromusers_groupsgroupbyuidhavingcount(1)>=1;
- QueryOK,7rowsaffected(0.11sec)
- Records:7Duplicates:0Warnings:0
- mysql>truncatetableusers_groups;
- QueryOK,14rowsaffected(0.03sec)
- mysql>insertintousers_groupsselect*fromtmp_wrap;
- QueryOK,7rowsaffected(0.03sec)
- Records:7Duplicates:0Warnings:0
- mysql>select*fromusers_groups;
- --phpfensi.com
- queryresult(7records)
- iduidgid
- 111502
- 2107502
- 3100503
- 4110501
- 5112501
- 6104502
- 9102501
- mysql>droptabletmp_wrap;
- QueryOK,0rowsaffected(0.05sec)
2、还有一个很精简的办法.
查找重复的,并且除掉最小的那个,代码如下:
- deleteusers_groupsasafromusers_groupsasa,
- (
- select*,min(id)fromusers_groupsgroupbyuidhavingcount(1)>1
- )asb
- wherea.uid=b.uidanda.id>b.id;
- (7row(s)affected)
- (0mstaken)
- queryresult(7records)
- iduidgid
- 111502
- 2107502
- 3100503
- 4110501
- 5112501
- 6104502
- 9102501
3、现在来看一下这两个办法的效率,运行一下以下SQL 语句,代码如下:
- createindexf_uidonusers_groups(uid);
- explainselect*fromusers_groupsgroupbyuidhavingcount(1)>1unionall
- select*fromusers_groupsgroupbyuidhavingcount(1)=1;
- explainselect*fromusers_groupsasa,
- (
- select*,min(id)fromusers_groupsgroupbyuidhavingcount(1)>1
- )asb
- wherea.uid=b.uidanda.id>b.id;
- queryresult(3records)
- idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
- 1PRIMARYusers_groupsindex(NULL)f_uid4(NULL)14
- 2UNIONusers_groupsindex(NULL)f_uid4(NULL)14
- (NULL)UNIONRESULT<union1,2>ALL(NULL)(NULL)(NULL)(NULL)(NULL)
- queryresult(3records)
- idselect_typetabletypepossible_keyskeykey_lenrefrowsExtra
- 1PRIMARY<derived2>ALL(NULL)(NULL)(NULL)(NULL)4
- 1PRIMARYarefPRIMARY,f_uidf_uid4b.uid1Usingwhere
- 2DERIVEDusers_groupsindex(NULL)f_uid4(NULL)14
很明显的第二个比第一个扫描的函数要少,当没有创建表或创建索引权限的时候,创建一个新表,然后将原表中不重复的数据插入新表,代码如下:
- mysql>createtabledemo_newasselect*fromdemogroupbysite;
- QueryOK,3rowsaffected(0.19sec)
- Records:3Duplicates:0Warnings:0
- mysql>showtables;
- +----------------+
- |Tables_in_test|
- +----------------+
- |demo|
- |demo_new|
- +----------------+
- 2rowsinset(0.00sec)
- mysql>select*fromdemoorderbyid;
- +----+------------------------+
- |id|site|
- +----+------------------------+
- |1|http://www.phpfensi.com |
- |2|http://phpfensi.com |
- |3|http://www.phpfensi.com |
- |4|http://www.phpfensi.com |
- |5|http://www.phpfensi.com |
- +----+------------------------+
- 5rowsinset(0.00sec)
- mysql>select*fromdemo_neworderbyid;
- +----+------------------------+
- |id|site|
- +----+------------------------+
- |1|http://www.phpfensi.com |
- |2|http://phpfensi.com |
- |3|http://www.phpfensi.com |
- +----+------------------------+
- 3rowsinset(0.00sec)
然后将原表备份,将新表重命名为当前表,代码如下:
- mysql>renametabledemotodemo_old,demo_newtodemo;
- QueryOK,0rowsaffected(0.04sec)
- mysql>showtables;
- +----------------+
- |Tables_in_test|
- +----------------+
- |demo|
- |demo_old|
- +----------------+
- 2rowsinset(0.00sec)
- mysql>select*fromdemoorderbyid;
- +----+------------------------+
- |id|site|
- +----+------------------------+
- |1|http://www.phpfensi.com |
- |2|http://phpfensi.com |
- |3|http://www.phpfensi.com |
- +----+------------------------+
- 3rowsinset(0.00sec)