在mysql使用时到超大千万级数据量时我们大多会考虑到分表查询,分库查询,下面我们来介绍一下关于分表查询时的优化.
我们的项目中有好多不等于的情况,今天写这篇文章简单的分析一下怎么个优化法,这里的分表逻辑是根据t_group表的user_name组的个数来分的.
因为这种情况单独user_name字段上的索引就属于烂索引,起不了啥名明显的效果.
1、试验PROCEDURE,代码如下:
- DELIMITER$$
- DROPPROCEDURE`t_girl`.`sp_split_table`$$
- CREATEPROCEDURE`t_girl`.`sp_split_table`()
- BEGIN
- declaredoneintdefault0;
- declarev_user_namevarchar(20)default'';
- declarev_table_namevarchar(64)default'';
- --Getallusers'name.
- declarecur1cursorforselectuser_namefromt_groupgroupbyuser_name;
- --Dealwitherrororwarnings.
- declarecontinuehandlerfor1329setdone=1;
- --Opencursor.
- opencur1;
- whiledone<>1
- do
- fetchcur1intov_user_name;
- ifnotdonethen
- --Gettablename.
- setv_table_name=concat('t_group_',v_user_name);
- --Createnewextratable.
- set@stmt=concat('createtable',v_table_name,'liket_group');
- prepares1from@stmt;
- executes1;
- dropprepares1;
- --Loaddataintoit.
- set@stmt=concat('insertinto',v_table_name,'select*fromt_groupwhereuser_name=''',v_user_name,'''');
- prepares1from@stmt;
- executes1;
- dropprepares1;
- endif;
- endwhile;
- --Closecursor.
- closecur1;
- --Freevariablefrommemory.
- set@stmt=NULL;
- END$$
- DELIMITER;
2、试验表,我们用一个有一千万条记录的表来做测试,代码如下:
- mysql>selectcount(*)fromt_group;
- +----------+
- |count(*)|
- +----------+
- |10388608|
- +----------+
- 1rowinset(0.00sec)
表结构,代码如下:
- mysql>desct_group;
- +-------------+------------------+------+-----+-------------------+----------------+
- |Field|Type|Null|Key|Default|Extra|
- +-------------+------------------+------+-----+-------------------+----------------+
- |id|int(10)unsigned|NO|PRI|NULL|auto_increment|
- |money|decimal(10,2)|NO||||
- |user_name|varchar(20)|NO|MUL|||
- |create_time|timestamp|NO||CURRENT_TIMESTAMP||
- +-------------+------------------+------+-----+-------------------+----------------+
- 4rowsinset(0.00sec)
索引情况.
- mysql>showindexfromt_group;
- +---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
- |Table|Non_unique|Key_name|Seq_in_index|Column_name|Collation|Cardinality|Sub_part|Packed|Null|Index_type|Comment|
- +---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
- |t_group|0|PRIMARY|1|id|A|10388608|NULL|NULL||BTREE||
- |t_group|1|idx_user_name|1|user_name|A|8|NULL|NULL||BTREE||
- |t_group|1|idx_combination1|1|user_name|A|8|NULL|NULL||BTREE||
- |t_group|1|idx_combination1|2|money|A|3776|NULL|NULL||BTREE||
- +---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
- 4rowsinset(0.00sec)--phpfensi.com
PS:idx_combination1 这个索引是必须的,因为要对user_name来GROUP BY,此时属于松散索引扫描,当然完了后你可以干掉她.
idx_user_name 这个索引是为了加快单独执行constant这种类型的查询.
我们要根据用户名来分表,代码如下:
- mysql>selectuser_namefromt_groupwhere1groupbyuser_name;
- +-----------+
- |user_name|
- +-----------+
- |david|
- |leo|
- |livia|
- |lucy|
- |sarah|
- |simon|
- |sony|
- |sunny|
- +-----------+
- 8rowsinset(0.00sec)
所以结果表应该是这样的,代码如下:
- mysql>showtableslike't_group_%';
- +------------------------------+
- |Tables_in_t_girl(t_group_%)|
- +------------------------------+
- |t_group_david|
- |t_group_leo|
- |t_group_livia|
- |t_group_lucy|
- |t_group_sarah|
- |t_group_simon|
- |t_group_sony|
- |t_group_sunny|
- +------------------------------+
- 8rowsinset(0.00sec)
3、对比结果,代码如下:
- mysql>selectcount(*)fromt_groupwhereuser_name='david';
- +----------+
- |count(*)|
- +----------+
- |1298576|
- +----------+
- 1rowinset(1.71sec)
执行了将近2秒.
- mysql>selectcount(*)fromt_group_david;
- +----------+
- |count(*)|
- +----------+
- |1298576|
- +----------+
- 1rowinset(0.00sec)
几乎是瞬间的.
- mysql>selectcount(*)fromt_groupwhereuser_name<>'david';
- +----------+
- |count(*)|
- +----------+
- |9090032|
- +----------+
- 1rowinset(9.26sec)
执行了将近10秒,可以想象,这个是实际的项目中是不能忍受的.
- mysql>select(selectcount(*)fromt_group)-(selectcount(*)fromt_group_david)astotal;
- +---------+
- |total|
- +---------+
- |9090032|
- +---------+
- 1rowinset(0.00sec)
几乎是瞬间的.
我们来看看聚集函数,对于原表的操作,代码如下:
- mysql>selectmin(money),max(money)fromt_groupwhereuser_name='david';
- +------------+------------+
- |min(money)|max(money)|
- +------------+------------+
- |-6.41|500.59|
- +------------+------------+
- 1rowinset(0.00sec)
最小,最大值都是FULL INDEX SCAN,所以是瞬间的.
- mysql>selectsum(money),avg(money)fromt_groupwhereuser_name='david';
- +--------------+------------+
- |sum(money)|avg(money)|
- +--------------+------------+
- |319992383.84|246.417910|
- +--------------+------------+
- 1rowinset(2.15sec)
其他聚集函数的结果就不是FULL INDEX SCAN了,耗时2.15秒,对于小表的操作,代码如下:
- mysql>selectmin(money),max(money)fromt_group_david;
- +------------+------------+
- |min(money)|max(money)|
- +------------+------------+
- |-6.41|500.59|
- +------------+------------+
- 1rowinset(1.50sec)
最大最小值完全是FULL TABLE SCAN,耗时1.50秒,不划算,以此看来,代码如下:
- mysql>selectsum(money),avg(money)fromt_group_david;
- +--------------+------------+
- |sum(money)|avg(money)|
- +--------------+------------+
- |319992383.84|246.417910|
- +--------------+------------+
- 1rowinset(1.68sec)
取得这两个结果也是花了快2秒,快了一点,我们来看看这个小表的结构,代码如下:
- mysql>desct_group_david;
- +-------------+------------------+------+-----+-------------------+----------------+
- |Field|Type|Null|Key|Default|Extra|
- +-------------+------------------+------+-----+-------------------+----------------+
- |id|int(10)unsigned|NO|PRI|NULL|auto_increment|
- |money|decimal(10,2)|NO||||
- |user_name|varchar(20)|NO|MUL|||
- |create_time|timestamp|NO||CURRENT_TIMESTAMP||
- +-------------+------------------+------+-----+-------------------+----------------+
- 4rowsinset(0.00sec)
明显的user_name属性是多余的,那么就干掉它,代码如下:
- mysql>altertablet_group_daviddropuser_name;
- QueryOK,1298576rowsaffected(7.58sec)
- Records:1298576Duplicates:0Warnings:0
现在来重新对小表运行查询,代码如下:
- mysql>selectmin(money),max(money)fromt_group_david;
- +------------+------------+
- |min(money)|max(money)|
- +------------+------------+
- |-6.41|500.59|
- +------------+------------+
- 1rowinset(0.00sec)
此时是瞬间的.
- mysql>selectsum(money),avg(money)fromt_group_david;
- +--------------+------------+
- |sum(money)|avg(money)|
- +--------------+------------+
- |319992383.84|246.417910|
- +--------------+------------+
- 1rowinset(0.94sec)
这次算是控制在一秒以内了.
mysql> Aborted
小总结一下:分出的小表的属性尽量越少越好,大胆的去干吧.