学习积累

mysql order by rand优化

今天来讲讲这个mysql常用的order by rand用法。我们经常会遇见一个需求,那就是随机从数据库中抽取几条数据显示在右边栏,或则显示在其它地方,那么常用的方法就是:

SELECT * FROM tablename ORDER BY RAND() LIMIT 1; 但是随着数据量的增加,这个语句会变得越来越慢。

举例:我这里有一个news的表,如下:

mysql> select count(*) from news;
+———-+
| count(*) |
+———-+
| 16338 |
+———-+
1 row in set (0.00 sec)

执行SQL: select from ·`news` order by rand() limit 1; 结果:1 row in set (13.97 sec) ,执行时间长达14秒左右,这对于一个网站来说是无法接受的。

这时候网上百度,google一搜就能搜到替代方案,如:

执行sql: SELECT *
FROM `news` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `news`)-(SELECT MIN(id) FROM `news`))+(SELECT MIN(id) FROM `news`)) AS id) AS t2
WHERE t1.id >= t2.id
ORDER BY t1.id LIMIT 1;

结果: 1 row in set (0.02 sec),这个差距实在是太大了。很多时候到这里我们以为就结束了,其实并不然,这只是简单的抽取表中的随机数据,并没有任何的过滤条件,有人说,很简单嘛,我们就在where后面加条件就行了,例如:

SELECT *
FROM `news` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `news`)-(SELECT MIN(id) FROM `news`))+(SELECT MIN(id) FROM `news`)) AS id) AS t2
WHERE t1.id >= t2.id AND category_id = 2
ORDER BY t1.id LIMIT 1;

其实这里是有一个误区的,如果我们只在最后的where里面加过滤条件有可能是查询不到足够的数据的(对于多条数据抽取),因为这个最大值和最小值中间的差并不是满足条件的数据总和,因为这会导致t2.id的值比较大,就算我们在子查询select minx和select max后面加上where的新条件也是一样的,例如

SELECT *
FROM `news` AS t1 JOIN (SELECT ROUND(RAND() * ((SELECT MAX(id) FROM `news` WHERE category_id = 2)-(SELECT MIN(id) FROM `news` WHERE category_id = 2))+(SELECT MIN(id) FROM `news` WHERE category_id = 2)) AS id) AS t2
WHERE t1.id >= t2.id AND category_id = 2
ORDER BY t1.id LIMIT 1; 执行时间: 1 row in set (0.04 sec)

当limit的值比较大时,很容易就取不到数据,这时候我们就需要考虑新的办法了。既然最大值与最小值之的差不是满足条件的数据总和,那我们是不是可以考虑直接用这个数据总和值来作为这个条件了,答案是肯定的,而且更能取得足够合适的数据,sql如下:

 

SELECT *
FROM `news` AS t1 JOIN (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM `news` WHERE category_id = 2)+(SELECT MIN(id) FROM `news` WHERE category_id = 2)) AS id) AS t2
WHERE t1.id >= t2.id AND category_id = 2
ORDER BY t1.id LIMIT 1; 执行时间: 1 row in set (0.02 sec),

由此可见,新的方法不仅速度快,而且数据更真实, 不信你可以试试看。

 

 

 

 

Be the First to comment.

Leave a Comment