PHP + MySQL - Selecting random rows effectively
Page 1 of 1
X_Dror




Posts: 4957
Location: Jerusalem, Israel
PostPosted: Wed, 29th Dec 2010 21:47    Post subject: PHP + MySQL - Selecting random rows effectively
Hi all,

I got a PHP with a local MySQL database on it.

The PHP server has a webservice that one of its function need to select about 100-200 random rows from a table. The table PK is the ID column which is also AUTO_INCREMENT.

At the moment my table is quite small (20 record) but I expect it to go to a few thousands at least.
For now I made thos simple query -

Code:
SELECT * FROM TABLE ORDER BY RAND() LIMIT 200


But I understand that this query can get really nasty performance wise when the table will get bigger.

I read this article for tips -

http://akinas.com/pages/en/blog/mysql_random_row/

but they only support selecting one row at a time, and I need 200 hundred at a time.

As far as I understand I got 2 options -

1. Finding a more efficient single query, that maybe you guys could help me find.
2. Using 2 sequental queries - First select the IDs of the existing records, randomize them on the PHP side, and select the whole table again with the random order I generated in PHP.

What would you suggest me to do?
Is it not that bad to make more calls to the DB if it's local?

Thanks!
Back to top
Spazmotic
VIP Member



Posts: 3107

PostPosted: Wed, 29th Dec 2010 21:54    Post subject:
If you're using an ID table, just use the PHP script to randomly generate the numbers, then query the results using the random result as the ID, can also run that within a loop so you can do it 200 times with minimal coding effort.

Or yes, for more efficiently, make all of the randomized calls and store them within an Array or such, then call them out within a single longer query maybe.
Back to top
Werelds
Special Little Man



Posts: 15098
Location: 0100111001001100
PostPosted: Wed, 29th Dec 2010 22:26    Post subject:
Are you handling your database properly when one of those records should be deleted? As in not *actually* deleting any rows, but rather using a flag to mark them as deleted? If not, do that first, otherwise your index will be fucked up Wink

After that, a simple COUNT() query combined with PHP's mt_rand() will do the trick. This is assuming you can ensure your db's integrity. I did some tests selecting 100 from slightly over 145000 rows this summer for work, and this proved to be faster than doing it through MySQL on a large set like that. Just make sure that you don't get a number twice, build in some assertions.

Not sure if retrieving all id's and storing those, then randoming from those would be that efficient, that'll spike resource usage on both ends if you've got a big database Smile
Back to top
Spazmotic
VIP Member



Posts: 3107

PostPosted: Wed, 29th Dec 2010 22:32    Post subject:
^-- Bit better way to describe what I meant.. Didn't know Deletions would fuck up an auto increment though, but makes sense.
Back to top
Werelds
Special Little Man



Posts: 15098
Location: 0100111001001100
PostPosted: Wed, 29th Dec 2010 22:44    Post subject:
Well not the auto increment itself, as that'll just keep counting properly. If you have 1-5 by auto inc and then delete 3, it'll still count from 6 onward, but the indexing table MySQL keeps won't be correct anymore. Not an issue with smaller databases, but when you've got the kind of database I deal with it gets noticeable over time unless you manually have it rebuild

Basically, whatever you can do in PHP, you should do in PHP. 9 out of 10 times it'll be faster than building some massive query for it. Same goes for JOINs: avoid them whenever you can, because they are one of, if not the slowest constructs you can use (because with a join it compares the ENTIRE tables, while most of the time you're only selecting part of it) Smile
Back to top
X_Dror




Posts: 4957
Location: Jerusalem, Israel
PostPosted: Sat, 1st Jan 2011 15:59    Post subject:
Thanks a lot for the answers!

Just as you suggested I added a boolean IS_DELETED column to my table in order to support the logical deletion.

However now it means that if I want to query valid rows I would need to add an additional criterion - 'IS_DELETED = 0'. Wouldn't this affect my queries? Should I index that column as well?

Besides that issue, what I need to do is to select all the valid IDs from my table like this

Code:

SELECT ID FROM TABLE
                                                                WHERE IS_DELETED = 0
                                                                 LIMIT 200


Randomize the IDs array. Re query it with all the fields I need and that's it.

Is that considered efficient enough?
Back to top
Werelds
Special Little Man



Posts: 15098
Location: 0100111001001100
PostPosted: Fri, 4th Feb 2011 10:20    Post subject:
Sorry, somehow I completely missed this last question here mate. Don't know if you still need an answer, but here goes:

X_Dror wrote:
Wouldn't this affect my queries?

Yes, this will affect your existing queries. Assuming you've set up your code like you should have though, that shouldn't be a lot of work though Smile

X_Dror wrote:
Should I index that column as well?

In one word: yes.
I don't have any link at hand right now, but creating effective indexes takes a bit of work. Basically you don't want indexes on just one field, unless there is a query which uses *only* that field - an auto increment column is typically the only field you use as the only index in a query (SELECT * FROM `table` WHERE `id` = x), but in most of your queries you're using composite indexes. Example: in the query (SELECT * FROM `table` WHERE `deleted` = 0 AND `date` < 1234567890) you use deleted and date as index; the first thing MySQL does is look for a composite index of the two together.

Just write your app and all the queries first - then go back, analyse them, and create the indexes Smile


As far as your query/code goes: yep, that'll be the fastest way Smile
Back to top
Page 1 of 1 All times are GMT + 1 Hour
NFOHump.com Forum Index - Programmers Corner
Signature/Avatar nuking: none (can be changed in your profile)  


Display posts from previous:   

Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB 2.0.8 © 2001, 2002 phpBB Group