View Full Version : [PHP] - Storing user submitted images... Multiple folders?
redrum87
August 10th, 2006, 05:15 AM
I have a need to store lots of small user submitted images in a project I'm working on. My question is, is it a bad idea to store them in multiple folders for faster retrieval (or some other reason so there's not 50 bazillion files in one folder)? If I were to do this, I'm thinking that I would have to store the name of the folder in the DB.
The user base for the site I'm working on will be college students, so a pretty logical division for the folders would be to store them by school. So an example of a directory tree might be...
user_images
|
|_MIT
|
|_UCLA
|
|_UMiami
...and in the DB, I would just store the name of the folder, and not the full path. There's a slight kink though. The system I'm using (that bwh2 suggested to me) involves storing the md5 of the image to prevent two people from uploading duplicate images. This would break that a little bit. People from the same school would not be able to upload the same image, but people from two different schools would, thus taking up precious storage space. Is there a better way?
redrum87
August 11th, 2006, 11:54 PM
I really need a reply. Did anyone read this and just not have a suggestion, or did I not supply enough information, etc?
hl
August 12th, 2006, 12:34 AM
You can check if the hash exists in the DB, and then if it returns true to existence, have it write the image as a PHP file including the contents of the other file.. so a pointer.
This probably isn't the best approach of course, I'm a bit too tired to think right now though.. :P
bwh2
August 14th, 2006, 02:46 PM
i personally don't see any clear benefit from having the university named subdirectories.
i don't know enough about how servers work to say with any certainty either way, but you might want to do some quick research on read/write permissions for directories. that is, how many users can you have simultaneously writing and/or reading from a directory. if there are limitations, you might need to look at the subdirectory method. or possibly a subdomain method.
the directory name being semantic is the wrench. even if you have subdirectories, you don't have many problems if you have non-semantic directories (e.g. "user_images/98327972/myhash.jpg" ). but you will have problems if a Lehigh student views an image location and sees "user_images/Lafayette/myhash.jpg". that would start riots.
redrum87
August 14th, 2006, 03:24 PM
That would start riots indeed. I'll look into this, and if I find anything, I'll come back to this thread.
bwh2
August 14th, 2006, 03:27 PM
i'm not sure what kind of site you're working on, but you might want to think about this also... if users are uploading images of themselves and friends, odds are you won't get too many overlaps across schools anyhow. surely the large majority of duplicates would come from the same school. so if you allowed for duplicates across different schools, but not within the same school, you might be ok.
redrum87
August 14th, 2006, 07:16 PM
and i intend to keep the kind of site i'm working on a secret until it's complete :)
with this site, duplicate images are quite possible across multiple schools. some alpha numeric string is probably better than the school name... and in typing here it just hit me. i can just use the school_id instead of the school name... tada!
think that'll work?
bwh2
August 14th, 2006, 07:26 PM
it'll work as long as you map it correctly in the db. i'm just confused as to what benefits it offers.
redrum87
August 14th, 2006, 10:14 PM
Yeah, it may not have any benefit. I'm still looking into how many users can read/write to a single directory. I just know, that on my Windows client-side machine, the more files that are in a directory, the longer it takes for it to come up. I know very little about how file systems work, but I'm thinking that maybe a Linux server (which is where this site will be at) might be better about that sort of thing. Still, the only reason I think it'll take a long time to access a directory if there are a lot of files in it is based on my experience. Any insights into this would be helpful.
bwh2
August 14th, 2006, 10:30 PM
well, i probably wouldn't be using PHP to scan the directories. i would be using mysql to search the db, which would call up the appropriate image(s). i'm sort of saying this more based on experience and common sense than actual testing, but i feel like mysql queries would run faster than PHP scanning a directory. especially since you can do things like index columns in a mysql table, but you can't really index a directory in PHP.
my thought is also this. let's say you have 10 images on a page that all come from different schools. if you have the university level folders, you would have 10 folders with 1 image each. in looping through the mysql rows for those 10 images, you would need PHP to test whether or not each directory structure exists. so if you have 10 different folders, you're running 10 if statements. if you have one folder, you're running 1 if statement. so it's obvious which is faster. not that 10 if statements like that would kill you, but clearly you want to control your overhead.
my main concern though with not having subfolders is the ability to manage the data. let's say you have something like 200,000 images uploaded. it's going to be a lot more manageable to break them up into batches of a thousand or two thousand (pick your number) than it would be to have one folder with 200,000 images. so you might also want to consider some sort of auto increment directory creation. you could do something fixed width like 00000001, 00000002, 00000003.... 99999999. that would give you plenty of room for growth and still be manageable.
JoshuaJonah
August 14th, 2006, 11:15 PM
Definately use a DBase with entries for the images. This way you can also store an unlimited amout of info on the photos if necessary.
redrum87
August 14th, 2006, 11:17 PM
you hit my concern dead on. i'm very worried about having a folder with a bazillion images in it, versus a bunch of folders in batches of a few thousand or so.
i wouldn't be using PHP to index a directory. i would just have the name of the folder where the image is stored in the db. so, in the images table, there would be a file column and a folder column. in the php code, it would just say something similar to this:
<?php //query the db here ?>
<img src="images/<?php echo $row_getImages['folder'] . '/' . $row_getImages['file']; ?>" alt="<?php echo $row_getImages['title']; ?>" />
so it would just be all MySQL in terms of getting the paths. right?
bwh2
August 14th, 2006, 11:23 PM
right. and you could just throw the directories in a separate table and join them on directory_id.
redrum87
August 14th, 2006, 11:30 PM
right. and you could just throw the directories in a separate table and join them on directory_id.
hm, wasn't even thinking about that, but yes, that's necessary to prevent duplicate data. alright. i think my problem is solved. :)
bwh2
August 14th, 2006, 11:32 PM
cool beans. i'm looking forward to seeing this project when it goes live.
redrum87
August 14th, 2006, 11:44 PM
i'm hoping for september 1st if all goes well, but it might be earlier or later than that. it'll most likely be in my footer. i'm only testing it at my college (UCF), and if that works out i'll scale up to all of Florida, and if that works out i'll scale up to the USA, etc... until it dominates the universe... or something...
i'm going to have a suggest box like facebook does, and if i get enough people wanting a school added, i'll add it.
Powered by vBulletin® Version 4.1.10 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.