thousands of files in one directory

I'm serving user-uploaded pictures from the filesystem using nginx. I'm not using a database for performance reasons. I want this to be able to scale like crazy, like into the hundreds of thousands of pictures. If I select the filesystem carefully, is there any problem with dumping all the files into the same directory?

I was thinking of creating multiple buckets to distribute the files based on hashing their id. But then, how many buckets do I need? Sub-buckets?

ReiserFS and ext3 both support b-tree searches. I read that ext3 supports around 10**20 files per directory, but I couldn't find any data for ReiserFS.

Anybody have experience doing this kind of thing?

5 Replies

It may be best to avoid MurderFS until its' future is a bit more certain.

You could look at hash-like databases (CouchDB, etc) and then have a quick lookup to get the file location (create 100 dirs, randomize which dir is used for the image). Or create dirs as you go, making sure each dir only has X images in it.

Also,might be worth looking at pastebin's sourcecode (http://pastebin.com/pastebin.php?help=1)

@funkytastic:

Anybody have experience doing this kind of thing?
I'd avoid extremes such as that. Even if the filesystem technically supports that many files in a single directory, various admin tools you may wish to use when working with that tree are likely to bog down, sometimes severely.

I'd certainly suggest sharding the set of files among one or more levels depending on your expected scale. If you're in control of the filenames (say assigning uuids or something), just create a few levels based on initial characters. For example, with a uuid scheme, using 2-character directories (00-ff) with 2 levels you can support a million files with an average leaf directory size of about 16, assuming even uuid distribution.

If you're only going to be in low hundreds of thousands, a single level of directories would still average only ~400 files in each leaf node per hundred thousand.

If you don't have control over the filenames, you may want to hash the filename and then use characters from the hash since otherwise common naming patterns could significantly skew the tree.

– David

Linode is a good place to house your website,

for scalability of mass image hosting, you would be better served to push your images to amazon s3 or rackspace cloudfiles.

its going to be cheaper for you in the long run when it comes to raw file storage (but potentially more for actual bandwidth) and rackspace has a CDN built in, with no extra costs bandwidth from cloud files to the cdn edge like amazon does.

amazon has better access controls.

this would be the more scalable way to do this, and the infrastructure is already there, you don't have to reinvent it.

Thanks for pointing out rackspace cloud files. I hadn't heard of it. I just signed up and it looks good so far. This will certainly simplify things!

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct