Ubuntu copying directory strange slowness

Part of my app deployment process involves making a copy of a directory that's about 50MB or so. On two of my nodes, this occurs in about a second. On a third machine, it takes about 55 seconds. Here's some info.

The disk is 20GB, ext3, running on Ubuntu 10.04. I have a folder called

/srv/www/server.com/html/shared

The total size of the /shared directory is around 8GB or so, most of it in

/srv/www/server.com/html/shared/data

I mention the data directory, because it's 8GB on this server, but is not present on the servers that don't have this same problem.

The 50MB folder I'm trying to copy is also in shared:

/srv/www/server.com/html/shared/cached-copy

The following command takes about 55 sec:

cp -RPp /srv/www/tradervue.com/html/shared/cached-copy ~/testing1

That's the exact same command that finishes in a second on other servers; the only real difference is on the other servers, the 8GB data directory is not there.

But then, oddly, the following will complete in a second or so:

cp -RPp ~/testing1 ~/testing2

This is reproducible, in any order, so I don't think it's related to file caching.

Any ideas what might be happening, or how I could debug this?

Thanks,

Greg

8 Replies

Possibly /srv and /home are on different partitions on the slow system? Also, there might be different filesystem options. What does mount tell you for each system?

@Vance:

Possibly /srv and /home are on different partitions on the slow system? Also, there might be different filesystem options. What does mount tell you for each system?

I think it's all one partition.

On slow system:

~$ mount

/dev/xvda on / type ext3 (rw,noatime,errors=remount-ro)

proc on /proc type proc (rw)

none on /proc/sys/fs/binfmtmisc type binfmtmisc (rw,noexec,nosuid,nodev)

none on /sys type sysfs (rw,noexec,nosuid,nodev)

none on /sys/fs/fuse/connections type fusectl (rw)

devtmpfs on /dev type devtmpfs (rw,mode=0755)

none on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)

none on /dev/shm type tmpfs (rw,nosuid,nodev)

none on /var/run type tmpfs (rw,nosuid,mode=0755)

none on /var/lock type tmpfs (rw,noexec,nosuid,nodev)

none on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)

The output from 'mount' is identical on the other systems…

Hmm, if you want to go digging, it looks like you can enable debugging if your kernel has CONFIGJBDDEBUG configured. I don't know if the Linode kernels do.

But re-reading your original post, 55 seconds to copy 50 MB is a really long time. It might be worth putting in a support ticket; this could be a hardware problem.

Do you folks think it could be something like… The directory used to have an obscene number of files in it, so its inode is really big, and it takes an excessively long time to work with? I don't think so -- it shouldn't take that long, and a modern ext3 fs should have dir_index enabled, which I think would handle it reasonably well -- but I'm not sure, and nobody else has any ideas…

Or perhaps one of the parent directories?

ls -d /srv/www/server.com/html/shared /srv/www/server.com/html/shared/cached-copy would show the size of the directories, Edit: but I may be totally out to lunch anyway.

@Vance:

But re-reading your original post, 55 seconds to copy 50 MB is a really long time. It might be worth putting in a support ticket; this could be a hardware problem.

That's what I originally thought…but the Linode folks say all is well on the host.

@mnordhoff:

Do you folks think it could be something like… The directory used to have an obscene number of files in it, so its inode is really big, and it takes an excessively long time to work with? I don't think so -- it shouldn't take that long, and a modern ext3 fs should have dir_index enabled, which I think would handle it reasonably well -- but I'm not sure, and nobody else has any ideas…

Well, there was certainly some activity with a lot of files. For example, the data directory I was talking about in the OP was about 150K files in one directory; I later split this into 27 separate directories (e.g. data/A, data/B, etc) and moved the files into their appropriate places.

I'm experimenting with a few things now - restoring a new node from this node's backup, for example, to see if it suffers from the same problem.

I'm secretly hoping rebooting this node might fix it…waiting for a slow time to take a few minute maintenance window.

Not at all sure if this offers any clues, but I restored a backup of the slow node onto a new node. The new node can do this directory copy in a few seconds. One difference I see is on the old (slow) node:

~$ df -i

Filesystem Inodes IUsed IFree IUse% Mounted on

/dev/xvda 4580736 417217 4163519 10% /

devtmpfs 63626 2036 61590 4% /dev

none 63678 1 63677 1% /dev/shm

none 63678 26 63652 1% /var/run

none 63678 2 63676 1% /var/lock

none 63678 1 63677 1% /lib/init/rw

and on the new node:

~$ df -i

Filesystem Inodes IUsed IFree IUse% Mounted on

/dev/xvda 1286752 408301 878451 32% /

devtmpfs 63626 2036 61590 4% /dev

none 63678 1 63677 1% /dev/shm

none 63678 20 63658 1% /var/run

none 63678 2 63676 1% /var/lock

none 63678 1 63677 1% /lib/init/rw

Note the different number of inodes. Not sure why that would make a difference, but it's something I noticed.

Next step is to reboot the slow node and see if that changes anything. If not, I might just rsync the incremental changes over to this new node and use that…

Huh. Well, I rebooted the slow node, it forced a fsck on startup, and after the restart it was still slow.

Ran a few experiments, and I ended up doing the following:

1. make a copy of shared/cached-copy directory as shared/copy1

2. rm -rf shared/cached-copy

3. mv shared/copy1 shared/cached-copy

After running a few random copies of other folders to ensure the disk cache cleared, I find that the new cached-copy directory behaves much better than the old one, copying in seconds.

Not sure why this helped, but at least at the moment it looks to be working better now. Hopefully the problem won't come back…

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct