UNIX Consulting and Expertise
Golden Apple Enterprises Ltd. » Posts in 'Filesystems' category

Solaris file system says it’s full, but there’s plenty of free space? Comments Off on Solaris file system says it’s full, but there’s plenty of free space?

Looking for UNIX and IT expertise? Why not get in touch and see how we can help?

A fairly common problem with Solaris UFS filesystems is where df output is showing lots of free space, but you can’t actually write to the filesystem. Having been recently playing with multi-terabyte filesystems, and forcing these sort of issues for debugging, I thought I’d share some information about the tools you can use and what they can report.

As an example, let’s look at a 2TB filesystem:

[[email protected]:/] # df -kh
Filesystem             size   used  avail capacity  Mounted on
/dev/dsk/c9t60060E80141189000001118900001400d0s0
                       1.9T   532G   1.4T    28%    /fatty

The first thing we can do is not only check the amount of free disk space, but also check inode usage:

df -F ufs -o i

[[email protected]:/] # df -F ufs -o i
Filesystem             iused   ifree  %iused  Mounted on
/dev/dsk/c9t60060E80141189000001118900001400d0s0
                     2096192       0   100%   /fatty

If we have multi-terabyte filesystems, our number of bytes per inode (nbpi) could be set too high if we’re using lots of small files – in which case it’s very easy to run out of inodes. We can see on this filesystem that we’ve used up all our inodes. Trying to write to this filesystem will result in “No space left on device” error messages – which is always good for some head scratching fun, as we can see that we’ve got 1.4Tb of space free.

To get an idea of how inodes, block size and things have been specified we need to find out how the filesystem was built:

/usr/sbin/mkfs -m <disk_device>

I’ve wrapped the line here to make it a bit more readable, but here’s the output querying our full multi-terabyte filesystem.

[[email protected]:/] # /usr/sbin/mkfs -m /dev/dsk/c9t60060E80141189000001118900001400d0s0
mkfs -F ufs -o nsect=128,ntrack=48,bsize=8192,fragsize=8192,cgsize=143,free=1,rps=1,nbpi=1161051, \
opt=t,apc=0,gap=0,nrpos=1,maxcontig=128 /dev/dsk/c9t60060E80141189000001118900001400d0s0 4110401456

This will show the commands passed to mkfs when it created the filesystem, and we can get an idea of what parameters were specified when the filesystem was built.

Things we care about here are:

  • fragsize – the smallest amount of disk space that can be allocated to a file. If we have loads of files smaller than 8kb, then this should be smaller than 8kb.
  • nbpi – number of bytes per inode
  • opt – how is filesystem performance being optimised? t means we’re optimising to spend the least time allocating blocks, and s means we’ll be minimising the space fragmentation on the disk

On a multiterabyte filesystem, nbpi cannot be set to less than 1mb, and fragsize will also be set to bsize. So we’d want to optimise for time as opposed to fragments, as we’ll only every allocate in 8kb blocks.

fstyp is the command we can use to do some really low-level querying of a UFS filesystem.

We can invoke it with:

fstyp -v <disk_device>

Make sure you pipe it through more, or redirect the output to a file, because there’s a lot of it. fstyp will report on the statistics of all the cylinder groups for a filesystem, but it’s really just the first section reported from the superblocks that we’re interested in.

[[email protected]:/] # fstyp -v /dev/dsk/c9t60060E80141189000001118900001400d0s0 | more
ufs
magic   decade  format  dynamic time    Fri Dec  5 17:26:27 2008
sblkno  2       cblkno  3       iblkno  4       dblkno  11
sbsize  8192    cgsize  8192    cgoffset 8      cgmask  0xffffffc0
ncg     4679    size    256900091       blocks  256857968
bsize   8192    shift   13      mask    0xffffe000
fsize   8192    shift   13      mask    0xffffe000
frag    1       shift   0       fsbtodb 4
minfree 1%      maxbpg  2048    optim   time
maxcontig 128   rotdelay 0ms    rps     1
csaddr  11      cssize  81920   shift   9       mask    0xfffffe00
ntrak   48      nsect   128     spc     6144    ncyl    669011
cpg     143     bpg     54912   fpg     54912   ipg     448
nindir  2048    inopb   64      nspf    16
nbfree  187148663       ndir    2       nifree  0       nffree  0
cgrotor 462     fmod    0       ronly   0       logbno  23
version 1
fs_reclaim is not set

bsize and fsize show us the block and fragment size, respectively.
nbfree and nffree show us the number of free block and fragments, respectively. If nbfree is 0, you’re in trouble – no free blocks means no more writing to the filesystem, regardless of how much space is actually still available.

What usually happens when writing lots of small (ie. > 8kb) files to a filesystem is that the number of free blocks (nbfree) has fallen to 0, but you’ve got plenty of fragments left. If block size = fragment size, that’s not an issue – but if fragments are, say, 2kb, then you’re not going to be able to write to the filesystem any more (“file system full” error messages) even though df is showing lots of free disk space.

A big part of tuning your filesystem is knowing what’s going on it. For multi-terabyte filesystems, you should be placing larger files on there – so setting block size to equal fragment size won’t be wasting space.

If you’ve got lots of smaller files, you’ll need to think about what the average filesize is – if it’s less than 8kb, you’ll want to make sure that fragment size is also less than 8kb. Otherwise you’ll be wasting space by writing 8kb blocks all the time when you could get away with 2kb fragments.

Anyway, back to the problem at hand – our 2Tb filesystem that’s run out of inodes. In this particular case, we’ll need to rebuild the filesystem and allocate more inodes. The question is – how do we work out what the value should be?

This simple shell script will analyse the files from the directory you execute it in, and will come back with the average file size:

#!/bin/sh
find . -type f -exec ls -l {} \; | \
awk 'BEGIN {tsize=0; fcnt=1;} \
    { printf("%03d File: %-060s size: %d bytes\n", fcnt++, $9, $5); \
    tsize += $5; } \
END { printf("Total size = %d Average file size = %.02f\n", \
tsize, tsize/fcnt); } '

Running it we can see:

(lots of output)
....
Total size = 2147483647 Average file size = 258286.18

Now, if our average file size is 252k, then our inode density of 1161051 (1 inode per 1mb) is going to be hopelessly inadequate. This is born out by looking again at our df output – we can see that we’ve run out of inodes when the filesystem is only approximately a quarter full, which matches up to our average file size being a quarter of the inode density.

However, at this point, we’re stuffed – we can’t set nbpi to be less than 1mb on a Solaris UFS filesystem that’s larger than 1Tb. Our only options are:

  • chop the filesystem up into smaller ones
  • migrate to ZFS
  • create bigger files ;-)
Top of page / Subscribe to new Entries (RSS)