Jacek Konieczny jajcus
Tue Aug 2 08:54:15 PDT 2005
On Mon, Aug 01, 2005 at 10:55:50AM -0400, cbbrowne at ca.afilias.info wrote:
> It is certainly supposed to fail when disk errors occur so that you don't
> have empty logs.

Unfortunately it doesn't.

I have just examined the problem and found the bug.

> If you can point out locations of insufficient error checking, that would
> be helpful.

First I tried slon under strace on a small filesystem. And I got empty
files again. In the strace output I have:

write(1, "2005-08-02 09:37:26 CEST DEBUG2 "..., 95) = 95
write(12, "-- Slony-I log shipping archive\n"..., 521) = -1 ENOSPC (No space left on device)
close(12)                               = 0

It seems all data is written on the final flush in fclose().

And here is the code that closes log:

int close_log_archive () {
        int rc = 0;
        if (archive_dir) {
                rc = fprintf(archive_fp, "\n------------------------------------------------------------------\n-- End Of Archive Log\n------------------------------------------------------------------\ncommit;\n");
                rc = fclose(archive_fp);
                rc = rename(archive_tmp, archive_name);
        }
        return rc;
}

As you can see return code of fprintf() and fclose() is ignored.

Also, in sync_event(), the close_log_archive() result is not always checked:

        /*
         * Add the final commit to the archive log, close it and rename
         * the temporary file to the real log chunk filename.
         */
        if (archive_dir)
        {
          close_log_archive();
        }

> I have tried it out with a full disk and found it stopping; it sounds very
> rather peculiar that that wouldn't happen for you.

It will stop when the queries written to the log file are bigger than file
buffers -- fprintf() will fail then. On small or empty logs slon will not notice
that data is not written properly.

In most cases calling fflush() on each query write should help, but the best solution would
be to check if all operations in close_log_archive() (including fclose() and frename()) succeeded
and revoke transaction if they did not.

Greets,
	Jacek


More information about the Slony1-general mailing list