NAME

cdiff - compare and/or merge two text files


SYNOPSIS

    B<cdiff> [B<-skipto> I<pattern>] [B<-merge[1|2]>|B<-trim>] F<file1> F<file2>


DESCRIPTION

cdiff will compare two text files, ignoring differences in layout, and produce an output file which shows the differences. It can also be used to merge the two files into a single version, choosing between variant readings in the files based on a dictionary of words. (See the -merge option below).

The layout of the output is taken from file2.

Characters deleted from file1 are shown in [angle brackets], while inserted characters are shown in {curly brackets}.

Suppose file1 contains the following text:

    CHAPTER I
    Call me Ishmael.  Some years ago--never mind how long precisely--
    having little or no money in my purse and nothing particular
    to interest me on shore,I thought I would sail about a little
    and see the watery port of the wor1d.  It is a way I have
    of driving of the spleen, and regulating the circulation.

while file2 contains this:

    Chapter 1
      Call me ishmael. Some years ago- never mind how long
    precisely- having little or no money in my purse, and
    nothing particular to interest me on shore, I thought
    I would sail about a little see
    the vatery part of the world.
    It is a way I have of driving off the
    spleen and regulating the circulation.

The result of cdiff file1 file2 is:

    C[HAPTER I]{hapter 1}
      Call me [I]{i}shmael. Some years ago-[-]{ }never mind how long
    precisely-[-] having little or no money in my purse{,} and
    nothing particular to interest me on shore, I thought
    I would sail about a little [and] see
    the [w]{v}atery part of the wor[1]{l}d.
    It is a way I ha[s]{ve} of driving of{f} the 
    spleen[,] and regulating the circulation.

Adding the -trim option shows only the lines with differences, which are within 50 lines of a chapter heading (this is designed for Project Gutenberg's legal team):

    1:C[HAPTER I]{hapter 1}
    3:      Call me [I]{i}shmael. Some years ago-[-]{ }never mind how long 
    4:    precisely-[-] having little or no money in my purse{,} and
    6:    I would sail about a little [and] see
    7:    the [w]{v}atery part of the wor[1]{l}d.
    8:    It is a way I ha[s]{ve} of driving of{f} the 
    9:    spleen[,] and regulating the circulation.

The -merge option merges the files by taking the ``obvious'' choice where possible, but leaving the choices in otherwise. The ``obvious'' choices are:

  1. One variant is the same as the other but with some letters in upper case. In this case, take the upper case version.

  2. Punctuation has been inserted or deleted. In this case we include the punctuation in the output, on the assumption that punctuation is more likely to be missing.

  3. One or more words have been inserted or deleted. As for punctuation, the words are included in the output.

  4. If the choice is between two sets of words, where one variant's words are all in the dictionary but the other variant has words not in the dictionary, then choose the dictioary words.

  5. Otherwise, if one variant has more words in it, return that one.

  6. Otherwise, if one variant has fewer digits, return that one. (This is mostly for chapter headings: if there is a choice between roman numerals and arabic numbers, return the latter).

  7. Otherwise, if the -merge1 option was given, choose the variant from file1, if the -merge2 option was given, choose the variant from file2, otherwise, leave in the choice with [...]{...} brackets.

The result of cdiff -merge file1 file2 is:

    CHAPTER I
      Call me Ishmael. Some years ago- never mind how long
    precisely-- having little or no money in my purse, and
    nothing particular to interest me on shore, I thought
    I would sail about a little and see
    the watery part of the world.
    It is a way I [has]{have} of driving [of]{off} the 
    spleen and regulating the circulation.

Note that there are only two variants left, and these are expanded to whole words: [has]{have} and [of]{off}.

If you want to see which choices are made automatically, do

cdiff -merge2 file1 file2 > file3

and then compare file3 against file2, perhaps with:

cdiff file2 file3


OPTIONS

cdiff takes the following options:

-skipto pattern
Skip to the first line matching the given pattern in both files before starting the comparison. Lines in file2 up to the matching line will be included in the output unchanged.

-trim
Add line numbers to the output and include only those lines which have changes on them and which are within 50 lines from a chapter break.

-merge[1|2]
Merge the two files into one, automatically choosing which variant to include in each case. If the choice cannot be made automatically, then -merge1 chooses the text from file1, -merge2 chooses from file2 and -merge (or -merge3) leaves in the brackets for a human being (or some other program) to make the choice.


REQUIREMENTS

cdiff requires perl and wdiff to be available. wdiff also requires the diff utility.


AUTHOR

Martin Ward, <martin@gkc.org.uk>.


SEE ALSO

wdiff, diff


COPYRIGHT

This program is distributed under the Artistic License.