Software written by Martin Ward

[Home] [Talks] [Publications] [Software] [G.K. Chesterton] [GKC books] [GKC pictures]

FermaT Transformation System
Bible Reference
Kansas City Tape Decoder
Plot a Frequency Spectrum of a wav File
Miscellaneous UK101 BASIC and 6502 Assembler Programs
SpamFilter
Clonedir: Create a copy of a directory and keep it up to date
Short File Name Restorer for Windows FAT16 and FAT32
Reflow: Break lines into paragraphs in a readable way
Compare two text files and show differences at the character level
Check the punctuation in a text file
Fix end-of-line hyphens in a text file
Convert paragraph breaks from indentation to a blank line
Download selected news from selected groups based on regular expressions

FermaT

Bible Reference

A perl script to look up Bible references and search for verses.

Usage: ref [bible] [book] [chapter:[verse[-verse|,verse]]] {/pattern/in}*

You can specify any of the bible, book, chapter or a list or range of verse and/or one or more patterns. The BIBLES environment variable gives the directory where bible versions are stored. The BIBLE variable is the name of the default bible to search.

Examples:

  ref John 3:16         print this famous verse
  ref all John 3:16     print this verse from every available bible
  ref jn 3:             print all of chapter 3 of John's gospel
  ref 3:16              list chapter 3 verse 16 in each book that has one
  ref 3 john            print a whole book
  ref 3 john 2          some books don't have chapter divisions
  ref AV 1 jn /Jesus/   list verses in 1 John in the AV which contain Jesus
  ref RSV /Jesus wept/  find the shortest verse in the RSV!
  ref /Adam/ /Eve/      list verses which mention both Adam and Eve
  ref /adam/i           case-insensitive search, finds Adam, adamant etc.
  ref RSV /baalmeon/in  finds the name Ba'al-me'on, ignoring the punctuation
  ref                   print the whole of the default bible

Bible etext files formatted for ref (download and unzip to the BIBLES directory):

Matthew Thorley has added a few features and made his version available on github.

Kansas City Tape Decoder

Download tape-read

This perl script can read and decode wav files recorded from cassette tapes in Kansas City format, as used by the Compukit UK101, Ohio Superboard and many other home computers.

It has also been used by several people as an FSK decoder program to deode Yamaha and Roland synthesiser tapes.

The program uses two perl modules: Audio::Wav and Math::FFT, these can be downloaded from CPAN.

By default, a '0' bit is represented as four cycles of a 1200 Hz sine wave, and a '1' bit as eight cycles of 2400 Hz. This gives a data rate of 300 baud. The carrier wave is a stream of 1 bits (2400Hz). Each frame starts with one start bit (a '0') followed by eight data bits (least significant bit first) followed by two stop bits ('1's). So each frame is 11 bits, for a data rate of 27 bytes per second.

These defaults can be changed via the options. For example, a CUTS tape represents a '1' bit as one cycle of a 1200Hz sine wave and a '0' bit as half a cycle of a 600Hz sine wave, giving a data rate of 1200baud.

The program uses Fourier analysis to determine the points where the signal changes from low to high frequency, and vice versa.

The options are:

hi=N       High frequency (1 bit/carrier/stop bit) (default=2400Hz)
lo=N       Low frequency (0 bit/start bit) (default=1200Hz)
baud=N     Baud rate (default=300)
CUTS       CUTS format (short for: hi=1200 lo=600 baud=1200)
frame=Nxy  Format: N=data bits, x=parity (E/O/N), y=stop bits (default=8N2)
max=N      Stop after reading N samples from the file
steps=N    Compute N Fast Fourier Transform steps per bit (default=10)
window=xxx FFT window function (none/bartlett/welch/hann) (default=hann)
resample=N Resample wav file so that one bit is N samples (default=0)
keep=Y/N   Keep all data, including short isolated sections? (default=N)
graph=Y/N  Plot a graph of the frequency spectrum against time (default=N)
channel=x  Channel to use (L=Left, R=Right, A=Average) (default=A)

If you have a poor quality recording, or a high bit rate recording (eg a CUTS tape) try resampling to, say, 128 or 256 samples per bit using the option resample=128 (the number of samples should be a power of two), and set the number of steps to 8 or 16.

Download some sample wav files here: sample-wav-files.zip.

1200TARG.wav is a CUTS format file, decode it with:

perl tape-read CUTS 1200TARG.wav

or equivalently:

perl tape-read baud=1200 lo=600 hi=1200 1200TARG.wav

This should produce a binary file 1200TARG-001.txt which should be identical to the file 1200TARG.txt in the archive.

All the other files are in the (default) UK101 format: 300 baud, lo=1200, hi=2400. The script automatically detects the sample rate of the wav file.

bad-example.wav is a really poor file which the script can still process
good-example.wav is a high quality wav file
prog_01c.wav is the rest of the program for which good-example.wav is the first part. This has been resampled to 8 bit mono at 6000 Hz, and can still be processed.

In theory, the lowest possible sample rate for a 2400Hz signal is 4800Hz (the Nyquist limit). In practice, sampling at 4800Hz causes the 2400Hz signal to disappear periodically (when the sample points coincide with the zero crossings of the signal). But you can get quite close to the limit.

What this means in practice is:

If you have a poor quality tape, convert to a wav file at the highest resolution and sampling rate that your sound card or recorder can handle. Save the result as a wav file (don't convert to MP3!) If you have a really bad tape recorded at a high bitrate, then resample so that each bit takes 32 or 64 samples.
If you have a good quality tape, but not much disk space or CPU power, you can probably get away with recording in 8 bit mono at 8KHz.

MP3 encoding is OK for good quality 300 baud recordings, but will destroy 1200 baud recordings. If you need to compress the wav files, use a "lossless" compression format, such as FLAC.

CUTS tapes can be hard to decode because the zero bit is only half a cycle of the low frequency. The script has to analyse a two bit wide window. Using the welch window function can help since it gives more weighting to the centre of the window. For poor quality CUTS tapes the following method has worked sucessfully:

Save the CUTS tapes at low volume (to avoid clipping) and a high bitrate (44.1KHz or higher).
Examine the wav file with a suitable editor to work out the exact bit rate (1 cycle of the high frequency = 1 bit).
In one example, 20 cycles of the high frequency was about 787 samples, so the frequency is 44100*20/787 = 1120 Hz (approx). So the low frequency is 1120/2 = 560 Hz and the bit rate is 1120.

Use these parameters (adjusted accordingly):

tape-read hi=1120 lo=560 baud=1120 resample=128 steps=32 window=welch file.wav

This will take a while to process, but should give about the best possible result.

Weber Kai has created a modified version of the program for processing MSX tapes which is available on GitHub. He has used this to extract the MSX code from an old brazilian radio programme.

Plot a Frequency Spectrum of a wav File

Download fftplot

This perl script will read a wav file and create a gif file with a plot of the frequency spectrum.

This can be useful for analysing an unknown computer tape format.

Miscellaneous UK101 and Superboard BASIC and 6502 Assembler Programs

Download UK101 Programs

Download UK101 ROMs

Download UK101 manual

A collection of my Compukit UK101 BASIC and assembler programs, recovered from tapes which have lain in my attic for the last 30 years.

Tim Baldwin has written an excellent Compukit UK101 simulator/emulator which is implemented in Java and runs on Windows, Linux and Mac systems.

Some of my programs use the "enhanced" 48x32 character screen: these will need modified ROMs which can be downloaded here along with a suitable properties file for the UK101 simulator.

My Real Time Star Trek game now has it's own page (copied from Tim's sourceforge site).

SpamFilter

Download

A perl module and sample scripts for filtering email using several popular mail filters. The module presents a uniform interface for passing a message through each filter and determining which filters consider the message to be spam

The spamcheck script passes a copy of the given message to each filter and counts how many filters consider it to be spam. It adds a X-SPAM-Votes: header with the total.

I currently delete everything with four or more votes and quarantine everything with one to three votes using these procmail rules:

:0fw: spamcheck.lock
| spamcheck

# Record the votes in the procmail log file:
:0
* ^X-Spam-Votes: \/.*$
{ LOG="Spam-Votes: ${MATCH}" }

# Junk anything that 4 or more scanners give a positive result on.

:0
* ^X-Spam-Votes: [456789]
/dev/null

# Filter anything which any scanner considers to be spam:

:0
* ^X-Spam-Votes: [123]
SPAM.ASSASSIN

The isspam and notspam scripts can be used to train your filters. Any spam message which is missed by any filter can be passed to isspam while false positives should be passed to notspam.

The spam filters it currently knows about are:

Email martin@gkc.org.uk with codes for any additional filters you know about!

Clonedir

Download

A perl script for copying a directory tree to another location (eg a separate hard drive for backups). It looks at the size and modification times of the files to decide whether to copy them or not. As a result, after the first "clone", keeping the copy up to date is a very quick operation.

Note: If you keep a backup of your windows partition on a linux partition, or if you use NTBackup or XCOPY to backup your windows partition, then you also see sfn-fix (see below).

Short File Name Restorer

Download

The Windows FAT16 and FAT32 file systems don't really have long file names: the long names are hacked on top of the ``real'' file name which has to keep to the old 8.3 format. When files get backed up and restored by many backup programs (including my clonedir above and Microsoft's NTBackup and XCOPY) then they can end up with different short file name. This wouldn't be so bad if everybody always referred to files by their long names, but Window's Registry is stuffed full of references to files by their short names.

ITS Systems has an article on the subject (plugging their own backup software) as does PC World

Microsoft's workaround is to ``Adhere to a pure 8.3 short file naming convention...''. They don't say what to do about directories such as ``My Documents'' or even ``Program files'' which don't adhere to the convention!

My solution, sfn-fix, is a perl script which uses a saved copy of the output of ``mdir -/ C:'' (a Linux utility for listing a directory on a windows filesystem) to give the restored files their original short file names. An mdir listing includes both long and short names, and the -/ option does a recursive directory listing.

Note that Windows ME and Linux use different file name mangling conventions for creating short file names after the first nine files in a directory with the same first six characters and extension. Only short files of the form xxxxxx~n.xxx (where n is a digit) can be restored. But this should be enough to keep the Registry happy.

Reflow

Download

Text::Reflow v1.04 is a perl module which takes some ascii text, in a file, string or array, (with paragraphs separated by blank lines) and reflows the paragraphs. If two or more lines in a row are "indented" then they are assumed to be a quoted poem and are passed through unchanged. It uses Knuth's paragraphing algorithm (the same algorithm used by TeX) to choose optimal line breaks based on keeping the lines the same length while avoiding breaks within a proper noun or after certain connectives ("a", "the", etc.) and encouraging breaks at punctuation.

The result is a file with a fairly "ragged" right margin but which is easier to read than a file with a strict right margin since it is less likely that phrases are broken across the line.

The package includes a simple perl script for reflowing files.

The -skipindented option causes all indented lines to be passed through unchanged.

The -veryslow option reflows each paragraph 16 times with different optimal line widths and picks the "best" result--with this option the paragraphs are easier to read, but the line width may vary from one paragraph to the next.

The -skipto pattern option skips to the first line which starts with the given pattern: this is to avoid reflowing header material such as the Project Gutenberg header.

Most of the text files on my G. K. Chesterton site are reflowed with this script.

Cdiff

Download

Cdiff will compare two text files, ignoring differences in layout, and produce an output file which shows the differences. It can also be used to merge the two files into a single version, choosing between variant readings in the files based on a dictionary of words.

See this separate page for full documentation.

Check-punct

Download

Check-punct is another perl script which checks an ascii file for bad spacing around punctuation and other errors such as mismatched quotes and parentheses. It is particularly useful for checking scanned documents.

Fix-hyphens

Download

Another perl script which uses a dictionary to find words which have been broken by an end-of-line hyphen and deletes the hyphen and line break. This also fixes most "Larson" encodes, _*emphasis_ and and simple HTML codes (<i>, <b> and accented characters such as é).

The -head option will try to delete page headers from scanned documents.

Convert-paras

Download

Convert paragraph breaks from indentation to a blank line. The default is to treat a line starting with a tab character as a new paragraph. The -n option looks for n spaces at the start of a line.

The -skipto pattern option skips to the first line which starts with the given pattern: this is to avoid reflowing header material such as the Project Gutenberg header.

Newsgrep

Download

A perl script to download selected news from selected groups based on regular expressions. Based on "newscan" by John F. McGowen but completely rewritten to use the Net::NNTP module from CPAN and to be vastly more efficient over a slow link such as a modem.

The .newsgreprc file in your home directory lists which groups you are interested in and which articles to select.

A sample .newsgreprc file:

# Sample .newsgreprc file (this line is a comment)
# List your nntp server
NNTP nntphost.at.your.isp

# Mailbox where you want the news to be stored:
MBOX ~/NEWS

# SELECT command selects a news group
# WHERE/REQUIRE/UNLESS commands select articles from the newsgroup
# based on the given perl regular expressions
# Any UNLESS match means that we don't want this article
# All REQUIRE patterns must match.
# If there are no WHERE patterns, then we want everything that is left.
# Otherwise, at least one WHERE pattern must match.

SELECT comp.lang.perl.announce
REQUIRE  /^Approved:/

SELECT comp.compilers
WHERE  /^Approved:/
UNLESS /^Subject:.* Frequently Asked Questions/

SELECT rec.games.bridge
# I am interested in articles that mention the word gib or GIB or Gib...
WHERE /\bgib\b/i

# The script will record which articles have already been seen
# at the end of this file:

Back to my home page.

Last modified: 20th March 2002

Martin Ward, De Montfort University, Leicester.
Email: martin@gkc.org.uk

Software written by Martin Ward

Contents

FermaT

Bible Reference

Kansas City Tape Decoder

Plot a Frequency Spectrum of a wav File

Miscellaneous UK101 and Superboard BASIC and 6502 Assembler Programs

SpamFilter

Clonedir

Short File Name Restorer

Reflow

Cdiff

Check-punct

Fix-hyphens

Convert-paras

Newsgrep