This is a bunch of stuff I needed at some for manipulating sequence
clusters. See the README for details. The tools included are:
filter - remove unwanted sequences from a clustering
hist - produce a histogram of cluster sizes from a "label"-formatted clustering.
clusc - compare clusterings, calculating numerous pair-based and entropy based indices.
add_single - add singletons to a clustering.
ace2contigs - parse an ACE assembly file, and output the contigs in a FASTA file.
ace2fasta - parse an ACE assembly, and output each assembly in a separate FASTA file
ace2clusters - parse an ACE assembly, and output clusters in TGICL format
clusterlibs - given a table of regular expressions and library names, along with a
clustering (TGICL-format), output a table of cluster sizes per library.
xcerpt - extract sequences from a list of sequence labels.