IMP: CoordGen8- Coordinate Generation Utility: User’s Manual

H.David Sheets, Dept. of Physics, Canisius College, 2001 Main St., Buffalo, NY 14208. sheets@canisius.edu.

            The IMP software series is a set of tools for the analysis of biological shape using landmark-based morphometric geometric methods, and CoordGen7 (COORDinate GENerater) is a software tool for generating various superimpositions of the landmark configurations.  CoordGen7 can generate, display and save data sets in Partial Procrustes Superimpositioning, Bookstein Coordinates, Sliding Baseline Registration and RFTRA superimpositioning.  The input files needed for CoordGen7 may be in TPS file format (F.J. Rohlf, 1993-present, see the morphometrics web site at SUNY Stony Brook, http://life.bio.sunysb.edu/morph/) as produced by the TPSDig program (Rohlf, 1998), or in the x1y1x2y2...CS file form used by IMP, in which each specimen is arrayed on a row of a matrix (see further discussion below), or in the MorphoJ format (Chris Klingenberg, http://www.flywings.org.uk/MorphoJ_page.htm) which is a text file with specimen labels on each line and a header.  The output files are in any of these three formats. 

Additionally, CoordGen can rescale the data using the endpoints of a ruler included in the measurements of landmark locations, display the landmark configurations in each of the superimpositions, and generate reference forms from the data in each superimposition.   There are also tools meant to look for errors in digitization (including a simple PCA plot, and an approach to determining which landmarks make large contributions to the total variance),  to calculate variance in shape and in centroid size, as well as to work with data in the Procrustes Size Preserving (Procruste SP) superimposition.   The program can compute and export the variance-covariance matrix of all specimens and the matrix of pairwise Procrustes distance between all specimens, which may be useful in other programs.  This version of CoordGen will incorporates the older IMP TradMorphGen program for calculating lengths between landmarks and angles and also the SemiLand program for carrying out semi-landmark analysis.

The image above shows 16 landmark coordinate measurements from 46 specimens of Serralmus elongatus (a piranha) in a Partial Procrustes Superimposition (CS=1)

 

The image above shows 16 landmark coordinate measurements from 46 specimens of Serralmus elongatus (a piranha) in Bookstein Coordinates (registration to a baseline).

CoordGen7 forms the “entry-point” to the rest of the tools in the IMP series.  All other software tools in IMP use the same file format, and this file format also loads directly into Excel, SPSS, SAS and other software packages, since the data are stored in a data matrix such that each row is a specimen, and each column is a coordinate.   The MorphoJ file format may also be helpful when exporting data into other software packages, as it includes a header line and specimen labels.

This manual will not serve as an introduction to Geometric Morphometrics, if you are completely new to the field, spend some time browsing the website at StonyBrook run by F. James Rohlf (, http://life.bio.sunysb.edu/morph/) or the reference list at the end of this document to locate further resources.

 

Credits:

            Conceptualization and GUI Design: H. D. Sheets, D.L. Swiderski, M.L. Zelditch

            Coding and Software Design:

H. .D. Sheets -sheets@canisius.edu, Dept. of Physics, Canisius College, 2001 Main St. Buffalo, NY 14208,   716-888-2587

Referencing:

                        The IMP series is completely free, but I would appreciate being referenced if you use it in published work.

                        Please reference my name and address, and ideally the website address, 

http://www.canisius.edu/~sheets/morphsoft.html

                        or the textbook,   Geometric Morphometrics: A primer. Zelditch M.L., Swiderski D.L., Sheets H.D. and Fink W.L. 2004. Elsevier. (2nd edition under contract, scheduled for completion January 2012).

 


 

 

Topics Covered in this Manual

 

Input File Formats (IMP, tps, MorphoJ)

 

Loading and Saving Data

 

Output File Formats

 

Detecting Digitizing Errors

 

Detecting Noisy or Difficult to Digitize Landmarks

 

Variance

 

Procrustes SP results

 

Summary Information about Data sets

 

Calling Other Tools

 

            TMORPHGen

            SemiLand

 

Miscellaneous Buttons and Commands

 

 

 

           

Using CoordGen7 

CoordGen7 is a Coordinate Generating program meant to generate data files of landmark based geometric morphometric data in different types of superimpositions, Bookstein Coordinates (BC), Sliding Baseline Registration (SBR), Procrustes Superpositioning (PS) and Resistant Fitting Theta Rho Analysis (RFTRA). The program also displays data sets and allows for the generation of mean or reference specimens in each type of superimposition. The output files can be generated in the TPS format used by James Rohlf's software or the x1y1x2y...CS format favored by Zelditch, Sheets et al, and the MorphoJ format used by Chris Klingenberg.  The program will also compute summary statistics for the data set, and has tools meant to allow detection of digitization errors.  It will also save matrices of all inter-specimen distances.

 

Input File Formats

The program can currently load three distinct data types,

            TPS files

            This is the file format used by the tps software series, including the extremely       useful tpsDig tool for digitizing images.  The following tps file variants can be loaded             into CoordGen, Landmark data

            -with a ruler visible in the image,

            -with no ruler, but already properly scaled

            -with a scaling factor for each specimen.

            -with curves which will be converted into semi-landmarks

            -with curves and a ruler

            X1Y1   formats

            These are input files with one specimen per line, with each column being a variable            (landmark coordinate).  The first column is the x coordinate of the first landmark, the   second column is the y coordinate of the 1st   landmark, followed by the (x,y) pair for        each consecutive landmark.   The specimen name may appear at the end of a line after      a percent sign (%specimen 411), see details below

            -X1Y1 raw data with a ruler in the file

            -X1Y1 Raw data with no ruler

            -BC Files (X1Y1..CS)- this is the IMP format default file used elsewhere in IMP.              The name BC refers to our early practice of saving all data in Bookstein Coordinates,      with the centroid size(CS) as the last data column.  If specimen labels are used, they          are after a % on the last line of the file.

            MorphoJ format

            In the MorphoJ format, each specimen is on a single line, the first entry in each     column is the specimen name, followed by each landmark coordinate in (x,y) pairs.         Data is scaled to the size of the specimen.

Formatting, reformatting and viewing of data files may be done using Word or Excel, or any other program that is capable of outputting ascii text files. The 'Save As' option in Word or Excel will let you specify MS-DOS text files which work well.

Loading and Saving Data

Most load options are available using buttons on the main screen, others must be accessed through the file menu.

Loading TPS files

To load a TPS file, the TPS format rules must be followed, see the documentation with F. James Rohlf's software(see the website listed earlier, or the example below). The only critical factor for CoordGen is that the symbol sequence LM=XX be on the line immediately before the set of values representing the coordinates of each specimen, where the x and y coordinates are paired on each line. XX is the number of landmarks. Any information after the number of landmarks following LM= is ignored, so this is a good place to put a specimen label or other comments.

If you are loading a file that has a ruler in the image, the software assumes that the ruler is the last two landmarks in the list, and that the ruler has a length of 10 units. If this does not match your data, use the ruler endpoints boxes to specify where the ends of the ruler are, and the ruler length box to specify the ruler length. Then use the carry out rescaling button to re-adjust the value of the landmark positions and calculate the centroid size.

If you have no ruler in the image, CoordGen can load the data assuming it to be in properly scaled units Load tps (no ruler/no scale factor), and to need no further rescaling.  It is also possible to to a tps file with a scale factor (this option is on the File menu).

 When you load a tps file, CoordGen will attempt to extract some labeling information from the tps file, collecting information from three different locations in the tps file.

            a.) on the LM= N line of the tps file, CoordGen will read any information following          the value of N             and include this information in the specimen label.    If the LM line         reads “LM=12 NmFaS#311.2”, then the letters “NmFaS#3112” will appear in the             specimen label.

            b.) if the is an ID=M line in the data file, CoordGen will append the ID number to the       information on the LM= line.   So if in the file above, the id line was ID=7, the “7”        would be appended to the label, to read “NmFaS#3112 7”

            c.) if there is a line with the image file name,  IM=filename, in the tps file, CoordGen         will append the image file name to the data label.   If the the file discussed above has    the line im=”NMFAS_3112.tif”, the entire data label would read

            “NmFaS#3112 7 NMFAS_3112.tif”

This process does have the possibility of generating unwieldy specimen labels, but allows for a lot of labelling options.   All IMP series programs keep specimens in the same order as they were input into the program and typically refer to specimens by their ordinal number in the file, although some functions will use the label.

Loading X1Y1 data

These are all variants on the standard IMP format, which were developed using earlier versions of Matlab.   In each case, a specimen occupies one row of a data matrix, and each column is a landmark coordinate (or the centroid size).  This format is easily loaded into Excel or systat.

X1Y1 Raw Data With Ruler

This file format consists of the set of landmark measurements for each specimen arranged as the row of the data matrix.  The first column is the x coordinate of the first landmark, the second column is the y-coordinate of the first landmark, followed by in successive columns by the x and y coordinates of all other landmarks in order.  This option again assumes that the endpoints of a ruler are included in the landmarks, the endpoints of the ruler are used to rescale the rest of the data. 

X1Y1 Raw Data (no ruler)

This is identical to the format used for the X1Y1 Raw Data with Ruler as discussed above, except that there is no ruler.  The data is assumed to be correctly scaled.

X1Y1...CS Files, Format and Input

In this file format (loaded using the file labeled “Load BC File (X1Y1..CS) IMP standard format”), the coordinates of all landmarks corresponding to a single specimen lie on a single row or line of the file. The X coordinate of each landmark is listed first, followed by the Y coordinate, with a space between each. The Centroid Size (CS) of the data is the last item in the list.  Following centroid size there may be a percent sign (%) followed by a data label.   All text and numbers following the % are treated as labels. So for data in 2 dimensions with k landmarks, there are k pairs of X Y values plus CS or 2k+1 values on a single row or line. Each line of the file is a separate specimen.   All of the IMP software using this system is written to avoid changing the ordering of specimens, so that the order of all specimens in an X1Y1...CS file is fixed, and the same as the digitizing file it was created from.   Most (but not all) of the IMP programs will recognize and load the data labels.

Coordgen will load X1Y1...CS files in, so that you can convert from one superposition to another, or generate reference forms or plots from it.

Sample X1Y1...CS file format

0 0 1 0 0 1 1 1 4 % sepecimen 1

This is an X1Y1 file for a square in BC with corners at (0,0), (1,0), (0,1) and (1,1) which had a centroid size of 4 before being placed in BC registration.

Output File formats (Landmark data)

The output file format button will allow you to chose the output format you want, either TPS for use with Rohlf's software or X1Y1...CS format for this software suite.  You can then save using any number of different superimpositions.   We have often saved data in Bookstein Coordinates (BC) when saving in IMP format, as the baseline  endpoints will be at (0,0) and (1,0), making it eas to see column positions of LMs in Excel.   Most IMP software allows you to specify the superimposition to use (recomputing it each time), so the superimposition you save data in is not critical when working with IMP.     MorphoJ output format is also available on the file menu.

The program will output data in a variety of superimpositions when working with tps or IMP file formats.   The IMP format loads easily into programs like Excell, SPSS, etc.  Note that in all the IMP output formats, the last column is always the CS value.   Data labels are at the end of the line, separated by a percent sign (%).

When you use the save data in MorphoJ format (using the File Menu), the data is first placed in a Procrustes Superimposition and then multiplied by the CS so all the interlandmark distances are correct.  Since the centroid size of each specimen is now equal to the measured CS, these data are appropriate for use in MorphoJ, where the program will carry out a Procrustes superimposition.  They are not properly scaled shape data for use elsewhere though, since the Centroid size is not one.

Other Output Files

The option to Save All Pairwise Procrustes Distances on the File menu will save a matrix of all Partial Procruste Distances between all specimens to a file.   The rows and columns of this matrix correspond to the ordinal position of the specimens in the input data file.  This matrix may be used as the input to a Non-metric multidimensional scaling (NMDS) program such as PAST (Hammer, 20XX) or to a Permutation MANOVA program (Anderson 200X).

Detecting Digitization Errors

One of the common errors in digitizing is to swap the order of two landmarks in the order of digitization, a problem which is difficult to see in a Procrustes plot.  Fortunately, errors like this are easy to spot on a PCA plot, since this error will produce a large variance in the data, and result in the specimen with the error being well separated from the others on a PCA plot.  CoordGen has a quick PCA tool, which shows only the first two PCA axes scores of all specimens, with specimens numbered on the plot, as a tool for locating digitizing errors.  This tool is accessed through the “Call Other Tools” menu.  For a more complete PCA tool, see IMP PCAGen7 or tpsRelWarp.

In this example data set of landmarks on piranha, specimen 13 has reversed LM ordering.  It is not particularly visible in the Procrustes Plot of the data.

 

The quick pca immediately shows specimen 13 as an outlier.  

CoordGen will also display single specimens at a time, using the show individual specimen controls.  You can specify the specimen to show by listing it’s ordinal value, and can plot it along with the mean (in Procustes format).  This is intended to allow specimen by specimen examination of your data set.

When specimen 13 (blue) is plotted with the mean (red), with numbered landmarks on each, it becomes clear that the number of landmarks 4 and 5 is reversed on specimen 13 relative to the mean.  The zoom function may be necessary to clearly see LM numbers in this view

Detecting “Noisy” or Difficult to Digitize Landmarks

In some studies and organisms, some landmarks may be difficult to reliably locate and digitize, relative to the other landmarks.   When working with shape data, landmark locations are relative to all other landmarks, not absolute or independent quantities.  This makes it a bit difficult to determine the cause of excess variation or randomness, since variance is a property of the whole set of landmarks, not of each landmark as an individual. 

What we can do is compute the variation in LM position around the mean shape, omitting on LM at at time in the calculation.  When we omit a landmark that is difficult to reliably digitize, we would expect the variance around the mean to decrease, relative to the variation seen when other landmarks are omitted.  CoordGen can do this calculate of variance, systematically omitting landmarks, and then display a list of variances values as shown below for the piranha example above, sorting in order of increasing variance.  This option is available on the Call Other Tools Menu, under the LM Contributions to Variance option. 

The variation in the data set will be reported both as a function of the omitted landmark, and in the form of a histogram.

LM Omitted

Variance

4

0.0027743

5

0.0027828

10

0.0035901

16

0.00363

11

0.0037165

9

0.0037207

3

0.0037374

1

0.0037899

15

0.0038604

14

0.003873

2

0.0038964

13

0.0039513

12

0.0039574

6

0.0042497

8

0.0042787

7

0.0045109

 

 

In this example, rather than an unreliable landmark, the position of LM 4 and 5 was swapped for specimen 13.  This does appear in the listing above as though LMs 4 and 5 were large contributors to the variance (since variance drops drastically when they are omitted).  CoordGen will also plot a histogram of the variance values and the number of occurrences of each value, which one can use to determine if the “noisest” landmark is really different from the other landmarks, or part of a smooth distribution.

In the piranha example, the two variance values on the left side of the plot are not part of a smooth distribution of variance values, rather they are isolated low values, indicating these two landmarks do not have the same distribution of error as the other landmarks do. 

Evidence like this in a real study, and not due to landmark mis-labeling, might be evidence that one or more LMs were unusually hard to accurately digitize.

 

Variance

CoordGen7 will also compute variance and RMS scatter statistics.   Variance is calculated as the summed squared procrustes distances about the mean form (GPA Procrustes mean form) divided by n-1.  The RMS scatter (root mean square) is the square root of this variance, and is a linear measure of typical variability in the data.  The “Calculate Variance” option on the Call Other Tools menu will compute the variance and display it in the file menu.  The “Summary Statistics” option under Call Other Tools will show the variance, the RMS scatter, the specimen farthest from the mean (and the distance for this specimen), as well as the mean centroid size, the standard deviation in centroid size, and the maximum and minimum observed centroid sizes.

 

Procrustes SP results

In working with forensic data, particularly impression evidence such as fingerprints, bitemarks and footwear impressions, it is often desirable not to remove size information from the data.  We have been using a method dubbed Procrustes Size Preserving (SP) in which the specimens are matched using only rotation and translation, not rescaling.

Note that it is also to work in Procrustes Form Space, which is Procrustes shape space with the log of centroid size added as an initial variable.   The relative merits of Procrustes SP and Procrustes Form space have not been studied in great detail.    Procrustes SP was meant for use with forensic data, where any size changes are expected to be small, typically under 5% and rarely reaching 15% variation in size.  In biological studies of growth, substantially higher variability in size is common.   This indicates that Procrustes Form Space is probably preferrable when larger variation in size is common, while Procrustes SP may be suitable for systems with little shape variation.

CoordGen has a “Procrustes SP” menu, which may be used to plot data in Procrustes SP, save data in this format and save the mean of the data in this format.

The “Procrustes SP Variance” option will compute a set of summary statistics in Procrustes SP superimposition,  including the minimum and maximum Procrustes SP distance from the mean form, and the Variance about the mean (summed squared distances divided by n-1) and the RMS scattter about the mean (root mean square variation, or square root of the variance).  The “Both Variances (Summary)’ lists the variance and RMS scatter in both Procrustes and Procrustes SP superimpositions, as well as the standard deviation of the centroid size.

A matrix of all pairwise distances based on the Procrustes SP superimposition may also be saved from this menu.

 

Calling Other Tools

CoordGen7 now incorporates two other pieces of software which had been independent programs in earlier versions of IMP.  These two programs are TMorphGen (Traditional Morphometrics measurement generatior) which can be used to compute length and angle calculations of traditional morphometric variables from landmark coordinates, and SemiLand, which is a tool for Semi-landmark alignment.

When either of these programs is called from CoordGen, any data loaded into CoordGen is passed on to the called program.  CoordGen will remain open.  Separate manuals are included for these two programs.

 

Miscellaneous Buttons and Commands

Clear Axis

This button will remove the axis from the diagram. Some people like axis, some don't!

Copy Image to Clipboard (Not available on Mac)

This button copies the graph onto the windows clipboard, so that it can be pasted from there into other programs. Word seems to be a nice choice for just saving images for later use. I don't know exactly how other software will react. Try it, let me know what happens. You can copy images into drawing programs if necessary. Miriam likes Arts and Letters Express, CorelDraw also seems possible.

On a Mac, you can use the Grab utility to capture images on the screen.

Copy Image to File

This button will allow you to produce  an image output file for editing elsewhere, available formats are tiff, jpeg eps (encapsulated post script), png and svg.

Print Image

Print the image to the default Windows Printer. This uses the default Windows print drivers. It may or not work particularly well, I haven't tested it extensively. I recommend using Copy Image to Clipboard or to File to get good quality prints. This print button is meant as a fast and dirty way to get hardcopies, not for publication quality images.

Baseline Windows

Bookstein Coordinates and SBR both require that you specify the choice of baseline points. The software is set to use the first and seventh landmarks as the baseline, after all this works great for piranha! Fill in your choice of baseline points to be used. The software numbers the landmarks according to their ordinal position in the input file. Use the display buttons (Display BC) to see if you have the landmarks you wanted. You may output files in more than one choice of baselines, this may be helpful in some analyses.

Display Buttons

There are display button for BC, SBR and Procrustes superpositioning methods. This will let you see what the data file looks like. I recommend always saving a BC file, it is the easiest to convert to the other formats later on in your analyses. A BC file can always be loaded back into CoordGen later to generate the other file formats.

The landmark points of the data specimens will be shown in blue, the current reference specimen will be shown in red.

Reference Specification

You will need to specify how to form the reference specimen. The default is to use all specimens to calculate the reference. You may specify that the program is to use the average of the N smallest or the N largest if you prefer. Set the N value by using the N= window. The default N is 5.

Note that the Procrustes Reference generated by this program is not aligned to the principal component axis (PCA) of the mean form as of October 26, 2000. This will be done in later versions of this program. Other software in this software suite does carry out a PCA orientation of the reference. This difference only matters when calculating the partial warp scores using the 2 component uniform model. Don't worry about it at the moment.

Save Data Buttons

The program will save data in any of the superpositions available. I advise looking at the data in a given superposition prior to saving it, just to keep track of what baseline setting and reference specification are in use at the time, since you may well change these while using the program.

The file format button allows files to be output in TPS or X1Y1...CS format as discussed earlier.

The Save BC labeled button saves a BC format file with the comment information (all characters on the same line as LM= in the input TPS file) on the start of each line. This may be a helpful file in keeping track of the ordering of the data sets.

Save Reference Buttons

These buttons save the reference form currently in use. This is helpful if you later want to compute partial warp forms using a common reference.

To compute a grand consensus mean over many data sets, save each one as an X1Y1...CS file in BC, then use a word processor to concatenate them all into one giant X1Y1...CS file, load this into CoordGen and and generate a reference form based on all data.

Or you could generate a mean of all your juvenile specimens, or...you get the idea.

Exit

Exits the program. There is no "Clear" button, successive files replace their predecessor.

Figure Options

This pull down menu has a number of options that let you alter the display, ie symbol sizes, color etc.

 

 

References

 

C. P. Klingenberg. 2011. MorphoJ: an integrated software package for geometric morphometrics. Molecular Ecology Resources 11: 353-357.