Collins Leeds Method (CLM)
The Collins Leeds Method (CLM) Tool uses a grid (often called a matrix) to show your matches arrayed in groups (clusters). Membership in clusters is based on matches sharing other matches in common (ICW) with your selected kit. Cluster members share a common line of descent within a few generations, often even the same ancestors. You may optionally show ICW matches that do not meet the criteria you have set for inclusion in clusters.
FTDNA, 23andme and MyHeritage data are currently supported. Other sources are in development. CLM can operate on the database or CSV files output by the Client.
This article focuses on how to operate the CLM Tool and how to read the HTML and Excel cluster maps it creates. Consult our <resource tba> for examples of strategies for setting up and interpreting cluster graphs.
The CLM tool setup screen has two levels of features: basic and advanced (highlighted below in yellow). If your subscription is limited to DNAGedcom, you will see only the basic features. The advanced features are only displayed if you have enabled the Genetic.Family Bridge on the Options page of the Client and are logged into the Client with a Genetic.Family account.
|Kit and Match Selection|
|Kit Filter||Prefilter list of kits in the database|
|DNA Kit||Selects the target kit for clustering|
|DNA Kit||Sets the kit data source to the DNAGedcom Client database|
|Match/ICW files||Sets the kit data source to previously output CSV files|
|CM Range||Selects matches based on total cM shared|
|Inclusion Threshold||Selects matches based on criteria for cluster membership|
|Surname List||Filters matches for matching surnames|
|Matches||Determines how matches appear within clusters|
|Clusters||Determines the order clusters are arrayed|
|Unclustered matches||Allows other relationships for cluster members to be shown|
|Painted Midline||Creates a diagonal midline to make match pairs easier to locale|
|Open HTML||Automatically opens the saved HTML output on completion|
|Custom Tags||Includes or excludes based on their associated match or tree-based tags|
|Tag Weighting||Groups tags within superclusters based on weighting criteria|
*See Tagging in Genetic.Family Help.
CLM Setup Details
Typing a name or part of a name in this field will limit the kits listed in DNA Kit dropdown to only those kits that have a name containing that input.
For example, to find all kits for John Collins, you may type the full name or just “John”
After you select a kit, there are 2 methods for clustering your matches: DNA Kit vs Match File/ICW File.
DNA Kit versus Output Files Selection
The DNA Kit drop down lists all the kits in your database previously gathered with the DNAGedcom Client. You can currently cluster any AncestryDNA, MyHeritage or FTDNA kit in your database. Alternatively, you can cluster a set of Match/ICW CSV output files.
Kits in the database can be distinguished from files created from the database this way:
Database Kit: (FTDNA) John Doe
Files: FTDNA match/ICW files
A file selection form will appear automatically when you select an available Match/ICW File option listed at the bottom of the DNA Kit dropdown list.
Using a Match and ICW file for a kit instead of the data in the database has a few advantages:
- You can modify the Match or ICW files to eliminate unwanted matches. Note: You only need to remove it from one file or the other to be excluded (but both is fine, too) For example, if you have previously tagged all the maternal matches in an AncestryDNA kit, you could include, or exclude matches based on that tag in order to focus on only maternal or paternal matches.
You can save your Match and ICW files before doing a gather, to get a “snapshot” of your current data to be able to run again in the future to compare against the latest results.
TIP: Because the Client overwrites output files in its default folder, if you want to cluster different versions of Match/ICW CSV files for the same kit, you will need to store them in a separate directory.
Enter this value to sets the upper and lower limit for total cM of selected matches when you use the DNA Kit option or the Match/ICW option
The default value is 50 to 400 cM but try moving it up or down to target matches in certain relationship ranges. The wider the range the longer the clustering process will take.
Different ranges will yield different results for different kits, so running it for various ranges may offer more insight. The output files will include the range values at the end of output file names.
Filter matches for matching surname(s) by inserting whole or partial surname strings separated by commas.
This threshold determines how many people within a set of matches a person must match to be included in a cluster group
By default, it’s 1/2, meaning everybody in the cluster matches at least half of the other people. Try setting it to 2/3 to get more but tighter clusters.
The order matches are listed within each cluster
- By Inclusion – those who match the most kits within the cluster will be in the upper-left corner.
- By cM – those who match the primary kit by the highest cM will be in the upper-left corner.
The default is By Inclusion. Switching to By cM will show matching kits in descending order based on total cM.
There are two types of clusters: full clusters (solid color square) and superclusters (a set of full clusters sharing a significant number of cross-cluster matches bounded by light border spanning the clusters).
(The order in which the clusters appear on the chart diagonal.
- By Size – The clusters will appear from largest to smallest
- By cM – The clusters will appear in order of the maximum cM match to the primary kit.
Within clusters, matches can be sorted
- By Size, Superclustered – matches arrayed in descending order of how may kits matched. As each cluster is placed, others that have a significant number of cross-cluster matches will appear near it within the lighter colored boundary of the supercluster
- By cM, Superclustered – matches arrayed in descending order based on total cM shared with the target, but as each cluster is placed, others that have a significant number of cross-cluster matches will appear near it within the lighter colored boundary of the supercluster
Include Unclustered Matches
If selected, those matches that do not fit into any cluster will also paint. This can be helpful, because the cross-cluster matches are still drawn, so you can see partial relations. The density of unclustered matches can sometimes indicate endogamy or, more recently, pedigree collapse within a tree or simply other connections among families over time.
Try it out with one of the Superclustered options under Cluster Sort to see where they fall on the chart.
If selected, self-matches will be painted black both in the Excel and HTML pages.
This can be helpful when reordering clusters in Excel to make sure you get the whole cluster.
Open HTML When Done
If selected, the HTML file (saved to the default database directory) will automatically open in your default browser.
These options require a Genetic.Family account and the local web service be started with the Enable button on the DNAGedcom Client Settings Page. They are otherwise invisible.
If you choose to Include Tags, only the matches which have been Tagged will appear in the list.
If you do not select any Tags, then nothing will be excluded. If you pick all tags, those that have not been tagged at all will not appear, so be careful when choosing your tags.
If you choose to Exclude Tags, then matches which have the selected tags will be excluded from the match, even if they are in the Included tags section.
Tag Grouping Options
Selecting one of these options will modify how Clusters or Superclusters are built.
- Weight Tags Cluster – This will adjust the weight a match has when determining inclusion. For example, if a Cluster has 9 people, a Match who matches 4 within it would be excluded under 1/2 inclusion. However, if some of those 4 are tagged with the same tag as the Match, it could be included.
- Force Tags Clustering – When building a Cluster, this will force in any single Match that matches someone within that cluster and shares a tag.
- Supercluster Tags – When building Superclusters, Clusters that share common tags will be weighted closer together, in addition to the usual cross-cluster comparison. Those who match tags and have some cross-clusters will be even more likely.
CLM generates two versions of the Cluster Report. The primary output of the CLM is an HTML file, that is automatically displayed in your default browser and saved in the Db folder displayed on the Settings page. An Excel file also generated in the same folder.
Reading the HTML Cluster Report
Here is an example of HTML output (with paint midline off):
Viewing your clusters
HTML format cluster maps are drawn with filled in squares, each representing a match between two people. Clusters, generally, appear whenever groups of people match each other and with the person whose kit is being analyzed. Mousing over any colored square will show the kit names of the two matches.
You will notice three types of color-coded squares (or four, if you opted to show the diagonal).
- Solid color squares indicate a match between two members of the same cluster
- Pale squares with two colors indicate matches between members of the two corresponding solid colored clusters. Bi-colored squares typically signal the presence of supercluster (a set of closely related clusters)
- Pale squares showing grey plus one pale color indicate a match between a cluster member and another kit
Opting to color the diagonal (solid black squares) provides a reference point for more easily locating a kit in the left column and the top row
Note: Squares with a small green leaf in indicates ancestors in common
Clusters consist of a set of matching solid color squares and are surrounded by a black line. Because membership in clusters is based on matches sharing other matches in common (ICW) with your selected kit, clusters indicate a shared line of descent. Often, although not necessarily, cluster members share the same ancestors within that line.
Superclusters are sets of adjacent, related clusters with matches in common between two or more of the clusters. They are surrounded by a dark grey line. Superclusters show a set of clusters sharing the same ore closely related lines of descent.
It is essential to remember while CLM cluster maps provide valuable clues to shared descent, they are not proof per se of specific ancestors. For guidance in interpreting clusters, we recommend studying <????? URL to list Resources>
If you have a live internet connection and click a cluster, a popup menu will appear enabling you to see a list of cluster members or Chromosome Browser details (excluding Ancestry).
Following the horizontal line back to the match listing in the leftmost columns will lead you to the following information for matching kits
- A link to a publicly shared tree, if available and linked to the DNA kit. Private trees have a red line drawn through them. Unlinked trees may exist but are not indicated.
- The total Shared cM between the match and the kit being analyzed
- The Match’s kit name – If available, clicking the name will open the Match page in a new tab.
- The color code assigned to the Match’s cluster
Viewing Genetic.Family Integration
If the DNAGedcom Client is running and the Genetic.Family Bridge is enabled, you can also click a cluster to:
- view cluster member details
- add tags to cluster members
- filter your CLM match list by the tags you create
- display Chromosome Browser details
For more information about tagging, see CLM Tagging
To view Genetic.Family match details, click on any cluster and choose View Cluster to display a popup list of its members.
From here, you can:
- Tag people or clusters with symbols and colors or tree tags
View the following info for each person:
- ? – a link to the Match’s Tree
- Shared cM
- Name (This is a link to the match itself)
- A list of tags
- Tag Add/Remove buttons
Reading the Excel Cluster Report
The Excel Report is a workbook with three tabbed worksheets that are selected by clicking either the Chart, Data, or Ancestors tab at the bottom (left) of the Excel window.
Unlike the HTML format, Excel data can be easily annotated, highlighted, or extracted as data. Here is a helpful Blog post from Dana Leeds on how to use the Excel file to label and organize your results.
The Chart Tab
This worksheet (seen in the view above) is where you will find in Excel format the same clusters depicted in the HTML format report. They are also drawn with a dark black border around each cluster and a gray border around each Supercluster. However, all matches outside of clusters are depicted with a gray box instead of bi-colored boxes. A grey box indicates a match between two people not within the same cluster, indicating at least a possible link between clusters. Grey boxes within superclusters are evidence of links between clusters.
Some important items to note:
- The colors of the clusters are arbitrary! They repeat after the 10, so even if they are the same color, they don’t belong in the same cluster if they don’t touch!
- Although a cluster indicates a line of descent in common, the same line may be represented in several clusters, not necessarily members of superclusters.
- The Cross-Cluster matches (inside or outside of superclusters) are sometimes as important as the clusters themselves. Treat them as possible clues to how the clusters are related. However, cross cluster matches based on ICW relationships do not always signal the presence of shared ancestors.
The Data Tab
This worksheet provides a list of clustered matches and includes:
- Cluster number
- cM (in relation to the kit being analyzed)
- A link to the Match Page on the vendor’s website
- A link to the Match’s Tree page, if available
The Ancestor Tab
This worksheet provides a list of specific ancestors appearing in the trees of matches associated with clusters.