Chromosome Matrix App (CMA)
This article focuses on how to operate the CMA Tool and viewing the HTML and Excel cluster maps it creates.
Overview
The CMA Tool uses a matrix to show your matches in clustered groups sorted by chromosome and segment address. CMA requires match, ICW, and chromosome data be gathered with the DNAGedcom Client. FTDNA, 23andme, Gedmatch MyHeritage data are currently supported. CMA can analyze data in the database or, alternatively, CSV files reported by the Client. CMA creates two kinds of output: HTML and Excel formatted matrices.
The HTML output includes a side by side chromosome browser view of segments and live links to trees associated with DNA profiles on the vendor sites. The Excel version provides an editable view of all matches with cluster color-coding sorted by chromosome and segment address.
Similar to CLA, CMA clusters consist of matches whose members share a common line of descent within a few generations, often even the same ancestors. Membership in clusters is based on matches sharing other matches in common (ICW) with your selected kit. (Although, triangulation can sometimes be inferred, clusters are not based on triangulation.)
Special Features
Unlike CLM and other matrix tools that create a single set of clusters for the kit being studied, CMA clusters ICW matches sharing segments on the same chromosome (s).
CMA HTML enables you to dynamically toggle between two views:
- Chromosome browser view: matches are sorted in order by the beginning location of shared segments
- Cluster View: matches are sorted by cluster membership and then arrayed in order by cluster member chromosome address
The presence of chromosome segment data gives CMA a depth of detail that can take you even further in your genetic genealogy-based family research projects The presence of segment data can help reveal:
- why matches from a known ancestral line do or do not cluster with other descendants from the same line.
- whether a cluster or a specific match is more likely maternal vs paternal.
- Localize chromosome specific patterns of interrelated matches suggesting endogamy or pedigree collapse on specific ancestral lines
CMA Setting Summary
CMA Settings |
|
Setting |
Purpose |
Kit and Match Selection |
|
Prefilter list of kits in the database |
|
DNA Kit and CMA Source option |
Selects the target kit for clustering |
Sets the kit data source to the DNAGedcom Client database |
|
Sets the kit data source to previously output CSV files |
|
Selects matches based on total cM shared |
|
Mn SNPs |
Selects matches based on criteria for cluster membership |
Chromosome |
Filters matches for matching surnames |
Include Non-ICW Segments |
Allows other relationships for cluster members to be shown |
Automatically opens the saved HTML output on completion |
*See Tagging in Genetic.Family Help.
CMA Setup Details
Basic Options:
Kit Filter
Typing a name or part of a name in this field will limit the kits listed in DNA Kit dropdown to only those kits that have a name containing that input.
For example, to find all kits for John Collins, you may type the full name or just “John”
Select DNA Kit and Data Source
The DNA Kit drop down lists all the kits in your database previously gathered with the DNAGedcom Client. CMA supports MyHeritage, FTDNA, 23andme and Gedmatch kits
After you select a kit, there are 2 methods for clustering your matches: DNA Kit vs Match File/Chromosome/ICW File. you can cluster matches listed in a Match/chromosome/ICW CSV file combination created from the database by the Client instead of clustering the matches as listed in the database. A file selection form will appear automatically when you select an available set of CSV files listed at the bottom of the DNA Kit dropdown list.
For advanced users, the CSV option allow users to cluster specially edited versions of input files.
Match File/ICW File Method
Using a Match and ICW file for a kit instead of the data in the database has a few advantages:
- You can modify the Match or ICW files to eliminate unwanted matches. Note: You only need to remove it from one file or the other to be excluded (but both is fine, too) For example, if you have previously tagged all the maternal matches in an AncestryDNA kit, you could include, or exclude matches based on that tag in order to focus on only maternal or paternal matches.
- You can save your Match and ICW files before doing a gather, to get a “snapshot” of your current data to be able to run again in the future to compare against the latest results.
cM Range
Enter this value to sets the upper and lower limit for total cM of selected matches when you use the DNA Kit option or the Match/ICW option
The default value is 50 to 400 cM but try moving it up or down to target matches in certain relationship ranges. The wider the range the longer the clustering process will take.
Different ranges will yield different results for different kits, so running it for various ranges may offer more insight. The output files will include the range values at the end of output file names.
Min SNPs
This dropdown lets you increase or decrease the density of SNPs required for selection of a match case. Chromosomes vary in their average density and length. The default 500 is set an average likely to produce matches regardless of chromosome. However, SNP criteria for matches vary between 500 and 700 from vendor to vendor. Adjusting SNP density also allows you more precise control when screening matches on specific chromosomes with regions known to be more or less dense than average.
Chromosome
This dropdown menu setting allows you cluster all chromosomes versus a single chromosome of your choice.
Include Non-ICW Segments
If selected, those matches that do not fit into any cluster will also paint. This can be helpful, because the cross-cluster matches are still drawn, so you can see partial relations. The density of unclustered matches can sometimes indicate endogamy or, more recently, pedigree collapse within a tree or simply other connections among families over time.
Open HTML When Done
If selected, the HTML file (saved to the default database directory) will automatically open in your default browser.
- Those who match tags and have some cross-clusters will be even more likely.
CMA Output
CMA generates two versions of the Cluster Report. The primary output of the CMA is an HTML file, that is automatically displayed in your default browser and saved in the Db folder displayed on the Settings page. An Excel file also generated in the same folder.
Reading the HTML Cluster Report
Viewing your clusters
HTML format cluster maps are drawn with filled in squares, each representing a match between two people. Clusters, generally, appear whenever groups of people match each other and with the person whose kit is being analyzed. Mousing over any colored square will show the kit names of the two matches.
You will notice two types of color-coded squares:
- Solid color squares: Matches are assigned colors by rows. Hover over a colored square to see both kit names. Solid color squares for matches meeting cluster criteria are assigned cluster numbers
- Black squares: The diagonal (solid black squares) provides a reference point for more easily locating a kit in the left column and the same person in the top row
Clusters consist of a set of numbered matching solid color squares and are surrounded by a black line. Because membership in clusters is based on matches sharing other matches in common (ICW) with your selected kit, clusters indicate a shared line of descent. Often, although not necessarily, cluster members share the same ancestors within that line.
Colored squares without numbers that fall between two adjacent clusters indicate a supercluster. CMA Superclusters consist set of clustered matches sharing segments in close proximity on the same chromosome indicating that larger region of that side of the chromosome is coming from the same general line.
It is essential to remember while CMA cluster maps provide valuable clues to shared descent, they are not proof per se of specific ancestors, although it is usually the case that cluster members are descendants of the same individual. Although superclustered matches share the ancestry from the same broad line, their most recent common ancestors (MRCA) may differ from cluster to cluster within the supercluster.
If you have a live internet connection and click a cluster, a popup menu will appear enabling you to see a list of cluster members or Chromosome Browser details (excluding Ancestry).
The HTML Cluster report allows you to array all matches by segment or, by toggling cluster, show only cluster members as in the example below where unclustered matches are removed after clicking Toggle Cluster (top center of graph)
Match Details
Match details appear in the leftmost columns and in the last column on the right where a chromosome browser shows segments visually. Details include
- A tree icon linked to a publicly shared tree, if available and linked to the DNA kit. Unlinked trees may exist but are not indicated.
- Segment details including start, end, cM and SNPS
- The Match’s kit name – If available, clicking the name will open the Match page in a new tab.
- The color code assigned to the Match’s cluster
Reading the Excel Cluster Report
The Excel Report is a workbook with a worksheet for each chromosome that can be selected by clicking either the numbered Chromosome tabs along the bottom of the Excel window. Worksheets show the same data as the unclustered HTML view.
Unlike the HTML format, Excel data can be easily annotated, highlighted, or extracted as data. For example, where white rows alternate with color coded rows, a closer look at segment details provides clues to which side of the tree matches may becoming from since segments overlapping by 7 CM or more would be likely to be ICW matches and therefore cluster IF they are coming from the same ancestral line as the clusterable matches.
Some important items to note:
- The colors of the clusters are arbitrary! They were chosen to visually distinguish matches from each other. They repeat after the 10, so even if they are the same color, they don’t belong in the same cluster if they don’t touch!
- Although a cluster indicates a line of descent in common, the same line may have several clusters,
- The Cross-Cluster Matches (where there is strong indication a match also shares matches with another cluster) are sometimes as important as the clusters themselves. Treat them as clues for how the clusters are related.