Can I share 139cM with a 4th cousin?

Question

Can I share 139cM with a 4th cousin?

5 Answers

Best answer

Hi, S. There's a partial answer buried in my comment to Sean, so I'll just add here that personally I would be highly suspicious of a 4th cousin relationship displaying 139cM at AncestryDNA. In no small part because Ancestry reports only on the autosomes (no xDNA to increase the centiMorgan value as might be the case at 23andMe), and Ancestry uses a proprietary system of what's called imputational phasing and genotyping to attempt to eliminate small segments that have a high probability of being invalid. Bottom line is that most people find two test takers at AncestryDNA will display a somewhat lower overall sharing amount there than if, for example, the sets of raw data are uploaded to GEDmatch and compared there.

At 139cM, my guess is that it's more likely to be along the lines of a 2nd cousin 1x removed or even a half 2nd cousin. Just a guess, mind you.

Meiosis and recombination isn't entirely random, but extremely uneven inheritance distribution is rare. If it wasn't, we wouldn't be able, as a species, to maintain adequate genetic diversity. There would be instances of massive chunks of DNA being passed down intact generation over generation

Pedigree collapse will have a distortive effect in populations that are actively endogamous, but even in those individuals the inheritance impact decreases rapidly once the pedigree collapse stops. With each generation that introduces new, exogamous genetic material, the effect quickly lessens in the subsequent offspring.

For example, to be 4th cousins you would have to share a pair of 3rd great-grandparents. If you were double 4th cousins, the typical scenario is that you would share two pairs of 3g-grandparents. So instead of having 32 3g-grandparents, you would have only 28. But the theoretical average autosomal DNA sharing for 4th cousins is 0.1953%, call it 13 or 14cM. If you were double 4th cousins, the expected sharing would only go up by that same 0.1953%...so you would share around 28cM instead of 14cM. You can see how having pedigree collapse at the 3g-grandparent level has, if there is no further consanguinity down the line, very little effect five generations later.

Another example, because this is one I have in my tree. Let's say you and your cousin have one pair of 3g-grandparents in common, and one pair of 2g-grandparents. I don't know of any standard genealogical term for this, so I simply refer to it as 3C/4C, or 3rd Cousin/4th Cousin. The result is that you're 4th cousins via one inheritance pathway, and 3rd cousins via another; you share 12 2g-grandparents instead of 16, and as above, 28 3g-grandparents instead of 32. Again, we can estimate the expected sharing by simple addition without having to calculate the Coefficient of Relationship. The 4th cousin expected sharing is 0.1953%, and for 3rd cousins it's 0.7813%. Added together we get 0.9766%, and if we convert that to an estimation of centiMorgans, it's about 67 or 70cM, give or take a few. So even as a combined 3C/4C you'd still be only at about half the expected sharing amount that 139cM indicates. Even double 3rd cousins would be looking at somewhere around 110cM.

Is it possible? It isn't impossible...little is. But even using the Shared cM Project data, at a 95% confidence interval the range for 4th cousins would top out at 83cM.

answered Aug 24, 2021 by Edison Williams G2G6 Pilot (449k points)
selected Aug 25, 2021 by S Levett

Thanks for the best answer star! As you work through the possible scenarios to see if there might be a closer relationship than the paper trail currently shows, one utility that might be helpful is hosted at DNA Painter: https://dnapainter.com/tools/wato. WATO is an acronym for "What are the odds?"; it uses probability calculations developed by Leah Larkin and data from the Shared cM Project...so, yes, it is only constrained to the broad ~99% confidence intervals. It can help evaluate different hypotheses about how paper-trail relationships align with reported DNA sharing. You can do the same things manually, of course, but this may speed things up a bit. One significant drawback is that it has no capability to include complex, collateral relationships resulting from pedigree collapse like the 3C/4C I have in my tree.

Good luck!

commented Aug 25, 2021 by Edison Williams G2G6 Pilot (449k points)

Answer 1 · 2021-08-24T03:20:23+0000

Hi, Sean. Just a comment that the Shared cM Project does not represent confirmed shared DNA or confirmed relationships. They are crowd-sourced data and Blaine has no way of verifying the information provided. See the section of the ScP PDF report headed "Possible issues with user-provided data" for a partial explanation.

He does take some steps to try to eliminate the most obvious mistakes in submitted data; see the sections titled "Initial Data Curation" and "Outlier Removal." This is a brute force, but reasonably effective, way to arrive at about a 99% confidence interval.

However, also note that the resultant information for more distant relationships is known to be heavily skewed toward larger sharing amounts than is realistic; see the "Data Analysis" section where he writes, "For relationships where the minimum value was 0 cM shared, the averages were calculated only for cM amounts greater than 0 cM. Accordingly, these averages represent the average only for cousins sharing a detectable amount of DNA."

Crowd-sourced data functions poorly where zero is a valid value: if you don't know about it, you can't report it. For probabilities of relationships having no detectable shared DNA see Amy Williams, Cornell University, Brenna Henn, et al., and ISOGG. Blaine extends his datasets to 10 groupings, down to 4C1R. At that level approximately 71% of the time the individuals would be expected to show no meaningful shared DNA; 4th cousins would be expected to show no shared DNA about 53% of the time.

Beyond those adjustments, Blaine evaluates the raw data as submitted; there is no curve fitting, Poisson distribution projections, or further delimiting. He did provide estimated standard deviation values with the v4.0 (March 2020) report, and I took the opportunity to use those values to extrapolate information for a chart displaying the information based on 95% and 68% confidence intervals. Tightening Blaine's data down to a 68% CI, 4th cousins would fall within a range of 0cM to 59cM. That isn't necessarily more accurate than the Shared cM Project's original averages and ranges, but personally I almost never consult the ScP's values for anything beyond Grouping #8. The under-representation of zero-value data is, in my mind, simply too great to render the numbers in Groupings #9 and #10 usable. To me it's better to baseline them using the theoretical average sharing as calculated by the Coefficient of Relationship. When we get down to Blaine's Group #8, the calculated standard deviation already exceeds the expected average value, which should send up red flags.

commented Aug 24, 2021 by Edison Williams G2G6 Pilot (449k points)

Edison, I apologize for my poor wording, I should have said self-confirmed or self-reported. However, I believe that those who are reporting data to Blaine are honest and also generally do diligent genealogical work. For the record, I don't use Blaine's chart, nor have I submitted any DNA data to his project

That said, the original question was is 139cM possible and according to Blaine's data at least one person reported having that much DNA with a 4th cousin. Both the question and my answer have nothing to do with the average amount of shared DNA between 4th cousins. By the way, I think Blaine trying to report real values and not just theoretical values is a good thing, despite not using them myself.

Of course, it is also possible that they have another as yet unknown relationship that is increasing the shared DNA or maybe they do have a closer relationship. I don't know how accurate S's tree is, just like the trees for the user-submitted data to Blaine, but I'm willing to assume it is accurate.

Also, where does Ancestry get there relationship percentages from? I don't know and I would like to.

commented Aug 24, 2021 by Sean Pickett G2G6 Mach 1 (10.9k points)

I made an language error, too. While my comment about zero-value crowd-source reporting is valid, the real exacerbating issue at more distant cousinships is a corollary, that the "crowd" will seldom report a relationship where no DNA match has been reported by the testing/comparison companies: you don't know what you don't know. For example, last year AncestryDNA changed its matching threshold from 6cM to 8cM. Similarly, FTDNA recently stopped showing very small segments, no doubt because those small segments in the chromosome browser confused more people than they helped...and it conserves CPU cycles.

As the relationships become more distant and the expected sharing amounts decrease, the closer those amounts move nearer the reporting companies' minimum thresholds, and the more the crowd-sourced values can greatly skew the ranges.

I have less faith than you do in the average contributor to the crowd sourced information. Some comments from a few years ago here. With the low-end of the centiMorgan ranges being underreported by the very nature of the beast, even a single in-error value at the ridiculously high end will distort the data. In his data summary, Blaine himself said that he manually caught and removed obvious errors in contributions like the longest segment showing a greater cM value than the total amount of sharing reported; that some submitted relationship descriptions were indecipherable; that there were instances of text and no numerals being entered for cM values; that he had at least one submission indicating a 7cM total sharing for a parent/child relationship. He also writes, "Some relationships were almost certainly entered incorrectly, which might be due to misunderstandings of 'removed' relationships in genealogy. Other relationship errors were clearly due to misattributed parentage events resulting in the believed relationship being incorrect."

So, no. Pareto's law: only about 20% of the people using DNA for genealogy will have more than a rudimentary knowledge of it. The Shared cM Project is open to anyone for contributions--I've provided upwards of 50 entries myself--but no one has to show bona fides to indicate they know what they're doing. And a single contributor who mistakes a 2C1R for a 4C is making an error on the order of an 800% magnitude.

Too, any evaluation of cousinship distance based on DNA ultimately comes down to probabilities. Probabilities can't be estimated unless some baseline or benchmark has been established. So in that regard I'd say that every question about autosomal DNA sharing amounts involves some median, average, baseline, or benchmark. We can't really avoid it. Hyperbole, but until the 1950s, the Rapa Nui peoples of Easter Island were the most endogamous society on earth; they were simply too isolated geographically to have a diverse genetic pool. That all began to change with commercial air travel, but prior to that if a single Rapa Nui had shown a verifiable 4th cousin relationship that displayed 140cM of DNA sharing due to generations upon generations of pedigree collapse, that wouldn't be of useful or practical value as a benchmark for anywhere else among the world's 7.8 billion people. Is such a result possible? Sure enough. But the odds in the general population would be astronomically, and I'll bet a Benjamin Franklin that you could split Blaine's histogram for Grouping #9 right down the middle of the chart, throw out everything on the right-hand side, and arrive a more realistic value range.

AncestryDNA uses a hybrid approach to relationship estimation once the segment processing has gone through BEAGLE and Timber. They use both actual data derived from "thousands of pairs of individuals with known family relationships," but they rely more heavily on a common bioinformatics process of generating vast numbers of simulated individuals with controlled pedigrees in order to see what the distributions are like. For this they use the catchy term "in silico," meaning experimentation performed computationally (i.e., using silicon chips) as opposed to in vitro or in vivo. They incorporate no pedigree collapse into the modeling: all pairs of individuals in the simulations share exactly two ancestors or no ancestors. From those data they define IBD (identical by descent) intervals that correspond to the maximum-probability relationship estimates; they use full cousinship levels only, no half or 1x removed.

There are additional steps after that to better ensure that closer relatives (with greater than 90cM shared) aren't impacted by computational phasing errors from the BEAGLE procedure, and to evaluate "IBD2," or places where segments are shared on both sides of the family (essentially FIR matches in addition to HIR). The number of shared segments in addition to the amount of sharing also factors in (but at the level of 5C and beyond more than a singleton shared segment is unusual). The graph they present in their white paper, which is intended to be only an example, goes only as far as nine meioses, or birth events, equivalent to a 3C1R, and shows a range of 40-75cM, but they note that only IBD greater than or equal to 40cM is displayed. A step above that--at 8 meioses, or equivalent to 3rd cousins and Blaine's Grouping #7--the range shown is 75-90cM (my extrapolation of Blaine's data there shows, at a 68% CI, a min/max of 29/115, so pretty close).

Beyond a general description of the process and those broad ranges, I don't believe that AncestryDNA provides any additional insight into their actual data and operations.

commented Aug 24, 2021 by Edison Williams G2G6 Pilot (449k points)

Answer 2 · 2021-08-24T07:03:46+0000

Hello,

According to “dna painter”, the probability of 4Cs sharing 139cM is about 2% (99% confidence level). However, this assumes no endogamy or pedigree collapse. Other relationships are possible. https://dnapainter.com/tools/sharedcmv4/139. As others have pointed out above, these predictions are based on crowd sourced data.

Answer 3 · 2021-08-27T10:00:04+0000

I maintain my own database of how many cM vs relationship, from various tests I have access to. The advantage is that I'm pretty confident that that the relationships are correct, and I can go back and re-check any that are brought into question. The disadvantage is that I only have data from a handful of tests to work with.

I have 179 DNA matches at the 4C level (a few of them are half-3C1R or 3C2R, relations that have the same average amout of shared DNA). The highest one is 69cM, for a 3C2R. The next highest is 67cM, for a 4C.

For 3C1R, I have 147 of those, and the highest is 76cM. The next highest is 70cM, and the next highest after that is 55cM. Only 9 of them are 50cM or higher.

That being said, I definitely have seen evidence of some freakishly long DNA segments showing up out there, once in a great while. My brother has a segment that, IIRC, is 71cM, in common with a relative in our paternal line who I can't even connect up with - he's a LEAST a 6th cousin.

Our own Julie Kelts - who has commented on this very question - has (again, IIRC) a 4C that is in excess of 100cM - I'm surprised she didn't bring that up. I think her case is like this one, where more than one relative on that side shows a freakishly-high DNA match.

DNA matching is about chromosomes being chopped up as they get passed down. Once in a great while a huge segment manages to survive many generations with being chopped up. Probably everybody has SOME of these - the unusual thing is when that distant relative happens to have the SAME large segment, from the same ancestor.

DNA gets chopped up significantly less when it is handed down from males, BTW. So this could be more likely to occur when the ancestral lines have mostly men in them.

Answer 4 · 2024-05-03T23:18:36+0000

That may not be just a 3rd cousin. My initial guess is that the two of you are biologically related in more than one way. Not only have I never seen 240cM shared between 3C, but the number of shared segments is also an outlier for the relationship.

Third cousins will share greater than or equal to 5 distinct segments only 12.3% of the time. Research by Brenna Henn, et al., found that 12 shared segments was just about the median value for 2nd cousins while the median for 3rd cousins was 2 to 3 segments; two-thirds of 3rd cousins share 3 segments or fewer.

The averages can be undone by additional genetic pathways, but otherwise the odds are strongly against it. It would mean that one of your 16 2g-grandparents contributed far more than the average 6.25% to both your and your cousin's genomes, as in over double that amount. Conversely, if the other averages hold, it would then mean that 2 of your 16 2g-grandparents contributed no DNA to you or your cousin...because it all has to work out to 100%.

Some work by Dr. Graham Coop at U.C. Davis indicates that the odds are pretty nonexistent that that would happen; the probability of inheriting no DNA from a given ancestor doesn't start to manifest itself until 5 generations back, or 3g-grandparents, and doesn't become statistically significant until 6 generations.

Just off-the-cuff observations, but I believe it would be worthwhile to go back at least to your and your cousin's 5g-grandparents and carefully document every ancestor at every generation. It may be that at least one of them is going to be in the tree twice.

commented May 4 by Edison Williams G2G6 Pilot (449k points)

Categories

Can I share 139cM with a 4th cousin?

Please log in or register to add a comment.

Please log in or register to answer this question.

5 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions