Cahier de Généalogie

Shapes in the Great Tree


The tree, with its branches and roots, is an old, powerful and pervasise metaphor in genealogy. But traditional genealogical trees, even the largest and deepest ones, are only very partial and often biased local maps, spanning a handful of recent human generations, a tiny part of a long human history. And humanity itself is only a short and very recent branch of the long evolution of life on Earth, a continuous process spanning billions of years.

Large genealogical data bases such as WikiTree interconnect tens of millions of profiles from all over the world. Applications and tools have been developed which allow to locally explore this huge family network, monitor its growth and expansion, compute distances and shorter paths, discover the patterns of clustering and crossovers. But we can't yet really figure the global geometry of the genealogical network connecting all humans, and beyond them, the shape of the Great Tree of life to which the ancient Chinese words of wisdom seem to apply : the Great Image has no shape.

The Tree of Life is not a tree

We are all cousins. Not only all of us humans, but all of us living creatures on this planet. At least that's the main lesson to bring home from any introduction to phylogenetics, even if many details of the story are still unclear. Somewhere on Earth, billions of years ago, there lived some of our monocellular common ancestors, and part of their genetic code has been copied over and over through a number of generations which defy our imagination.

The story of evolution has not been as simple as the nice and highly graphic Tree of Life proposed by Evogeneao could suggest. Descending lines have kept separating, evolving in niches, and re-crossing each other. And this has been true at all times and scales of the evolution, across and inside species, including ours. The more we learn about the origins of the human family, based both on archeological findings and genetic studies, the more it becomes obvious that we have to forget about some unique forking point in the evolution from which humanity would have started. Different tribes left their ancestral dwellings in Africa at different points in time, migrated all over the world, separated for tens of thousands of years, evolve differently, meet again to either fight each other over territories and resources, or ally and mate if they could - and probably both, most of the time. And this story has happened again and again until today, through successive migrations and invasions, and is likely to carry on forever while there is life on this planet.

Our family stories follow, at smaller scale, similar patterns of separation and reconnection. Genealogists call "pedigree collapse" - as if it was a catastrophic event - or "endogamy" - as if it was a shameful disease of populations closed on themselves by geography, language, religion, social class - a general rule, consequence of a simple mathematical fact : ancestors branches necessarily cross each other at some point of time. If we don't see it in our particular branches, it's just because we can't track them far enough back in time. If we could find out all our ancestors over just thirty generations (a very small part of the whole human story), it would became obvious to anyone. Thirty generations without pedigree collapse would mean a number of ancestors passing one billion somewhere in the Middle Ages, exceeding by far the world population at that time. More on this, with mathematical details, can be found in this page (in French).

Connecting trees
They can't see the Tree for the branches

Of the above state of affairs, many newcomers to genealogy, and even more ancient ones, don't seem to be fully aware. Or maybe they don't buy the narrative built up by centuries of scientific enquiry. In any case, they have hard time understanding, and accepting, what a Single Tree approach to genealogy is all about. Aware of this difficulty, WikiTree communication tries hard to shift the user's perspective from "my tree" to "our tree", but such efforts are unfortunately often misunderstood.

"A tree we all share" can be mistaken as only a simple change from an individual to a collaborative management of the user's genealogical tree, which is indeed an important aspect of the game, and not the most obvious one to accept. Conflicts will happen, conversation and mediation will be needed, since "my tree" is now open not only to "my family", but also to unknown distant cousins, or even total strangers, called "data doctors", "connectors", "sourcerers", all self-authorized to mingle with "my tree" for sake of obscure concepts such as data quality and accuracy, or graph connectivity. Those social management issues can hide the main point, that there is no such a thing as "my tree", "my family", neatly cut from the rest of the world, and that building a single interconnected family network is all the project is about. We are not planting a dumb forest where each tree would live a separate life. To keep the metaphor rolling, it's important for newcomers to understand that the family data they will bring and manage represent just small branches that will really come to life and make sense when connected, through as many paths as possible, to the Single Tree.

In order to bring newcomers to understand that crucial point, a simple and somehow obvious claim is made by WikiTree in its introduction to the Single Tree (see above link) : "We are all related". Such a claim can in fact be counter-productive. The page uses both "related" and "connected" without clearly explaining the difference, and goes on focusing on common ancestors. "You only have to go back a few hundred years before many of us start to share common ancestors." True again, but "many of us" does not mean "all of us", and "a few" can still be too much for many (most) of us. It is completely correct that all living humans have common ancestors (see above). But it's equally true that identifying and naming a single one of those with a reasonable level of confidence, for two random humans living today, is all but obvious. More often than not, the documented ancestors won't go backwards in time far enough. Supposing both of us, dear reader, have European origins, it's pretty sure that many of our common ancestors were living around 1200, but the rare branches we both can follow this far away, if any, will represent not only a ridiculously small part of our millions of ancestors at that time, but will be known with a poor level of confidence, given the number of generations to reach them. Moreover, it's likely that those branches turn out to be part of the aristocracy (at large), a thin minority to have left some record of life events. The commoners of those times, representing an overwhelming majority of the ancestors for most of us, have not left a single piece of document, and what is left of them is randomly spread in bits of our genetic code. Showing off a handful of dubious notable "deep ancestors" is giving a totally biased view of the big picture, like pretending that over thousands of light-years in deep space dwell only a few giant stars, just because at such distances we can't see the millions of fainter ones. More on this here : Anne de Bretagne et Bételgeuse (in French).

That said, connected we are, for sure, and we don't need to look deep into the mist of centuries to find out how. According to the Wikitree Connection Finder, about 85% of profiles in WikiTree belong to the main connected component of the family network, aka the "Single Tree", and this proportion has been steadily growing over years. And those who are not yet will eventually be, provided someone takes the time to look at them seriously, or by sheer serendipity of the network expansion. But, as seen above, most of us are not stricto sensu related in the data base, that is pairwise identified as direct cousins. The cool "My Cousins" WikiTree application is listing at the end of 2021 less than twenty of my direct living cousins in the data base (my branch belongs to a quite remote suburb of the tree). But at the same time the Connection Finder says I'm connected to over 24 million profiles...

Oak Years
Credit : Eva Ekeblad, WikiTree image

A slice in the tree : Seven generations around 1900

Let's face it : the intertwined roots of the Great Tree are lost in the depth of time, and its branches keep growing and crossing each other towards an unpredictable future. Both tips are forever out of reach, and all we can explore in our genealogical research is but a thin slice of a few centuries in a story of billions of years. But all the life streams flowing from the distant sources are passing through any slice of time, and if we focus on the geometry of such a slice while keeping in mind the big picture it is cut from, what can we see, and what does it tell us about the tree at larger scales?

Consider such a slice of the human race, centered on Year 1900, spanning two centuries, roughly seven generations. The middle generation is born in the very late 1800s or early 1900s, à la Belle Epoque. Parents were typically born in the 1870s, grandparents in the 1840s, great-grandparents in the very early 1800s. Children were born during or just after World War I, grandchildren are baby-boomers of the 1950s, great-grandchidren the so-called Generation Y, born in the late 1900s. Most living genealogists belong to the two latter generations, even if some of the youngest ones belong to the following one, born in the early 2000s.

Using WikiTree to explore those seven generations, the first figure to be aware of is the distribution of profiles over time in the data base. As of 2021, over 95% of WikiTree profiles were born post-1700, and almost 80% of them post-1800. The second figure to bear in mind is that 1800s profiles in WikiTree, in the current state of affairs, represent less than 1% of the world population during the same period. Nevertheless, even in this very partial state of affairs, shortest paths linking two of those post-1800 profiles, as computed by the Connection Finder, rarely venture pre-1700, and often stay post-1800, using a lot of transversal paths via spouses, siblings, in-laws, cousins of cousins. Adding even a small amount of the remaining 99% can only provide more transversal bridges and shortcuts to the existing paths, and reduce the mean distances. It is not too bold a conjecture to say that all people in those seven generations could be connected without need to go further back in the past. The Connect 1900 project is exploring such a conjecture, adding to the traditional vertical exploration of the tree the transversal relationships (siblings, spouses and in-laws), looking in all four directions to expand "circles", as defined by the 100 Circles project.

Watch out for crossovers

Circular exploration of generations around 1900 uncovers a geometry of endogamic clusters linked by rare but crucial (pun intended) crossovers. The endogamic clusters are geographical, social, ethnic, religious, economic ... or any mixture of the above ingredients. The main geographical clusters in WikiTree are in North America, hence the most obvious crossovers are immigrants from anywhere to America, and this is an obvious bias to the geometry of WikiTree. Behind the main American cluster are hiding national and regional clusters in Europa, South Africa, Australia, New Zealand. A notable and particular cluster is the European aristocracy, known to be very endogamic, but nevertheless crossed by many connection paths, even those linking commoners.

In any connection path linking two profiles belonging to distant clusters, watch out for crossovers, often easy to spot in the path. Without them, the Great Tree would look like a forest of separated trees, and each one of them is telling a singular story of how we are tied in a single big family : migrants, runaways flying off misery, war, or simply family quarrels. But also carefully thought alliances between name and fortune, which allowed European ancient aristocracy to float over the end of the Ancien Régime, and keep alive and wealthy until today. And of course artists, writers, musicians, actors, or scientists, all people of cosmopolitan culture, marrying (often several times) across boarders. Less noticeable at first sight, children of generations of farmers, leaving their ancestral countryside, looking for a better life in the expanding towns and industries of the late 1800s, meeting there other migrants of similar poor social extraction, coming from other rural regions of the same country, or from abroad.

Circles in a Circle
Vassily Kandinsky, 1923 - Circles in a Circle
Source : Wikimedia commons

The Shape of Things to Come

Of course we can't see further in the future than in the past, and obviously even less. But like in the famous novel by H.G.Wells, we can have a try at anticipating how the next slices of the Great Tree could look like, based on past and recent trends. One of the most noticeable of those trends in the 1900s and early 2000s has been the continuing decrease of both geographical and social endogamy. This trend was already visible around 1900s in the Western World, as said in the previous section, but it has now become global. Here is a short list of relevant resources about it.

  • World Migration Report 2022 of the International Organization for Migration (United Nations) : the number of migrants has kept steadily growing worldwide since the beginning of the 21st century, both inside countries and across international boarders.
  • Trends and patterns in intermarriage, a 2017 study by the Pew Research Center : the proportion of marriages (in the USA) between people of different ethnic origins has grown from 3% in the late 1960s to 17% in 2015.
  • 2021 Gallup poll on approval of intermarriage (in USA), showing that such an approval has grown from 4% in the late 1950s to an amazing 94% in 2021.
  • Urbanization worldwide has grown from less than 20% in 1900 to over 50% today. And we have seen that life in towns, along with higher levels of education, is a major cause of exogamic encounters.

Even if the majority of unions in the world even today are still endogamic, the proportion of crossovers has been steadily growing, each one shortening the distances between endogamic clusters, and we are globally getting closer and closer to each other in the great family network.

Figures in WikiTree mirror the global trends. We have been monitoring the distribution of distances in WikiTree over one year now in the framework of the 100 Circles project. For all focus profiles, the mean distance to the 20+ millions of connected profiles has slowly but steadily decreased over time, and we see no reason for this trend to change in the foreseeable growth of the data base. Typical short distances between random post-1800 profiles are currently in the 20 to 30 range. Various napkin computations converge to indicate that this value could over time fall under 20 for most profiles. See Ten circles meet each other, for more details.

