The Russian Math Behind the Trillion Dollar Algorithm
Discover how a century-old Russian math feud led to the creation of Markov chains, shaping everything from Google's trillion-dollar algorithm to modern AI. Dive deeper into this fascinating story and explore the math behind the technology that powers our world today!
Mathematics often seems like an abstract concept, but its applications can have profound impacts on the world around us. From predicting the next word in a sentence to understanding how Google’s search engine works, math provides the foundation for many systems we take for granted. One particularly fascinating chapter in the history of mathematics is the Russian feud between two mathematicians, Pavel Nekrasov and Andrey Markov. Their rivalry in the early 20th century not only revolutionized probability theory but also laid the groundwork for the modern algorithms that power technologies like Google and predictive text.
The Roots of the Russian Feud: A Nation Divided
In 1905, Russia was in the midst of political turmoil. Socialist groups, seeking to overthrow the Tsarist regime, were demanding complete political reform. On the other side stood the Tsarists, who wanted to maintain the status quo. This division didn’t just affect politics it seeped into every facet of Russian society, including the mathematical community. Mathematicians, too, began to take sides.
Pavel Nekrasov, a deeply religious and powerful figure, aligned himself with the Tsarist camp. He believed that mathematics could be used to explain divine will and free will itself. His intellectual adversary, Andrey Markov, rejected such notions. A staunch atheist, Markov had no patience for what he saw as unscientific ideas. Their bitter feud focused on one central idea: the law of large numbers, which had been the cornerstone of probability theory for over 200 years.
The Law of Large Numbers and the Coin Flip
The law of large numbers, first proven by Jacob Bernoulli in 1713, states that the average result of an independent random experiment (like flipping a coin) will approach the expected value as the number of trials increases. For instance, if you flip a coin 10 times, you might get 6 heads and 4 tails. But as you increase the number of flips, the ratio of heads to tails will converge toward 50/50.
To illustrate this, imagine you flip a coin 100 times and get 51 heads and 49 tails. This result aligns almost exactly with the expected outcome of 50 heads and 50 tails. The law of large numbers, in simple terms, says that as you conduct more independent trials, the average result will stabilize near the expected value.
The Divergence: Independent vs. Dependent Events
For two centuries, probability theory relied on the assumption that events must be independent for the law of large numbers to hold true. However, Markov challenged this notion. He believed that dependent events—where the outcome of one event influences the outcome of the next—could also follow the law of large numbers. To prove this, he used a simple example involving text.
Markov noted that the occurrence of certain letters in a sequence of words (like vowels and consonants) is not independent. For instance, the likelihood of the next letter being a vowel depends heavily on the current letter. To test this hypothesis, Markov analyzed the first 20,000 letters of a famous Russian poem, "Eugene Onegin," by Alexander Pushkin. He discovered that certain pairs of letters—such as vowel-vowel or consonant-vowel pairs—appeared with much higher or lower frequencies than expected if the letters were independent. This provided evidence that the events were, indeed, dependent.
The Birth of Markov Chains
Markov’s groundbreaking work on dependent events led to the creation of Markov chains, a mathematical system that models sequences of dependent events. In a Markov chain, the outcome of each event depends only on the current state, not on previous events. This concept would later become foundational in fields ranging from physics to economics to computer science.
Markov’s chain model works by defining a set of states (e.g., vowels and consonants) and the probabilities of transitioning from one state to another. For example, if you're in the "vowel" state, there's a certain probability that the next letter will be another vowel or a consonant. Over time, as you repeatedly transition between states, the system converges to a stable distribution, just like the law of large numbers.
The Monte Carlo Method: From Solitaire to Nuclear Physics
Markov's ideas would have far-reaching consequences. One of the most significant applications came in the 1940s, when a mathematician named Stanislaw Ulam, working on the Manhattan Project, used Markov chains to simulate the behavior of neutrons inside a nuclear bomb. The problem was that calculating all the possible outcomes of neutron interactions was virtually impossible. Ulam had a flash of insight: What if he could use random sampling to approximate the solution, similar to how he played Solitaire to pass the time during his recovery from encephalitis?
This led to the creation of the Monte Carlo method, a statistical technique for solving complex problems by generating random samples. Ulam shared his idea with his colleague, John von Neumann, who recognized its potential. Together, they used Monte Carlo simulations to model neutron behavior inside nuclear reactors and bombs. This method became a cornerstone of nuclear physics and is now widely used in various fields, from finance to climate modeling.
Markov Chains and the Rise of Google
Fast forward to the 1990s, when the internet was rapidly growing and the need for an efficient search engine became crucial. Early search engines like Yahoo and Lycos used simple keyword matching to rank pages, but they were easily manipulated. Enter Google.
Larry Page and Sergey Brin, the founders of Google, used a revolutionary idea based on Markov chains. They realized that instead of ranking web pages by how often a keyword appeared, they could model the web as a Markov chain, where each page is a "state" and the links between pages represent transitions. Pages with more links pointing to them would be considered more "important."
This idea became the foundation of PageRank, Google's ranking algorithm. By using Markov chains to model the web, PageRank could rank pages based on both their relevance to the search term and their importance within the web's link structure. The result was a search engine that consistently provided better, more accurate results.
Predictive Text and Modern AI
The power of Markov chains doesn't stop at web search. In recent years, Markov models have been adapted to improve predictive text and artificial intelligence. Today, large language models like GPT-3 use a similar approach, except with more advanced techniques like attention mechanisms to understand context and predict the next word in a sentence.
In predictive text, the model looks at the sequence of words (or tokens) you've typed so far and predicts the next word based on probabilities learned from vast amounts of text data. While Markov chains used to be limited to simple word pairs or letter pairs, modern models like GPT-3 consider the entire context of a sentence to make predictions.
The Trillion Dollar Algorithm
The impact of Markov chains extends far beyond Google. From AI-driven content generation to weather predictions to the behavior of financial markets, Markov chains are used to model systems where the future depends on the present, not the past. The algorithm that powers Google’s search engine, which is based on Markov chains, is valued at over a trillion dollars, illustrating the profound effect these mathematical ideas have had on the modern world.
As we've seen, a century-old mathematical feud in Russia, sparked by the differing views of Pavel Nekrasov and Andrey Markov, has had far-reaching consequences. From the birth of Markov chains to the development of the Monte Carlo method, these ideas laid the groundwork for some of the most influential algorithms in modern technology. Today, their legacy lives on in everything from search engines to artificial intelligence.
Conclusion: The Power of Mathematics
What started as a feud over free will and probability has evolved into one of the most important areas of mathematics, shaping everything from computer algorithms to the global economy. Markov chains have become a foundational tool in understanding complex systems and making predictions about the future. Whether it's predicting the next word in a sentence or simulating the behavior of neutrons in a nuclear reactor, Markov chains provide a powerful and efficient way to model and solve problems that would otherwise be too complex to handle. As we move forward into the age of AI and machine learning, the Russian math that started over a century ago continues to shape the future in profound ways.

