Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 : Ramsey R(k, k)

Let us repeat the key paragraph nearly verbatim. Erdős defines R(k, l) as the least integer so that given any graph G of n ≥ R(k, l) vertices then either G contains a complete graph of order k or the complement G′ contains a complete graph of order l.

Theorem 1.

Let k ≥ 3, then

$$\displaystyle{{2}^{k/2} < R(k,k) \leq \left ({ 2k - 2 \atop k - 1} \right ) < {4}^{k-1}.}$$

Proof. The second inequality was proved by Szekeres thus we only consider the first one. Let N ≤ 2n ∕ 2. Clearly the number of graphs of N vertices equals \({2}^{N(N-1)/2}\). (We consider the vertices of the graph as distinguishable.) The number of different graphs containing a complete graph of order k is less than

$$\displaystyle{ \left ({ N \atop k} \right )\frac{{2}^{N(N-1)/2}} {{2}^{k(k-1)/2}} < \frac{{N}^{k}} {k!} \frac{{2}^{N(N-1)/2}} {{2}^{k(k-1)/2}} < \frac{{2}^{N(N-1)/2}} {2} }$$
(*)

since by a simple calculation for N ≤ 2k ∕ 2 and k ≥ 3

$$\displaystyle{2{N}^{k} < k!{2}^{k(k-1)/2}.}$$

But it follows immediately from ( ∗ ) that there exists a graph such that neither it nor its complementary graph contains a complete subgraph of order k, which completes the proof of the Theorem. □ 

Erdős used a counting argument above, in the more modern language we would speak of the random graph G ∼ G(n, p) with \(p = \frac{1} {2}\). The probability that G contains a complete graph of order k is less than

$$\displaystyle{\left ({ N \atop k} \right ){2}^{-k(k-1)/2} < \frac{{N}^{k}} {k!} {2}^{-k(k-1)/2} < \frac{1} {2}}$$

(calculations as in the original paper) and so the probability that G or G contains a complete graph is less than one so that with positive probability G doesn’t have this property and therefore there exists a G as desired. Erdős has related that after lecturing on his result the probabilist J. Doob remarked “Well, that’s very nice but it really is a counting argument.” For this result the proofs are nearly identical, the probabilistic proof having the minor advantage of avoiding the annoying \({2}^{N(N-1)/2}\) factors. Erdős writes interchangeably in the two styles. As the methodology has progressed the probabilistic ideas have become more subtle and today it is quite rare to see a paper written in the counting style. We’ll take the liberty of translating Erdős’s later results into the more modern style.

The gap between 2k ∕ 2 and 4k for R(k, k) remains one of the most vexing problems in Ramsey Theory and in the Probabilistic Method. All improvements since this 1947 paper have been only to smaller order terms so that even today limR(k, k)1 ∕ k could be anywhere from \(\sqrt{2}\) to 4, inclusive. Even the existence of the limit has not been shown!

2 : Sidon Conjecture

Let S be a set of positive integers. Define f(n) = f S (n) as the number of representations \(n = x + y\) where x, y are distinct elements of S. We call S a basis if f(n) > 0 for all sufficiently large n. Sidon, in the early 1930s, asked if there existed “thin” bases, in particular he asked if for all positive \(\varepsilon\) there existed a basis with \(f(n) = O({n}^{\varepsilon })\). Erdős heard of this problem at that time and relates that he told Sidon that he thought he could get a solution in “a few days”. It took somewhat longer. In 1941 Erdős and Turán made the stronger conjecture that there exists a basis with f(n) bounded from above by an absolute constant—a conjecture that remains open today. In 1955 Erdős [1] resolved the Sidon conjecture with the following stronger result.

Theorem 2.

There exists S with \(f_{S}(n) = \Theta (\ln n)\).

Proof. The proof is probabilistic. Define a random set by \(\Pr [x \in S] = p_{x}\), the events being mutually independent over integers x, setting

$$\displaystyle{p_{x} = K{\bigg(\frac{\ln x} {x}\bigg)}^{1/2}}$$

where K is a large absolute constant. (For the finitely many x for which this is greater than one simply place x ∈ S.) Now f(n) becomes a random variable. For each x < y with \(x + y = n\) let I xy be the indicator random variable for x, y ∈ S. Then we may express \(f(n) =\sum I_{xy}\). From Linearity of Expectation

$$\displaystyle{E[f(n)] =\sum E[I_{xy}] =\sum p_{x}p_{y} \sim {K}^{{\prime}}\ln n}$$

by a straightforward calculation.

Lets write \(\mu =\mu (n) = E[f(n)]\). The key ingredient is now a large deviation result. One shows, say, that

$$\displaystyle\begin{array}{rcl} \Pr [f(n) < \frac{1} {2}\mu ]& <& {e}^{-c\mu } {}\\ \Pr [f(n) > 2\mu ]& <& {e}^{-c\mu } {}\\ & & {}\\ \end{array}$$

where c is a positive absolute constant, not dependent on n, K or μ. This makes intuitive sense: as f(n) is the sum of mutually independent rare indicator random variables it should be roughly a Poisson distribution and such large deviation bounds hold for the Poisson. Now pick K so large that K′ is so large that  > 2lnn. Call n a failure if either f(n) > 2μ or f(n) < μ ∕ 2. Each n has probability less than 2n  − 2 failure probability. By the Borel-Cantelli Lemma (as \(\sum {n}^{-2}\) converges) almost surely there are only a finite number of failures and so almost surely this random S has the desired properties. □ 

While the original Erdős proof was couched in different, counting, language the use of large deviation bounds can be clearly seen and, on this count alone, this paper marks a notable advance in the Probabilistic Method.

3 : High Girth, High Chromatic Number

Tutte was the first to show the existence of graphs with arbitrarily high chromatic number and no triangles, this was extended by Kelly to arbitrarily high chromatic number and no cycles of sizes three, four or five. A natural question occurred—could graphs be found with arbitrarily high chromatic number and arbitrarily high girth—i.e., no small cycles. To many graph theorists this seemed almost paradoxical. A graph with high girth would locally look like a tree and trees can easily be colored with two colors. What reason could force such a graph to have high chromatic number? As we’ll see, there is a global reason: χ(G) ≥ n ∕ α(G). To show χ(G) is large one “only” has to show the nonexistence of large independent sets.

Erdős [2] proved the existence of such graphs by probabilistic means. Fix l, k, a graph is wanted with χ(G) > l and no cycles of size ≤ k. Fix \(\varepsilon < \frac{1} {k}\), set \(p = {n}^{\varepsilon -1}\) and consider G ∼ G(n, p) as n → . There are small cycles, the expected number of cycles of size ≤ k is

$$\displaystyle{\sum _{i=3}^{k}\frac{(n)_{i}} {2i} {p}^{i} =\sum _{ i=3}^{k}O\big({(np)}^{i}\big) = o(n)}$$

as \(k\varepsilon < 1\). So almost surely the number of edges in small cycles is o(n). Also fix positive \(\eta <\varepsilon /2\). Set \(\lfloor u = {n}^{1-\eta }\rfloor \). A set of u vertices will contain, on average, \(\mu \sim {u}^{2}p/2 = \Omega ({n}^{\alpha })\) edges where \(\alpha = 1 +\varepsilon -2\eta > 1\). Further, the number of such edges is given by a Binomial Distribution. Applying large deviation results, the probability of the u points having fewer than half their expected number of edges is e  − . As α > 1 this is smaller than exponential, so o(2 − n) so that almost surely everyu points has at least μ ∕ 2 edges. We need only that μ ∕ 2 > n.

Now Erdős introduces what is now called the Deletion Method. This random graph G almost surely has only o(n) edges in small cycles and every u vertices have at least n edges. Take a specific graph G with these properties. Delete all the edges in small cycles giving a graph G  − . Then certainly G  −  has no small cycles. As fewer than n edges have been deleted every u vertices of G  − , which had more than n edges in G, still has an edge. Thus the independence number α(G  − ) ≤ u. But

$$\displaystyle{\chi ({G}^{-}) \geq \frac{n} {\alpha ({G}^{-})} \geq \frac{n} {u} \sim {n}^{\eta }}$$

As n can be arbitrarily large one can now make χ(G  − ) ≥ k, completing the proof.

The use of counting arguments became a typographical nightmare. Erdős considered all graphs with precisely m edges where \(m = \lfloor {n}^{1+\varepsilon }\rfloor \). He needed that almost all of them had the property that every u vertices (u as above) had more than n. The number of graphs failing that for a given set of size u was then

$$\displaystyle\begin{array}{rcl} \sum _{i=1}^{n}\left ({ \left ({ u \atop 2} \right ) \atop i} \right )\left ({ \left ({ n \atop 2} \right ) -\left ({ u \atop 2} \right ) \atop m - i} \right )& <& (n + 1)\left ({ \left ({ u \atop 2} \right ) \atop n} \right )\left ({ \left ({ n \atop 2} \right ) -\left ({ u \atop 2} \right ) \atop m} \right ) {}\\ & <& {u}^{2n}\left ({ \left ({ n \atop 2} \right ) -\left ({ m \atop 2} \right ) \atop m} \right ) < \left ({ \left ({ n \atop 2} \right ) \atop m} \right ){u}^{2n}{\left (1 -{ \left ({ u \atop 2} \right ) \over \left ({ n \atop 2} \right )} \right )}^{m} {}\\ & <& \left ({ \left ({ n \atop 2} \right ) \atop m} \right ){u}^{2n}{\bigg(1 -{ {u}^{2} \over {n}^{2}} \bigg)}^{m} < \left ({ \left ({ n \atop 2} \right ) \atop m} \right ){u}^{2m}{e}^{-m{u}^{2}/{n}^{2} }. {}\\ \end{array}$$

Now the number of possible choices for the u points is

$$\displaystyle{\left ({ n \atop u} \right ) < {n}^{u} < {u}^{n}}$$

and so the number of graphs without the desired property is

$$\displaystyle{\left ({ \left ({ n \atop 2} \right ) \atop m} \right ){u}^{3n}{e}^{-{n}^{1+\varepsilon -2\eta } } = o\left (\left ({ \left ({ n \atop 2} \right ) \atop m} \right )\right )}$$

as desired. Today, with large deviation results assumed beforehand, the proof can be given in one relatively leisurely page.

Many consider this one of the most pleasing applications of the Probabilistic Method as the result seems not to call for probability in the slightest and earlier attempts had been entirely constructive. The further use of large deviations and the introduction of the Deletion Method greatly advanced the Probabilistic Method. And, most important, the theorem gives an important truth about graphs. In a rough sense the truth is a negative one: chromatic number cannot be determined by local considerations only.

4 : Ramsey R(3, k)

Ramsey Theory was one of Paul Erdős’s earliest interests. The involvement can be dated back to the winter of 1932/33. Working on a problem of Esther Klein, Erdős proved his famous result that in every sequence of n 2 + 1 real numbers there is a monotone subsequence of length n + 1. At the same time, and for the same problem, George Szekeres rediscovered Ramsey’s Theorem. Both arguments appeared in their 1935 joint paper [10]. Bounds on the various Ramsey functions, particularly the function R(l, k), have fascinated Erdős ever since. We have already spoken of his 1947 paper on R(k, k). In his 1961 paper Erdős [3] proves

$$\displaystyle{R(3,k) > c{\frac{{k}^{2}} {\ln }^{2}k}.}$$

The upper bound R(3, k) = O(k 2) was already apparent from the original Szekeres proof so the gap was relatively small. Only in 1994 was the correct order \(R(3,k) = \Theta \big(\frac{{k}^{2}} {\ln k} \big)\) finally shown.

Erdős shows that there is a graph on n vertices with no triangle and no independent set of size x where \(x = \lceil A{n}^{1/2}\ln n\rceil \), and A is a large absolute constant. This gives R(3, x) > n from which the original statement follows easily. We’ll ignore A in our informal discussion. He takes a random graph G(n, p) with \(p = c{n}^{-1/2}\). The probability that some x-set is independent is at most

$$\displaystyle{\left ({ n \atop x} \right ){(1 - p)}^{x(x-1)/2} < {[n{e}^{-p(x-1)/2}]}^{x}}$$

which is very small. Unfortunately this G will have lots (\(\Theta ({n}^{3/2})\)) of triangles. One needs to remove an edge from each triangle without making any of the x-sets independent.

The Erdős method may be thought of algorithmically. Order the edges e 1, , e m of G ∼ G(n, p) arbitrarily. Consider them sequentially and reject e i if it would make a triangle with the edges previously accepted, otherwise accept e i . The graph G  −  so created is certainly triangle free. What about the sets of x vertices. Call a set S of x vertices good (in G, not G  − ) if it contains an edge e which cannot be extended to a triangle with third vertex outside of S. Suppose S is good and let e be such an edge. Then S cannot be independent in G  − . If e is accepted we’re clearly OK. The only way e could be rejected is if e is part of a triangle e, e 1, e 2 where the other edges have already been accepted. But then e 1, e 2 must (as S is good) lie in S and again S is not independent.

Call S bad if it isn’t good. Erdős shows that almost always there are no bad S. Lets say something occurs with high probability if its failure probability is \(o\big(1/\left ({ n \atop x} \right )\big)\). It suffices to show that a given S = { 1, , x} is good with high probability. This is the core of the argument. We expose (to use modern terminology) G in two phases. First we examined the pairs {s, t} with s ∈ S, tS. For each tS let d(t) be the number of edges to S. Set

$$\displaystyle{Z =\sum _{t\notin S}\left ({ d(t) \atop 2} \right ).}$$

Each d(t) has Binomial Distribution B(x, p) and so expectation \(xp = \Theta (\ln n)\) so that one can get fairly easily \(E[Z] = \Theta ({n\ln }^{2}n)\). Note this is the same order as x 2. It is definitely not easy to show that for appropriate A, c (Erdős takes \(c = {A}^{-1/2}\) and A large) that \(Z < \frac{1} {2}\left ({ x \atop 2} \right )\) with high probability. The requirement “with high probability” is quite severe. But note, at least, that this is a pure probability statement. Lets accept it and move on. Call a pair {i, j} ⊂ S soiled if it lies in a triangle with third vertex outside of S. At most Z pairs are soiled so with high probability at least \(\frac{1} {2}\left ({ x \atop 2} \right )\) pairs are unsoiled. Now we expose the edges of G inside S. If any of the unsoiled pairs are in G then G is good and so the failure probability is at most

$$\displaystyle{{(1 - p)}^{\frac{1} {2} \left ({ x \atop 2} \right )} < {e}^{-\Omega (p{x}^{2}) } = o\left ({\left ({ n \atop x} \right )}^{-1}\right )}$$

and so G is good with high probability.

Sounds complicated. Well, it is complicated and it is simultaneously a powerful application of the Probabilistic Method and a technical tour de force. The story has a coda: the Lovász Local Lemma, developed in the mid-1970s, gave a new sieve method for showing that a set of bad events could simultaneously not hold. This author applied it to the random graph G(n, p) with \(p = c{n}^{-1/2}\) with the bad events being the existence of the various potential triangles and the independence of the various x-sets. The conditions of the Local Lemma made for some calculations but it was relatively straightforward to duplicate this result. Still, the ideas behind this proof, the subtle extension of the Deletion Method notion, are too beautiful to be forgotten.

5 : No Local Coloring

With his 1957 paper previously discussed Erdős had already shown that chromatic number cannot be considered simply a local phenomenon. With this result he puts the nail in the coffin.

Theorem 3 ([4]). 

For any k ≥ 3 there is an \(\varepsilon > 0\) so that the following holds for all sufficiently large n: There exists a graph G on n vertices which cannot be k-colored and yet the restriction of G to any \(\varepsilon n\) vertex subgraph can be 3-colored.

Often probabilistic theorems are best understood as negative results, as counterexamples to natural conjectures. A priori, for example, one might conjecture that if every, say, n ∕ (lnn) vertices could be 3-colored then G could be 4-colored. This theorem disproves that conjecture.

We examine the random graph G ∼ G(n, p) with \(p = c/n\). As in the 1957 paper

$$\displaystyle{\Pr [\alpha (G) \geq x] < \left ({ n \atop x} \right ){(1 - p)}^{\left ({ x \atop 2} \right )} < {[(ne/x){e}^{-p(x-1)/2}]}^{x}.}$$

When c is large and, say, \(x = 10n(\ln c)/c\), the bracketed quantity is less than one so the entire quantity is o(1) and a.s. α(G) ≤ x and so χ(G) ≥ c ∕ (10lnc). Given k Erdős may now simply select c so that, with \(p = c/n\), χ(G) > k a.s.

Now for the local coloring. If some set of \(\leq \varepsilon n\) vertices cannot be 3-colored then there is a minimal such set S with, say, \(\vert S\vert = i \leq \varepsilon n\). In the restriction G |  S every vertex v must have degree at least 3—otherwise one could 3-color S − { v} by minimality and then color v differently from its neighbors. Thus G |  S has at least 3i ∕ 2 edges. The probability of G having such an S is bounded by

$$\displaystyle{\sum \limits _{i=4}^{\varepsilon n}\left ({ n \atop i} \right )\left ({ i \atop 2} \right )3i/2{p}^{3i/2} \leq \sum \limits _{ i=4}^{\varepsilon n}{\bigg[\frac{ne} {i} {\bigg(\frac{ei} {3} \bigg)}^{3/2}{\bigg( \frac{c} {n}\bigg)}^{3/2}\bigg]}^{i}}$$

employing the useful inequality \(\left ({ a \atop b} \right ) \leq {(\frac{ea} {b} )}^{b}\). Picking \(\varepsilon =\varepsilon (c)\) small the bracketed term is always less than one, the entire sum is o(1), a.s. no such S exists, and a.s. every \(\varepsilon n\) vertices may be 3-colored.

Erdős’s monumental study with Alfred Rényi “On the Evolution of Random Graphs” [8] had been completed only a few years before. The behavior of the basic graph functions such as chromatic and clique number were fairly well understood throughout the evolution. The argument for local coloring required a “new idea” but the basic framework was already in place.

6 /4: Coloring Hypergraphs

Let A 1, , A m be n-sets in an arbitrary universe \(\Omega \). The family \(\mathcal{A} =\{ \mathcal{A}_{1},\ldots,\mathcal{A}_{m}\}\) is 2-colorable (Erdős used the term “Property B”) if there is a 2-coloring of the underlying points \(\Omega \) so that no set A i is monochromatic. In 1963 Erdős gave perhaps the quickest demonstration of the Probabilistic Method.

Theorem 4 ([5]). 

If m < 2 n−1 then \(\mathcal{A}\) is 2-colorable.

Proof. Color \(\Omega \) randomly. Each A i has probability 21 − n of being monochromatic, the probability some A i is monochromatic is then at most m21 − n < 1 so with positive probability no A i is monochromatic. Take that coloring. □ 

In 1964 Erdős [6] showed this result was close to best possible.

Theorem 5.

There exists a family \(\mathcal{A}\) with m = cn 2 2 n which is not 2-colorable.

Here Erdős turns the original probability argument inside out. Before the sets were fixed and the coloring was random, now, essentially, the coloring is fixed and the sets are random. He sets \(\Omega =\{ 1,\ldots,u\}\) with u a parameter to be optimized later. Let A 1, , A m be random n-sets of \(\Omega \). Fix a coloring χ with a red points and \(b = u - a\) blue points. As A i is random

$$\displaystyle{\Pr [\chi (A_{i})\mbox{ constant}] = \frac{\left ({ a \atop n} \right ) + \left ({ b \atop n} \right )} {\left ({ u \atop n} \right )} \leq \frac{2\left ({ u/2 \atop n} \right )} {\left ({ u \atop n} \right )}.}$$

The second inequality, which follows from the convexity of \(\left ({ x \atop n} \right )\), indicates that it is the equicolorings that are the most troublesome. As the A i are independent

$$\displaystyle{\Pr [\mbox{ no $A_{i}$ monochromatic}] \leq {\left [1 -\frac{2\left ({ u/2 \atop n} \right )} {\left ({ u \atop n} \right )} \right ]}^{m}.}$$

Now suppose

$$\displaystyle{{2}^{u}{\left [1 -\frac{2\left ({ u/2 \atop n} \right )} {\left ({ u \atop n} \right )} \right ]}^{m} < 1}$$

The expected number of χ with no A i monochromatic is less than one. Therefore there is a choice of A 1, , A m for which no such χ exists, i.e., \(\mathcal{A}\) is not 2-colorable. Solving, one may take

$$\displaystyle{m =\bigg \lceil \frac{u\ln 2} {-\ln \left [1 - 2\left ({ u/2 \atop n} \right )/\left ({ u \atop n} \right )\right ]}\bigg\rceil }$$

Estimating \(-\ln (1-\varepsilon ) \sim \varepsilon\) this is roughly \(cu\left ({ u \atop n} \right )/\left ({ u/2 \atop n} \right )\). This leads to an interesting calculation problem (as do many problems involving the Probabilistic Method!)—find u so as to maximize m. The answer turns out to be u ∼ n 2 ∕ 2 at which value m ∼ (eln2)n 22n − 2.

Erdős has defined m(n) as the least m for which there is a family of n-sets which cannot be 2-colored. His results give \(\Omega ({2}^{n}) = m(n) = O({n}^{2}{2}^{n})\). Beck has improved the lower bound to \(\Omega ({n}^{1/3}{2}^{n})\) but the actual asymptotics of m(n) remain elusive.

7 : Unrankable Tournaments

Let T be a tournament with players 1, , n (each pair play one game and there are no ties) and σ a ranking of the players, technically a permutation on {1, , n}. Call game {i, j} a nonupset if i beats j and σ(i) < σ(j); an upset if i beats j but σ(j) < σ(i). The fit f(T, σ) is the number of nonupsets minus the number of upsets. One might have thought—in preprobabilistic days!—that every tournament T had a ranking σ with a reasonably good fit. With J.W. Moon, Erdős [7] easily destroyed that conjecture.

Theorem 6.

There is a T so that for all σ

$$\displaystyle{f(T,\sigma ) \leq {n}^{3/2}{(\ln n)}^{1/2}.}$$

Thus, for example, there are tournaments so that under any ranking at least 49% of the games are upsets. Erdős and Moon take the random tournament, for each pair {i, j} one “flips a fair coin” to see who wins the game. For any fixed σ each game is equally likely to be upset or nonupset and the different games are independent. Thus f(T, σ) ∼ S m , where \(m = \left ({ n \atop w} \right )\) and S m is the number of heads minus the number of tails in m flips of a fair coin. Large deviation theory gives

$$\displaystyle{\Pr [S_{m} >\alpha ] < {e}^{- \frac{{\alpha }^{2}} {2m} }.}$$

One now uses very large deviations. Set \(\alpha = {n}^{3/2}{(\ln n)}^{1/2}\) so that the above probability is less than \({n}^{-n} < 1/n!\). This super small probability is used because there are n! possible σ. Now with positive probability no σ has f(T, σ) > α. Thus there is a T with no σ having f(T, σ) > α.

The use of extreme large deviations has become a mainstay of the Probabilistic Method. But I have a more personal reason for concluding with this example. Let g(n) be the least integer so that every tournament T on n players has a ranking σ with f(T, σ) ≥ g(n). Then \(g(n) \leq {n}^{3/2}{(\ln n)}^{1/2}\). Erdős and Moon showed g(n) > cn, leaving open the asymptotics of g(n). In my doctoral dissertation I showed g(n) > c 1 n 3 ∕ 2 and later (but see de la Vega [11] for the “book proof”) that g(n) < c 2 n 3 ∕ 2. Though at the time I was but an \(\varepsilon\) Paul responded with his characteristic openness and soon [9] I had an Erdős number of one. Things haven’t been the same since.