We started by giving several examples of uses of probability as well as puzzles. Here they are briefly described with pointers to more details elsewhere.
A young man lives in Manhattan near a subway express station. He has two girlfriends, one in Brooklyn, one in the Bronx. To visit the girl in Brooklyn, he takes a train on the downtown side of the platform; to visit the girl in the Bronx, he takes a train on the uptown side of the same platform. Since he likes both girls equally well, he simply takes the first train that comes along. In this way, he lets chance determine whether he rides to the Bronx or to Brooklyn. The young man reaches the subway platform at a random moment each Saturday afternoon. Brooklyn and Bronx trains arrive at the station equally often—every 10 minutes. Yet for some obscure reason he finds himself spending most of his time with the girl in Brooklyn: in fact on the average he goes there 9 times out of 10. Can you think of a good reason why the odds so heavily favor Brooklyn?Read the solution to the puzzle from the Wikipedia article linked to from the same page.
The wrong way to think about this is as follows: since we see 1 euro in the open hand then the contents of the other hand are either 1/2 euro or 2 euros, each happening with probability 1/2. Thereore, if we switch to the other hand the expected gain will be $\frac12 \cdot \frac12 + \frac12\cdot 2 = \frac54$ which is $\gt 1$, hence we should switch.
In reality it does not make any difference if we switch or not (statistically -- of course it can matter at a specific instance of the game). So where is the error in the above calculation?
We randomly choose a piece of paper, look at the number underneath, and win a candy if we correctly state that the number we see is the larger or smaller of the two.
There is an easy way to win the candy with probability $1/2$. We just always say "larger". Or always say "smaller". Or we flip a fair coin and say "larger" or "smaller" depending on what the coin came up with, heads or tails. (This way we don't even take into account the number we saw written under the piece of paper we chose.)
Is there a way to play this game so that the probability of getting the candy is greater than $1/2$?
The surprising answer is yes (but we cannot guarantee how much larger than 1/2 this probability of winning is).
We can do this as follows: Let $X$ be the number we saw after we chose a piece of paper (so $X$ is a random variable taking the two values $a$ and $b$ with equal probability). Let also $Y$ be a random variable whose density on the real axis is everywhere strictly positive. For instance we may take $Y$ to follow the standard normal distribution $\mathcal{N}(0, 1)$. We run the experiment that produces $Y$ and then say "larger" if $X \ge Y$ or smaller if $X \lt Y$. One can easily check that the probability of success is strictly larger than 1/2.
Read about it here.
Read about it here.
Read about it here.
So our algorithm for checking the identity of the two strings is the following. For a large number $k$ let the remote computer evaluate the numbers $b(X_1), b(X_2), \ldots, b(X_k)$, where the $X_j$ are produced by us (local) chosen independently and uniformly from $S$. The the remote computer sends these values to us and we check if $a(X_j) = b(X_j)$, for $j=1, \ldots, k$. If the two strings are different the polynomial $P(x)$ is not identically zero, so the probability of this happening is $1/2^k$ at most. We pick the number $k$ large enough so that we can tolerate failure probability $1/2^k$ (for instance choosing $k$ roughly 300 guarantees failure probability $10^{-100}$). Let us point out that the two sides may still to do work $O(N)$ in order to evaluate the two polynomials but the slow part, the communication, is only $O(k)$.
As a demonstration of the principle of checking identities up to a probability of error, and thus saving communication, we mentioned the
well-established program rsync
. Read about it here.
It is easy to see by taking the vectors $v_j$ to be the coordinate vectors $e_j$ that these bounds cannot be improved as, in that case, no matter what the signs we have $\Abs{v}=\sqrt{n}$.
Today we gave one more example of this method. This differs from the ones we saw last time in that the random object constructed is not guaranteed to have the properties that we want. Instead we need to make some modifications of it and then it works. These modifications need to be controlled and this is again done use the average value argument.
• If the graph $G$ has $n$ vertices and $nd/2$ edges then it contains an independent set of size $\ge n/(2d)$.
Proof: Take a random subset $S$ of the vertices by keeping each vertex with probability $p$. Let $X$ be its size and $Y$ be the number of edges in $G$ restricted to $S$. It turns out that $$ \Mean{Y} = \frac{nd}{2}p^2. $$ Since $\Mean{X} = pn$ we have $\Mean{X-Y}=np-\frac{nd}{2}p^2$. Choose $p=1/d$ to make this as large as possible we obtain $\Mean{X-Y}=n/(2d)$. Delete one vertex from each edge to be left with an independent set of size $\ge n/(2d)$.
• The next thing we saw is the so-called derandomization by conditional probabilities which is a general method of turning some probabilitic proofs of existence into constructive methods (algorithms) of finding the good object whose proof we already have. Read about it here (section 1.3).
• We observed next that the average value argument cannot possibly help us construct an interesting object if the number of quantities we want to control (to have in a certain range, for example) are more than 1. For this we need deviation inequalities: inequalities that give us upper bounds on the probability that a ceratin random variable is far from its mean. The most well known (and usually weak) such inequalitites are Markov's inequality and Chebyshev's inequality, inequalities we usually encounter in any standard course of Probability. When, however, the random variable $X$ in question is a sum of many independent parts then we have the phenomenon of concentration of $X$ close to its mean $\Mean{X}$. We gave (without proof) an example of such an inequality:
• If $S = X_1+\cdots+X_n$ and the $X_j$ are independent and equal to $\pm 1$ with equal probability then
$$
\Prob{\Abs{S} \ge a} \le 2 e^{-a^2/2n}.
$$
We then compared the inequality given by this theorem (which we will prove next time) to the result given by Chebyshev's inequality
and observed that this is dramatically better.
• Today we started by proving the theorem that we stated last time. To prove it the key is to
apply Markov's inequality to the variable $e^{\lambda S}$ for an appropriately chosen parameter $\lambda$. Since $S=X_1+\cdots+X_n$
and the $X_j$ are independent we can write
$$
\Mean{e^{\lambda S}} = \Mean{e^{\lambda X_1}} \cdots \Mean{e^{\lambda X_n}} = \left( \Mean{e^{\lambda X_1}} \right)^n,
$$
and using the inequality $\frac{e^{-\lambda}+e^\lambda}{2} \le e^{\lambda^2/2}$ (easy to prove using the power
series expansion for the exponential function) we deduce, choosing now $\lambda=a/n$, the desired inequality of the theorem.
• We used the same theorem then to prove that if $A$ is a $n\times m$ matrix with 0 or 1 entries then there is a vector of signs
$b \in \Set{-1, 1}^m$ such that $\Linf{A b} \le \sqrt{4 m \log n}$. The trick is again to take $b$ to be a random vector of signs (independently
for all coordinates and with equal probability for $\pm 1$). It follows that $\Mean{(Ab)_i} = 0$ for all $i=1,\ldots,n$, and we would
like all random variables $(Ab)_i$, $i=1,\ldots,n$, to be close to their expected value. The probability that this fails for a given $i$
is given to us by using this theorem. It turns out that the sum of the probabilities of these "bad events" is smaller
than 1, so that with positive probability none of these bad events happens. You can read more details in Theorem 4.11 of the book "Mitzenmacher and Upfal,
Probability for Computing, 2ed".
• Then we stated the following large deviation inequality without proof. If $X_1, \ldots, X_n$ are independent indicator (0 or 1-valued)
random variables and $S = X_1+\cdots+X_n$ with $\mu = \Mean{S}$ then for any $\delta \in (0, 1)$ we have
$$
\Prob{\Abs{S-\mu} \le \delta \mu} \le 2 e^{-\mu \frac{\delta^3}{3}}.
$$
• We used this result next to estimate the number $n$ of repetitions of a randomized algorithm, which produces the
correct result with probability $p\gt 1/2$, that we need to make in order to select as the correct result the one which appears
more than half the times. Writing $X_j = 1$ if the $j$-th repetition of the randomized algorithm gave the correct result and 0 else, and
$S=X_1+\cdots+X_n$, we observe that $\Mean{S} = np$ and we would like that the following bad event does not happen: $\Set{S \le n/2}$. This
bad event is included in the bad event $\Set{\Abs{S-np} \gt n(p-\frac12)} = \Set{\Abs{S-np} \gt \frac{p-\frac12}{p} np}$ which,
by our inequality, is at most $2 \exp(-n \frac{(p-1/2)^3}{p^2})$ and we would like this to be at most a desired $\epsilon$. We can solve
this inequality for $\epsilon$ to get that
$$
n \gt \log\frac{2}{\epsilon} \frac{p^2}{(p-1/2)^3}
$$
is enough.
• Last we talked about a theorem of Erdos on the existence of additive bases of order 2 for the natural numbers.
This was proved using the probabilistic method where we repeatedly use the previous theorem for bounding large deviations.
You can read about this result in section 3.2 in this paper.
Thursday, 15 May 2025