• Today we started by proving the theorem that we stated last time. To prove it the key is to apply Markov's inequality to the variable $e^{\lambda S}$ for an appropriately chosen parameter $\lambda$. Since $S=X_1+\cdots+X_n$ and the $X_j$ are independent we can write $$ \Mean{e^{\lambda S}} = \Mean{e^{\lambda X_1}} \cdots \Mean{e^{\lambda X_n}} = \left( \Mean{e^{\lambda X_1}} \right)^n, $$ and using the inequality $\frac{e^{-\lambda}+e^\lambda}{2} \le e^{\lambda^2/2}$ (easy to prove using the power series expansion for the exponential function) we deduce, choosing now $\lambda=a/n$, the desired inequality of the theorem.
• We used the same theorem then to prove that if $A$ is a $n\times m$ matrix with 0 or 1 entries then there is a vector of signs $b \in \Set{-1, 1}^m$ such that $\Linf{A b} \le \sqrt{4 m \log n}$. The trick is again to take $b$ to be a random vector of signs (independently for all coordinates and with equal probability for $\pm 1$). It follows that $\Mean{(Ab)_i} = 0$ for all $i=1,\ldots,n$, and we would like all random variables $(Ab)_i$, $i=1,\ldots,n$, to be close to their expected value. The probability that this fails for a given $i$ is given to us by using this theorem. It turns out that the sum of the probabilities of these "bad events" is smaller than 1, so that with positive probability none of these bad events happens. You can read more details in Theorem 4.11 of the book "Mitzenmacher and Upfal, Probability for Computing, 2ed".
• Then we stated the following large deviation inequality without proof. If $X_1, \ldots, X_n$ are independent indicator (0 or 1-valued) random variables and $S = X_1+\cdots+X_n$ with $\mu = \Mean{S}$ then for any $\delta \in (0, 1)$ we have $$ \Prob{\Abs{S-\mu} \le \delta \mu} \le 2 e^{-\mu \frac{\delta^3}{3}}. $$
• We used this result next to estimate the number $n$ of repetitions of a randomized algorithm, which produces the correct result with probability $p\gt 1/2$, that we need to make in order to select as the correct result the one which appears more than half the times. Writing $X_j = 1$ if the $j$-th repetition of the randomized algorithm gave the correct result and 0 else, and $S=X_1+\cdots+X_n$, we observe that $\Mean{S} = np$ and we would like that the following bad event does not happen: $\Set{S \le n/2}$. This bad event is included in the bad event $\Set{\Abs{S-np} \gt n(p-\frac12)} = \Set{\Abs{S-np} \gt \frac{p-\frac12}{p} np}$ which, by our inequality, is at most $2 \exp(-n \frac{(p-1/2)^3}{p^2})$ and we would like this to be at most a desired $\epsilon$. We can solve this inequality for $\epsilon$ to get that $$ n \gt \log\frac{2}{\epsilon} \frac{p^2}{(p-1/2)^3} $$ is enough.
• Last we talked about a theorem of Erdos on the existence of additive bases of order 2 for the natural numbers. This was proved using the probabilistic method where we repeatedly use the previous theorem for bounding large deviations. You can read about this result in section 3.2 in this paper.
Today we gave one more example of this method. This differs from the ones we saw last time in that the random object constructed is not guaranteed to have the properties that we want. Instead we need to make some modifications of it and then it works. These modifications need to be controlled and this is again done use the average value argument.
• If the graph $G$ has $n$ vertices and $nd/2$ edges then it contains an independent set of size $\ge n/(2d)$.
Proof: Take a random subset $S$ of the vertices by keeping each vertex with probability $p$. Let $X$ be its size and $Y$ be the number of edges in $G$ restricted to $S$. It turns out that $$ \Mean{Y} = \frac{nd}{2}p^2. $$ Since $\Mean{X} = pn$ we have $\Mean{X-Y}=np-\frac{nd}{2}p^2$. Choose $p=1/d$ to make this as large as possible we obtain $\Mean{X-Y}=n/(2d)$. Delete one vertex from each edge to be left with an independent set of size $\ge n/(2d)$.
• The next thing we saw is the so-called derandomization by conditional probabilities which is a general method of turning some probabilitic proofs of existence into constructive methods (algorithms) of finding the good object whose proof we already have. Read about it here (section 1.3).
• We observed next that the average value argument cannot possibly help us construct an interesting object if the number of quantities we want to control (to have in a certain range, for example) are more than 1. For this we need deviation inequalities: inequalities that give us upper bounds on the probability that a ceratin random variable is far from its mean. The most well known (and usually weak) such inequalitites are Markov's inequality and Chebyshev's inequality, inequalities we usually encounter in any standard course of Probability. When, however, the random variable $X$ in question is a sum of many independent parts then we have the phenomenon of concentration of $X$ close to its mean $\Mean{X}$. We gave (without proof) an example of such an inequality:
• If $S = X_1+\cdots+X_n$ and the $X_j$ are independent and equal to $\pm 1$ with equal probability then
$$
\Prob{\Abs{S} \ge a} \le 2 e^{-a^2/2n}.
$$
We then compared the inequality given by this theorem (which we will prove next time) to the result given by Chebyshev's inequality
and observed that this is dramatically better.
So our algorithm
for checking the identity of the two strings is the following. For a large number $k$ let the remote computer evaluate the numbers
$b(X_1), b(X_2), \ldots, b(X_k)$, where the $X_j$ are produced by us (local) chosen independently and uniformly from $S$. The the remote computer sends
these values to us and we check if $a(X_j) = b(X_j)$, for $j=1, \ldots, k$. If the two strings are different the polynomial $P(x)$ is not identically
zero, so the probability of this happening is $1/2^k$ at most. We pick the number $k$ large enough so that we can tolerate failure probability
$1/2^k$ (for instance choosing $k$ roughly 300 guarantees failure probability $10^{-100}$). Let us point out that the two sides may still to do work $O(N)$
in order to evaluate the two polynomials but the slow part, the communication, is only $O(k)$.
As a demonstration of the principle of checking identities up to a probability of error, and thus saving communication, we mentioned the
well-established program
It is easy to see by taking the vectors $v_j$ to be the coordinate vectors $e_j$ that these bounds cannot be improved as, in that case, no matter
what the signs we have $\Abs{v}=\sqrt{n}$.
We started by giving several examples of uses of probability as well as puzzles. Here they are
briefly described with pointers to more details elsewhere.
The wrong way to think about this is as follows: since we see 1 euro in the open hand then the contents of the other hand are either 1/2 euro or 2 euros, each
happening with probability 1/2. Thereore, if we switch to the other hand the expected gain will be $\frac12 \cdot \frac12 + \frac12\cdot 2 = \frac54$ which is
$\gt 1$, hence we should switch.
In reality it does not make any difference if we switch or not (statistically -- of course it can matter at a specific instance of the game). So where is the
error in the above calculation?
We randomly choose a piece of paper, look at the number underneath, and win a candy if
we correctly state that the number we see is the larger or smaller of the two.
There is an easy way to win the candy with probability $1/2$.
We just always say "larger". Or always say "smaller". Or
we flip a fair coin and say "larger" or "smaller" depending on what the coin came up with, heads or tails.
(This way we don't even take into account the number we saw written under the piece of paper we chose.)
Is there a way to play this game so that the probability of getting the candy is greater than $1/2$?
The surprising answer is yes (but we cannot guarantee how much larger than 1/2 this probability of winning is).
We can do this as follows: Let $X$ be the number we saw after we chose a piece of paper (so $X$ is a random variable
taking the two values $a$ and $b$ with equal probability). Let also $Y$ be a random variable whose density on the real axis is everywhere strictly positive.
For instance we may take $Y$ to follow the standard normal distribution $\mathcal{N}(0, 1)$.
We run the experiment that produces $Y$ and then say "larger" if $X \ge Y$ or smaller if $X \lt Y$. One can easily check
that the probability of success is strictly larger than 1/2.
Read about it here.
Read about it here.
Read about it here.
Thursday, 8 May 2025
Today we examined first some randomized algorithms (algorithms that use probability
while running, but may fail to work with a very small probability) and thensaw a few proofs
(probabilistic constructions of objects with some desired properties).
The next examples are typical of uses of the probabilistic method relying only on the expected value (mean) of one random variable. The
following are probabilistic constructions. Setting up the right probabilistic experiment we construct an object with certain properties
based on the expected value of some quantity being in the right range.
rsync
. Read about it here.
Tuesday, 6 May 2025
A young man lives in Manhattan near a subway express station. He has two girlfriends, one in Brooklyn, one in the Bronx. To visit the girl in Brooklyn, he takes a train on the downtown side of the platform; to visit the girl in the Bronx, he takes a train on the uptown side of the same platform. Since he likes both girls equally well, he simply takes the first train that comes along. In this way, he lets chance determine whether he rides to the Bronx or to Brooklyn. The young man reaches the subway platform at a random moment each Saturday afternoon. Brooklyn and Bronx trains arrive at the station equally often—every 10 minutes. Yet for some obscure reason he finds himself spending most of his time with the girl in Brooklyn: in fact on the average he goes there 9 times out of 10. Can you think of a good reason why the odds so heavily favor Brooklyn?
Read the solution to the puzzle from the Wikipedia article
linked to from the same page.
Convince yourselves that the probability of winning with these three strategies are 1/3, 1/2 and 2/3 respectively.