Probability in Data Science at KU, Spring 24-25

\( \newcommand{\Ds}{\displaystyle} \newcommand{\PP}{{\mathbb P}} \newcommand{\RR}{{\mathbb R}} \newcommand{\KK}{{\mathbb K}} \newcommand{\CC}{{\mathbb C}} \newcommand{\ZZ}{{\mathbb Z}} \newcommand{\NN}{{\mathbb N}} \newcommand{\TT}{{\mathbb T}} \newcommand{\QQ}{{\mathbb Q}} \newcommand{\Abs}[1]{{\left|{#1}\right|}} \newcommand{\v}[1]{{{\mathbf #1}}} \newcommand{\Floor}[1]{{\left\lfloor{#1}\right\rfloor}} \newcommand{\Ceil}[1]{{\left\lceil{#1}\right\rceil}} \newcommand{\sgn}{{\rm sgn\,}} \newcommand{\Set}[1]{{\left\{{#1}\right\}}} \newcommand{\Norm}[1]{{\left\|{#1}\right\|}} \newcommand{\Prob}[1]{{{{\mathbb P}}\left[{#1}\right]}} \newcommand{\Mean}[1]{{{{\mathbb E}}\left[{#1}\right]}} \newcommand{\cis}{{\rm cis}\,} \newcommand{\one}{{\mathbf 1}} \renewcommand{\Re}{{\rm Re\,}} \renewcommand{\Im}{{\rm Im\,}} \renewcommand{\arg}{{\rm arg\,}} \renewcommand{\Arg}{{\rm Arg\,}} \renewcommand{\deg}{{\rm deg\,}} \newcommand{\ft}[1]{\widehat{#1}} \newcommand{\FT}[1]{\left(#1\right)^\wedge} \newcommand{\Lone}[1]{{\left\|{#1}\right\|_{1}}} \newcommand{\Linf}[1]{{\left\|{#1}\right\|_\infty}} \newcommand{\inner}[2]{{\langle #1, #2 \rangle}} \newcommand{\Inner}[2]{{\left\langle #1, #2 \right\rangle}} \)

Class diary

In chronological ordering

Tuesday, 6 May 2025

We started by giving several examples of uses of probability as well as puzzles. Here they are briefly described with pointers to more details elsewhere.

Thursday, 8 May 2025

Tuesday, 13 May 2025

Today we gave one more example of this method. This differs from the ones we saw last time in that the random object constructed is not guaranteed to have the properties that we want. Instead we need to make some modifications of it and then it works. These modifications need to be controlled and this is again done use the average value argument.

• If the graph $G$ has $n$ vertices and $nd/2$ edges then it contains an independent set of size $\ge n/(2d)$.

Proof: Take a random subset $S$ of the vertices by keeping each vertex with probability $p$. Let $X$ be its size and $Y$ be the number of edges in $G$ restricted to $S$. It turns out that $$ \Mean{Y} = \frac{nd}{2}p^2. $$ Since $\Mean{X} = pn$ we have $\Mean{X-Y}=np-\frac{nd}{2}p^2$. Choose $p=1/d$ to make this as large as possible we obtain $\Mean{X-Y}=n/(2d)$. Delete one vertex from each edge to be left with an independent set of size $\ge n/(2d)$.

• The next thing we saw is the so-called derandomization by conditional probabilities which is a general method of turning some probabilitic proofs of existence into constructive methods (algorithms) of finding the good object whose proof we already have. Read about it here (section 1.3).

• We observed next that the average value argument cannot possibly help us construct an interesting object if the number of quantities we want to control (to have in a certain range, for example) are more than 1. For this we need deviation inequalities: inequalities that give us upper bounds on the probability that a ceratin random variable is far from its mean. The most well known (and usually weak) such inequalitites are Markov's inequality and Chebyshev's inequality, inequalities we usually encounter in any standard course of Probability. When, however, the random variable $X$ in question is a sum of many independent parts then we have the phenomenon of concentration of $X$ close to its mean $\Mean{X}$. We gave (without proof) an example of such an inequality:

Thursday, 15 May 2025

• Today we started by proving the theorem that we stated last time. To prove it the key is to apply Markov's inequality to the variable $e^{\lambda S}$ for an appropriately chosen parameter $\lambda$. Since $S=X_1+\cdots+X_n$ and the $X_j$ are independent we can write $$ \Mean{e^{\lambda S}} = \Mean{e^{\lambda X_1}} \cdots \Mean{e^{\lambda X_n}} = \left( \Mean{e^{\lambda X_1}} \right)^n, $$ and using the inequality $\frac{e^{-\lambda}+e^\lambda}{2} \le e^{\lambda^2/2}$ (easy to prove using the power series expansion for the exponential function) we deduce, choosing now $\lambda=a/n$, the desired inequality of the theorem.

• We used the same theorem then to prove that if $A$ is a $n\times m$ matrix with 0 or 1 entries then there is a vector of signs $b \in \Set{-1, 1}^m$ such that $\Linf{A b} \le \sqrt{4 m \log n}$. The trick is again to take $b$ to be a random vector of signs (independently for all coordinates and with equal probability for $\pm 1$). It follows that $\Mean{(Ab)_i} = 0$ for all $i=1,\ldots,n$, and we would like all random variables $(Ab)_i$, $i=1,\ldots,n$, to be close to their expected value. The probability that this fails for a given $i$ is given to us by using this theorem. It turns out that the sum of the probabilities of these "bad events" is smaller than 1, so that with positive probability none of these bad events happens. You can read more details in Theorem 4.11 of the book "Mitzenmacher and Upfal, Probability for Computing, 2ed".

• Then we stated the following large deviation inequality without proof. If $X_1, \ldots, X_n$ are independent indicator (0 or 1-valued) random variables and $S = X_1+\cdots+X_n$ with $\mu = \Mean{S}$ then for any $\delta \in (0, 1)$ we have $$ \Prob{\Abs{S-\mu} \le \delta \mu} \le 2 e^{-\mu \frac{\delta^3}{3}}. $$

• We used this result next to estimate the number $n$ of repetitions of a randomized algorithm, which produces the correct result with probability $p\gt 1/2$, that we need to make in order to select as the correct result the one which appears more than half the times. Writing $X_j = 1$ if the $j$-th repetition of the randomized algorithm gave the correct result and 0 else, and $S=X_1+\cdots+X_n$, we observe that $\Mean{S} = np$ and we would like that the following bad event does not happen: $\Set{S \le n/2}$. This bad event is included in the bad event $\Set{\Abs{S-np} \gt n(p-\frac12)} = \Set{\Abs{S-np} \gt \frac{p-\frac12}{p} np}$ which, by our inequality, is at most $2 \exp(-n \frac{(p-1/2)^3}{p^2})$ and we would like this to be at most a desired $\epsilon$. We can solve this inequality for $\epsilon$ to get that $$ n \gt \log\frac{2}{\epsilon} \frac{p^2}{(p-1/2)^3} $$ is enough.

• Last we talked about a theorem of Erdos on the existence of additive bases of order 2 for the natural numbers. This was proved using the probabilistic method where we repeatedly use the previous theorem for bounding large deviations. You can read about this result in section 3.2 in this paper.

Probability in Data Science

A short course

Spring 2024-25

KU Eichstätt - Ingolstadt