Phase 1: Logic & Set Theory

0. Mathematics, Axioms, and Science

Mathematics begins with axioms — assumptions we agree to accept without proof — and builds everything else from them using precise rules of logic.
This process of axiomatization is what gives mathematics its unique clarity: starting from simple, explicit principles, we construct entire theories.

Why is this important?
- Axioms are not “true” in an absolute sense. They are starting points.
- What matters is whether the consequences drawn from them are logically consistent and useful for understanding the world.
- In practice, this makes mathematics a language in which science can express and test ideas.

Mathematics provides the rigorous framework for turning assumptions into predictions.

For example, in Euclidean geometry we assume as an axiom:

Through two distinct points, there is exactly one straight line.

But this depends on the “space” we are working in.

On a flat plane (Euclidean space), the axiom works perfectly: two points uniquely determine a straight line.
On the surface of a sphere (non-Euclidean geometry), the analogue of a “straight line” is a geodesic — the shortest path between two points on the surface.
On a sphere, geodesics turn out to be arcs of great circles (circles centered at the sphere’s center, like the equator or any meridian).

Intuitively:

If you fly from New York to Paris, the shortest route is not a line of latitude on a flat map, but rather a curved arc of a great circle (airplanes follow these routes because they minimize distance).
The airplane could in principle fly in a perfectly straight line, but only if it were allowed to go through the Earth’s interior. Since it’s constrained to stay on the surface of the sphere, the “straight line” within that space becomes a geodesic (the great circle).

Now consider two points on the equator that are exactly opposite each other (say Rio de Janeiro and Jakarta).

There isn’t a unique great circle through them — in fact, infinitely many great circles (all meridians) pass through both points.
So the Euclidean axiom fails in this curved space.

The key idea is that “straight line” really means shortest possible path given the geometry of the space.

On a plane, that’s the familiar straight line; on a sphere, it’s a great circle.
The notion of a geodesic generalizes this idea: whatever the space, it tells us the natural way to connect two points as efficiently as possible.

This illustrates a key point:

axioms are not “universally true” in some absolute sense.

They are conventions that define the system we are working in.
Within their proper context (e.g., Euclidean geometry), their consequences are logically consistent (no contradictions arise) and useful (for designing buildings, maps, and bridges). But in other contexts (e.g., spherical geometry), we need a different set of axioms.

In the same way, in probability we may assume:

Outcomes of an experiment are independent and identically distributed (i.i.d.).

Real-world data often violates this (dependence in time series, non-identical distributions in heterogeneous populations).
But within the i.i.d. framework, the consequences are logically consistent and extremely useful: they give us fundamental results like the Law of Large Numbers and the Central Limit Theorem.

The analogy with geometry is this:

In flat Euclidean space, the “straight line” is the natural geodesic.
On a sphere, the “straight line” is reinterpreted as a great circle.

Similarly, in probability:

For many problems, assuming i.i.d. is the natural starting point (the “straight line” of probability).
In more complex contexts (time series, causal inference), we need to redefine the structure (like choosing geodesics on a sphere) with weaker or different axioms — e.g., mixing conditions, stationarity, or potential outcomes.

Why we begin with Logic and Set Theory

Logic gives us the rules of the game:
how to combine statements, reason consistently, and structure proofs.
Without logic, axioms and theorems would collapse into ambiguity.
Set Theory provides the building blocks:
almost every object in modern mathematics (numbers, functions, probability spaces, datasets) can be described in terms of sets.

Together, Logic and Set Theory form the grammar and vocabulary of mathematics.
They are not yet “about the real world” — but they give us the precise tools to state assumptions and derive consequences.

Intuition for research

When you see theoretical work in physics, economics, or causal machine learning, it often starts by axiomatizing the problem:
- Define the objects (e.g. random variables, causal spaces).
- Specify assumptions as axioms (e.g. independence, stability, interventions).
- Derive results rigorously from there.

By starting here, we are learning how to speak the language of mathematics before applying it to inference, probability, and causality.

Thus, before diving into analysis and probability, we establish a foundation in logic and set theory. This is important to formalize assumptions, express mathematical objects precisely, and build proofs with rigor.

Logic gives us the language to connect premises and conclusions, while Set Theory gives us the structure to define universes of discourse, events, functions, probability spaces, etc.

Therefore: Yes, it would be essential to learn Logic and Set Theory!

1. Sentential Logic

1.1 Statements, Propositions, Predicates, and Connectives

💡Motivation
Learning statements and connectives is like learning the alphabet of mathematics.
- If you cannot distinguish valid statements, you cannot even start a proof.

Example in causal inference:
- \(p\): “The model includes all relevant confounders.”
- \(q\): “The conditional independence assumption holds.”
- Then “If the model includes all relevant confounders, then the conditional independence assumption holds” is \(p \to q\).

Without connectives, we’d stay in informal language. With them, we can formalize statements and reason rigorously about consequences, such as proving that a set of assumptions implies consistency of an estimator, showing that ignorability implies identification of a treatment effect, or demonstrating that conditional independence leads to factorization of a probability distribution into simpler components.

A Statement (or Proposition) is a declarative sentence that is either true or false.

Example: “3 is even” (false), “Barcelona is in Spain” (true).

A Predicate is like a “template” for a statement: it depends on a variable and becomes a statement once you specify the value. For example:
\(P(x): x > 0\) (this is a Predicate or a “template”)
- \(P(2)\) → “2 > 0” (This is a Statement, with logic value “true”).
- \(P(-1)\) → “-1 > 0” (This is a Statement, with logic value “false”).

Connectives are logical “operators” that allow us to combine simple statements into more complex ones.
They let us reason systematically instead of relying on vague language.

But why do we want to combine simple statements into more complex ones?

Because in mathematics (and especially in statistics), we rarely deal with isolated facts.
We deal with assumptions and want to know what conclusions they imply.
Connectives give us the structure to move from “pieces of information” to a logical argument.

Example from statistics:

\(p\): “The sample is i.i.d.”
\(q\): “The variance of the sample is finite.”
\(r\): “The sample mean converges to the true mean (Law of Large Numbers).”

In words:
“If the sample is i.i.d. and the variance is finite, then the sample mean converges.”

Symbolically:
\[ (p \land q) \to r \]

This is exactly how many theorems are written: conditions (joined with \(\land\)) imply a result (\(\to\)).

So connectives are not just about making compound sentences longer.
They are about giving us a precise language of assumptions and consequences.

This is the foundation of mathematical reasoning: once we can formalize, we can prove.

Let’s see them one by one with examples:

Negation (\(\lnot p\))
Meaning: “not \(p\)”
- If \(p\) is: “The number 5 is even” (false),
- Then \(\lnot p\) is: “The number 5 is not even” (true).

Conjunction (\(p \land q\))
Meaning: “\(p\) and \(q\)” → true only when both are true.
- \(p\): “2 is even” (true)
- \(q\): “3 is prime” (true)
- \(p \land q\): “2 is even and 3 is prime” (true).
- If either part were false, the whole conjunction would be false.

Disjunction (\(p \lor q\))
Meaning: “\(p\) or \(q\)” → true if at least one is true.
- \(p\): “Barcelona is in Spain” (true)
- \(q\): “Lisbon is in Brazil” (false)
- \(p \lor q\): “Barcelona is in Spain or Lisbon is in Brazil” (true).
(Note: In logic, “or” usually means inclusive or — at least one true, possibly both.)

Conditional (\(p \to q\))
Meaning: “if \(p\) then \(q\)” → false only if \(p\) is true and \(q\) is false.
- \(p\): “It rains.”
- \(q\): “The ground is wet.”
- \(p \to q\): “If it rains, then the ground is wet.”
  - True if it rains and the ground is wet (promise kept).
  - False if it rains but the ground is not wet (promise broken).
  - True if it doesn’t rain (vacuously true).

Biconditional (\(p \leftrightarrow q\))
Meaning: “\(p\) if and only if \(q\)” → true when \(p\) and \(q\) have the same truth value.
- \(p\): “Today is Saturday.”
- \(q\): “Tomorrow is Sunday.”
- \(p \leftrightarrow q\): “Today is Saturday if and only if tomorrow is Sunday.” → true.
- If one part is true and the other false, the biconditional is false.

Thus:
- Predicate: general template (open sentence, truth depends on a variable) -> becomes true or false only when a variable is given a value.
- Proposition/statement: instance of that template (closed sentence, definite truth) -> something that is already true or false.
- Connectives: operators that take simple propositions and form compound propositions.

1.2 Truth Tables

💡Motivation
Now that we know connectives allow us to combine assumptions and conclusions into structured arguments, we need a way to check the consistency and behavior of these arguments.

Why? Because once statements get more complex, our intuition alone isn’t reliable.

For example, the conditional \(p \to q\) often confuses beginners. Why? Because in everyday language, “if… then…” feels different from how logic defines it.

The Concept of “Promise”

We naturally read a conditional like \(p \to q\) as: “If it rains, then the ground gets wet.”

Think of \(p \to q\) as a promise:

\(p\): It rains.
\(q\): The ground gets wet.

So \(p \to q\) means: “I promise that if it rains, then the ground will get wet.”

We also naturally think that:
1. If it rains and the ground is wet → ✅ promise kept.
2. If it rains and the ground is not wet → ❌ promise broken.

So far, this matches intuition.

But what if it doesn’t rain?
- Logic says: the conditional is still true, no matter whether the ground is wet or dry.
- Why? Because the promise “if it rains, then the ground gets wet” has not been broken — the condition (\(p\)) never happened in the first place.
- This is what we call vacuous truth.

This feels counterintuitive, and that’s exactly why we need a systematic tool.

Why Truth Tables?

Truth tables are the grammar checker of logic. They resolve this confusion:

They let us mechanically verify whether compound statements are valid, equivalent, tautological, or contradictory.
They are a first step toward building formal proofs, because they allow us to test the “mechanics” of logical structure before tackling deeper arguments.

Think of them as a microscope for logical statements:
when connectives weave sentences together, truth tables let us zoom in and see all possible truth scenarios at once.

They also let us test whether two statements are logically equivalent. For example:

“If not \(p\), then not \(q\).”
“If \(q\), then \(p\).”

Are these equivalent? Intuition may fail, but a Truth Table makes it crystal clear!

Example in statistics:

If the data behave nicely (i.i.d. + finite variance), then the sample mean is reliable (it will converge to the true mean) (\(p \to q\)).
Contrapositive: If the sample mean does not converge to the true mean, then the data were not i.i.d. with finite variance (\(\lnot q \to \lnot p\)).

A truth table shows how the truth value of a compound statement depends on its parts.

Example: Prove that an implication (\(p \to q\)) is equivalent (\(\equiv\)) to its contrapositive (\(\lnot q \to \lnot p\)).

We want to show: \(p \to q \;\equiv\; \lnot q \to \lnot p\)

\(p\)	\(q\)	\(p \to q\)	\(\lnot q\)	\(\lnot p\)	\(\lnot q \to \lnot p\)
T	T	T	F	F	T
T	F	F	T	F	F
F	T	T	F	T	T
F	F	T	T	T	T

Since the last two columns match, the implication is equivalent to its contrapositive.

Example: Analysis of a Conditional

Take \(p\): It rains, and \(q\): The ground gets wet.
The statement \(p \to q\) is a promise: “If it rains, then the ground gets wet.”

\(p\)	\(q\)	\(p \to q\)	Explanation
T	T	T	✅ promise kept
T	F	F	❌ promise broken
F	T	T	vacuously true (condition never triggered)
F	F	T	vacuously true (condition never triggered)

Key intuition:
- An implication is only false when the condition happens but the promised result fails.
- In all other cases, the promise has not been broken, so the implication is true.

This is why statements like “All unicorns have horns” are technically true: since there are no unicorns, the condition never triggers, and the statement cannot be falsified.

Exercises

Exercise 1. Show that \(p \to q\) is equivalent to \(\lnot p \lor q\).
Step 1. Write the two statements side by side: \(p \to q\) and \(\lnot p \lor q\).
Step 2. Build a truth table with columns for \(p\), \(q\), \(\lnot p\), \(p \to q\), and \(\lnot p \lor q\).
Step 3. Compare the last two columns.
Step 4. Conclude: if they match in all rows, the statements are logically equivalent.

Solution

Step 1: Write the two statements side by side: \(p \to q\) and \(\lnot p \lor q\).
What we want to check is whether these two always mean the same thing.
- \(p \to q\): “If \(p\), then \(q\).”
- \(\lnot p \lor q\): “Either not \(p\), or \(q\).”

At first sight they look different, but we suspect they are really saying the same thing.
To confirm, we will compare their truth tables row by row.

Step 2: Build a truth table with columns for \(p\), \(q\), \(\lnot p\), \(p \to q\), and \(\lnot p \lor q\).
(Didactic hint: to fill this correctly, recall the “grammar” of each connective.)
- Negation \(\lnot p\): flips the truth value (T→F, F→T).
- Disjunction \(A \lor B\): true if at least one of \(A,B\) is true.
- Conditional \(p \to q\): think “promise” — it is false only when the promise is broken (i.e., \(p\) is T but \(q\) is F); otherwise true (including when \(p\) is F: vacuous truth).

\(p\)	\(q\)	\(\lnot p\)	\(p \to q\)	\(\lnot p \lor q\)
T	T	F	T	T
T	F	F	F	F
F	T	T	T	T
F	F	T	T	T

How we filled it:
- \(\lnot p\) is just the flip of \(p\).
- \(p \to q\) is only F in the row \((p,q)=(T,F)\) (promise broken).
- \(\lnot p \lor q\) is true when either \(\lnot p\) is T or \(q\) is T.

Step 3: Compare the last two columns.
The columns for \(p \to q\) and \(\lnot p \lor q\) are identical (T, F, T, T).
Therefore, the statements have the same truth value in every case.

Step 4: Conclude.
We conclude:
\(p \to q \equiv \lnot p \lor q\)

In words: an implication “if \(p\), then \(q\)” always carries the same meaning as saying “either \(p\) does not happen, or \(q\) happens.”

Exercise 2. Analyze the statement: “If a number is divisible by 4, then it is even.”
Step 1. Define \(p\): “number is divisible by 4”, and \(q\): “number is even.”
Step 2. Translate into logic: \(p \to q\).
Step 3. Think through cases:
- A number divisible by 4 (say 12) → it is even → promise kept.
- A number divisible by 4 but not even → impossible case → promise broken.
- A number not divisible by 4 (say 9) → nothing promised, implication vacuously true.
Step 4. Verify with a truth table if needed.
Step 5. Conclude: the statement holds logically.

Biconditional: \(p \leftrightarrow q\) (“\(p\) if and only if \(q\)”)

Rule: \(p \leftrightarrow q\) is true exactly when \(p\) and \(q\) have the same truth value
(both true or both false).

Equivalent form:
\[ p \leftrightarrow q \;\equiv\; (p \to q) \land (q \to p) \]
Example:
\(p\): “Today is Saturday.”
\(q\): “Tomorrow is Sunday.”
\(p \leftrightarrow q\): “Today is Saturday if and only if tomorrow is Sunday.”
Case analysis:
- If both \(p\) and \(q\) are true → the biconditional is true (both directions of the promise hold).
- If \(p\) is true but \(q\) is false → false, because one direction of the “if and only if” fails.
- If \(p\) is false but \(q\) is true → false, for the same reason.
- If both \(p\) and \(q\) are false → true, because they match in value (both false).

This explains why the truth table looks like this:

\(p\)	\(q\)	\(p \leftrightarrow q\)	Explanation
T	T	T	both true → promise kept
T	F	F	mismatch → one direction fails
F	T	F	mismatch → one direction fails
F	F	T	both false → they match

Intuition and goal of the biconditional

The biconditional expresses equivalence: \(p\) and \(q\) “stand or fall together.”
It is stronger than a one-way implication: both \(p \to q\) and \(q \to p\) must hold.

If you read \(p \leftrightarrow q\) aloud, it means:
“\(p\) is true exactly when \(q\) is true.” or “\(p\) holds exactly when \(q\) holds.”

This is why mathematicians often use “iff” (“if and only if”) in definitions and theorems:
- It guarantees not only that \(p\) implies \(q\), but also that \(q\) implies \(p\).

Why is this important?

It formalizes definitions in mathematics.
- Example: “A number \(n\) is even iff \(n = 2k\) for some integer \(k\).”
- This captures both directions: every even number has that form, and every number of that form is even.
It allows us to state equivalence theorems.
- Example: “A sequence is Cauchy iff it is convergent (in \(\mathbb{R}\)).”
- The biconditional captures the deep connection: each property implies the other.
It makes reasoning reversible.
- With an implication, you can only go forward (\(p \to q\)).
- With a biconditional, you can go forward and backward: knowing either \(p\) or \(q\) tells you the other.

By mastering the biconditional, the reader understands why mathematicians love the phrase “if and only if”: it’s the precise way of stating true equivalence between concepts.

Summary Table of Connectives

Connective	Symbol	Rule (when true)
Negation	\(\lnot p\)	when \(p\) is false
Conjunction	\(p \land q\)	when \(p\) and \(q\) are true
Disjunction	\(p \lor q\)	when at least one of \(p, q\) is true
Conditional	\(p \to q\)	false only if \(p\) true and \(q\) false
Biconditional	\(p \leftrightarrow q\)	when \(p\) and \(q\) have same truth value

1.3 Tautologies, Contradictions, and Logical Equivalence

Tautology

Definition:
A tautology is a statement that is true in all possible cases.

Why it matters:
- Tautologies act like universal truths: they don’t depend on data or assumptions.
- They are often the “glue” of proofs, showing that certain forms are always valid.
- Many rules of inference (like modus ponens) are based on tautologies.

Example (logic):

\[ (p \land q) \to p \]

This means: If both (p) and (q) are true, then (p) is true.
- Always true, regardless of whether (p) or (q) are true or false.

Example (statistics):
The Law of Total Probability is tautological:

\[ P(A) = P(A \cap B) + P(A \cap \lnot B). \]

This identity always holds by construction, no matter what events (A) and (B) are.

Contradiction

Definition:
A contradiction is a statement that is false in all possible cases.

Why it matters:
- Contradictions are the engine of proof by contradiction.
- If assuming something leads to a contradiction, then the assumption must be false.
- They represent “impossible situations” in logic.

Example (logic):

\[ p \land \lnot p \]

This means: (p) is true and (p) is false at the same time.
- Always false, no matter what truth value (p) has.

Example (statistics):
Suppose we assume:
1. “The variance of this distribution is finite.”
2. “The variance of this distribution is infinite.”

Together, these form a contradiction, so at least one assumption must be wrong.

Logical Equivalence

Definition:
Two statements are logically equivalent if they have the same truth value in all possible cases.

Why it matters:
- Logical equivalence lets us replace one statement with another in a proof.
- Many powerful proof strategies rely on equivalence (contrapositive law, De Morgan’s laws, distributive laws).
- Often the equivalent form is much easier to work with.

Example (logic):

\[ p \to q \;\equiv\; \lnot p \lor q \]

This means: “If (p), then (q)” is the same as “Either not (p), or (q).”
- This equivalence makes it easier to manipulate conditionals in proofs.

Example (causal inference):

\[ \text{Ignorability} \to \text{Identifiability} \]

is logically equivalent to

\[ \lnot \text{Identifiability} \to \lnot \text{Ignorability}. \]

Switching to the contrapositive often makes a proof or argument simpler.

Summary:
- Tautologies give us universal truths to rely on.
- Contradictions allow us to eliminate false assumptions through contradiction proofs.
- Logical equivalence lets us restate problems in easier forms without changing meaning.

2. Quantificational Logic

In propositional (sentential) logic, we treated statements as indivisible units: each one was either true or false.
But in mathematics and statistics, we often want to say things about all numbers in a set, or claim that at least one number has a certain property.
This is where quantifiers come in.

2.1 Predicates and Quantifiers

As we saw above, a predicate is like a sentence with a “blank” — it becomes a full statement only once you plug in a value.

Example: \(P(x): x > 0\).
- If \(x = 2\), then \(P(2)\) is the proposition “2 > 0” (true).
- If \(x = -3\), then \(P(-3)\) is the proposition “-3 > 0” (false).

We use quantifiers to talk about how many elements satisfy a predicate:

Universal quantifier (\(\forall\)):
\(\forall x\; P(x)\) means “for all \(x\), \(P(x)\) is true.”
Existential quantifier (\(\exists\)):
\(\exists x\; P(x)\) means “there exists at least one \(x\) such that \(P(x)\) is true.”

Examples:
- \(\forall x \in \mathbb{Z},\; x^2 \geq 0\). (Every integer squared is nonnegative.)
- \(\exists x \in \mathbb{Z},\; x^2 = 9\). (There exists an integer whose square is 9.)

2.2 Universe of Discourse

The universe of discourse is the set of objects we allow \(x\) to vary over.
The truth of a statement depends on it!

Example:

\(\forall x \in \mathbb{R},\; x^2 \geq 0\) is true.
\(\forall x \in \mathbb{Z},\; x^2 = 2\) is false (no integer squared equals 2).

If we didn’t specify whether \(x\) ranges over \(\mathbb{R}\) or \(\mathbb{Z}\) (both interpreted as the universe of discourse of each statement), the meaning would be ambiguous.

2.3 Truth of Quantified Statements

How to evaluate quantified statements:

\(\forall x\; P(x)\) is true if every \(x\) in the universe makes \(P(x)\) true.
\(\exists x\; P(x)\) is true if at least one \(x\) makes \(P(x)\) true.

Negations of Quantifiers

Negating quantified statements flips the quantifier:

\[ \lnot (\forall x\, P(x)) \equiv \exists x\, \lnot P(x) \]

\[ \lnot (\exists x\, P(x)) \equiv \forall x\, \lnot P(x) \]

Examples:

“Not all students passed” means “There exists a student who did not pass.”
“There does not exist a unicorn” means “For all \(x\), \(x\) is not a unicorn.”

2.4 Multiple Quantifiers

Often statements involve more than one quantifier.
The order matters!

\(\forall x \in \mathbb{R},\; \exists y \in \mathbb{R}: y > x\)
→ True, because for every real number \(x\), we can pick \(y = x+1\).
\(\exists y \in \mathbb{R},\; \forall x \in \mathbb{R}: y > x\)
→ False, because no single real number is greater than all real numbers.

Tip: Think of quantifiers as a kind of game:
- For \(\forall x\), your opponent chooses the worst possible \(x\).
- For \(\exists y\), you get to respond by picking a suitable \(y\).

The order decides who gets to “move” first, and the outcome can change completely.

2.5 Why This Matters

Quantifiers appear in almost every mathematical theorem.

Analysis (limits):
\[ \forall \epsilon > 0,\; \exists \delta > 0:\; |x - a| < \delta \;\to\; |f(x) - L| < \epsilon \]
(“For every tolerance \(\epsilon\), there exists a closeness \(\delta\) that guarantees the function stays within that tolerance.”)
Statistics:
- \(\forall n,\; \exists \hat{\theta}_n:\; \hat{\theta}_n \to \theta\) (There exists an estimator consistent for \(\theta\).)
- \(\exists\) an unbiased estimator of \(\mu\) (the sample mean).
Causal Inference:
- \(\forall\) randomized experiments, \(\exists\) an unbiased estimator of the treatment effect.

Quantifiers are the way mathematics formalizes sweeping claims like “always” and “sometimes,” which are at the heart of proofs and assumptions in Causal ML.

2.6 Mathematical Interpretation

Quantifiers can look intimidating at first, but the real skill is learning how to read them.
Every quantified statement has two parts:
1. the quantifier (\(\forall\) or \(\exists\)), and
2. the predicate (a property of \(x\) that is claimed to hold).

2.6.1 Single Quantifier Examples

Universal (\(\forall\)):

\[ \forall x \in \mathbb{R},\; x^2 \geq 0 \]

Read: “For every real number \(x\), the square of \(x\) is nonnegative.”
Interpretation: This is true, because no real number squared gives a negative result.

Statistics example:

\[ \forall n \in \mathbb{N},\; \operatorname{Var}(\bar{X}_n) \geq 0 \]

Read: “For every sample size \(n\), the variance of the sample mean is nonnegative.”
Interpretation: Always true, because variances can never be negative.

Existential (\(\exists\)):

\[ \exists x \in \mathbb{Z},\; x^2 = 4 \]

Read: “There exists an integer whose square is 4.”
Interpretation: True, since \(x = 2\) and \(x = -2\) work.

Statistics example:

\[ \exists \;\text{an estimator } \hat{\theta}\; : \; \mathbb{E}[\hat{\theta}] = \theta \]

Read: “There exists an estimator whose expected value equals the true parameter.”
Interpretation: This is the definition of an unbiased estimator (e.g., the sample mean for \(\mu\)).

2.6.2 Multiple Quantifier Examples

When quantifiers are combined, order matters.

Example 1:

\[ \forall x \in \mathbb{R},\; \exists y \in \mathbb{R}:\; y > x \]

Read: “For every real number \(x\), there exists a real number \(y\) that is greater than \(x\).”
True, because if someone hands you any \(x\), you can always respond with \(y = x+1\).

Statistics example:

\[ \forall \epsilon > 0,\; \exists N \in \mathbb{N}:\; n > N \;\to\; |\bar{X}_n - \mu| < \epsilon \]

Read: “For every tolerance \(\epsilon\), there exists a large enough sample size \(N\) such that if \(n > N\), the sample mean is within \(\epsilon\) of \(\mu\).”
Interpretation: This is the definition of consistency (Law of Large Numbers).

Example 2:

\[ \exists y \in \mathbb{R},\; \forall x \in \mathbb{R}:\; y > x \]

Read: “There exists a real number \(y\) such that \(y\) is greater than every real number \(x\).”
False, because no single real number is larger than all others.

Statistics example (false statement):

\[ \exists N \in \mathbb{N},\; \forall n > N:\; \bar{X}_n = \mu \]

Read: “There exists a finite sample size \(N\) such that for all \(n > N\), the sample mean equals the population mean exactly.”
False, because sampling variation never completely disappears — the sample mean only converges in probability, not with exact equality at some \(N\).

2.6.3 How to Think About Multiple Quantifiers

A useful way to think is as a game:

\(\forall x\) = your opponent picks a value of \(x\), possibly trying to make you fail.
\(\exists y\) = you get to respond by picking \(y\) to satisfy the condition.

So the statement

\[ \forall x \in \mathbb{R},\; \exists y \in \mathbb{R}:\; y > x \]

means: No matter what \(x\) your opponent picks, you can always respond with a suitable \(y\).

But the reverse order

\[ \exists y \in \mathbb{R},\; \forall x \in \mathbb{R}:\; y > x \]

means: You must pick one \(y\) that beats all possible \(x\). This is impossible, so the statement is false.

Why this section is important

Quantifiers are everywhere in math, stats, and causal ML.

Universal quantifiers express generality:
- “For all sample sizes \(n\), \(\operatorname{Var}(\bar{X}_n) \geq 0\).”
- “For all \(\epsilon > 0\), there exists an \(N\) such that …” (limits, consistency).
Existential quantifiers express possibility:
- “There exists an unbiased estimator of \(\mu\).”
- “There exists a consistent estimator for every parameter.”
With multiple quantifiers, the order of ‘who chooses first’ changes the meaning dramatically.
This interpretative skill is essential for reading theorems correctly and avoiding misinterpretation.

2.7 Exercises

Goal of these exercises:
- Practice evaluating truth values.
- Practice negating quantified statements.
- Practice reading and interpreting symbolic logic in plain English.
At this stage, we are not proving statements — only learning to understand and translate them correctly.

Truth values (universe of discourse: \(\mathbb{Z}\)):
For each statement, decide whether it is true or false and explain why in words.
- \(\forall x,\; x^2 \geq 0\)
- \(\exists x,\; x^2 = 2\)
Hint: In the first, think: “Is there any integer whose square is negative?”
In the second, think: “Is there an integer whose square equals 2?”

Negation practice:
Write the logical negation of each statement and simplify.
- \(\forall x \in \mathbb{R},\; x^2 \geq 0\)
- \(\exists x \in \mathbb{N},\; x^2 = 2\)
Hint: Use the rules:
\[ \lnot (\forall x\, P(x)) \equiv \exists x\, \lnot P(x), \qquad \lnot (\exists x\, P(x)) \equiv \forall x\, \lnot P(x). \]

Quantifier order:
Carefully interpret the following statements in plain English.
Are they true or false?
- \(\forall x \in \mathbb{R},\; \exists y \in \mathbb{R}: y > x\)
- \(\exists y \in \mathbb{R},\; \forall x \in \mathbb{R}: y > x\)

Translate into symbols:

Express the following in logical notation.
- “Every dataset has at least one outlier.”
- “There exists a consistent estimator for every parameter.”

Interpret the following statistical statements (no proof needed):
- \(\forall n \in \mathbb{N},\; \operatorname{Var}(\bar{X}_n) \geq 0\)
  (For every sample size \(n\), the variance of the sample mean is nonnegative.)
- \(\exists n \in \mathbb{N},\; \forall \epsilon > 0:\; |\bar{X}_n - \mu| < \epsilon\)
  (There exists a fixed sample size \(n\) such that the sample mean is always arbitrarily close to \(\mu\). Is this realistic?)
- \(\forall \epsilon > 0,\; \exists N \in \mathbb{N}:\; n > N \;\to\; |\bar{X}_n - \mu| < \epsilon\)
  (Interpretation: This is the formal definition of consistency / the Law of Large Numbers).

References

Velleman, D. J. (2006). How to Prove It: A Structured Approach.
Rosen, K. H. (2011). Discrete Mathematics and Its Applications.
Spanos, A. (1999, 2010). Probability Theory and Statistical Inference.