Random Vector

A concise explanation of a random vector as a collection of random variables representing a single observation with multiple features.

Definition

A random vector is a collection of random variables:

\[ X = (X^{(1)}, X^{(2)}, \dots, X^{(p)}) \]

where each component \(X^{(j)}\) is a random variable.

It represents a single observation with multiple features.

Interpretation

A random vector models the outcome of one observation generated by a Data Generating Process (DGP).

Each component corresponds to a feature
All components are generated together
Their joint behavior is described by a distribution \(F\)

\[ X \sim F \]

Example

In a coffee dataset, one observation may include:

rating
acidity
body

We represent this as:

\[ X = (\text{rating}, \text{acidity}, \text{body}) \]

Each component is a random variable, and together they form a random vector.

From Random Vector to Dataset

In practice, we observe multiple realizations of \(X\):

\[ X_1, X_2, \dots, X_n \]

Each \(X_i\) is a random vector
Each realization \(x_i\) is a row in the dataset

So:

Rows → realizations of random vectors
Columns → components of the random vector

Connection to Random Sample

A random sample is a collection of random vectors:

\[ X_1, \dots, X_n \overset{i.i.d.}{\sim} F \]

This means:

Each observation has the same distribution
Observations are independent

Key Idea

A random vector represents:

One observation
Multiple features
Joint randomness across variables

One-line Summary

A random vector is a collection of random variables representing a single observation with multiple features, generated by the same data-generating process.