Random Vector

A concise explanation of a random vector as a collection of random variables representing a single observation with multiple features.

Definition

A random vector is a collection of random variables:

\[ X = (X^{(1)}, X^{(2)}, \dots, X^{(p)}) \]

where each component \(X^{(j)}\) is a random variable.

It represents a single observation with multiple features.


Interpretation

A random vector models the outcome of one observation generated by a Data Generating Process (DGP).

  • Each component corresponds to a feature
  • All components are generated together
  • Their joint behavior is described by a distribution \(F\)

\[ X \sim F \]


Example

In a coffee dataset, one observation may include:

  • rating
  • acidity
  • body

We represent this as:

\[ X = (\text{rating}, \text{acidity}, \text{body}) \]

Each component is a random variable, and together they form a random vector.


From Random Vector to Dataset

In practice, we observe multiple realizations of \(X\):

\[ X_1, X_2, \dots, X_n \]

  • Each \(X_i\) is a random vector
  • Each realization \(x_i\) is a row in the dataset

So:

  • Rows → realizations of random vectors
  • Columns → components of the random vector

Connection to Random Sample

A random sample is a collection of random vectors:

\[ X_1, \dots, X_n \overset{i.i.d.}{\sim} F \]

This means:

  • Each observation has the same distribution
  • Observations are independent

Key Idea

A random vector represents:

  • One observation
  • Multiple features
  • Joint randomness across variables

One-line Summary

A random vector is a collection of random variables representing a single observation with multiple features, generated by the same data-generating process.