Normal Equation : Vector derivative 를 이용한 유도
OLS (Ordinal Least Square)에서 approximation of solution $\hat{\textbf{x}}$는 다음을 만족해야 함.
$$\begin{aligned}
\hat{\textbf{x}} &= \text{arg } \underset{\text{x}}{\text{min}} \Vert \textbf{b}-A\textbf{x} \Vert\\
&=\text{arg } \underset{\text{x}}{\text{min} } \Vert \textbf{b}-A\textbf{x} \Vert ^2\\
&=\text{arg } \underset{\text{x}}{\text{min} } (\textbf{b}-A\textbf{x} )^T(\textbf{b}-A\textbf{x})\\
&=\text{arg } \underset{\text{x}}{\text{min} } f_\text{loss} (\textbf{x})\end{aligned}$$
where
- $\textbf{x}$ : column vector, $\textbf{x} \in \mathbb{R}^n$, ($n \times 1$)
- $A$ : $m \times n$ matrix
- $\textbf{b}$ : column vector, $\textbf{b} \in \mathbb{R}^m$, ($m \times 1$)
- $m \gg n$
즉, 다음의 function $f_\text{loss}(\textbf{x})$를 최소화 시키는 $\textbf{x}$를 찾는 문제임.
$$\begin{aligned}
f_\text{loss}(\textbf{x})&= \textbf{b}^T\textbf{b} - \textbf{x}^T A^T \textbf{b} - \textbf{b}^T A \textbf{x} + \textbf{x}^T A^T A \textbf{x}
\end{aligned}$$
$f_\text{loss}(\textbf{x})$가 최소값일 때, $\textbf{x}$에 대한 gradient는 0 이 되므로 이를 이용하여 다음과 같이 벡터미분을 통해 normal equation을 구할 수 있음.
$$\begin{aligned}
0&=\dfrac{\partial f_\text{loss}(\textbf{x})}{\partial \textbf{x}}\\
&=\dfrac{\partial}{\partial \textbf{x}}\left( \textbf{b}^T\textbf{b} - \textbf{x}^T A^T \textbf{b} - \textbf{b}^T A \textbf{x} + \textbf{x}^T A^T A \textbf{x}\right)\\
&=\dfrac{\partial}{\partial \textbf{x}}\left( \textbf{b}^T\textbf{b} - \textbf{x}^T A^T \textbf{b} - (\textbf{x}^T A^T \textbf{b})^T + \textbf{x}^T A^T A \textbf{x}\right)\\
&=\dfrac{\partial}{\partial \textbf{x}}\left( \textbf{b}^T\textbf{b} - \textbf{x}^T A^T \textbf{b} - (\textbf{x}^T A^T \textbf{b}) + \textbf{x}^T A^T A \textbf{x}\right)\\
&=0-A^T\textbf{b}-A^T\textbf{b}+2A^TA\hat{\textbf{x}}\\
&=-2A^T\textbf{b}+2A^TA\hat{\textbf{x}}\\
&\quad \\
2A^T\textbf{b}&=2A^TA\hat{\textbf{x}}\\
A^T\textbf{b}&=A^TA\hat{\textbf{x}}\\
A^TA\hat{\textbf{x}}&=A^T\textbf{b}\\
\hat{\textbf{x}}&=(A^TA)^{-1}A^T\textbf{b}\end{aligned}$$
Vector derivative (summary)
$f(\textbf{x})$ | $\frac{d f(\textbf{x})}{d \textbf{x}}$ |
---|---|
$\textbf{x}^T A$ | $A$ |
$\textbf{x}^T \textbf{b}$ | $\textbf{b}$ |
$\textbf{x}^T \textbf{x}$ | $2\textbf{x}$ |
$\textbf{x}^T A \textbf{x}$ | $2A\textbf{x}$ |
Transpose 관련
- $(A^T)^T =A$
- $(A+B)^T =A^T+B^T$
- $(A-B)^T =A^T-B^T$
- $(kA)^T =kA^T, \text{ where }k\text{ is scalar.}$
- $k^T = k, \text{ where }k\text{ is scalar.}$
- $(AB)^T=B^TA^T$
Reference
'... > Math' 카테고리의 다른 글
[Statistics] Sample Point Method (0) | 2022.05.01 |
---|---|
Closed-form solution and Closed-form expression (0) | 2022.04.29 |
One sample t-test : The Moon Illusion (0) | 2022.04.27 |
Chi Square : Independence Test (Analysis of Contingency Table) (0) | 2022.04.25 |
Chi Square Test : Goodness of fit test (0) | 2022.04.25 |