Normal Equation : Vector derivative 를 이용한 유도

728x90

Normal Equation : Vector derivative 를 이용한 유도

OLS (Ordinal Least Square)에서 approximation of solution $\hat{\textbf{x}}$는 다음을 만족해야 함.
$$\begin{aligned}
\hat{\textbf{x}} &= \text{arg } \underset{\text{x}}{\text{min}} \Vert \textbf{b}-A\textbf{x} \Vert\\
&=\text{arg } \underset{\text{x}}{\text{min} } \Vert \textbf{b}-A\textbf{x} \Vert ^2\\
&=\text{arg } \underset{\text{x}}{\text{min} } (\textbf{b}-A\textbf{x} )^T(\textbf{b}-A\textbf{x})\\
&=\text{arg } \underset{\text{x}}{\text{min} } f_\text{loss} (\textbf{x})\end{aligned}$$
where

$\textbf{x}$ : column vector, $\textbf{x} \in \mathbb{R}^n$, ($n \times 1$)
$A$ : $m \times n$ matrix
$\textbf{b}$ : column vector, $\textbf{b} \in \mathbb{R}^m$, ($m \times 1$)
$m \gg n$

즉, 다음의 function $f_\text{loss}(\textbf{x})$를 최소화 시키는 $\textbf{x}$를 찾는 문제임.
$$\begin{aligned}
f_\text{loss}(\textbf{x})&= \textbf{b}^T\textbf{b} - \textbf{x}^T A^T \textbf{b} - \textbf{b}^T A \textbf{x} + \textbf{x}^T A^T A \textbf{x}
\end{aligned}$$

$f_\text{loss}(\textbf{x})$가 최소값일 때, $\textbf{x}$에 대한 gradient는 0 이 되므로 이를 이용하여 다음과 같이 벡터미분을 통해 normal equation을 구할 수 있음.

$$\begin{aligned}
0&=\dfrac{\partial f_\text{loss}(\textbf{x})}{\partial \textbf{x}}\\
&=\dfrac{\partial}{\partial \textbf{x}}\left( \textbf{b}^T\textbf{b} - \textbf{x}^T A^T \textbf{b} - \textbf{b}^T A \textbf{x} + \textbf{x}^T A^T A \textbf{x}\right)\\
&=\dfrac{\partial}{\partial \textbf{x}}\left( \textbf{b}^T\textbf{b} - \textbf{x}^T A^T \textbf{b} - (\textbf{x}^T A^T \textbf{b})^T + \textbf{x}^T A^T A \textbf{x}\right)\\
&=\dfrac{\partial}{\partial \textbf{x}}\left( \textbf{b}^T\textbf{b} - \textbf{x}^T A^T \textbf{b} - (\textbf{x}^T A^T \textbf{b}) + \textbf{x}^T A^T A \textbf{x}\right)\\
&=0-A^T\textbf{b}-A^T\textbf{b}+2A^TA\hat{\textbf{x}}\\
&=-2A^T\textbf{b}+2A^TA\hat{\textbf{x}}\\
&\quad \\
2A^T\textbf{b}&=2A^TA\hat{\textbf{x}}\\
A^T\textbf{b}&=A^TA\hat{\textbf{x}}\\
A^TA\hat{\textbf{x}}&=A^T\textbf{b}\\
\hat{\textbf{x}}&=(A^TA)^{-1}A^T\textbf{b}\end{aligned}$$

Vector derivative (summary)

$f(\textbf{x})$	$\frac{d f(\textbf{x})}{d \textbf{x}}$
$\textbf{x}^T A$	$A$
$\textbf{x}^T \textbf{b}$	$\textbf{b}$
$\textbf{x}^T \textbf{x}$	$2\textbf{x}$
$\textbf{x}^T A \textbf{x}$	$2A\textbf{x}$

Transpose 관련

$(A^T)^T =A$
$(A+B)^T =A^T+B^T$
$(A-B)^T =A^T-B^T$
$(kA)^T =kA^T, \text{ where }k\text{ is scalar.}$
$k^T = k, \text{ where }k\text{ is scalar.}$
$(AB)^T=B^TA^T$

Reference

matrixcookbook.pdf

692.7 kB

'... > Math' 카테고리의 다른 글

[Statistics] Sample Point Method (0)	2022.05.01
Closed-form solution and Closed-form expression (0)	2022.04.29
One sample t-test : The Moon Illusion (0)	2022.04.27
Chi Square : Independence Test (Analysis of Contingency Table) (0)	2022.04.25
Chi Square Test : Goodness of fit test (0)	2022.04.25