[데이터 분석과 통계] 7. Logistic Regression, Exponential Family

2020. 11. 10. 12:35

728x90

Classification

입력받은 데이터가 속하는 카테고리를 분류
- 스팸 필터링
- 의료 보조
- 이미지 분류(Computer Vision)
임의 데이터셋
- linear regression : y=ax + b꼴로 모델 형성
  - 일정 카테고리로 분류되는 데이터의 경우, linear regression 적용 시 카테고리를 크게 벗어나는 데이터에 의해 오차가 커질 위험이 있음

Logistic Regression

데이터가 2가지 class로 분류되었음을 가정
- 예측하고자 하는 모델 범위는 [0, 1]
- 클래스가 속하는 카테고리는 0 혹은 1로 구분
$H_\theta(x)=g(\theta^Tx),\ g(k)=[1+e^{-k}]^{-1}$ $H_{θ} (x) = g (θ^{T} x), g (k) = [1 + e^{- k}]^{- 1}$
- g(k)를 signoid 혹은 logistic 함수라 부름
- k가 무한이면 g(k)=1, -무한이면 g(k)=0
- $\theta^Tx$ : 두 데이터 그룹을 나누는 기준선
$P(y=1|x, \theta)=H_\theta(x)$ , $P(y=0|x, \theta)=1-H_\theta(x)$
log-likelihood $l(\theta) = log(L(\theta))\\\sum_{i=1}^m[y^ilog(H_\theta(x^i))+(1-y^i)log(1-H_\theta(x^i))]$ $l (θ) = l o g (L (θ)) \sum_{i = 1}^{m} [y^{i} l o g (H_{θ} (x^{i})) + (1 - y^{i}) l o g (1 - H_{θ} (x^{i}))]$
- maximum log-likelihood : m=1 가정, $\frac{\partial l(\theta)}{\partial\theta_j}=[\frac{y}{H_\theta(x)}-\frac{(1-y)}{1-H_\theta(x)}]\frac{\partial H_\theta(x)}{\partial\theta_j}\\=[\frac{y}{H_\theta(x)}-\frac{(1-y)}{1-H_\theta(x)}]H_\theta(x)(1-H_\theta(x))x_j\\=(y-H_\theta(x))x_j$

최대/최소값을 찾는 방법
f(x) = 0이 되게 하는 x를 탐색
- $x_{t+1}=x_t-\frac{f(x)}{f'(x)}$
$J'(\theta)$ 에 대해 $\theta_{t+1} = \theta_t-\frac{J'(\theta_t)}{J''(\theta_t)}$
multiple vatiant function에 대해
- $\vec{x_{t+1}}=\vec{x_t}-H^{-1}\nabla f(\vec{x_t})$ $x_{t + 1} = x_{t} - H^{- 1} \nabla f (x_{t})$
  - $(H_x)_{i,j} = \frac{\partial^2f}{\partial x_i\partial x_j}$ : Hessian Matrix

Binary Classification / Logistic Regression : $\theta^Tx$ $θ^{T} x$ 가 0보다 큰가 / 작은가 여부로 분류
- 모델 $H_\theta=g(\theta^Tx),\ g(z)=\frac{1}{1+e^{-z}}$
- log-lokelihood $l(\theta) = log(L(\theta))\\=\sum_i[y^ilogh_\theta(x)+(1-y^i)log(1-h_\theta(x))]$
- gradient ascent $\theta_j=\theta_j+\frac{\alpha}{m}\sum_iy^i-h_\theta(x^i))x_j^i$

다수 클래스로의 분류
- $score_j = \theta_j^Tx$
- j개 클래스 중 가장 높은 score인 쪽으로 입력을 분류
score normalization : [0, 1] 내의 분포로 표준화
- $p(y^i|x^i, \theta)=\Pi_{y^i==l}[\frac{e^{\theta_i^Tx}}{\sum_je^{\theta_j^Tx}}]$
- log-likelihood $l(\theta)=log((p(y|x, \theta))=\sum_ilog((p(y^i|x^i, \theta))))$

linear, logistic, multivariant regression의 gradient descent는 동일 함수를 이용
이 세 함수를 같은 exponential family에 속한다고 칭함
- exponential family : $p(y, \eta) = b(y)exp(\eta^TT(y)-a(\eta))$
ex. linear regression
- $p(y, \mu)=\frac{1}{\sqrt{2\pi}}exp[-\frac{1}{2}(y-\mu)^2]\\=\frac{1}{\sqrt{2\pi}}exp[-\frac{1}{2}y^2]-exp[\mu y-\frac{1}{2}\mu^2]$
- $p(y, \eta) = b(y)exp(\eta^TT(y)-a(\eta))$ 꼴에서
- $b(y)=\frac{1}{\sqrt{2\pi}}exp[-\frac{1}{2}y^2], T(y)=y,\ \eta=\mu,\ a(\eta)=\frac{1}{2}\mu^2=\frac{1}{2}\eta^2$
ex. logistic regression
- $P(y=1)=\phi$ , $P(y=0)=1-\phi$
- $P(y)=\phi^y(1-\phi)^{1-\phi}=exp[ylog\phi+(1-y)log(1-\phi)]\\=exp[ylog(\frac{\phi}{1-\phi})+log(1-\phi)]$
- $b(y)=1,\ T(y)=y,\ \eta=log\frac{\phi}{1-\phi},\ a(\eta)=-log(1-\phi)=log(1+e^\eta)$
Exponential family의 gradient descent
- $\theta_j=\theta_j-\frac{\alpha}{m}\sum_i[H_\theta(x^i)-y^i]x_j^i$

728x90