Computer vision/Detection

Yolo v1 paper architecture

zz0622

|2023. 4. 16. 15:39

Yolo v1 paper review

Untitled

Output → 7 x 7 x 30인 이유는

$grid * grid * {(x,y,w,h ,c)* bbox_{candidate} + class_{size}}$

Untitled

Localization Loss

$$
\lambda_{coord}
\sum_{i=0}^{S^2}\sum_{j=0}^{B}
\mathbb{1}_{ij}^{obj}[(x_i - \hat{x}_i )^2 + (y_i - \hat{y_i})^2]
\ \qquad \qquad \qquad

\lambda_{coord} \sum_{i=0}^{S^2} \sum_{j=0}^{B}\mathbb{1}_{ij}^{obj}[(\sqrt w_i- \sqrt {\hat{w_i}})^2 + (\sqrt h_i - \sqrt {\hat{h_i}})^2]
$$
$\lambda_{coord}$ : 객체를 포함하는 cell에 가중치를 주는 parameter (=5)
$S^2$ : grid cell의 수 (=7x7 = 49)
$B$ : grid cell별 bounding box의 수 (=2)
$\mathbb{1}_{ij}^{obj}$ : $i$ 번째 grid cell의 $j$ 번째 bbox 객체를 감지하도록 할당(responsible)
ex) IoU가 hightest 한 Index 하나만 loss값을 줄 수 있게 할당

$w,h$ 에 루트를 씌워주는 이유는 그대로 넣어버리면 큰 bbox오류가 작은 bbox오류보다 중요하다고 판단 할 수 있는데 이를 방지하기 위해서 사용

많은 grid cell은 객체를 포함하지 않아 confidence score가 0이 되어 객체를 포함하는 grid cell의 gradient를 압도하여, 모델이 불안정해질 수 있습니다. $\lambda_{coord}$ 는 이러한 문제를 해결하기 위해 객체를 포함하는 cell에 가중치를 두는 파라미터입니다.

indecator function에서 $\mathbb{1}{ij}^{obj} $표현보다$ \mathbb{1}{ij}^{responsible} $표현이 더 맞아보임$ obj$에 따라 움직이는게 아니라 책임을 부여하느냐 마느냐에 따라 달렸으니까

Confidence Loss

\sum_{i=0}^{S^2}\sum_{j=0}^{B}\mathbb{1}{ij}^{obj}(C_i - \hat{C_i})^2
\
\qquad \qquad \qquad+ \lambda{noobj} \sum_{i=0}^{S^2}\sum_{j=0}^{B}\mathbb{1}_{ij}^{noobj}(C_i - \hat{C_i})^2
$$
$\lambda_{noobj}$ : 객체를 포함하지 않는 grid cell에 곱해주는 가중치 parameter (=0.5)
$\mathbb{1}_{ij}^{noobj}$ : $i$ 번째 grid cell의 $j$ 번째 bounding box가 객체를 예측하도록 할당(responsible)받지 않았을 때 1 , 그렇지 않을경우 0을 반환하는 indecator function
$C_i$ : 객체가 포함되어 있을 경우 1, 그렇지 않을 경우 0

grid cell 내에 Obj가 있을 확률 * IoU ⇒ confidence score

$C_i = Pr(Obj) * IOU_{pred}^{truth}$

Classification Loss

$+\sum_{i=0}^{S^2}\sum_{c \in classes} (p_i(c) - \hat{p_i}(c))^2$

$p_i (c)$ : 실제 class probability

Test time에선 이 값을 구해 class-specific confidence score를 통해

how well the predicted box fits the objets & probability of that class appearing in the box

$$
Pr(Class_i|Object)Pr(Obj) * IoU_{pred}^{truth} = Pr(Class_i)IoU^{truth}_{pred}
$$

$\Rightarrow ; class_i \subset Obj \ \therefore P(Class_i|Obj) = {P(CLass_i) \cap P(Obj) \over P(Obj) } = {P(Class_i) \over P(Obj)}$

Full Loss function

Untitled

Addition method

Leaky ReLU

$\phi(x) = \begin{cases} x, & if ; ; x>0 \ 0.1x & otherwise \end{cases}$

detection 때때로 high resoultion image을 필요하기 때문에 448 x 448 image 이용
batch size 64 , momentum 0.9, deacy 0.0005, 가변적 $lr : 10^{-2}\rightarrow10^{-3}\rightarrow10^{-4}$
drop out layer at the first fc
data aug random scaling(up to 20%), exposure and saturation(up to 1.5 HSV color)
NMS

Limitation of Yolov1

쉽게 최적화하기 위해 MSE를 Localization , Classification의 Loss로 사용했지만 mAP
베르누이 분포를 따르는 classification과 가우시안분포를 따르는 localization에 모두 mse를 적용하는 것은 ideal 하지 않다* → 향 후 이 문제는 고쳐짐
grid cell 당 하나의 객체를 예측하므로 grid cell 내에 있는 여러 객체를 탐지하지는 못함
즉 작은 물체를 탐지하지 못함
data로 부터 bounding boxes를 학습하기 때문에 unsual aspect ratio, configurations를 가지는 bbox에 대해선 제대로 예측하지못함
큰 상자의 작은 오류는 일반적으로 적응영향을 끼치지만 작은 상자의 작은 오류는 IOU에 훨씬 더 큰 영향을 미침

Localization Loss
Confidence Loss
Classification Loss
Test time에선 이 값을 구해 class-specific confidence score를 통해
Full Loss function
Addition method
Limitation of Yolov1

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Yolo v1 paper architecture

Yolo v1 paper review

Localization Loss

Confidence Loss

Classification Loss

Test time에선 이 값을 구해 class-specific confidence score를 통해

Full Loss function

Addition method

Limitation of Yolov1

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역