self-supervised learning

Deep Learning

self-supervised learning

응엉잉 2022. 7. 15. 13:51

self-supervised learning

unsupervised learning 을 통해 unlabelled dataset 에서 task-agnostic(task 와 무관한) good representation 을 얻고, 해당 representation 을 downstream task 에 적응하는 것을 목표로 함

self 로 task 를 정해서 supervision 방식으로 model 을 학습

= label(y) 없이 input(x) 내에서 target 을 쓰일만한것을 정하는 것

= pretext task

pretext task 를 학습한 모델은 downstream task 에 transfer 해서 사용 가능

learning 의 목적이 downstream task 를 잘 푸는 것이므로 downstream task 성능으로 model 을 평가함

self-supervised learning method : self predction and contrastive learning

1. self-predction

개별 data sample 이 주어졌을 때, sample 내 한 파트(일부)를 통해 다른 파트(나머지)를 예측하는 task

ex. time-serise data 에서 next time step 예측

1) Autogressive generation

이전 behavior 통해 미래 behavior 예측

sequential 한 순서가 있는 데이터는 regression 으로 모델링 가능

ex. audio : WaveNet / language : GPT, XLNet / image : PixelCNN

2) masked gernertation

일부 정보를 마스킹하여 마스킹하지 않은 영역을 통해 missing 영역 예측

ex. language : BERT, image : denoising autoencoder

3) innate relationship prediction

transformation(segmentation, rotation) 을 하나의 sample 에 가했을 때도 본직적인 정보는 동일할 것이라는 믿음을 갖고 relationship 을 predcit 하는 방법. domain knowledge 가 필요해 대체로 이미지에서 사용

ex) image에서 어떤 rotation이 적용되었는지 예측

4) hybrid self-prediction

앞의 여러 방법을 섞음

2. contrastive learning

batch 내 data sample 이 주어졌을 때, 그들 사이의 관계를 예측하는 task

imbedding space 에서 유사한 sample pair 들은 거리가 가깝게, 유사하지 않은 sample pair 들은 거리가 멀게 하는 것

anchor : 유사한지/유사하지 않은지 기준이 되는 현재 data point

- positive point : anchor 와 유사한 sample, anchor 와 positive pair

- negative point : anchor 와 유사하지 않은 sample, anchor 와 negative pair

1) inter-sample classification

positive pair 와 negative pair 의 후보들이 주어질 때, 어떤게 anchor data point 와 유사한지 (=positive pair 인지) 구분하는 것이 classification task 가 됨.

argumentation 을 통해 원본의 distorted version 을 만드는 방법을 통해 positive pair 를 고를 수 있음

ex. MoCo, SimCLR

inter- sample classification 위한 loss

- Triplet loss : anchor와 positive 사이의 distance는 minimize하는 동시에 negative와의 distance는 멀게하는 방식

- N-pair loss : 여러개의 negative sample들과의 비교를 위해 triplet loss를 일반화한 loss

- Noise Contrastive Estimation (NCE) : target data를 noise와 구분하기 위해 logistic regression을 수행. target sample의 분포 p와 노이즈의 분포 q가 주어졌을 때, 단순히 binary cross entropy를 통해 loss를 계산.

- InfoNCE : NCE에서의 noise를 multiple로 확장한 loss로 볼 수 있다. contrastive learning의 근본으로 볼 수 있는 CPC (Contrastive Predictive Coding) 논문에서 이름을 붙인 loss로 target data를 관계가 없는 noise samples(negative samples)와 구분하기 위해 categorical cross-entropy loss를 사용한다. categorical cross-entropy로 결국 정리가 되지만 내부적으로 positive sample의 context vector와 input vector의 mutual information을 크게 가져가는 효과가 있다. [추가 정리 예정]

- Soft-Nearest Neighbors Loss : label(y)이 있을 때, 여러 positive sample들을 포함하도록 loss를 확장

2) feature clustering

학습된 feature 들로 data sample 을 clustering 해서 만들어진 class 에 pseudo label 을 달고 label 을 토대로 intra-sample classification 을 진행

3) multiview coding

입력의 다른 view 들의 encoding 에 InfoNCE loss 를 적용하는 방법

ex. Contrastive Multiview Coding (CMC)

'Deep Learning' 카테고리의 다른 글

AutoEncoder와 anomaly detection (0)	2022.08.17
PyTorch : Dataset과 DataLoader (0)	2022.08.16
PyTorch : class와 nn.Module 이용해서 신경망 모델 정의 (0)	2022.08.16
PCA (0)	2022.08.16
self-supervised pre-training (0)	2022.07.16

현재글self-supervised learning

응엉이의 엉엉코딩일기

가격대별 상품 개수 구하기, 자동차 종류 별 특정 옵션이 포함된 자동차 수 구하기, 라운드로빈 알고리즘, wumpus wolrd, 선점형 스케줄링, 서브셋, 랏소회귀, FROM절 서브쿼리, 오차 변화량, 다중큐, 알고리즘 평가 기준, 인공지능입문, Format, 릿지회귀, 비선점형 스케줄링, 집합, Map, 문제은행, 파이썬, Theorem proving,

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

응엉이의 엉엉코딩일기