warm up

728x90

처음 warm up

https://arxiv.org/pdf/1706.02677.pdf

낮은것부터 점차 올라가는 warm up

https://arxiv.org/pdf/1812.01187.pdf

반대로 높은것부터 내려오는? 확실하진않음. variance관련 warmup

https://arxiv.org/pdf/1908.03265.pdf

기타 https://papers.nips.cc/paper_files/paper/2019/hash/dc6a70712a252123c40d2adba6a11d84-Abstract.html

Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

Requests for name changes in the electronic proceedings will be accepted with no questions asked. However name changes may cause bibliographic tracking issues. Authors are asked to consider this carefully and discuss it with their co-authors prior to reque

papers.nips.cc

낮은것부터 점차 올라가는 warmup은 training stability을 높이기 위함이고

(처음에는 random init이기 때문에 optimial에서 멀다 -> numerical stability가 떨어짐 => 작은 lr으로 안정화한 후 regular warm up 하는 것)

높은 lr부터 떨어지는 warmup은 local minima 초반 탈출을 위함.. => but optimizer가 잘못된 방향을 오래 기억할 수 있음

그래서 우리는 0부터 시작해서 높은곳까지 올라갔다가 떨어지는건가?

728x90

'Pytorch' 카테고리의 다른 글

Huggingface "UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor." 처리법 (0)	2024.03.13
HF Trainer 사용 시 collator에 아무것도 들어오지 않는 경우 (0)	2024.03.13
Concat용 빈 tensor 사이즈만 만들어놓기 (0)	2023.02.21
Torchvision read_video worker error[Dataloader] (0)	2023.02.09
nn.ModuleDict() (0)	2022.11.15

알고 쓰자 데이터 사이언스

warm up

'Pytorch' 카테고리의 다른 글

티스토리툴바

warm up

'Pytorch' 카테고리의 다른 글

관련글

티스토리툴바