본문 바로가기
Pytorch

Torchvision read_video worker error[Dataloader]

by jinwooahn 2023. 2. 9.
728x90

Video raw data을 dataset에서 읽어올 때 문제가 되는 상황 (Memory Leakage)

https://github.com/pytorch/pytorch/issues/13246

 

DataLoader num_workers > 0 causes CPU memory from parent process to be replicated in all worker processes · Issue #13246 · pyt

Editor note: There is a known workaround further down on this issue, which is to NOT use Python lists, but instead using something else, e.g., torch.tensor directly. See #13246 (comment) . You can ...

github.com

애초에 Python list, dict 등의 자료구조 자체가 문제가 있기에 버그로 보기엔 어려운 상황

 

대응방법 1.

Multiprocessing manager (AVT가 사용한 방법)

from torch.utils.data import Dataset, DataLoader
import torch
from multiprocessing import Manager


class DataIter(Dataset):
    def __init__(self):
        manager = Manager()
        self.data = manager.list([x for x in range(24000000)])

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        data = self.data[idx]
        return torch.tensor(data)


train_data = DataIter()
train_loader = DataLoader(train_data, batch_size=300,
                          shuffle=True,
                          drop_last=True,
                          pin_memory=False,
                          num_workers=18)

for i, item in enumerate(train_loader):
    if i % 1000 == 0:
        print(i)

 

 

대응 방법 2

List의 매 object마다 주소가 누적되어 메모리가 터지는 것이므로 Deep copy사용

def __getitem__(self,idx):
        seg_tag = deepcopy(self.segment_tags[idx])   # str
        traj_info = deepcopy(self.traj_infos[seg_tag])  # dict
        traj_feat = deepcopy(self.traj_features[seg_tag]) # (n_traj,2048)
        traj_embd = deepcopy(self.traj_embeddings[seg_tag]) # (n_traj,256)
       
        return seg_tag, traj_info,traj_feat,traj_embd
728x90