Huggingface "UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor." 처리법

728x90

Huggingface에서 주어지는 Imageprocessor class을 사용할 때 간혹 위와 같은 warning문구가 뜨곤 한다.

예를 들어 비디오를 읽어서 전처리를 하는 함수가 있다고 해보자

self.video_processor = VivitImageProcessor.from_pretrained("google/vivit-b-16x2-kinetics400")
...

video = read_video_pyav(container=container, indices=indices) #[T, H, W, C]
inputs = self.video_processor(list(video), return_tensors="pt")["pixel_values"] #[1, T, C, H, W]

위 블록의 마지막 줄에서 해당 warning이 발생하게 된다.

video은 numpy array이고 이를 list로 감싸서 processor에 넣으면 기본적으로는 결과도 list안에 담겨서 나온다.

예를 들어 T (frame)이 32 라고 하면

[ numpyarray(3, 224, 224), numpyarray(3, 224, 224), ....] 같이 길이가 32인 list안에 각 frame마다 전처리된 이미지가 들어있게 된다. 이를 return_tensors="pt" 로 인해 tensor로 다 바꾸게 된다면 for문을 내부적으로 돌면서 tensor로 바꾸는데 이 시간이 오래걸려서 warning이 뜨게 되는 것.

따라서 겉을 두르고 있는 list까지 한번에 numpy로 바꾼 다음, torch.from_numpy() 로 tensor로 바꾸면 훨씬 빠르다.

따라서 위 블록을 아래로 바꾸면 된다

self.video_processor = VivitImageProcessor.from_pretrained("google/vivit-b-16x2-kinetics400")
...

video = read_video_pyav(container=container, indices=indices) #[T, H, W, C]
inputs = torch.from_numpy(np.array(self.video_processor(list(video))["pixel_values"][0])) #[1, T, C, H, W] -> 이게 훠어어얼씬 빠름

728x90

'Pytorch' 카테고리의 다른 글

HF Trainer 사용 시 collator에 아무것도 들어오지 않는 경우 (0)	2024.03.13
warm up (0)	2023.08.23
Concat용 빈 tensor 사이즈만 만들어놓기 (0)	2023.02.21
Torchvision read_video worker error[Dataloader] (0)	2023.02.09
nn.ModuleDict() (0)	2022.11.15

알고 쓰자 데이터 사이언스

Huggingface "UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor." 처리법

'Pytorch' 카테고리의 다른 글

티스토리툴바

Huggingface "UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor." 처리법

'Pytorch' 카테고리의 다른 글

관련글

티스토리툴바