NVLink in GPUs

GPU 연결에 따라 총 모델 훈련 시간이 차이가 난다!

NVLink로 연결되어 있는 GPU가 가장 inter-connected 하다. NVLink는 기존의 PCI-E 기반 솔루션보다 더욱 유연한 통신을 제공하는 고속 GPU 연결 장치이다.

확인 방법:

GPU가 같은 노드에 있을 때 아래와 같이 실행하면 GPU가 어떻게 inter-connected 되는지 보여줌

nvidia-smi topo -m

NVLink로 연결되어 있는 GPU들:

        GPU0    GPU1    CPU Affinity    NUMA Affinity
GPU0     X      NV2     0-23            N/A
GPU1    NV2      X      0-23            N/A

NVLink로 연결되어 있지 않은 GPU들:

        GPU0    GPU1    CPU Affinity    NUMA Affinity
GPU0     X      PHB     0-11            N/A
GPU1    PHB      X      0-11            N/A

각 내용에 대한 설명:

  X = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

1. DDP와 같이 GPU간의 연결이 많지 않은 경우에는 큰 영향이 없지만, ZERO-DP와 같이 GPU간 통신을 자주 할 경우에는 큰 영향을 받는다.

2. NV다음에 오는 #의 숫자가 클수록 더 통신이 빠르다고 생각하면 된다.

3. NVLink가 있을 때와 없을 때 GPT2 모델의 훈련 속도 차이는 아래와 같다.

4. NVLink를 사용하지 않으려면 'NCCL_P2P_DISABLE=1' 옵션을 사용한다.

참고:

https://huggingface.co/docs/transformers/v4.18.0/en/performance#nvlink

'Tips' 카테고리의 다른 글

[DEBUG] Torch Distributed Error: Received 1 death signal, shutting down workers (2)	2023.09.01
[Sox] 오디오 파일의 Sample Rate 변환 (0)	2023.07.24
[VSCode] Vscode에서 검색한 내용 복사하기 (0)	2023.05.08
[환경설정] conda disk quota exceeded 해결 (0)	2023.04.24

GPU 연결에 따라 총 모델 훈련 시간이 차이가 난다!

확인 방법:

'Tips' 카테고리의 다른 글

티스토리툴바