NVLink in GPUs

GPU 연결에 따라 총 모델 훈련 시간이 차이가 난다!

NVLink로 연결되어 있는 GPU가 가장 inter-connected 하다. NVLink는 기존의 PCI-E 기반 솔루션보다 더욱 유연한 통신을 제공하는 고속 GPU 연결 장치이다.

확인 방법:

GPU가 같은 노드에 있을 때 아래와 같이 실행하면 GPU가 어떻게 inter-connected 되는지 보여줌

nvidia-smi topo -m

NVLink로 연결되어 있는 GPU들:

         GPU0    GPU1    CPU Affinity    NUMA Affinity
GPU0     X      NV2     0-23            N/A
GPU1    NV2      X      0-23            N/A

NVLink로 연결되어 있지 않은 GPU들:

         GPU0    GPU1    CPU Affinity    NUMA Affinity
GPU0     X      PHB     0-11            N/A
GPU1    PHB      X      0-11            N/A

각 내용에 대한 설명:

   X = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

1. DDP와 같이 GPU간의 연결이 많지 않은 경우에는 큰 영향이 없지만, ZERO-DP와 같이 GPU간 통신을 자주 할 경우에는 큰 영향을 받는다.

2. NV다음에 오는 #의 숫자가 클수록 더 통신이 빠르다고 생각하면 된다.

3. NVLink가 있을 때와 없을 때 GPT2 모델의 훈련 속도 차이는 아래와 같다.

4. NVLink를 사용하지 않으려면 'NCCL_P2P_DISABLE=1' 옵션을 사용한다.

참고:

https://huggingface.co/docs/transformers/v4.18.0/en/performance#nvlink

'Tips' 카테고리의 다른 글

[DEBUG] Torch Distributed Error: Received 1 death signal, shutting down workers (2)	2023.09.01
[Sox] 오디오 파일의 Sample Rate 변환 (0)	2023.07.24
[VSCode] Vscode에서 검색한 내용 복사하기 (0)	2023.05.08
[환경설정] conda disk quota exceeded 해결 (0)	2023.04.24

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

NVLink in GPUs

GPU 연결에 따라 총 모델 훈련 시간이 차이가 난다!

확인 방법:

'Tips' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

	GPU0 GPU1 CPU Affinity NUMA Affinity
	GPU0 X NV2 0-23 N/A
	GPU1 NV2 X 0-23 N/A

	GPU0 GPU1 CPU Affinity NUMA Affinity
	GPU0 X PHB 0-11 N/A
	GPU1 PHB X 0-11 N/A

	X = Self
	SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
	NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
	PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
	PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
	PIX = Connection traversing at most a single PCIe bridge
	NV# = Connection traversing a bonded set of # NVLinks