VMware vSphere Bitfusion Example Guide
Modified on 17 SEP 2021
VMware vSphere Bitfusion 4.0
You can find the most up-to-date technical documentation on the VMware website at:
https://docs.vmware.com/
VMware, Inc.
3401 Hillview Ave.
Palo Alto, CA 94304 www.vmware.com
Copyright © 2020-2021 VMware, Inc. All rights reserved. Copyright and trademark information.
VMware vSphere Bitfusion Example Guide
Contents
About vSphere Bitfusion Example Guide 4 Updated Information 5
1
Introduction to Using AI and ML Applications with vSphere Bitfusion 62
Installing and Running AI and ML Applications with vSphere Bitfusion 7Installing NVIDIA CUDA 7 Install NVIDIA cuDNN 10
Install Python on CentOS and Red Hat Linux 10 Installing TensorFlow 11
Installing PyTorch and YOLO 14
About vSphere Bitfusion Example Guide
The vSphere Bitfusion Example Guide provides information about using vSphere Bitfusion to run TensorFlow, PyTorch, and YOLO on VMware vSphere.
At VMware, we value inclusion. To foster this principle within our customer, partner, and internal community, we create content using inclusive language.
The vSphere Bitfusion Example Guide describes how to install TensorFlow, PyTorch, and YOLO, and then run tests and benchmarks by using vSphere Bitfusion. This guide serves as a basis for understanding how to use artificial intelligence (AI) and machine learning (ML) applications, and frameworks under vSphere Bitfusion.
Intended Audience
This information is intended for anyone who wants to use vSphere Bitfusion with machine learning platforms. The information is written for experienced Linux system administrators who are familiar with virtual machine technology and data center operations using VMware vSphere.
Updated Information
This vSphere Bitfusion Example Guide guide is updated with each release of the product or when necessary.
This table provides the update history of the vSphere Bitfusion Example Guide.
Revision Description
17 SEP 2021 n Minor update to Install YOLO.
n Minor update to Run YOLO Tests.
17 AUG 2021 Initial release.
Introduction to Using AI and ML Applications with vSphere
Bitfusion 1
To use AI and ML applications with vSphere Bitfusion, you must install and configure several components.
To use TensorFlow, PyTorch, and YOLO with vSphere Bitfusion, and perform benchmarks and tests, you must complete the following tasks.
1 Install prerequisites.
a Install vSphere Bitfusion.
See the VMware vSphere Bitfusion Installation Guide.
b Install NVIDIA CUDA.
c Install NVIDIA cuDNN.
d If you are using CentOS or Red Hat Linux, you must install Python 3.
2 Install TensorFlow and benchmarks.
a Install TensorFlow.
b Install TensorFlow benchmarks.
c Run the TensorFlow benchmarks to measure the performance of your system . 3 Install PyTorch and YOLO.
a Install YOLO and YOLO tests.
b Run the YOLO tests to measure the performance of your system.
Installing and Running AI and ML Applications with vSphere
Bitfusion 2
To use AI and ML applications with vSphere Bitfusion, you install and configure several software packages and programming frameworks.
This chapter includes the following topics:
n Installing NVIDIA CUDA
n Install NVIDIA cuDNN
n Install Python on CentOS and Red Hat Linux
n Installing TensorFlow
n Installing PyTorch and YOLO
Installing NVIDIA CUDA
Compute Unified Device Architecture (CUDA) is a parallel computing platform and programming model developed by NVIDIA for general computing on graphical processing units (GPUs). CUDA dramatically speeds up computing applications by using the processing power of GPUs. For example, CUDA is used by TensorFlow and PyTorch benchmarks.
Install NVIDIA CUDA on Ubuntu
To run AI and ML workflows in vSphere Bitfusion, you must install CUDA on the Ubuntu Linux operating system of your vSphere Bitfusion client.
Prerequisites
Verify you have installed vSphere Bitfusion client on an Ubuntu Linux operating system.
Procedure
1 Navigate to a directory on the virtual machine in which to download the NVIDIA CUDA distribution.
cd <download_directory>
2 Download and move the cuda-ubuntu2004.pin file.
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda- ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
3 Download the NVIDIA CUDA distribution for Ubuntu 20.04 by using the wget command.
wget <https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo- ubuntu2004-11-0-local_11.0.3-450.51.06-1_amd64.deb>
4 Install the CUDA 11 package for Ubuntu 20.04 by using the dpkg -i command.
sudo dpkg -i cuda-repo-ubuntu2004-11-0-local_11.0.3-450.51.06-1_amd64.deb
5 Install the keys to authenticate the software package by using the apt-key command.
The apt-key command manages the list of keys used by apt to authenticate packages.
Packages which have been authenticated using these keys are considered to be trusted.
sudo apt-key add /var/cuda-repo-ubuntu2004-11-0-local/7fa2af80.pub
6 Update and install the CUDA software package.
sudo apt-get update sudo apt-get install cuda
7 (Optional) To confirm your GPU partition size or verify the resources available on your vSphere Bitfusion deployment, run the NVIDIA System Management Interface (nvidia-smi) monitoring application .
bitfusion run -n 1 nvidia-smi
8 Navigate to the directory that contains the CUDA Matrix Multiplication (matrixMul) sample files.
cd /usr/local/cuda/samples/0_Simple/matrixMul
9 Run the make and bitfusion run commands against the matrixMul sample file.
sudo make
bitfusion run -n 1 ./matrixMul
What to do next
Install and configure NVIDIA cuDNN. See Install NVIDIA cuDNN.
Install NVIDIA CUDA on CentOS or Red Hat Linux
To run AI and ML workflows in vSphere Bitfusion, you must install CUDA on the CentOS or Red Hat Linux operating system of your vSphere Bitfusion client.
VMware vSphere Bitfusion Example Guide
Prerequisites
Verify you have installed vSphere Bitfusion client on a CentOS or a Red Hat Linux operating system.
Procedure
1 Navigate to directory on the virtual machine in which to download the NVIDIA CUDA distribution.
cd <download_directory>
2 To download the NVIDIA CUDA 11 package for CentOS 8 or Red Hat Linux 8, run the wget command.
wget https://developer.download.nvidia.com/compute/cuda/11.0.3/local_installers/cuda-repo- rhel8-11-0-local-11.0.3_450.51.06-1.x86_64.rpm
3 To install the CUDA package, run the rpm -i command.
sudo rpm -i cuda-repo-rhel8-11-0-local-11.0.3_450.51.06-1.x86_64.rpm
4 Run the yum clean all and yum -y install commands as shown to update your environment and install the CUDA software package.
sudo yum clean all sudo yum -y install cuda
5 (Optional) To confirm your GPU partition size or verify the resources available on your vSphere Bitfusion deployment, run the NVIDIA System Management Interface (nvidia-smi) monitoring application .
bitfusion run -n 1 nvidia-smi
6 Navigate to the directory containing the CUDA Matrix Multiplication (matrixMul ) sample files.
cd /usr/local/cuda/samples/0_Simple/matrixMul
7 Run the make and bitfusion run commands against the matrixMul sample file.
sudo make
bitfusion run -n 1 ./matrixMul
What to do next
Install and configure NVIDIA cuDNN. See Install NVIDIA cuDNN.
VMware vSphere Bitfusion Example Guide
Install NVIDIA cuDNN
The NVIDIA CUDA Deep Neural Network (cuDNN) is a GPU-accelerated library of primitives for use with deep neural networks.
Prerequisites
n Create an NVIDIA developer account from which to download the cuDNN package
matching your NVIDIA CUDA version, and appropriate for your Linux distribution. See https://
developer.nvidia.com/cudnn.
n Verify you have installed a vSphere Bitfusion client.
n Verify you have installed NVIDIA CUDA.
Procedure
1 Install the NVIDIA cuDNN package by running the command sequence for your Linux distribution.
u Ubuntu version 20.04
sudo dpkg -i libcudnn8_8.0.5.39-1+cuda11.0_amd64.deb
u CentOS 8 and Red Hat Linux 8
sudo rpm -ivh libcudnn8-8.0.5.39-1.cuda11.0.x86_64.rpm
2 (Optional) To verify that NVIDIA cuDNN is installed, run ldconfig -p | grep cudnn. What to do next
n If you are using CentOS or Red Hat Linux, first you must install Python 3. See Install Python on CentOS and Red Hat Linux.
n If you are using Ubuntu Linux, you can install TensorFlow, PyTorch, and YOLO.
Install Python on CentOS and Red Hat Linux
For CentOS and Red Hat Linux, you must install Python 3.
If you are using Ubuntu you do not have to perform this procedure. Ubuntu comes preinstalled with Python 3.
Procedure
1 Update all currently installed packages by running the yum update command.
sudo yum update
VMware vSphere Bitfusion Example Guide
2 To install Python 3, run the dnf command.
sudo dnf install python3
3 (Optional) To verify that you are using Python 3, run the the python3 -V command.
python3 -V Python 3.6.8
4 (Optional) Take a snapshot of your virtual machine.
What to do next
Install TensorFlow, PyTorch, and YOLO. See Installing TensorFlow and Installing PyTorch and YOLO.
Installing TensorFlow
TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets researchers push the state-of-the-art in ML and developers easily build and deploy ML powered applications.
TensorFlow can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. The platform is a symbolic math library based on dataflow and differentiable programming.
Install TensorFlow
TensorFlow is the machine learning framework you use with vSphere Bitfusion.
Install TensorFlow by using pip3, which is the package installer for Python 3. The procedure is applicable for Ubuntu 20.04, CentOS 8, and Red Hat Linux 8.
Prerequisites
n Verify you have installed a vSphere Bitfusion client.
n Verify you have installed NVIDIA CUDA and NVIDIA cuDNN on your Linux operating system.
Procedure
1 If you install TensorFlow on Ubuntu 20.04, install additional Python resources.
sudo apt-get -y install python3-testresources
2 Install pip3 by running the command sequence for your Linux distribution and version.
n Ubuntu 20.04
sudo apt-get install -y python3-pip VMware vSphere Bitfusion Example Guide
n CentOS 8 and Red Hat Linux 8
sudo yum install -y python36-devel sudo pip3 install -U pip setuptools
3 Install TensorFlow by using the pip3 install command.
sudo pip3 install tensorflow-gpu==2.4
Install TensorFlow BenchMarks
The TensorFlow benchmarks are open-source ML applications designed to test the performance of the TensorFlow framework.
You branch and download the TensorFlow benchmarks to your local environment. In Git, a branch is a separate line of development.
Prerequisites
Verify that you have installed TensorFlow.
Procedure
1 Install git.
n Ubuntu 20.04
sudo apt install -y git
n CentOS 8 and Red Hat Linux 8
sudo yum -y update sudo yum install git
2 Create and make ~/bitfusion your working directory.
mkdir -p bitfusion cd ~/bitfusion
3 Clone the Git repository of Tensorflow benchmarks to your local environment.
git clone https://github.com/tensorflow/benchmarks.git
4 Navigate to the benchmarks directory and list branches of the repository.
cd benchmarks git branch -a
master
remotes/origin/HEAD -> origin/master ...
VMware vSphere Bitfusion Example Guide
remotes/origin/cnn_tf_v1.13_compatible ...
remotes/origin/cnn_tf_v2.1_compatible ...
5 Do a Git checkout and list the TensorFlow benchmarks repository.
git checkout cnn_tf_v2.1_compatible
Branch cnn_tf_v2.1_compatible set up to track remote branch cnn_tf_v2.1_compatible from origin.
Switched to a new branch ‘cnn_tf_v2.1_compatible’
git branch
cnn_tf_tf_v2.1_compatible master
Run TensorFlow Benchmarks
You can run the TensorFlow benchmarks to test the performance of your vSphere Bitfusion and TensorFlow deployment.
By running the TensorFlow benchmarks and using various configurations, you can understand how ML workloads respond in your vSphere Bitfusion environment.
Procedure
1 To navigate to the ~/bitfusion/ directory, run cd ~/bitfusion/.
2 To use the tf_cnn_benchmarks.py benchmark script, run the bitfusion run command.
By running the commands in the example, you use the entire memory of a single GPU and pre-installed ML data in the /data directory.
bitfusion run -n 1 -- python3 \
./benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \ --data_format=NCHW \
--batch_size=64 \ --model=resnet50 \
--variable_update=replicated \ --local_parameter_device=gpu \ --nodistortions \
--num_gpus=1 \ --num_batches=100 \ --data_dir=/data \ --data_name=imagenet \ --use_fp16=False
VMware vSphere Bitfusion Example Guide
3 To use the tf_cnn_benchmarks.py benchmark script, run the bitfusion run command with the -p 0.67 parameter.
By running the commands in the example, you use 67% of the memory of a single GPU and pre-installed ML data in the /data directory. The -p 0.67 parameter lets you run another job in the remaining 33% of the GPU's memory partition.
bitfusion run -n 1 -p 0.67 -- python3 \
./benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \ --data_format=NCHW \
--batch_size=64 \ --model=resnet50 \
--variable_update=replicated \ --local_parameter_device=gpu \ --nodistortions \
--num_gpus=1 \ --num_batches=100 \ --data_dir=/data \ --data_name=imagenet \ --use_fp16=False
4 To use the tf_cnn_benchmarks.py benchmark script, run the bitfusion run command with synthesized data.
By running the commands in the example, you use the entire memory of a single GPU and no pre-installed ML data. TensorFlow can create synthesized data with a pretend set of images.
bitfusion run -n 1 -- python3 \
./benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py \ --data_format=NCHW \
--batch_size=64 \ --model=resnet50 \
--variable_update=replicated \ --local_parameter_device=gpu \ --nodistortions \
--num_gpus=1 \ --num_batches=100 \ --use_fp16=False
Results
You can now run TensorFlow benchmarks with vSphere Bitfusion with shared GPUs from a remote server. The benchmarks support many models and parameters to help you explore a large space within the machine learning discipline. For more information, see VMware vSphere Bitfusion User Guide.
Installing PyTorch and YOLO
PyTorch is an open source machine learning library based on the Torch library, used for
applications such as computer vision and natural language processing. It is free and open-source software released under the Modified BSD license.
VMware vSphere Bitfusion Example Guide
You can use PyTorch to implement an object detector based on You Only Look Once (YOLO) v3.
YOLO is an object detector that uses features learned by a deep convolutional neural network to detect an object.
Install YOLO
YOLO is a minimal PyTorch implementation, with support for training, inference, and evaluation.
PyTorch is a machine learning (ML) library that you can use with vSphere Bitfusion. The YOLO tests are open-source ML applications designed to test the performance of your vSphere Bitfusion deployment.
The procedure is applicable for Ubuntu 20.04, CentOS 8, and Red Hat Linux 8.
Prerequisites
n Verify you have installed a vSphere Bitfusion client.
n Verify you have installed NVIDIA CUDA and NVIDIA cuDNN on your Linux operating system.
n Verify that your virtual machine has at least 150 GB of free space.
Procedure
1 Create a bitfusion folder and navigate to the folder.
mkdir -p ~/bitfusion cd ~/bitfusion
2 Install additional resources for an Ubuntu Linux operating system.
a Download package information from all your configured sources.
sudo apt update
b Install zip.
sudo apt install -y zip
c Install Python test resources.
sudo apt install -y python3-testresources
d Install the libgl1-mesa-glx package.
sudo apt install -y libgl1-mesa-glx
3 Install git.
n Ubuntu 20.04
sudo apt install -y git n CentOS 8 and Red Hat Linux 8
VMware vSphere Bitfusion Example Guide
4 Install pip3 by running the command sequence for your Linux distribution and version.
n Ubuntu 20.04
sudo apt install -y python3-pip
n CentOS 8 and Red Hat Linux 8
sudo yum install -y python36-devel sudo pip3 install -U pip setuptools
5 Install YOLO and YOLO tests.
a Download the YOLO repository by using the git clone command.
git clone https://github.com/eriklindernoren/PyTorch-YOLOv3
b Navigate to the weights folder.
cd PyTorch-YOLOv3/weights
c Run the download_weights.sh installer script.
bash download_weights.sh
d Navigate to the data folder.
cd ../data
e Run the get_coco_dataset.sh installer script.
bash get_coco_dataset.sh
f Navigate to the main folder by running the using the cd .. command.
g Install and use Poetry to complete the YOLO installation process.
Poetry is a tool for dependency management and packaging in Python.
pip3 install poetry --user export PATH=~/.local/bin:$PATH poetry install
Run YOLO Tests
By running the YOLO tests, you can check the performance of ML workloads in your vSphere Bitfusion environment.
Prerequisites
n Verify you have installed a vSphere Bitfusion client.
n Verify you have installed CUDA and cuDNN on your Linux distribution.
n Verify you have installed YOLO and YOLO test scripts.
VMware vSphere Bitfusion Example Guide
Procedure
1 Navigate to the cd PyTorch-YOLOv3 folder.
2 (Optional) Verify that you installed YOLO successfully.
a Run a YOLO test by using the CPUs of the virtual machine of your vSphere Bitfusion client.
poetry run yolo-test --weights weights/yolov3.weights
b After YOLO starts, press Control + C on your keyboard to cancel the testing process.
YOLO tests that use CPU compute power require a lot of time to finish.
3 To run the yolov3.weights test script by using GPUs, use the bitfusion run command.
By running the following command, you use the entire memory of a single GPU.
bitfusion run -n 1 -- poetry run yolo-test --weights weights/yolov3.weights
Results
You can now run YOLO tests with vSphere Bitfusion with shared GPUs from a remote server. The tests help you to understand how to use YOLO within the machine learning discipline.
VMware vSphere Bitfusion Example Guide