• No results found

Experiences with Remote Access to High Performance Computing Systems for Computer Engineering Technology

N/A
N/A
Protected

Academic year: 2021

Share "Experiences with Remote Access to High Performance Computing Systems for Computer Engineering Technology"

Copied!
5
0
0

Loading.... (view fulltext now)

Full text

(1)

Experiences with Remote Access to High

Performance Computing Systems

for Computer Engineering Technology

Jeffrey J. Evans1, Gene L. Harding2

Department of Electrical and Computer Engineering Technology Purdue University

{jje, glhardin}@purdue.edu

1 Jeffrey J. Evans, 401 N. Grant Street, West Lafayette, IN 47907 2 Gene L. Harding, 1733 Northside Blvd, South Bend, IN 46634

Abstract - The growth in computational power has resulted in a divergence in CPU design from clock acceleration to parallelization techniques (multi-core processors) to improve computation and energy performance. Lab-based learning of computer performance in multiple-processor systems is complicated when most labs are equipped with now nearly dated single-core machines. An alternative is to study multi-processor performance using high performance computing (HPC) clusters. Not all labs can afford their own HPC machine however.

A lab-based course in Computer Architecture for Electrical and Computer Engineering Technology students has been offered for four semesters, including a high performance computing laboratory experience. Recently the course was offered at a satellite campus located roughly 120 miles from the main campus, and the HPC system.

This paper documents early experiences with using HPC hardware and concepts in a laboratory environment to demonstrate multiprocessor performance dynamics. Combined with the topics of instruction pipelining, the memory hierarchy, and I/O performance modeling, students have initially reacted positively, gaining an appreciation of performance speedup and its limiting factors. Suggestions for similar HPC implementations on small and medium scales are also provided.

Index Terms – Computer Engineering Technology, Computer Performance, High Performance Computing, Remote Access.

INTRODUCTION

Teaching and learning workstation-class computer architecture is challenging due to Moore’s law and most recently the trend toward parallelization techniques to achieve balance between computation and energy performance. This trend is motivated by the recent crossover of PC sales volume away from desktop systems in favor of laptop systems. Parallelization is now being realized by “multi-core” CPU

designs, with dual cores rapidly becoming mainstream. Laboratory experiences focused on systems level performance can suffer due to the lack of state-of-the-art hardware and the time lag between the release of multi-core products and widely available open source tools to study and measure performance.

Our course in computer architecture for technology students has an overarching focus on modeling and analyzing computer and subsystem performance (CPU, memory, disk, file, etc.). The course contains a significant laboratory component, where students study and demonstrate computer and subsystem performance using open source tools and programs of their own design. To address the topic of multiple CPUs, laboratory experiences have been formulated that utilize a medium scale computational Linux cluster. While useful for local students, tension exists when trying to teach these concepts at smaller satellite campus locations, where space and other resources are not as plentiful.

To help address this challenge students from a satellite location were given access to the same resources available to main campus students. All students were able to access the Linux cluster to explore operating system commands, programming and scripting, and to perform experiments on the machine. The majority of students in our program do not own PCs with Linux or other Unix variant, so this approach encourages them to learn about Linux without forcing them install it on their personal computers. Student perceptions and laboratory performance were favorable, as many were able to complete most of their activities prior to coming to lab, which made their in-lab time less stressful, and more enjoyable.

The remainder of this paper is organized as follows. The next section briefly describes our computer architecture course and the primary components that motivate using high performance computing. Section three describes the high performance computing system and how remote access is achieved. Specific experiments, student results and perceptions are presented in section four. Conclusions and areas of future work are offered in section five.

(2)

COURSE DESCRIPTION

ECET 325, Computer Architecture, Modeling, and Performance Analysis, a junior level course, has been offered in the spring and fall semesters at the main campus since the spring of 2005. The course is an elective for students in the electrical engineering technology (EET) program, and required for students enrolled in the computer engineering technology (CpET) option of the EET program. ECET 325 is a four credit-hour course, with three hours per week devoted to lecture and two hours per week in the laboratory.

The course’s primary objective is to motivate students to appreciate the probabilistic nature in which computers operate, and how certain probabilistic tendencies of programs can be taken advantage of in hardware subsystems. The pedagogical approach is illustrated in figure 1. The course begins with the development of how the user (the student) interacts with the machine. This concept is formalized with lectures on probability, random variables, and stochastic processes. In particular, the Poisson arrival process is presented in the context of user input, namely mouse clicks and keyboard input. The exponential service time probability distribution is applied in the context of network access time (the time spent waiting for a web page to materialize). Laboratory exercises reinforce these concepts by having a student “browse” the web while a partner documents mouse click and wait timing.

FIGURE 1 PEDAGOGICAL APPROACH

Each major teaching module attempts to reinforce probabilistic concepts that are developed in the first weeks of the semester. Other modules focus on CPU instructions and pipelining, the memory hierarchy, operating system support, and systems modeling using queuing theory. Another module addresses general computer performance, focusing on Amdahl’s Law, where execution time after some improvement is a function of two things: the amount of time a program spends performing operations affected by the improvement, and the amount of time spent performing operations not affected by it.

Students enrolled in ECET 325 spend two-hours each week in the laboratory, under the supervision of the instructor. Lab experiences focus on analyzing system and subsystem performance. This is accomplished by exposing students to

software tools that are part of the Linux operating system such as sar, vmstat, and iostat, and similar freeware tools for the Microsoft Windows environment. Students are also expected to create and augment their own tools outside the lab for measuring the performance of basic integer, floating point, and double precision math operations. Moreover, they use these skills to measure and analyze the performance of various sorting algorithms, exposing them to introductory algorithm analysis.

REMOTE ACCESS TO HPCRESOURCES

Students are granted access to a 128-node computational Linux cluster, part of the author’s Adaptive Computing Systems Laboratory (ACSL) [1]. Figure 2 shows one half of the system including all interconnection network elements (top left). In figure 2, four 16-node cluster segments (a group of nodes physically connected to the same network switch), are shown. Two 16-node cluster segments (32 compute nodes) are located in a separate laboratory and the last pair of 16-node segments is located in a basement storage area. The ACSL cluster consists of Pentium III class 933Mhz machines, each equipped with 512MB of RAM and 40GB of disk storage. A similar machine is used for logging into the system and system control. This machine has 250GB of disk storage. Another machine with 250GB of disk storage is used for data backup purposes.

FIGURE 2

ACSL COMPUTATIONAL CLUSTER (64 OF 128 NODES)

Students are sent an email during the third week of the semester giving them their user name and initial password. Instructions are provided in the way of a URL that takes them to a comprehensive set of instructions on how to change their password, setup their secure shell (ssh) environment and key, and test their ability to access the protected compute nodes.

Remote access to the machine serves a number of objectives. First, students can login to the system and explore

(3)

Linux operating system commands, C program development, and scripting at their convenience. The system is generally available on a 24/7 basis. Second, students are exposed to the area of high performance computing (HPC) by way of laboratory experiences where they learn to submit instructor provided parallel programs to the machine, then analyze and visualize (plot) results using techniques previously learned. The HPC experiments are used to illustrate and re-emphasize Amdahl’s law, demonstrating those parts of a program that can experience speedup and those that cannot. Students are given a simple parallel program using the Message-Passing Interface (MPI) for interprocess(or) communication [2]. Students focus individually on submitting, running, and evaluating the run time of the program using different numbers of compute nodes. This is different from other approaches to teaching HPC concepts, which have traditionally focused more on the details of parallel programming [3], [4] and architectures [5].

In the fall of 2006, ECET 325 was delivered for the first time at a satellite (remote) campus located in South Bend, 120 miles from the main campus. A local faculty member taught the course at South Bend, where the student demographics and lab setup was different. Since this campus has both traditional and non-traditional students (i.e., not immediately out of high school), the course offerings are in the evenings. Moreover, the lab time was three hours instead of two, and the labs were generally also available during the day. Since it was the first time ECET 325 was offered at South Bend, both faculty and students experienced the “growing pains” of a new course offering, and the extra lab time proved useful.

Students at the remote location were given the same access to the machine as those at the main (local) campus, using identical means for getting them started. Approximately 85 percent of both local and remote students correctly followed the provided instructions, resulting in “clean” setups. In cases where students struggled, it was generally because they either overlooked some detail(s) in the instructions, or executed the steps out of order. In one case a local student ultimately needed to delete his private ~/.ssh directory and start over; he had resorted to “experimenting” with an assortment of steps and execution orders that he could not remember or repeat.

EXPERIMENTS AND STUDENT PERCEPTIONS

I. Experiments

The first few offerings of ECET 325 required students to install two operating systems on an “experimental” machine that would remain theirs in the lab for the duration of the semester. This practice was curtailed at the main campus, in favor of spending the time acquainting them with basic command line operations in Windows and Linux. (Because of limited staffing resources, however, the former approach was adopted at the remote campus.) More recently, student interest has grown to where the instructor has provided the

information and means for students to install Linux on their personal computers.

Beginning in the fourth week and continuing for the duration of the semester students are assigned laboratory experiences that either encourage the use of or directly exploit the cluster system in some way. Several labs require developing simple software tools to measure CPU performance of basic mathematical operations. Students were required to make their programs portable between Windows, Linux, and Sun Solaris (local students only) operating system environments. Program requirements and the range of portability demands careful consideration of programming language and compilation tools. To accommodate this, students were encouraged to download free ‘C’ compilation tools and use them in a command line environment, which is what they experience when accessing the HPC system.

Over time the students add complexity to their tools while also making them easier to use. For example, their initial programs are primarily coded to execute a fixed number of times, compute average execution time, and send output to the screen only. Upgrades include writing their output to a file, then upgrading again to accept command line arguments to specify the output file and number of iterations. Students are required to create help, or usage instructions when command line arguments are improperly entered or omitted. The point of this requirement is to encourage the development of reusable components and to motivate students to develop habits that allow them to “forget” the details of their program by creating useful ways of reminding themselves of program operation. For example, if a well-formed usage page is presented when one omits required command line arguments it becomes straightforward to remind oneself how a program is supposed to function, particularly after it has not been used for an extended period of time. This practice has resulted in students gaining a significant appreciation of the complexities of creating detailed and meaningful command line interfaces and help systems, which helps them in later courses where graphical user interfaces are developed.

One of the teaching modules in ECET 325 focuses on developing Amdahl’s law, which quantifies the relationship of program execution time improvement to the portion of the system that is changed, such as a specific hardware improvement. Amdahl’s law quantifies the relationship in terms of aspects of a program that can be improved between those that cannot. The resulting improvement in a program’s execution time can be stated as an overall “speedup” by

S

=

1

(1

f

)

+

f

a

, (1)

where S is the speedup, f is the fraction of a program that is improved or enhanced, and a is the amount of the improvement expressed as a number that would be a ratio to one (i.e., a = 10 for a ten times improvement to that portion of the system). Speedup then is also a ratio, and heavily related to execution time, but not expressed as time. This abstraction

(4)

sometimes confuses students who are accustomed to reporting differences in time using units of time.

This concept is easily extended to address sections of a program that can only be executed serially compared to those that can be performed in parallel by

S

=

1

s

+

p

n

, (2)

where s is the portion of the program that can only be performed serially, p is the portion that can be performed in parallel, and n is the number of processors.

Near the end of the semester students are given a compiled parallel program and example scripts that they modify in order to execute the parallel program on the HPC machine. Each student runs several iterations of the parallel program on a geometric progression of machines (e.g. 1, 2, 4, … 32). This activity is to take place outside of the organized lab meeting. In the lab run time results are then statistically analyzed and plotted. Since the HPC cluster is a Linux based system students are required to visualize their results using GnuPlot [6]. This is initially met with some resistance; however once they realize the similarities between GnuPlot and more familiar programs such as MatLab, they become more curious. There is also the added motivation of waiving the final laboratory report once their data source and output files have been successfully demonstrated in the lab.

II. Student Results and Perceptions

The educational intent of granting 24/7 accessibility to a HPC cluster is twofold. First, students can work with a Linux system without being required to install the operating system on their home computer. Second, they can gain an appreciation of Amdahl’s law by investigating multiprocessor performance improvement using programs written for parallel execution. Moreover, Amdahl’s law is a critical component of a major learning objective and assessment point for the course, namely “Upon completion of this course each student should be able to demonstrate the ability to identify and performance analyze the CPU component of a computer”, and our assessment threshold is 70 percent.

Local students benefit since the HPC machine is close by, and the majority of students living on campus implies a high level of network reliability as provided by the University. Remote students however do not have a HPC machine at their location, and since the student population is more non-traditional (commuter), overall network reliability was a concern. Fortunately the course population at both locations is manageable (roughly 16 per class), and the South Bend campus had very little trouble with remote access. The only significant network problem was caused by a change in a host campus (Indiana University-South Bend) router, and the professor was able to reprogram the local hub, with phone help from the local network administrator, within about a half

hour. This event did, however, highlight the importance of having an on-call administrator.

As previously mentioned, a two-week HPC lab exercise occurs at the end of the semester reaffirming Amdahl’s law, which is taught earlier in the semester. Student performance of this lab, performance for all labs, the learning objective, and the overall course performance were analyzed. Students located at the main campus scored incrementally lower on the lab than in previous semesters. Surprisingly, nearly half of the class failed to complete all the procedures in order to avoid preparing a final lab report, whereas the remainder of the class not only completed the work, but also gained all available extra credit. The lab score exhibited a definitive bi-modal distribution as the report scores were centered around a value well below the other group.

Because of the “first time” challenges involved with executing the lab at the remote campus, the faculty member there was very generous with the lab grading. Most of the students were very diligent about completing the work, and a few made almost heroic efforts on occasion to accomplish the labs. Thus, the lab grades are higher than they will likely be the next time this course is offered. This likewise means the course total is probably three or four points higher than it would otherwise be. On the other hand, the exams, which comprise 50% of the course grade, did not benefit from such generosity in the grading. The learning objective evaluation is based on the final exam grade, and is more consistent with the corresponding evaluation at the main campus. Moreover, it was encouraging to observe that, for this specific objective, the average for questions on the final exam was more than eight points higher than the average for the three mid-course exams. Apparently, the student’s grasp of those concepts did improve during the course.

Table 1 shows the variables comparing local (main campus) and remote student performance. It is worth noting that the learning objective figure reflects a combination of knowledge and analysis skills including the determination of cycles per instruction for a program and instruction pipeline performance in addition to applying Amdahl’s law.

TABLE I

STUDENT PERFORMANCE (FALL 2006)

Item Avg, Score (Local) Avg. Score (Remote) Lab 10 - Clusters Lab Total Course Total Learning Objective 89.2 76.4 72.5 67.9 95.7 95.5 85.3 71.6

The local student learning objective performance was slightly below the desired threshold of 70 percent, and the remote students were just slightly above it. While disappointing, lab performance was encouraging. Those students who successfully completed their work, avoiding the final report, took advantage of using the HPC machine during evenings and weekends. This practice has the advantage of the student obtaining full or nearly full use of the machine. As the remaining students learned, delaying their effort had the ramification of creating longer job queues (student jobs

(5)

waiting to be executed), thus delaying their output sometimes for hours.

CONCLUSIONS AND FUTURE WORK

The use of high performance computing (HPC) clusters has been implemented in a junior level computer architecture course for electrical and computer engineering technology students. The use of our Linux HPC cluster serves several purposes and allows students access to Linux based machines without forcing them to install it on their own personal computers. While student performance results were mixed, there were clear indications that the HPC concepts experienced in the lab contributed in positive ways in developing the desired outcome for the assessed learning objective. Moreover, students who were initially reluctant to interact with different machines and operating systems became comfortable with them by the end of the semester.

One of the valuable lessons learned related to the need for clear, precise, and timely account setup and lab instructions. Lecture and lab instruction can become out of phase with each other. The courses themselves (local vs. remote) can also become out of phase due to calendar schedule (semester start, end, holidays, etc.). Another reality that cannot be avoided however is that in nearly all cases there are several “correct” methods of obtaining “correct” results, and correct results are not necessarily identical from group to group. To help address this, a web page was developed and improved to provide more accurate details for student account setup and verification. This allows the initial email sent to students to be shorter and less complex. By providing this level of detail on the web page, a student that follows the instructions given is guaranteed that the results are correct and repeatable. A second web page was also developed to encourage students to test remote access to help avoid problems later.

The HPC cluster used in our course was constructed using recycled machines. Most institutions implement programs to “rotate” campus computers for labs, etc., and those that are designated “past” their useful life can be useful for this purpose. It uses an open source operating system (Linux), and relies on open source systems and resource management software. There are other, more portable system software options also available that transform any local area network of

computers, regardless of operating system into a cluster by simply booting a CD on each machine. A remote access scenario for student learning of HPC concepts was chosen to (1) relieve the remote site instructor of the responsibility of obtaining and maintaining such a machine, and (2) to determine if there were any major obstacles to student learning using remote access.

The use of HPC concepts and systems to teach Amdahl’s law has yielded encouraging initial results. Future plans include gradually elevating the sophistication and complexity of the HPC lab experiences by expanding student work so each uses up to 64 nodes (1/2 of the system). In doing so it is envisioned that more sophisticated programs will be used and analyzed. All of these will also come from the open source community.

ACKNOWLEDGMENT

The authors wish to thank the Rosen Center of Advanced Computing (RCAC) at Purdue University for their donation of compute nodes, racks, and cabling, enabling us to create our HPC machine for our students. The authors also thank the reviewers for their insightful suggestions to improve the quality of this paper.

REFERENCES

[1] Adaptive Computing Systems Laboratory (ACSL), http://m1236-silver.tech.purdue.edu/acsl/. On-line document, 2007.

[2] Gropp W., Lusk E., and Skjellum A., Using MPI: parallel programming with the message-passing interface, 2nd Edition. MIT Press, 1999.

[3] Emilio Luque E., Suppi R., and Sorribes J., A auantitative approach for teaching parallel computing. Proceedings of the twenty-third SIGCSE

technical symposium on Computer science education, 1992, p 286-298

[4] Prins P. R., Teaching parallel computing using Beowulf clusters: a laboratory approach. Journal of Computing in Small Colleges, 2004, vol 20, no 2, p 55-61.

[5] Miller R., Schaller N., The right stuff? Teaching parallel computing. Proceedings from the Eighth International Parallel Processing

Symposium, 1994, p 956-961.

References

Related documents

Pork identity, brown/roasted, fat-like, bloody/serumy, metallic, liver-like, and nutty flavor aromatics, and astringent feeling factors, and sweet, sour, salty, bitter, and

Distribucija mastiti þ nih krava (> 400.000 SS/ml mlijeka) u odnosu na razli þ ito vrijeme plato faze mužnje..

U praksi se najviše koriste hladnokrvne pasmine kobila za ovaj tip proizvodnje, jer kao što je ranije spomenuto možemo pomusti veće količine mlijeka, a i ždrijebe je

Materials, Methods & Results : Sixty Holstein heifers, 32 ± 0.6 months of age, were divided into two groups: Control Group (Control, n = 30) comprising pregnant heifers that

Articles that focused on perception or knowledge of diabetes risk factors, complications, treatment benefits and preventive health behaviors were selected using the

Salah satu metode yang dapat digunakan penelitian kursi penumpang land rover yang ergonomis adalah Ergonomic Function Deployment (EFD), EFD adalah metode untuk

The Governance Institute’s Education Agenda 2 0 1 5 A service of Conference Dates Education Agenda Ongoing Governance Education Expert Faculty & Advisors Editorial Board...

Dakle, u 47,1 % ispitanika Gleasonov zbroj nije bio isti, a od toga je kod 8,2 % pacijenata grupa gradusa je bila viša u uzorcima koji su dobiveni biopsijom nego u onima koji