CiteSeerX — High-Performance Parallel Computing: Tools and Applications

(1)

Intelligent Network Interfaces

Final Project Report

CS 6432: High-Performance Parallel Computing: Tools and Applications Winter 1998

Kelly Norton

Craig Ulmer

(2)

Intelligent Networks

Background

A number of recent studies identify the Network Interface (NI) as a key element for the success of Networks of Workstations (NOWs) [1,2]. While link speeds in high-performance networks have risen steadily over the last decade, the ability for hosts to utilize bandwidth has suffered due to the traditional approach that NIs are dumb and require direct OS management. For example, Gigabit Ethernet is

commonly marketed as a “backbone” network since Gigabit NI cards using current OS’s struggle to fill the available link bandwidth.

Intelligent NI devices and interface standards are emerging to better utilize the increased link bandwidth of high-speed networks. A number of specialized networks now offload network processing onto intelligent NI’s, including Myricom’s Myrinet, FORE System’s ATM interface [3], and more recently the Intelligent I/O (I2O) standard for I/O devices. These systems generally decouple the NI from the host processor, allowing for improved performance through asynchronous computation. A few OS software interfaces anticipate this new style of NI co-processing, including the Virtual Interface Architecture (VIA) [4], Active Messages II [5], and I2O’s message passing interface. These software interfaces expect the NI to be able to support multiple and virtual endpoint interfaces, maintain network “state”, and handle more high level operations than previously placed at the NI level. Additionally, the multimedia and real-time communities are requiring more support for Quality of Service (QoS) within the network. These increased service demands of the network all result in a need for an evaluation of NI processing power, as well as exploration into efficient implementation strategies in the NI.

This project deals with intelligent NI issues by exploring two probable interfaces for intelligent NIs. The first NI examined in this project is Myricom’s Myrinet card. This part of the project deals with

implementing functionality in the NI, particularly multicast through NI recycling. While multicast may not be a particularly relevant topic in terms of cluster communications, it gives an indication of processing power available at the NI, deals with non-trivial buffer management, and provides a background for scatter/gather operations. The second NI examine in this project is the Intel I2O development card, featuring an embedded microprocessor, large memory, and multiple network connections. Since very little is known about this card, this part of the project deals primarily with first time use of the board,

documenting the first hand impressions of the suitability for useful intelligent NI work.

Intelligent Network Interfaces: Myrinet

Background

Myricom’s Myrinet provides a programmable interface to its high-speed point-to-point network through the firmware of its LANai Network Interface (NI) processor. While this processor must manage a number of interface details such as send/receive queues, source routing, and DMA operations between NI and host memory systems, it may be programmed to include functionality previously reserved for either the network switch or host CPU. Various Myrinet research projects use the LANai processor to perform tasks such as dynamic flow control credit management [6], network retransmission of lost packets [7], and virtual circuit caching.

Multicast over Myrinet

A practical use for the Myrinet NI is to implement multicasting functionality where the NI handles the distribution of multicast messages. This practice is generally referred to as NI forwarding, since the NI receives multicast messages from either the host or the network, and then forwards responses to a subset of the multicast receiver list. By allowing the NI to process multicast messages, the host no longer has to spend time injecting a large number of messages into the NI, thereby freeing the host CPU for useful

(3)

computation cycles. A number of issues complicate NI forwarding, including deadlock and guaranteed delivery. At least two studies [6,8] report details of implementation challenges with NI forwarding.

Walton and Gerla [8] at the University of California, Los Angeles implemented IP multicasting over Myrinet with both ring and tree topologies. Since multicasting is handled at the NI, Walton observed that the multicast structures could result in a deadlock situation where one NI waits to communicate with another NI, which waits on the first interface. The literature suggests an ‘up-down’ avoidance scheme, but opts for cell dropping to remove deadlock. Unfortunately, loss rates of up to 87% were observed in the worst case traffic for the tree structures.

Verstoep, Langendoen, and Bal [6] at Vrije University, Amsterdam report more promising results through extending the Illinois Fast Messages package for multicast traffic (FM/MC). FM/MC provides NI

forwarding, multicast trees, and credit based flow control for buffer space reservation. Since each multicast message must reserve flow control credits for each destination, FM/MC’s multicast traffic is guaranteed to provide reliable delivery of a message using FM’s underlying principles. FM/MC further deals with flow control by dynamically distributing credits and dividing credits into multicast and unicast categories.

Recycling

In the ATM community, the concept of cell recycling [9] provides an efficient means for delivering multicast messages in a copy-twice routing. Cell recycling allows distribution of multicast tree information and decreases the fanout any individual node experiences from the multicast. The cell recycling idea may be abstracted to Myrinet, observing that recycling for a subnetwork occurs at the NI instead of the subnetwork’s switch. While previous studies term this handling of multicast messages as forwarding, recycling may be a more accurate description since a single switch is reused for passing spawned children.

Tree-Based NI Recycling

Concept

As in other point-to-point multicast projects, tree based NI recycling places the burden of handling multicast messages at the NI level. The network separates multicast trees by a unique multicast tree identifier, and distributes tree information across relevant nodes. An incoming multicast message is examined by the NI and recycled into the network based on the node’s share of the multicast tree

information. By maintaining only pointers to a particular node’s children, distributed multicast trees allow for rapid handling of incoming messages, as well as prevent all nodes from having to maintain an up to date representation of the full tree structures.

Advantages of NI Multicast

It is not obvious that a NI recycling system provides many advantages over a one-to-many unicast system.

At the simplest level, a tree based NI recycling system provides a simple interface for subscriber based service. In a dynamic implementation nodes may add or remove themselves from a multicast tree with relatively few changes to the network “state”. Such subscriptions allow nodes to multicast to a set of receivers without having to explicitly maintain connections at the host level. A second reason for

implementing NI recycling comes from measuring host level involvement in network activity. For the case of a unicast system mimicking an n message multicast, the host level software must inject n messages into the NI. In contrast the NI recycling system need only inject one message into the NI to handle all

transmissions. Since injection time takes away from useful host time, it is only natural to offload work into an intelligent NI. A final point arguing for NI recycling is that it may take better advantage of destination localization for a large number of nodes. In the case of multiple switches, a recycling multicast can send a single message to long distances, then recycle to the far away nodes without additional distance costs.

(4)

S2 S1

2

1 3 4 5

6

1

2 4

5 6

3

Localization Across Switches

Tree Creation

A node that desires to create a new multicast tree must first register the tree with all involved nodes. A unique tree identifier is picked from a set of id’s allocated to the root node (e.g., in an n-node system, node zero may choose 0, n, 2n, ... while node 1 may choose 1, n+1, 2n+1, ... ). Once the root node selects a unique identifier for the tree, it expands the list of subscribing nodes into a top-down, left-flush, filled tree.

0

1 2

4 3

2

4 0

1

0 1 2 3 4 2 4 0 1

Nodes: Nodes:

Tree Creation from Flat List

The root of the tree sends messages to all tree members informing their NI of the children for which the node is responsible. Since a tree’s creation involves communication with all subscribers, the cost of establishing a new tree with n members is n unicast messages.

Reliable Multicast

A simple approach to implementing reliable point-to-point multicast is to use a fully reliable, credit- reserving tree [6]. In this scheme, a sender reserves flow control credits for all tree destinations, much like the unicast scheme would require. However, a single multicast message is issued from the root and the NIs handle all recycling. At each destination the NI passes the message to the host level, from which a reply message is generated for the restoration of the originator’s credits.

0

1 2

4 3

0

1 2

4 3 Distribute Acknowledge

Reliable Multicast

While injection of both request and replies in this system is greatly reduced, the reception work performed by the originator is still substantial since it must handle a reply message from each of the tree’s nodes. The reply messages further congest the network, but should be of the same magnitude as the one-to-many unicast case.

Semi-Reliable Multicast

In terms of multicasts that do not require explicit replies from the receivers, the full credit reserving, reliable multicast wastes limited network resources. The alternative option to use unreliable delivery [8] is even less satisfying, since complete tree coverage is not guaranteed. A more realistic solution to reducing

(5)

network traffic while providing full tree coverage is through a semi-reliable multicast system with NI deliverable guarantees.

A NI deliverable system guarantees that every tree destination will receive a copy of the multicast message, but the message may not be transported to the host in heavy loading conditions. There are two observations about multicast trees that make the semi-reliable optimization possible. First, the NI always forwards messages from left child to right child. Therefore, if the right child receives a multicast message, it knows that the left child has already been transmitted the message. If NI deliverable is the only necessary multicast guarantee, then only the rightmost brother of each branch need reply to the originator to signify delivery. The second observation is that a node is known as a replying node for semi-reliable messages at tree creation or modification time. There is therefore little overhead at the NI level to determine whether an incoming multicast message requires a reply or not.

0

1 2

4 3

0

1 2

4 3 Distribute Acknowledge

Semi-Reliable Multicast

In order to prevent semi-reliable messages from overwriting each other, a general purpose semi-reliable multicast flow control (SMC-FC) credit is used to limit the number of outstanding messages. This credit differs from normal flow control credits in that a particular SMC-FC credit cannot be released until all of its particular multicast tree’s responders send replies. Since the SMC-FC credit scheme limits the number of outstanding semi-reliable multicast messages, buffer space requirements for the semi-reliable messages is bounded.

The general procedure for a semi-reliable multicast is as follows:

• Originating node reserves a SMC-FC unit for a multicast to a particular tree. Once reserved, the SMC-FC stores the number of responses required for release of the unit.

• Originating node reserves a regular flow control credit for each destination that will be providing a response.

• Originator injects message into the network.

• At each receiving node, the NI receives the message and recycles it for children if necessary.

• Messages are passed to the host, marked with whether a response is required.

• Responding destinations send responses to the originator, with a pointer to which SMC-FC the multicast message was originally issued with.

• The originator releases the SMC-FC unit when all responses are received.

Investigation of I

₂

O as an Intelligent NI

Consideration of a New Platform

While the first segment of our project sought to extend the functionality of an available platform for doing low-latency communication over a network of workstations, the second segment explored another possible platform for performing these high performance communications. Recently, Many corporate giants began work on development of a new I/O standard which would accommodate “Smart I/O Devices”, devices which use onboard processors to off-load work from the host processor. Our intention was to acquire a development card and software for evaluation of the I2O platform as a means of interconnecting

workstations for high performance computing, which we were finally able to do. Below is a description of the development environment and the supporting hardware.

(6)

Intel’s IQ960-RD66 Evaluation Card

We selected the evaluation card based on it’s acceptance by the I20 Special Interest Group (SIG) as a valid I2O development platform. This card consists of an Intel i960 RISC processor as the central processor, two SCSI device interfaces, two 10/100 Mbps Ethernet network interfaces, a PCI-to-PCI bridge, and all supporting hardware controllers. The basic layout of the board is shown below:

Some Additional Specifications:

• This particular version of the i960 runs at 66MHz.

• All communication, on card and host/card, is done through memory-mapped I/O transactions.

• The card comes equipped with 4MB of DRAM on the board and a SIMM slot that will accept 32MB of additional memory.

• The ethernet adapters and SCSI connectors are located on daughter board that connects to the main card via a PCI interface.

Defining I

2

O

The I2O (Intelligent Input/Output) specification defines a standard architecture for intelligent I/O that is independent of both the specific device being controlled and the host operating system (OS).

The I2O specification addresses two key problem areas in I/O processing:

• Performance hits caused by I/O interrupts to the CPU.

• The necessity to create, test and support unique drivers for every combination of I/O device and OS on the market.

Host

System I960 Rx

Processor

Ethernet Interfaces

SCSI Interfaces Primary

PCI Interface

Secondary PCI

Interface

Local Bus

DRAM Flash

ROM DMA

Channels

DMA

Channel

(7)

Approaching from the performance concerned side of the industry, we are generally more concerned with the first of the two problems above. I2O defines a standard architecture for intelligent I/O, an approach to I/O in which low-level interrupts are offloaded from the CPU to I/O processors (IOPs).

We also found that the requirements to develop networking solutions using I20 standards were quite significant. Since I2O is being fueled by industry support, the cost of obtaining much of the technical information and tools is tremendous. We found that the following items were required for development:

• i960 Development Platform

• Tornado I2O Software Development Environment

• Membership to the I2O Special Interest Group (SIG)

These requirements constitute a hefty price for development according to the I2O standard. We also considered that the real time operating system, IxWorks, may present to much overhead for the type of time critical “bare-bones” processing, we would like to perform on Network Interfaces. Therefore, we also considered the less expensive alternative of using Intel’s i960 development tools with the i960 Development Platform card.

Intel’s i960 Development Environment

Intel’s i960 processor has provided the processing power for many embedded systems applications since the first chip in the family was released 10 years ago. Given the processor’s long-lived success, a wide selection of tools are now available for development on any i960 platform. Intel distributes and supports two different tool sets for development of any of their i960 based development environments. The tools include:

• CTOOLS (includes gcc960 C compiler)

• QuickVal

In addition to the well supported development environments, there are examples of source code available and their associated libraries and functions. Intel supplies, with the evaluation board, source code for a monitor program (mon960) and source code for several examples. This includes examples for ethernet tests and host to card communication libraries.

(8)

Results and Conclusions

Multicast over Myrinet

The latency timings for the multicast system were captured and recorded in tables 1 and 2. For means of comparison, Myrinet’s end-to-end latency is often rated at 4-20 µs for various LCPs.

Message Run Time Average Multicast Time Average Message Time

Unicast 1.914 s 38.279 µs 12.760 µs

Multicast Reliable 1.962 s 39.242 µs 13.081 µs

Multicast Semi-Reliable 2.085 s 41.705 µs 13.902 µs

Table 1: Latency Timings for 7 nodes/3 subscribers

Message Run Time Average Multicast Time Average Message Time

Unicast 5.072 s 101.434 µs 14.491 µs

Multicast Reliable 6.138 s 122.765 µs 17.538 µs

Multicast Semi-Reliable 2.685 s 53.706 µs 7.672 µs

Table 2: Latency Timings for 7 nodes/7 subscribers

Message Run Time: Total amount of time from start of first send, to receive of last reply.

Average Multicast Time: Expected average latency for each multicast. (MRT/Iterations) Average Message Time: Expected per message latency (MRT/(Iterations*Num_MCNodes)) In these measurements, the most striking detail is that the Multicast Semi-Reliable category achieves high performance with larger multicast trees (nearly halving all delays). It is important to note that the 7-node multicast tree requires only two replies for the entire tree and has drastic effects from the host’s

perspective.

The Round-Trip timings are recorded in Tables 3 and 4. These measurements reflect the amount of time each message spent in flight and show the large amount of overlap in communication times.

Average Multicast RTT Average Message RTT

Unicast 111.2 µs 67.9 µs

Multicast Reliable 332.1 µs 225.5 µs

Multicast Semi-Reliable 326.9 µs 326.9 µs

Table 3: Round-Trip Times for 7-nodes/3-subscribers

Average Multicast RTT Average Message RTT

Unicast 185.4 µs 120.8 µs

Multicast Reliable 1,072.2 µs 640.4 µs

Multicast Semi-Reliable 458.3 µs 429.2 µs

Table 4: Round-Trip Times for 7-nodes/7-subscribers

Examining the RTT measurements reveals unusually high travel times for both forms of multicast messages, especially the reliable form. A possible explanation is that the multicast injections spin on flow control units until all are available. The unicast injection grabs each flow control unit as it needs it, and therefore does not wait at the host interface as long as the multicast messages. As expected, the reliable multicast suffers worse than the semi-reliable because the semi-reliable spins on fewer reservations.

(9)

It should be noted for the semi-reliable multicast tests that all messages were received at the host level, without any packet loss.

NI Recycling provides a number of options for obtaining multicast support for point-to-point networks.

While fully reliable protocols may provide the basic functionality and adequate performance for multicast messages, small optimizations in message passing, such as the semi-reliable protocol, can result in reduced network overheads. Care must be placed on selecting the right communication tool for the right situation, and such dynamic choices support the need for more intelligent networks and network interfaces.

Intelligent I/O

The i960 card’s degree of usefulness was the primary consideration in our work. We also hoped to implement some simple network functions for benchmarking. While it was quite easy to compile and run code without PCI bus interaction, it became apparent that PCI communication increased the complexity to a point where we couldn’t explore the available source code before the quarter’s deadline. Based on the information that we did acquire, we feel that the card is useful and is worth of future research in the field of intelligent network interfaces. As part of the section on future work, we explore the distinct features of this card and ideas for exploiting these features.

Unfortunately, the very short time scale that we had to deal with when examining the i960 card limited our possibilities for implementations. However, we did have the foresight to realize this. Therefore, we have accumulated a collection of notes and pointers that identify problems, concerns, and starting points for future developers on the card. We have also attempted to begin trudging through the mass of source code that exists for the mon960 monitor program, since this code contains the only examples of performing network functions. All of the notes have been compiled into a notebook and will available to anyone working with the card. We have also attached a copy of our notes as an appendix to this paper. The notes in the notebook, however, accompany several useful manuals that we printed for our own reference.

(10)

Evaluation (What and Why)

Multicast over Myrinet

This study of multicast over Myrinet leads to several interesting points about what can be expected from an intelligent NI. The first observation is that the LANai processor features enough computational power to make NI based forwarding possible without huge performance hits for unicast traffic. These

communications are additionally possible with reliability, and optimizations may reduce network

congestion. Multicasting in itself however should not be the most important lesson extracted from this part of the project. A key detail of NI recycling is that the NI must be able to manage multiple writers for multiple queues. The multicasting LCP handles consistency between writers through the use of multiple queues and various scheduling algorithms. The fact that the LCP is able to manage these queues without performance hits differs from the typical view that sees the LCP as being able to handle only limited functionality. Observing that NI performance always lags host CPU performance, it is encouraging to see that with a little planning, even the first generation of NIs can provide useful offloading to improve cluster communication times.

Intelligent I/O

The primary motivation for this part of the project was to evaluate the i960 card as a viable platform for intelligent network interfaces. We were drawn to the card by information about I2O, and later our attention would be directed basically at the card since I2O proved quite expensive and inaccessible. The process began by gathering information about our options, which as we found were quite narrow with I2O.

Fortunately, the equipment arrived just in time for us to take a serious look at its feasibility to our work.

Unfortunately, the very short time scale that we had to deal with when examining the i960 card limited our possibilities for implementations. However, we did have the foresight to realize this. Therefore, we have accumulated a collection of notes and pointers that identify problems, concerns, and starting points for future developers on the card. We have also attempted to begin trudging through the mass of source code that exists for the mon960 monitor program, since this code contains the only examples of performing network functions. All of the notes have been compiled into a notebook and will available to anyone working with the card. We have also attached a copy of our notes as an appendix to this paper. The notes in the notebook, however, accompany several useful manuals that we printed for our own reference.

(11)

Relation To Class:

CS6432 has generally explored methods for exploiting the parallelism in certain computational tasks.

Typically, this is done by dividing a particular application into segments that can occur concurrently allowing a number of processors to contribute to the completion of the task. Obviously, the processors allocated to a task must have a medium for communication. Our project explores the lower level problem of exploiting parallelism in this underlying hardware structure upon which many conventional parallel applications run. Therefore, the topic of Intelligent Network Interfaces falls within the scope of the class in a couple of different ways:

• General Quest For Parallelism

• Underlying Structure Provides Communication for Typical Parallel Applications

General Quest For Parallelism

Exploiting parallelism is probably the biggest concern for hardware engineers who seek to increase performance within a system. Clock speeds and technology advancements have tended to overshadow the work done to better system parallelism, in recent years. However, the I/O bottleneck has remained slow in comparison with the ever increasing clock rates of processors. This principle is especially true in the case of network interfaces, where we must also consider the penalty the network suffers from the NI/host communication. The logical solution would be to reduce the amount of communication and transfer some of the processing load to the network interface. This is especially true in cases of collective

communication, where packets may not even be intended for the current host and need only to be re- injected into the network. Both Myrinet and the proposed i960 based interface seek to exploit this type of hardware parallelism by off-loading true network functions from the host processor allowing those CPU cycles to be devoted elsewhere.

Communication Structure for Parallel Applications

The use of networks of workstations has proved viable as a platform for implementing parallel applications.

The one weakness that such systems suffer is that their interconnection networks were designed to provide reliable communication even at great distances. In such a system, the majority of the delay is experienced when the network interface must interact with the host processor. For example, TCP/IP protocols spend most of the time processing host operation system kernel calls to set up communication. Since popular libraries, like MPI, use these protocols as their underlying communication structure, performance improvements can be far reaching. We feel that providing network interfaces with a greater level of dependency from the host processor is the key to making cluster machine performance approach that of traditional parallel machines.

(12)

Future Enhancement Through Distinct Features

As it stands, Myrinet has provided a platform for evaluating the usefulness of intelligent network interfaces.

Therefore, we must consider how the distinct features of the i960 card can be used in new, useful research efforts. The primary features which distinguish the i960 card over Myricom’s card are:

• Dual Ethernet Adapters The daughter board, which attaches to the main processor board via a secondary PCI bus, has two separate 10/100Mbit ethernet controllers and adapters.

• Intel i960 Processor The I/O processor on the main board is an Intel i960R processor, which has been on the market for over 10 years and has vast compiler, debugger support.

• High Memory Capacity In addition to the 4MB of DRAM on the card, there is an additional SIMM slot that will accept up to 32MB of additional DRAM.

• SCSI Device Connectors The main card accepts up to 4 devices on the secondary PCI bus. The daughter card that we have includes two SCSI interfaces in addition to the two ethernet adapters.

Considering these four distinctions, we identified a few ideas for future work to exploit these features.

1. Using the dual ethernet adapters, it may be advantageous to consider implementing dual networks.

With two networks, we could eliminate many of the problems that occur in today’s switch-based networks. For example, real time or quality of service applications could benefit by allocating one of the networks to control signals and the other as data signals. This eliminates the problems associated with sending control messages on a saturated network. Another idea was to create two data networks, but to arrange the two in different topologies. For instance, a switch-based network would be useful for handling all point-to-point communication and a hub-based network could handle all collective communication. This exploits the distinct advantages of both types of networks, and avoids the worst case communication for each.

2. Obviously, the ability to access SCSI devices at the card level generates some pretty original scenarios.

This type of access could provide an efficient means of network caching or perhaps an interesting approach to server type applications.

3. The additional memory on the card certainly gives us a new variable to play with since Myrinet cards are not adjustable in this area. One idea to consider is the possibility of implementing things like Virtual Interface Architecture (VIA) which calls for multiple send/receive queues in memory as a means to set up virtual end-points.

(13)

References

[1] Kimberly K. Keeton, Thomas E. Anderson, and David A. Patterson. “LogP Quantified: The Case for Low-Overhead Local Area Networks,” Hot Interconnects III: A Symposium on High Performance Interconnects, Stanford University, Stanford, CA, August 10-12 1995.

[2] Shubhendu S. Mukherjee and Mark D. Hill. “Making Network Interfaces Less Peripheral,” IEEE Computer, 1998

[3] Marcel-Catalin Rosu, Karsten Schwan, and Richard Fujimoto. “Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach ," Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing (HPDC ’97), Portland, OR, August, 1997.

[4] VIA Sig. “Virtual Interface Specification,” available at http://www.viarch.org, December 1997 [5] Brent N. Chun, Alan M. Mainwaring, and David E. Culler. “Virtual Network Transport Protocols for

Myrinet,” Proceedings of Hot Interconnects V: A Symposium on High Performance Interconnects, August 1997

[6] Kees Verstoep, Koen Langendoen, Henri Bal. “Efficient Reliable Multicast on Myrinet,” ICPP, Vol. 3 1996, pp. 156-165.

[7] Scott Pakin, Mario Lauria, Andrew Chien. “High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet,” Proceedings of Supercomputing ’95, 1995.

[8] Simon Walton and Mario Gerla. “Practical Multicasting on a Nonbroadcast Subnetwork,” ICNP ’97, Atlanta, GA, October 1997.

[9] Jonathan S. Turner. “An Optimal Nonblocking Multicast Virtual Circuit Switch,” Proceedings of Infocom, June 1994, pp. 298-305.

(14)

Appendix A: i960 Board Notes

Notes:

These are the notes that I compiled for the i960 evaluation board, if you have any questions you can contact me at [email protected]. I tried to rearrange some of the notes a little, but they are quite sporadic and don’t really fit into categories. I’ve also printed out some of the manuals that are on the Intel 960 CD.

The manuals range from about 150 pages to 400 pages, so it is definitely worth not reprinting the ones that I have already printed.

-Kelly Norton

The Board

The board is a IQ80960RD66. The only difference we could find between this one and the RP board is the clock speed. The RD is running at 66MHz. The basic layout of the board is shown below. There are more detailed views in the little black book for the card:

Talking To The Board:

There are two ways to talk to the board:

1. PCI

This is the preferred method, since it is fast and you don’t have to deal with terminal emulation and xmodem protocols. The one thing to look out for here is that you have to install the pci virtual device driver for NT before this will work. I’m not going to go into detail here, because when you

Host

System I960 Rx

Processor

Ethernet Interfaces

SCSI Interfaces Primary

PCI Interface

Secondary PCI

Interface

Local Bus

DRAM Flash

ROM DMA

Channels

DMA

Channel

(15)

install QuickVal, the release notes that appear at the beginning of the installation will tell you how. Just be sure not to skip over them like I did the first time.

2. Serial

Basically, if you might want to start here just to make sure that the card is responding and reaching a useful state. We used a freeware program called Tera Term Pro, but any terminal program that will allow you to talk to COM ports will work. It suggests that you change the Flow Control to XON/XOFF, but this disabled xmodem in Tera Term Pro and it didn’t seem to matter. One important thing to remember. When you connect to the card, it will not respond until you have hit the enter key about 5-6 times. At that point, you should get a (mon960) prompt. Look at the little black book for basic commands to perform a download and run some code.

Either way make sure you type quit at the end of the session. If you don’t you may have to reboot the machine to get the card to respond again.

Development Tools:

You actually have three development environments if you are not attempting to use Tornado, which is for I2O development: CTOOLS, QuickVal, and the command line stuff.

CTOOLS seems to be the best for debugging, but I didn’t do much debugging. I preferred to use the command line tools: gcc960, gld960, and gdb960.

Here’s an example of compiling and running the code contained in blink.c:

c:/users/kelly>gcc960 –ARP –Fcoff –g –c blink.c

compiler sets architecture as i960RP, format as coff, c:/users/kelly>gld960 –ARP –Fcoff –Tcyrx –o blink blink.o

linker

c:/users/kelly>gdb960 –t mon960 –pci blink

this is a command line debugging environment.

Here we set the target as the monitor program mon960 to download blink via pci.

For doing real stuff, you probably want to use CTOOLS. There is also a version 6.0 that is advertised on the Intel site. For the URL, check below under Intel’s i960 web site.

Using I

₂

0

If you ever have any interest in developing stuff in I2O, you will have to first change the Flash ROM on the board. Remove the one that says “mon960” and replace it with the one that says “IXWorks”. The whole kit came with one of those little chip pullers, so don’t kill your finger nails or slice your hand off with your pocket knife.

Another thing to remember is that you can’t run any of the I20 stuff until you have the correct OS module for NT. It is supposed to be packaged in 5.0, but for the mean time the only way to get it is to join the I2O SIG and get it.

The Tornado software is only an evaluation copy, the “real” copy costs about $12,000 right now so if you have the money, you are doing better that us. This is the software needed for I20 development.

Source Code in c:\intel960\src

Some things to note about the source code in this directory.

We are using the 960RD, which is basically a 960RP running faster. When you deal with the source code for the mon960 you will notice that all of the files have an include statement for “this_hw.h”. This is basically a header file which identifies all of the addresses for devices on the card. You should copy cyr.h to this_hw.h before you compile. Typically, all files beginning with “cyrp” are intended for this board.

The rest that have cyt, cyj, etc. are intended for other boards.

A few files of interest:

Cyr_eep.c contains eeprom functions.

(16)

Cyr_dma.c contains dma functions Cyr_uart.c contains serial port functions.

Most of these are quite obvious. Look for them in c:\intel960\src\mon960\common\

The real trick here is to figure out which files to link in. Most will compile with little or no problem. Some you may have to take the include statement out that points to mon960.h. There are a lot of files here as well. This is where I spent the bulk of my time, trying to resolve dependencies between files. Perhaps someone a little more talented in writing c code for hardware applications could do this much quicker than I could.

HOST/CARD Communication.

We had a lot of trouble here.

There is an example in the QuickVal kit, but we could only run it, we couldn’t compile it. The code uses if statements to determine if you are running WinNT or Win95. The unfortunate thing is that the code hostcode.c includes the header file dos_pci.h, but we couldn’t find this file. I also tried in vain to remove all of the if (winNt) else (win95) statements. So perhaps if you can effectively remove all of the Win95 code, you may not need to include dos_pci.h.

There is also a directory in c:\intel960\src entitled hdil and hdilcomm. These are the host/card libraries used by the debugging programs. You will probably want to use some of this code to do the

communication.

Ethernet Adapters

To do work here I would suggest, copying all of the source code out of all of the subdirectories in c:\intel960\src into the directory that you are working in. There is code for loopback tests on the ethernet adapters in the mon960 code. See these files:

Cyr_test.c contains the main function to test the adapters.

Cyr_ethb.c is the beef of the loopback test. I’ve attempted to resolve some of the linker dependencies below, by finding out where some of the extern functions were located so that these files can be linked in:

prtf(…) is in the file IO.c

bcopy(…) is in the file bcopy.c (it essentially does a memmove(…))

sgets(…) is in the cyt_io.c file. This doesn’t make sense, because cyt_* file should be for another type of board.

atod(…) is in the file convert.c

enable_interrupt(…), disable_interrupt(…), get_mask(…), set_mask(…) are in RP.s, which is in assembly and I couldn’t get it to compile. I get errors like saying that some of the opcodes aren’t supported. Never did figure this one out.

Addresses:

For the Ethernet Addresses Look at the little black book about the daughter card. It describes how to figure out the ehernet addresses. It says that the last byte and a half or so should be the serial number for the card, but I couldn’t find a serial numer anywhere on the card. You probably want to try to find it before you put the card in the machine. Since I couldn’t find the serial number, I just entered 0 and it stored that in the eeprom.

LEDS

I started here, and thought it was a pretty good place to get started. A few things. To change the leds, you just write to the LED register, at address E004 0000H. When you write remember that a 0 turns the led on and a 1 turns the LED off. For a good starting point, look at the examples in QuickVal. One of the examples has code to flash the LED’s.

(17)

Jumpers

If you look in the Hardware Section of the little black book about the card it gives a description of the jumpers that are just beside the processor on the card. If I remember correctly, the description in the book has the backwards. If you change them, only consider the stuff that is printed directly on the board. This may be kind of hard to see when the board is installed, but arranging the jumpers with what the manual said caused our machine to lock. That wasn’t a good thing.

I hope some of these notes help. Again, if you have any questions let me know. There are a lot of things that are just not worth trying to explain on paper and a few things I probably forgot.