Example Network Processors - Tzi-cker Chiueh

Tzi-cker Chiueh

2.4.4 Example Network Processors

Intel’s Internet exchange architecture (IXA) [6] includes an IXE component as the switching fabric, an IXF component for framing and formatting, an LXT component for physical-layer processing, and an Internet exchange processor (IXP) for packet processing. The IXP consists of a StrongARM core, six microengines and interfaces with the SRAM, SDRAM, the PCI bus, and a proprietary bus, the IX bus. The StrongARM core performs such supervisory processing as maintaining the routing table. Each of six microengines is a RISC core augmented with special instructions optimized for network processing such as bit extraction, table lookup, and single-cycle shifting, and with support for hardware multithreading. Each microengine has four program counters that allow four parallel threads to time-share a microengine’s data path. There are two banks of single-ported general-purpose registers for ALU operations, and four single-ported transfer registers to read=write SRAM and SDRAM. The IX bus allows the IXPs to interface with IXFs and IXEs, and supports 5 Gbps at 80 MHz.

Agere’s PayloadPlus architecture [9] includes a fast pattern processor (FPP), a routing switch processing (RSP), an agere system interface (ASI), and a functional programming language (FPL) for programming the FPP and RSP. The FPP sits between the physical interface and the RSP, and performs packet re-assembly, protocol recognition and associated computation, and calculation of checksums and CRC. The FPP is based on a pipelined and multithreaded architecture. It allocates a thread and a context to process each incoming packet, and operates on one 64-byte block at a time, each in the associated packet’s context. To program the FPP, system designers use a declarative programming language, FPL, to specify the set of protocols to recognize and the set of actions to take for each specified protocol. Programs for the FPP are represented as trees, where nodes correspond to pattern recognition functions and leaves as actions. The RSP sits between the FPP and the switch fabric controller, and consists of three VLIW engines: Traffic Management Compute engine that enforces packet discarding policies and maintains queue statistics, Traffic Shaper Compute engine that ensures QoS and CoS for each connection queue, and Stream Editor Compute engine that performs necessary packet modifications. These three engines work on each packet together as a linear pipeline. The ASI interfaces with the host processor for configuration and program download, and in addition coordinates the data movement between the FPP and RSP.

C-Port’s digital communications processor (DCP) [10] includes 16 channel processors (CP), five specialized processors, and a 160 Gpbs internal bus. Each CP interfaces with the physical link interface, and consists of a RISC core and two serial data processors (SDP). SDPs perform low-level bit manipu- lation task whereas the RISC core performs such high-level task as packet scheduling and traffic statistics collection. The five specialized processors perform classification table access, packet buffering, routing table lookup, interfacing with the switch fabric, and supervisory processing. C-Port supports a special communications programming interface called C-Ware to simplify system designers’ task of programming DCP.

2.4.5 Conclusion

In this chapter, we present the set of tasks that a modern network processor needs to perform, describe a set of architectural features specifically designed for network packet processing, and survey several commercial network processor architectures as examples. Most of existing network processors include special instructions to speed up packet processing, and use a parallel multithreaded architecture to exploit multiple levels of parallelism; however, these architectures cannot scale to OC768 link rate and beyond, and, therefore, further research into network processor architecture is warranted. Here are several research directions that we believe are worth exploring:

. _{Scalable packet classification mechanism that supports variable-length application-level classifi-} cation patterns

. _{Integrated packet scheduling for both switch fabric and output links to achieve per-connection} QoS in an input queuing network device architecture

. _{Novel memory management scheme that exploits the abundant internal bandwidth of intelligent} RAM architecture [12] to cost-effectively satisfy the memory bandwidth requirements of terabit links

. _{Architectural support for active networking and other high-level network functionalities}

References

1. Nick McKeown and Thomas E. Anderson. ‘‘A quantitative comparis on of scheduling algorithms for input-queued switches.’’ Computer Networks and ISDN Systems, vol. 30, no. 24, pp. 2309–2326, December 1998.

2. Nick McKeown. ‘‘iSLIP: A scheduling algorithm for input-queued switches.’’IEEE Transactions on Networking, vol. 7, no. 2, April 1999.

3. Keshav, S., ‘‘On the efficient implementation of fair queueing.’’Journal of Internetworking: Research and Experience, Vol. 2, no. 3, September 1991.

4. Tennenhouse, D. and D. Wetherall. ‘‘Towards an active network architecture.’’Computer Commu- nication Review, vol. 26, no. 2, p. 5–18, April 1996.

5. Patrick Crowley, Marc E. Fiuczynski, Jean-Loup Baer, and Brian N. Bershad. ‘‘Characterizing processor architectures for programmable network interfaces.’’ In Proceedings of the 2000 Inter- national Conference on Supercomputing, Santa Fe, N.M., May 2000.

6. Intel Internet Exchange Architecture, http:==developer.intel.com=design=ixa=whitepapers=ixa.htm. 7. Tzi-cker Chiueh and Prashant Pradhan. ‘‘High-performance IP routing table lookup using CPU

caching.’’ InProceedings of IEEE INFOCOM 1999, New York City, April 1999.

8. Tzi-cker Chiueh and Prashant Pradhan. ‘‘Cache memory design for internet processors.’’ InPro- ceedings of Sixth Symposium on High-Performance Computer Architecture (HPCA-6), Toulouse, France, January 2000.

9. Agere Systems. The PayloadPlus Architecture. http:==www.lucent.com=micro=netcom=docs= fppproductbrief.pdf.

10. David Husak and Robert Gohn. ‘‘Network processor programming models: the key to achieving faster time-to-market and extending product life.’’ http:==www.cportcorp.com=products=pdf=net_ proc_prog_models.pdf.

11. Xtream Logic Corporation. ‘‘Xstream logic packet processor core.’’ http:==www.xstreamlogic. com=architectural_files=v3_document.htm.

12. David Patterson et al. ‘‘A case for intelligent DRAM: IRAM,’’IEEE Micro, April 1997.

13. Anthony J. McAuley, Paul F. Tsuchiya, and Daniel V. Wilson. ‘‘Fast multilevel hierarchical routing table using content-addressable memory.’’ U.S. Patent serial number 034444. Assignee Bell Com- munications Research, Inc., Livingston, NJ, January 1995.

2.5 Stream Processors and Their Applications

for the Wireless Domain

In document Digital Systems and Applications 2e pdf (Page 174-176)