AM-PM: Alternate Marking
for Network Telemetry
Tal Mizrahi, Gidi Navon, Carmi Arad, Giuseppe Fioccola,
Background
What is network telemetry?
Performance
measurement
Delay
Packet loss
Queue status
…
Why do we need telemetry?
Detection
Failures
‘Elephant’
flows
Congestion /
Network Telemetry
Operations, Administration, Maintenance (OAM)
Network measurement / monitoring:
Control Message
This Talk is Based on…
RFC 8321
Fioccola, G., Capello, A., Cociglio, M., Castaldelli, L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, “Alternate Marking method for passive and hybrid performance monitoring”, RFC 8321, 2018.draft-mizrahi-ippm-multiplexed-alternate-marking (internet draft)
T. Mizrahi, C. Arad, G. Fioccola, M. Cociglio, M. Chen, L. Zheng, and G. Mirsky. “Compact Alternate Marking Methods for Passive Performance Monitoring”, draft-mizrahi-ippm-compact-alternate-marking, work in progress, IETF, 2018.
+ 2 papers
… in reviewOld-School Passive Monitoring
Counters
Per port
Queue State
Latency
Per flow
Per queue
Carrier Network OAM
OAM Protocols
IETF ICMPv4
IEEE 802.1ag ITU-T Y.1731
IETF ICMPv6 IETF IPPM
IP OAM Higher Layers Layer 3 Layer 2 Layer 1 ITU-T Y.1711 MPLS OAM IEEE 802.3ah
MPLS / PWE3 OAM Ethernet OAM ITU-T G.8113.1 MPLS-TP OAM IETF MPLS-TP OAM IETF LSP-Ping MPLS OAM IETF PWE3 VCCV IETFBFD
Active measurement / monitoring:
Control Message
Fate Sharing
Piggybacked Measurement
Measurement info is piggybacked onto data packets
Piggybacked Metadata – IOAM / INT
IOAM / INT Domain
Analytics Server
Switches push local
metadata into header:
delay, queue state, …
Telemetry
Info
IOAM
In situ OAM
INT
In-band Network Telemetry
AM-PM
: What Can We Do with ONE Bit Per Packet?
Measurement
Marking Bit
000 11111
00000 111
Marking Bit
00000001000000000
AM-PM:
Pulse
Marking – Delay Measurement
Analytics Server
Time Sent: March 8th, 16:02, 123400789 nsec (UTC) Time Received: March 8th, 16:02, 123500789 nsec (UTC) Network Delay: 100 μsec
Servers Servers
Checks when
packet sent
Checks when
AM-PM:
Pulse
Marking – Loss Measurement
Counter: 2100 Counter: 2000 Packets lost: 100
Servers Servers
Records counter value
Packets Sent: 10,000
Packets Lost: 500 Packets Received: 9,500
Servers Servers
AM-PM:
Alternate
Marking – Loss Measurement
Counts number of packets received Counts number
of packets sent
Consistent counting:
•
Export the counter of each color
when it is not in use.
...
Servers Servers
AM-PM:
Double
Marking
Pulse bit: Delay
Step bit:
Loss
AM-PM:
Multiplexed
Marking
Servers Servers
ONE
bit per packet
Accurate loss and
delay measurement!
AM-PM:
Hashed
Marking
Servers Servers
H(Pkt) = Hash of the packet header
If H(Pkt)=0
Pulse
Measurement within a specific domain
How do we get a spare bit in the header?
Measurement
Unused fields in the header:
-
Spare bit in the IP DSCP
-
Unused IP flag
Dedicated bit(s) in the header.
Ongoing AM-PM work in the IETF:
QUIC
MPLS
NSH
BIER
Geneve
Large Scale Deployment in Telecom Italia
•
Mobile backhaul network ~ 1000 eNodeBs.
•
AM-PM one bit (step-based) loss measurement.
•
Uses unused bit in DSCP.
AM-PM Evaluation using Marvell Prestera Switches
loss and delay
congestion is detected
Traffic Generator
Management Monitored data flow Background traffic
Switch 1
Software Implementation using P4
S
1
S
2
S
3
H
1
H
2
Server
•
Technion undergraduate project.
•
Implemented in P4.
•
Tested in Mininet.
Time.Sec
1
Time.Frac
*
…*
*
…*
Periodic range
...
time
1 second
actionTime
field2 field3 field4 …
TCAM Switch
header / metadata
Implementing Multiplexed Marking:
Time-multiplexed Parsing
•
TimeFlip is used to divide time into time slots.
•
Time
0
1
000 001 010 011 100 101 110 111
IOAM / INT Domain
Analytics Server
Combining IOAM/INT with AM-PM
AM-PM
Pulse
as a trigger for
exporting telemetry info.
Per-hop Telemetry
Challenge
: export telemetry info without
the expensive overhead of IOAM/INT.
Network Telemetry: An Evolution
Ping
Traceroute
Passive
Monitoring
Carrier
OAM
References
[1] Fioccola, G., Capello, A., Cociglio, M., Castaldelli, L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, “Alternate Marking method for passive and hybrid performance monitoring”, RFC 8321, 2018.
[2] Mizrahi, T., Arad, C., Fioccola, G., Cociglio, M., Chen, M., Zheng, L., and G. Mirsky, “Compact Alternate Marking
Methods for Passive Performance Monitoring”, draft-mizrahi-ippm-compact-alternate-marking, work in progress, IETF, 2018.
[3] Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, P., Chang, R. and D. Bernier, "Data Fields for In-situ OAM", draft-ietf-ippm-ioam-data-00, work in progress, 2017. [4] C. Kim et al., “In-band network telemetry (INT)”, P4 consortium, 2015.
[5] Mizrahi, T., Vovnoboy, V., Nisim, M., G. Navon, and A. Soffer, “Network Telemetry Solutions for Data Center and Enterprise Networks”, Marvell white paper, 2018.
[6] Mizrahi, T., Rottenstreich, O. and Y. Moses, “TimeFlip: Scheduling Network Updates with Timestamp-based TCAM Ranges”, IEEE INFOCOM, 2015.
[7] Mizrahi, T., Navon, G., Fioccola, G., Cociglio, M., Chen, M., and G. Mirsky, “AM-PM: Efficient Network Telemetry using Alternate Marking”, submitted, 2018.