Detecting unexpected traffic flows (routing violations)

Unexpected traffic flows are flows that conflict with the policy of operators. The typical example of these flows is traffic between two non-customer neighbors of a network (e.g. transit provider to a settlement-free peer), or any other one producing revenue leaks. Unexpected traffic flows can occur due to a number of reasons including pointing of default routes, erroneous routing configurations, and advertisement of more specific routes in combination with BGP scoping services. A detailed description of unexpected traffic triggered by the latter can be found here.

Detecting unexpected traffic flows should be an integral part of the day-to-day monitoring of an Internet service provider backone that is transiting public Internet IPv4/IPv6 traffic.

Pmacct is able to create traffic matrices of custom spatial and temporal resolution by integrating telemetry data (IPFIX, NetFlow, sFlow) with BGP prefix information. As a result, it can efficiently support such monitoring activity. Assuming capturing is done inbound for traffic entering the observed routing domain, the following two elements can be used for the purpose:

  1. local preference value
  2. input interface, physical or logical

Say a purposefully built traffic matrix encompasses the following aggregation method:

pre_tag_map: /path/to/inbound.map
aggregate: tag, local_pref

Say the Local Preference values used in the observed routing domain are essentially three:

  1. LP=80 for upstream transits
  2. LP=100 for peers, private and public
  3. LP=120 for customers

Say a pre_tag_map is filled as follows:

! Transit port
id=80       ip=<x.x.x.x>      in=<I>
! Peer port
id=100      ip=<y.y.y.y>      in=<J>
! Customer port
id=120      ip=<z.z.z.z>      in=<K>

The resulting traffic matrix will be on these lines:

mysql> SELECT agent_id AS tag, local_pref FROM <table>;
+-----+------------+
| tag | local_pref |
+-----+------------+
| 100 |        120 |
| 100 |        120 |
| 100 |        120 |
| 120 |         80 |
| 100 |        100 | <-- violation
| 100 |        120 |
| 100 |        120 |
| 100 |        120 |
| 100 |        120 |
| 120 |        100 |
+-----+------------+

This is just a typical scenario in which the Local Preference is set as LPtransit < LPpeer < LPcustomer. As we have just described, it's possible to tag the ingress interfaces in the exact same fashion, INtransit < INpeer < INcustomer. To detect a violation a quick check can be run against the traffic matrix to see whether any flow is transiting through the observed routing-domain connecting any two peers, upstream providers or combination of these:

IF (IN < INcustomer AND LP < LPcustomer) THEN
{
  // Violation occurred
}

Futher enriching the traffic matrix, ie. specifying the aggregation method below, can intuitively show not only the fact that there has been a violation but also highlight the tributary traffic aggregates responsible for such violation.

pre_tag_map: /path/to/inbound.map
aggregate: tag, src_as, peer_src_as, src_net, local_pref
bgp_peer_src_as_type: bgp

+-----+--------+-------------+----------------+------------+
| tag | as_src | peer_as_src | peer_ip_src    | local_pref |
+-----+--------+-------------+----------------+------------+
| 100 |     J0 |          K0 | x.x.x.x        |        120 |
| 100 |     J1 |          K1 | x.x.x.x        |        120 |
| 100 |     J2 |          K2 | x.x.x.x        |        120 |
| 120 |     J3 |          K3 | x.x.x.x        |         80 |
| 100 |     J4 |          K4 | x.x.x.x        |        100 | <-- violation
| 100 |     J5 |          K5 | x.x.x.x        |        120 |
| 100 |     J6 |          K6 | x.x.x.x        |        120 |
| 100 |     J7 |          K7 | x.x.x.x        |        120 |
| 100 |     J8 |          K8 | x.x.x.x        |        120 |
| 120 |     J9 |          K9 | x.x.x.x        |        100 |
+-----+--------+-------------+----------------+------------+

pmacct Wiki: DetectingRoutingViolations (last edited 2015-10-24 16:36:30 by paolo)