Fundamentals of Border Gateway Protocol (BGP) - Part 2
Mar 12, 2019Part 1 of our blog series on Border Gateway Protocol (BGP) gave you an overview of BGP and then delved into BGP message types and neighbor states. Now, in this post, you'll learn about one of the most challenging aspects of BGP, how it makes its path selection decision. While routing protocols such as RIP, OSPF, and EIGRP each have their own metrics used to pick the "best" path to a destination network, BGP uses a collection of path attributes (PAs).
BGP Path Attributes
When your BGP speaker receives a BGP prefix, there are going to be many path attributes tagged to it, and we know that these are going to be critical when it comes to BGP doing things like choosing a very best path to a destination. Interestingly, not all of these path attributes are created equal.
All BGP path attributes fall into one of four main categories. Note that this list also provides example attributes in each category. Do not be too concerned with these specific attribute values now, as you will understand many of them fully when you complete this blog series.
- Well-Known Mandatory (for example: Origin, AS Path, and Next Hop)
- Well-Known Discretionary (for example: Local Preference)
- Optional Transitive (for example: Community)
- Optional Non-Transitive (for example: Cluster List)
Notice that two of the categories begin with the term well-known. Well-known means that all routers must recognize this path attribute. The two other categories begin with the term optional. Optional means that the BGP implementation on the device doesn't have to recognize that path attribute at all.
Then we have the terms mandatory and discretionary associated with the well-known term. Mandatory means that the update must contain that attribute. If the attribute does not exist, a notification error message will result, and the peering will be torn down. Discretionary, of course, would mean the attribute does not have to be in the update.
With the optional attribute categories, we have transitive and non-transitive. If transitive, the device needs to pass that path attribute on toward its next neighbor. If non-transitive, it can just ignore that attribute value.
Example 1 shows the examination of several of the path attributes for a prefix that has been received by the TPA1 router from the ATL router. Note that we use the show ip bgp command in order to see this information that is stored in the BGP routing database. Specifically, this output shows the attributes of Next Hop, Metric (MED), LocPrf (Local Preference), Weight, and Path (AS Path).
Example 1: Viewing Some of the BGP Path Attributes for a Prefix
The Origin Attribute
The origin attribute in BGP is an attempt to record where a prefix came from. There are three possibilities when it comes to the origin for this attribute: IGP, EGP, and Incomplete. You can see from the legend in Example 1 that the codes Cisco uses for these origins are i, e, and ?. For the prefix shown in Example 1, you can see that the origin is IGP. This indicates that the prefix made its way into this topology thanks to the network command inside of the configuration of that source device. We will cover the network command in all of its glory later on in this chapter. The term IGP here assumes that the prefix came from an Interior Gateway Protocol entry. Let's say we have a prefix in our OSPF routing table, and then we use the network command inside of BGP to put it into the BGP ecosystem. Of course, IGPs are not the only source for prefixes that might carry this attribute. For example, you might create a local loopback interface on the device, then use the network command to have this local prefix advertised into BGP.
EGP is referencing the now antiquated Exterior Gateway Protocol, the predecessor to BGP. As a result, you are never going to see this origin code.
Incomplete means that BGP is unsure of exactly how the prefix was injected into the topology. The most common scenario here is that the prefix was redistributed into Border Gateway Protocol from some other protocol, typically an IGP.
It is a fair question to ask why the origin code is of such importance. The answer lies in the fact that it is a key factor when BGP is using its algorithm to select a best path to a destination in the network. It can break “ties” between multiple alternative paths in the network. We also give this attribute great attention, because it is indeed one of the well-known, mandatory attributes that must exist in our updates.
The AS Path Attribute
AS Path is a well-known mandatory attribute. It is absolutely critical for the best path decision, as well as loop prevention inside of Border Gateway Protocol.
Examining our topology shown in Figure 1, consider a prefix originated at TPA. This update is sent to TPA1, and TPA does not add (called prepending in BGP) its own AS of 100 in the AS Path, since the neighbor it is sending the update to is in its own AS per the iBGP peering.
Figure 1: A Sample BGP Topology
When TPA1 sends this update to ATL, it will prepend the AS number of 100 to the update. Following this logic, ATL can update ATL2 and will not prepend its own AS number. It is not until ATL2 sends this on to some other AS when it will prepend the AS of 200. This means that when we examine a sample AS path as shown in Example 2, the rightmost AS in the path is the AS that first originated the prefix (100), and the leftmost AS is the AS that delivered the prefix to the local device (342).
Example 2: A Sample BGP AS Path
The Next Hop Attribute
It's really no surprise that a BGP prefix has an attribute called Next Hop. After all, a router is going to need to know where to send traffic for that prefix. The Next Hop attribute fulfills this need. An interesting point here, however, is the fact that Next Hop in BGP does not work exactly as it does in most IGPs. Also of note is that the rules change when you are examining iBGP versus eBGP.
When thinking about an Interior Gateway Protocol, when a device sends an update to its neighbor, the default next hop value is the interface IP address that is sending the update. This continues to be reset by each router as the update traverses the topology. The next hop takes on a simple “hop-by-hop” paradigm.
With BGP, when we have an eBGP peering and the prefix is sent, the Next Hop is indeed going to be (by default) the IP address of the eBGP speaker sending the update. However, this eBGP speaker's IP address is going to be retained as the Next Hop as the prefix is passed on from iBGP speaker to iBGP speaker. Very often we see the Next Hop attribute populated with an IP address that is not the device that handed us the update. It is really an address that represents the neighboring AS, which provided us with the prefix. So, it is correct to think of BGP as an “AS-to-AS” protocol instead of a “hop-to-hop” protocol.
This can cause some interesting issues. The main consideration here is that you must ensure all of your BGP speakers can reach the Next Hop value in the attribute so that they consider the path valid. Smartly, BGP speakers will consider a prefix invalid if they cannot reach the Next Hop value.
Fortunately, there is a workaround you can use in specific cases. You can take an iBGP device and instruct it to set itself as the Next Hop value whenever you need to. This is done with a manipulation of the peering using the neighbor command, as shown in Example 3.
Example 3: Manipulating the Default Next Hop Behavior in BGP
The BGP Weight Attribute
Weight is a very interesting attribute with BGP, because it is specific to Cisco. The good news is that since Cisco is such a giant in the industry, many other vendors will support the use of Weight as an attribute.
Weight is also one of the most unique attributes, because the value is not passed to other routers. Weight is a value that is assigned to our prefixes as a locally significant value. Weight is a simple number in the range of 0 through 65535, and the higher the weight value, the higher the preference for that path. When the prefix is locally generated, it will get a weight of 32768. Otherwise, the default weight is 0 for a prefix.
How might Weight be used? It seems strange at first, since it is not passed to other BGP speakers. However, the answer is simple. Let’s say your router receives the same prefix from two different autonomous systems that it peers with. If the administrator wants to prefer one of the paths for whatever reason, they can manipulate the local Weight value on the preferred path and instantly influence BGP's best path decision process.
BGP Best Path Selection
As stated earlier, we know that with IGPs, we have a metric value that is key for determining the best path to a destination. In the case of OSPF, that metric is based on cost, which is based on bandwidth. With BGP, there are many path attributes that a prefix can have. These all lend themselves to the BGP Best Path Selection Algorithm. Figure 2 shows the steps (beginning at the top) that are used in the Cisco BGP selection of best paths.
Figure 2: The BGP Best Path Selection Algorithm
As you examine these path decision criteria, you might immediately question why it has to be this complicated. Remember, when we're dealing with something like the Internet, we want there to be as many “tuning knobs” as possible for BGP policy. We want to be able to control as much as possible how prefixes are being shared and preferred throughout such a large and complex network.
NOTE: Other vendors follow this similar order and consideration of path attributes, but there might be slight differences. For example, Juniper would have Juniper-only considerations in their algorithm, but the main order of the well-known, mandatory attributes would be about the same. For example, the shortest AS Path would come before something like the Origin type.
When preparing for a certification exam that covers BGP, you should make yourself aware of key facts about this algorithm. For example, it is critical to know that Cisco considers the highest Weight value before looking at any other path attribute.
It is also very important to realize that before this analysis of the best path can even take place against a prefix, there must be some checks the prefix must pass in order to be even compared against other paths. For example, as discussed earlier, the Next Hop must be accessible.
Once a prefix is considered valid, the analysis starts from the top down of these best path criteria. If there is a difference in Weight values for multiple paths of a prefix, the path with the highest Weight is preferred path, and any further analysis stops. Notice also in this algorithm (as it is often referred to) that as you near the bottom of the list, the values compared are almost silly. They exist there just to break any ties that are resulting from earlier comparisons. After all, at least one path must be considered best.
I hope you've enjoyed this look at BGP path attributes. In Part 3 of this series, you'll learn about how BGP neighborships are formed, within an autonomous system, between autonomous systems, and even between routers that are not directly connected. See you then.