Border Gateway Protocol (BGP)
Contents
-
Border Gateway Protocol (BGP)
- Subpages
- About BGP
- Sniffing
- Abbreviations
- Path attributes (PA)
- Best path selection
- Neighbors
- Messages
- Timers
- Network advertisement
- Synchronization (with IGP)
- Hard restart
- Graceful restart (GR)
- Graceful shutdown procedure
- Route aggregation
- eBGP multihop
- iBGP
- Route-map
- Enable IPv6 unicast routing (Cisco)
- Multiprotocol BGP (IPv4/IPv6)
- Equal cost multipath (ECMP)
- Security
Subpages
About BGP
Wiki EN - Path-Vector routing protocol
- In general distance-path routing protocols (RPs) have higher convergency times in comparison to Link-State RPs (like OSPF).
- Two flavors
eBGP External BGP Wiki EN - Exterior Gateway Protocol (EGP)
- There is another EGP called EGP that should not be confused
iBGP Interal BGP Wiki EN - Interior Gateway Protocol (IGP)
Youtube - Kevin Wallace Training, LLC - BGP Deep Dive
- Many thanks for the introduction!
- Web-Tools
- Interesting reads
- Forms neighborships
- Explicitly configured neighbors
Neighborships communicate over TCP-Session tcp/179
- No dynamic neighbor discovery using multicast (like in e. g. OSPF)
- Advertises address prefix and prefix length in NRLIs packages
- Advertises a collection if path attributes used for path selection
- Does not consider bandwidth or cost for path selection
- Scales good
- OSPF may not scale to the size required in large enterprises.
Standards
- Best Current Practices (BCP)
- IANA
Implementations
- Licenses:
- Source code: various Licenses
- Composite binary: GPLv2 or later
- Evolved from
Wiki EN - Quagga, GPLv2
Evolved from GNU Zebra, GPL
- Licenses:
Sniffing
In wireguard I find the follwing filter very handy, which filters out keep-alive messages (of type 4)
1 bgp.type!=4
Abbreviations
- ACL
- Access Control List
- AF
- Address Family
- AFI
- Address Family Identifier
- AIGP
- Accumulated Interior Gateway Protocol
- AS
- Autonomous System
- ASN
- Autonomous System Number
- BGP
- Border Gateway Protocol
- CE
- Customer Edge
- DD
- Database Description packages
- DUAL
- Diffusing Update ALgorithm
- eBGP
- External BGP
- EGP
- Exterior Gateway Protocol
- EIGRP
- Enhanced Interior Gateway Routing Protocol
- EOR
- End Of RIB
- EVPN
- Ethernet Virtual Private Network
- FIB
- Forwarding Information Base
- GR
- Graceful Restart
- HA
- High Availability
- iBGP
- Internal BGP
- IGP
- Interior Gateway Protocol
- IGRP
- Interior Gateway Routing Protocol
- IRR
- Internet Routing Registry
- IXP
- Internet Exchange Point
- L2VPN
- Layer 2 Virtual Private Network
- LIR
- Local Internet Registry
- LSA
- Link State Advertisments
- LSDB
- Link State Database
- LSR
- Link State Request
- LSU
- Link State Update
- Packet that may contain multiple LSAs
- MBGP
- Multicast BGP
- MP-BGP
- Multi-Protocol BGP
- NLRI
- Network Layer Reachability Information
- NSF
- Non Stop Forwarding
- NSR
- Non Stop Routing
- OSPF
- Open Shortest Path First
- PA
- Path Attribute
- PE
- Provider Edge
- PMTUD
- Path MTU Discovery
- RIB
- Routing Information Base
- RIR
- Regional Internet Registry
- RR
- Route Reflector
- RS
- Route-Server
- RTP
- Reliable Transport Protocol
- SAFI
- Subsequent Address Family Identifier
- SIA
- Stuck In Action
- SNM
SubNet Mask
- Tier 1 transit provider
- An IP transit provider that can reach any network on the Internet without purchasing transit services.
- SSO
- Stateful Switchover
- uRPF
- Unicast Reverse Path Forwarding
- VXLAN
- Virtual eXtensible LAN
- EoR
- End of RIB
- VTEP
- Virtual Tunnel Endpoint
- VLSM
- Variable Length Subnet Mask
Path attributes (PA)
# |
Scope |
PA |
Default |
Mnemonic |
BGP field |
Preferred |
Comment |
1 |
Local to router |
(Local) Weight |
"Off" |
We |
|
higher |
influence outbound routing decisions; |
2 |
Internal to AS |
Local preference |
"Off", |
Love |
LOCAL_PREF (5) |
higher |
influence outbound routing decisions; |
3 |
Internal to AS |
Originate / Accumulated Interior Gateway Protocol (AIGP) |
"Off" |
Oranges |
AIGP (26) |
Lowest |
|
4 |
External to AS |
AS path length |
"On", |
AS |
AS-path (2) |
Lowest |
influence outbound routing decisions; |
5 |
External to AS |
Origin Type |
IGP |
Oranges |
ORIGIN (1) |
Lowest |
0 = IGP (i) (maybe a network command) |
6 |
External to AS |
Multi-Exit Discriminator (MED) |
"On", |
Mean |
MULTI_EXIT_DISC |
Lowest |
Used to influence inbound path selection, if multiple (primary and backup) connections to a single remote AS (e.g. a single ISP) need to be weighted against each other. |
7 |
Local to router |
eBGP over iBGP paths |
"On" |
Pure |
|
|
Directly connected, over indirectly |
8 |
Local to router |
IGP metric to BGP next hop |
"On", imported from IGP |
|
|
Lowest |
Continue, even if bestpath is already selected. Prefer the route with the lowest interior cost to the next hop, according to the main routing table. If two neighbors advertised the same route, but one neighbor is reachable via a low-bitrate link and the other by a high-bitrate link, and the interior routing protocol calculates lowest cost based on highest bitrate, the route through the high-bitrate link would be preferred and other routes dropped. |
9 |
Local to router |
Path that was received first |
"On" |
|
|
oldest |
Used to ignore changes on the steps 10+ |
10 |
Local to router |
Router ID |
"on" |
Refreshment |
|
Lowest |
|
11 |
Local to router |
Cluster list length |
"On" |
|
|
Lowest |
|
12 |
Local to router |
Neighbor address |
"On" |
|
|
Lowest |
|
PAs by ID
WIP
# |
Scope |
PA |
Default |
Mnemonic |
BGP field |
Preferred |
Comment |
|
Internal to AS |
|
|
|
ORIGIN (1) |
|
|
|
External to AS |
AS path length |
"On", |
AS |
AS-path (2) |
Lowest |
|
|
|
|
|
|
NEXT_HOP (3) |
|
|
|
|
|
|
|
LOCAL_PREF (5) |
|
|
|
|
|
|
|
ATOMIC_AGGREGATE (6) |
|
|
|
|
|
|
|
AGGREGATOR (7) |
|
|
|
Internal to AS |
|
|
|
ORIGINATOR_ID (10) |
|
Carries the router-id of the route originator. |
|
Internal to AS |
|
|
|
CLUSTER_LIST (10) |
|
Carries a sequence of cluster-ids of RRs that the route has passed through. |
Best path selection
Cost or bandwidth are not considered for path selection.
During path selection PAs are compared. If they are not equal a routing decision can be done, but if they are tie the next attribute is compared.
Neighbors
- There is no dynamic BGP neighbor discovery
- Neighbors must be configured explicitly
- eBGP neighbors have different AS numbers
iBGP neighbors simply have the same remote-as AS-NUMBER during configuration
Show neighbors
Neighbor formation states
State Idle: Doing nothing
State Connect: establish TCP-session
State Active: established TCP-session
State Open Sent: initiate establish neighborship
State Open Confirm: confirm establish neighborship
State Established: established neighborship
State Shutdown: administratively down/not up
Peer group
- Simplify configuration by grouping of neighbors
- Various configuration (like policies, route-maps, filters, …) can be applied to the peer group
Example: apply #Prefix filtering to peer group
- Removing a peer group
Messages
Open: BGP version number, Local AS Number, Hold Time, BGP router ID, optional parameters
Keepalive: Keeps holdtime timer from expiring
- Defaults:
- Keepalive: 60s
- Holdtime: 180s (3 * default Keepalive)
- Defaults:
Update: Can contain withdrawn routes, path attributes and NLRI
Notification: Contains an error code, error sub-code, and information about the error
Timers
IETF RFC4271 - A Border Gateway Protocol 4 (BGP-4) #10. BGP Timers
networkstraining.com - The Most Important Border Gateway Protocol (BGP) Timers Explained
Defaults
- Keepalive: 60s
- Holdtime: 180s (3 * keepalive)
Other timers
- update-delay: 0s (disabled)
- readonly BGP until update-delay expired
Network advertisement
- You should not advertise the networks of the peers of the other AS, unless you want to become a transit AS/area and you want to route the traffic through your devices
- Only advertise your AS-internal networks you want to propagate throughout the internet
- Your router will not advertise the network defined by the network statement unless there is a corresponding route in the routing table.
- Add a null route
- A null route does not replace the directly connected route, even if the prefixes match perfectly including the prefix-length.
- A connected route is also not removed if you remove the matching null route.
- In cisco
- Under some instances a directly connected interface may not appear in the local routing. This may also only be a glitch, as IMHO they should. Shutting down and and enabling the interface may help.
- Add a null route
Synchronization (with IGP)
- To avoid unintentional black holes
- Only advertises a route learned from an iBGP peer to an eBGP peer when there is an exact match of that route learned from an IGP in the routing table
- Disabled by default in Cisco IOS 12.2(8)T
- Not really necessary
- There may be no IGP on the eBGP router
Hard restart
Cisco: In EXEC mode
1 clear ip bgp *
Graceful restart (GR)
FRR Docs - BGP #Send Hard Reset CEASE Notification for Administrative Reset
FRR Docs - BGP #Graceful Restart Graceful restart is a mechanism for BGP that would help minimize the negative effects on routing caused by BGP restart. An End-of-RIB marker is specified and can be used to convey routing convergence information. A new BGP capability, termed "Graceful Restart Capability", is defined that would allow a BGP speaker to express its ability to preserve forwarding state during BGP restart.
- Also available for OSPF, EIGRP, IS-IS, …
- GR towards RR is especially useful
- GR is preferred to Non-Stop-Routing
- Timers
- Restart timer:
- Default: 120 s
- Speaker must come back in this time or is considered dead.
- Selection_Deferral_Timer:
- Default: ? s
- Stale-path timer:
- Default: ? s
- Restart timer:
- Each routing protocol has to be checked, that the peer is capable and aware of GR
- Usage of EoR marker is beneficial for convergence time, since computation can start on reception
Process:
- Capability to perform graceful restart (Capability: 64) is announced to neighbors in OPEN message.
- May be enabled on a per neighbor basis
- Peers are notified that a GR takes place using an OPEN message with restart bit set.
- On a peer
- the routes in the routing table are marked as stale
- traffic is still forwarded to the restarting speaker
- the restart timer is started
- On a peer
- On the restarting speaker the Data-plane continues to forward traffic, while the control-plane restarts.
- When a new BGP session is established
- On the peer
- the restart timer is stopped (heard back)
- the stale-path timer is started
- On the restarting router
- initial updates are exchanged with neighbors
- performs Path-Selection
- on End-Of-RIB (EoR) by all peers with GR capability set or
- Selection_deferal_timer has expired
- send updates and EoR to neighbors
- On the peer
- On the peer
- the stale-path timer is stopped
- the stale prefixes are flushed
- new prefixes are injected
- Convergence
Graceful shutdown procedure
Youtube.com - NLNOG day 2017 - Job Snijders (NTT) - BGP Graceful Shutdown
- Simple procedure to reduce negative impact of shutting down BGP sessions (e.g. during maintenance)
- Can be combined with
- Can be part of the operational procedure as outlined in
IETF BCP214 - Mitigating the Negative Impact of Maintenance through BGP Session Culling
- Graceful shutdown is a "Make Before Break" mechanism
- Does not help against unplanned outages
- Not to be confused with "BGP Graceful Restart"
- Only useful for multi-homed AS
Scenario topology
Normal shutdown
Router A (RA) is shutdown
- BGP session to ASBR2 is torn down
- ASBR2 withdraws routes sending a NLRI to the route reflector (RR)
- ASBR2 has no more routes to prefixes announced by RA and blackholes legitimate traffic
- RR selects a new best path and updates ASBR2
- ASBR2 starts forwarding traffic using the new alternative path
Graceful shutdown
- Before maintenance Router A (RA)
Attaches the GRACEFUL_SHUTDOWN well-known community 65535:0 (as per IANA) to the prefixes
- Waits for a reasonable time (some minutes)
- ASBR2 receives updates from RA
Finds the community GRACEFUL_SHUTDOWN in the NLRI
- Lowers the LOCAL_PREFERENCE (AS-wide) to 0
- Updates RR
- RR p
RR sends new best path to ASBR2 and other peers
Community GRACEFUL_SHUTDOWN propagates through AS
- Other intermediate AS may select a new path prior to shutdown
- ASBR2 selects new best path received from RR
- Alternative path are already in place
- RA is drained
- Minimal package loss
- RA can now be shutdown without impact
- BGP session to ASBR2 is torn down without impact
- …
Graceful shutdown configuration
Allow peers to graceful shutdown
Configure this on each and every BGP session
- Example Cisco IOS XR
1 conf term 2 route-policy AS-ASN_NEIGH-ebgp-inbound 3 if community matches-any (65535:0) then 4 set local-preference 0 5 endif 6 end-policy 7 ! 8 router bgp ASN_OWN 9 neighbor IP_NEIGH remote-as ASN_NEIGH 10 address-family ipv6 unicast 11 send-community-ebgp 12 route-policy AS-ASN_NEIGH-ebgp-inbound in 13 end
- Example ARISTA/Brocade/IOS/Quagga/FRR
Graceful shutdown your systems
Cisco IOS XR (automatically shuts down BGP session)
Route aggregation
- Reduces the size and therefor increases the efficiency of the routing table.
- Algorithm
- Binary AND of network prefixes to determine common bits.
- Appending bits with value 0 of the resulting network address are counted, subtracted from the maximum length of the SNM (32/128) and converted to decimal to calculate the subnet mask.
- Summarized network address and SNM are concatenated to the summarized network prefix.
- Summarized address
- Be careful, because summarization will be performed also with wholes, which may cause issues.
- Should work because advertisements of network falling into that route are more specific.
- Network with the least cost is picked as cost for the summarized route
- Blueprint
eBGP multihop
- Form a neighborship with a not neighboring router
iBGP
iBGP Assumptions
- Full Mesh - All routers running BGP are directly connected in the same AS
- The number of neighborships grows quadratic
with the number of routers participating in iBGP in the same AS.
Number of adjacencies in a full mesh- Solutions
- BGP Confederation using Sub Autonomous Systems
- Sub-ASs from private AS address space
- Sub-Autonomous Systems require a full mesh …
- BGP Route reflectors
- Mirrors received advertisements to other routers
- BGP Confederation using Sub Autonomous Systems
- Solutions
- The number of neighborships grows quadratic
An iBGP neighbor has simply the same remote-as AS-NUMBER during configuration.
iBGP next hop self
- iBGP router has learned a route from another eBGP peer and the route is shown in, but the next hop is set to an unreachable peer of the propagating eBGP router,
where the iBGP router has no route to reach him
show ip bgp The route is not in the routing table
show ip route- The eBGP router does not adjust the attribute "next hop"
- On the eBGP router set the attribute next hop for a specific neighbor
iBGP route reflector (RR)
- To overcome the iBGP requirement of a full mesh an iBGP router can be configured as a route reflector.
- Therefor the iBGP neighbors of this [designated] router, that should receive mirrored route advertisements are configured to be a route-reflector-client
- A RR breaks the split horizon rule.
- A RR advertises routes learned from iBGP peers to other iBGP peers.
- Route reflectors do not modify any BGP path attributes, when they reflect a route, except the ORIGINATOR_ID and the CLUSTER_LIST PAs.
RR clusters
An RR and its clients form a cluster. The cluster ID is then attached to every route advertised by the RR to its client or nonclient peers. A cluster ID is a cumulative, non-transitive BGP attribute, and every RR must prepend the local cluster ID to the cluster list to avoid routing loops.
iBGP router types
- Route reflector server
- send reflections
- Route reflector clients
- receive reflections
- non-clients
- Part of the full mesh
iBGP route reflection rules
RR servers propagate routes inside the AS based on the following rules:
- Routes are always reflected to eBGP peers.
- Routes are never reflected to the originator of the route.
- If a route is received from a non-client peer, reflect to client peers.
- If a route is received from a client peer, reflect to client and non-client peers.
iBGP loop prevention
- Split-horizon
- A route learned from a iBGP peer is not advertised to other iBGP peers within the AS.
- RR is disables this rule.
- Filtering on ORIGINATOR_ID (PA Originate)
- type code: 9
- Carries the router-id of the route originator
- ORIGINATOR_ID is set by the first RR to the router-id of the router that originated the route into the AS. The ORIGINATOR_ID is not transitive and stays unmodified while passing other RRs.
- If a router receives an update that contains its own router-id in the ORIGINATOR_ID field it discards the update. The route is ignored.
- Only used by RRs
- Filtering on CLUSTER_LIST (PA CLUSTER_LIST)
- type code: 10
- Carries a sequence of cluster-ids of RRs that the route has passed through
- When a RR reflects a route it prepends its local cluster-id to the CLUSTER_LIST PA
- If an RR receives an route with a CLUSTER_LIST PA has includes its own cluster-id, it discards the update.
- Only used by RRs
iBGP Confederation
- Makes sense in bigger topologies.
- to be evaluated
Route-map
Outbound path selection (LOCAL_PREF)
- Create a route-map that sets LOCAL_PREF
- Apply route-map to ISP-routes to influence outbound path selection (AS-wide). (9
Inbound path selection (AS-path prepending)
- Create a route-map that prepends additional instances of your own AS-number
- Apply route-map to outbound routes to ISP1 to influence inbound path selection.
Enable IPv6 unicast routing (Cisco)
Multiprotocol BGP (IPv4/IPv6)
- Using address-families for IPv4/IPv6
- Configure IPv6 addresses to the interfaces
- Create route-map
- Setup BGP router
1 conf term 2 router bgp AS-NUMBER 3 neighbor IP-ADDRESS_ISP2 remote-as AS-ISP2 4 !ENABLE BOTH ADDRESS FAMILIES 5 address-family ipv4 6 network IPV4-PREFIX mask SNM 7 exit 8 address-family ipv6 9 network IPV6-PREFIX mask SNM 10 ! ACTIVATE IPV6 PEER 11 neighbor IP-ADDRESS_ISP2 activate 12 neighbor IP-ADDRESS_ISP2 route-map IPV6-NEXT-HOP out 13 exit
- Test with
Equal cost multipath (ECMP)
notes.networklessons.com - BGP Equal Cost Multipath AS_Path Attribute
First make sure to have equal costs on the routes.
- There are multiple ways to achieve ECMP
- Inject multiple paths in routing table
- Configure loopback with address, set static host routes to neighbor loopbacks, enable ebgp-multihop and run BGP-session from loopback to loopback
1 conf term 2 ip route REMOTE-IP-ADDRESS1 255.255.255.255 REMOTE-IFACE-ADDRESS 3 ip route REMOTE-IP-ADDRESS1 255.255.255.255 REMOTE-IFACE-ADDRESS 4 router bgp AS-NUMBER 5 neighbor REMOTE-IP-ADDRESS1 remote-as PEER-AS-NUMBER 6 neighbor 172.16.31.2 update-source loopback0 7 neighbor 172.16.31.2 ebgp-multihop 2
- Use LACP/802.3ad, … and configure a BGP session over this joint link.
Security
Filter TCP/197
- ACL filters input on tcp/179 for devices not eligible to become neighbors.
- Protection from DoS by resource exhaustion (bandwidth, CPU, …)
- Spoofed TCP RST can bring down BGP sessions
- On control-plane or interface level
- Please also see:
- Rate-limit tcp/179 to reserve platform resources
Filter inbound own source-addresses
- Block spoofed packets (packets with a source IP address belonging to their IP address space) at all edges of their network
- protects the TCP session used by Internal BGP (IBGP) from attackers outside the Autonomous System
Prefix filtering
- in- or outbound
- match IPs, ASNs, or any other attribute of a prefix (e.g. communities)
- Filters
- Drop IPv4/6 special purpose prefixes
- prefixes of non-global scope
- Drop IPv4/6 unallocated prefixes
- RIR-Allocated Prefix Filters
Internet Routing Registry ToolSet
github.com irrtoolset/irrtoolset
- IRRToolSet is not actively maintained
- Instead use BGPQ4
github.com bgp/bgpq4 - bgp filtering automation tool
- Drop IPv4/6 special purpose prefixes
Show prefix lists
1 show ip prefix list
Create a prefix list
Session Protection
- Please also see:
Policies
- BGP is typically restricted by policies.
- Common policies are
- Bogon filtering
- Limit the routes learned to a maximum AS-path length
- Filter routes with origin "255" (devel)
- Common policies are
Authentication
- Peer may be authenticated (using MD5)
IETF RFC2385 - Protection of BGP Sessions via the TCP MD5 Signature Option
IETF - The TCP Authentication Option (TCP-AO)
- Also possible using IPsec
- [[|IETF - ]]
Error tolerance
- Do not tear down BGP session, just withdraw the route with the potentially faulty attribute.
Some keywords