5
0
mirror of git://git.proxmox.com/git/pve-docs.git synced 2025-01-06 13:17:48 +03:00
pve-docs/pve-network.adoc
Thomas Lamprecht 62fd0400e9 sysadmin network: document disabling mac-learning on a bridge
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-19 16:25:40 +01:00

602 lines
20 KiB
Plaintext

[[sysadmin_network_configuration]]
Network Configuration
---------------------
ifdef::wiki[]
:pve-toplevel:
endif::wiki[]
{pve} is using the Linux network stack. This provides a lot of flexibility on
how to set up the network on the {pve} nodes. The configuration can be done
either via the GUI, or by manually editing the file `/etc/network/interfaces`,
which contains the whole network configuration. The `interfaces(5)` manual
page contains the complete format description. All {pve} tools try hard to keep
direct user modifications, but using the GUI is still preferable, because it
protects you from errors.
A 'vmbr' interface is needed to connect guests to the underlying physical
network. They are a Linux bridge which can be thought of as a virtual switch
to which the guests and physical interfaces are connected to. This section
provides some examples on how the network can be set up to accomodate different
use cases like redundancy with a xref:sysadmin_network_bond['bond'],
xref:sysadmin_network_vlan['vlans'] or
xref:sysadmin_network_routed['routed'] and
xref:sysadmin_network_masquerading['NAT'] setups.
The xref:chapter_pvesdn[Software Defined Network] is an option for more complex
virtual networks in {pve} clusters.
WARNING: It's discouraged to use the traditional Debian tools `ifup` and `ifdown`
if unsure, as they have some pitfalls like interupting all guest traffic on
`ifdown vmbrX` but not reconnecting those guest again when doing `ifup` on the
same bridge later.
Apply Network Changes
~~~~~~~~~~~~~~~~~~~~~
{pve} does not write changes directly to `/etc/network/interfaces`. Instead, we
write into a temporary file called `/etc/network/interfaces.new`, this way you
can do many related changes at once. This also allows to ensure your changes
are correct before applying, as a wrong network configuration may render a node
inaccessible.
Live-Reload Network with ifupdown2
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
With the recommended 'ifupdown2' package (default for new installations since
{pve} 7.0), it is possible to apply network configuration changes without a
reboot. If you change the network configuration via the GUI, you can click the
'Apply Configuration' button. This will move changes from the staging
`interfaces.new` file to `/etc/network/interfaces` and apply them live.
If you made manual changes directly to the `/etc/network/interfaces` file, you
can apply them by running `ifreload -a`
NOTE: If you installed {pve} on top of Debian, or upgraded to {pve} 7.0 from an
older {pve} installation, make sure 'ifupdown2' is installed: `apt install
ifupdown2`
Reboot Node to Apply
^^^^^^^^^^^^^^^^^^^^
Another way to apply a new network configuration is to reboot the node.
In that case the systemd service `pvenetcommit` will activate the staging
`interfaces.new` file before the `networking` service will apply that
configuration.
Naming Conventions
~~~~~~~~~~~~~~~~~~
We currently use the following naming conventions for device names:
* Ethernet devices: en*, systemd network interface names. This naming scheme is
used for new {pve} installations since version 5.0.
* Ethernet devices: eth[N], where 0 ≤ N (`eth0`, `eth1`, ...) This naming
scheme is used for {pve} hosts which were installed before the 5.0
release. When upgrading to 5.0, the names are kept as-is.
* Bridge names: vmbr[N], where 0 ≤ N ≤ 4094 (`vmbr0` - `vmbr4094`)
* Bonds: bond[N], where 0 ≤ N (`bond0`, `bond1`, ...)
* VLANs: Simply add the VLAN number to the device name,
separated by a period (`eno1.50`, `bond1.30`)
This makes it easier to debug networks problems, because the device
name implies the device type.
Systemd Network Interface Names
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Systemd uses the two character prefix 'en' for Ethernet network
devices. The next characters depends on the device driver and the fact
which schema matches first.
* o<index>[n<phys_port_name>|d<dev_port>] — devices on board
* s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — device by hotplug id
* [P<domain>]p<bus>s<slot>[f<function>][n<phys_port_name>|d<dev_port>] — devices by bus id
* x<MAC> — device by MAC address
The most common patterns are:
* eno1 — is the first on board NIC
* enp3s0f1 — is the NIC on pcibus 3 slot 0 and use the NIC function 1.
For more information see https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/[Predictable Network Interface Names].
Choosing a network configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Depending on your current network organization and your resources you can
choose either a bridged, routed, or masquerading networking setup.
{pve} server in a private LAN, using an external gateway to reach the internet
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The *Bridged* model makes the most sense in this case, and this is also
the default mode on new {pve} installations.
Each of your Guest system will have a virtual interface attached to the
{pve} bridge. This is similar in effect to having the Guest network card
directly connected to a new switch on your LAN, the {pve} host playing the role
of the switch.
{pve} server at hosting provider, with public IP ranges for Guests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
For this setup, you can use either a *Bridged* or *Routed* model, depending on
what your provider allows.
{pve} server at hosting provider, with a single public IP address
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In that case the only way to get outgoing network accesses for your guest
systems is to use *Masquerading*. For incoming network access to your guests,
you will need to configure *Port Forwarding*.
For further flexibility, you can configure
VLANs (IEEE 802.1q) and network bonding, also known as "link
aggregation". That way it is possible to build complex and flexible
virtual networks.
Default Configuration using a Bridge
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[thumbnail="default-network-setup-bridge.svg"]
Bridges are like physical network switches implemented in software.
All virtual guests can share a single bridge, or you can create multiple
bridges to separate network domains. Each host can have up to 4094 bridges.
The installation program creates a single bridge named `vmbr0`, which
is connected to the first Ethernet card. The corresponding
configuration in `/etc/network/interfaces` might look like this:
----
auto lo
iface lo inet loopback
iface eno1 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.168.10.2/24
gateway 192.168.10.1
bridge-ports eno1
bridge-stp off
bridge-fd 0
----
Virtual machines behave as if they were directly connected to the
physical network. The network, in turn, sees each virtual machine as
having its own MAC, even though there is only one network cable
connecting all of these VMs to the network.
[[sysadmin_network_routed]]
Routed Configuration
~~~~~~~~~~~~~~~~~~~~
Most hosting providers do not support the above setup. For security
reasons, they disable networking as soon as they detect multiple MAC
addresses on a single interface.
TIP: Some providers allow you to register additional MACs through their
management interface. This avoids the problem, but can be clumsy to
configure because you need to register a MAC for each of your VMs.
You can avoid the problem by ``routing'' all traffic via a single
interface. This makes sure that all network packets use the same MAC
address.
[thumbnail="default-network-setup-routed.svg"]
A common scenario is that you have a public IP (assume `198.51.100.5`
for this example), and an additional IP block for your VMs
(`203.0.113.16/28`). We recommend the following setup for such
situations:
----
auto lo
iface lo inet loopback
auto eno0
iface eno0 inet static
address 198.51.100.5/29
gateway 198.51.100.1
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up echo 1 > /proc/sys/net/ipv4/conf/eno0/proxy_arp
auto vmbr0
iface vmbr0 inet static
address 203.0.113.17/28
bridge-ports none
bridge-stp off
bridge-fd 0
----
[[sysadmin_network_masquerading]]
Masquerading (NAT) with `iptables`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Masquerading allows guests having only a private IP address to access the
network by using the host IP address for outgoing traffic. Each outgoing
packet is rewritten by `iptables` to appear as originating from the host,
and responses are rewritten accordingly to be routed to the original sender.
----
auto lo
iface lo inet loopback
auto eno1
#real IP address
iface eno1 inet static
address 198.51.100.5/24
gateway 198.51.100.1
auto vmbr0
#private sub network
iface vmbr0 inet static
address 10.10.10.1/24
bridge-ports none
bridge-stp off
bridge-fd 0
post-up echo 1 > /proc/sys/net/ipv4/ip_forward
post-up iptables -t nat -A POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
post-down iptables -t nat -D POSTROUTING -s '10.10.10.0/24' -o eno1 -j MASQUERADE
----
NOTE: In some masquerade setups with firewall enabled, conntrack zones might be
needed for outgoing connections. Otherwise the firewall could block outgoing
connections since they will prefer the `POSTROUTING` of the VM bridge (and not
`MASQUERADE`).
Adding these lines in the `/etc/network/interfaces` can fix this problem:
----
post-up iptables -t raw -I PREROUTING -i fwbr+ -j CT --zone 1
post-down iptables -t raw -D PREROUTING -i fwbr+ -j CT --zone 1
----
For more information about this, refer to the following links:
https://commons.wikimedia.org/wiki/File:Netfilter-packet-flow.svg[Netfilter Packet Flow]
https://lwn.net/Articles/370152/[Patch on netdev-list introducing conntrack zones]
https://web.archive.org/web/20220610151210/https://blog.lobraun.de/2019/05/19/prox/[Blog post with a good explanation by using TRACE in the raw table]
[[sysadmin_network_bond]]
Linux Bond
~~~~~~~~~~
Bonding (also called NIC teaming or Link Aggregation) is a technique
for binding multiple NIC's to a single network device. It is possible
to achieve different goals, like make the network fault-tolerant,
increase the performance or both together.
High-speed hardware like Fibre Channel and the associated switching
hardware can be quite expensive. By doing link aggregation, two NICs
can appear as one logical interface, resulting in double speed. This
is a native Linux kernel feature that is supported by most
switches. If your nodes have multiple Ethernet ports, you can
distribute your points of failure by running network cables to
different switches and the bonded connection will failover to one
cable or the other in case of network trouble.
Aggregated links can improve live-migration delays and improve the
speed of replication of data between Proxmox VE Cluster nodes.
There are 7 modes for bonding:
* *Round-robin (balance-rr):* Transmit network packets in sequential
order from the first available network interface (NIC) slave through
the last. This mode provides load balancing and fault tolerance.
* *Active-backup (active-backup):* Only one NIC slave in the bond is
active. A different slave becomes active if, and only if, the active
slave fails. The single logical bonded interface's MAC address is
externally visible on only one NIC (port) to avoid distortion in the
network switch. This mode provides fault tolerance.
* *XOR (balance-xor):* Transmit network packets based on [(source MAC
address XOR'd with destination MAC address) modulo NIC slave
count]. This selects the same NIC slave for each destination MAC
address. This mode provides load balancing and fault tolerance.
* *Broadcast (broadcast):* Transmit network packets on all slave
network interfaces. This mode provides fault tolerance.
* *IEEE 802.3ad Dynamic link aggregation (802.3ad)(LACP):* Creates
aggregation groups that share the same speed and duplex
settings. Utilizes all slave network interfaces in the active
aggregator group according to the 802.3ad specification.
* *Adaptive transmit load balancing (balance-tlb):* Linux bonding
driver mode that does not require any special network-switch
support. The outgoing network packet traffic is distributed according
to the current load (computed relative to the speed) on each network
interface slave. Incoming traffic is received by one currently
designated slave network interface. If this receiving slave fails,
another slave takes over the MAC address of the failed receiving
slave.
* *Adaptive load balancing (balance-alb):* Includes balance-tlb plus receive
load balancing (rlb) for IPV4 traffic, and does not require any
special network switch support. The receive load balancing is achieved
by ARP negotiation. The bonding driver intercepts the ARP Replies sent
by the local system on their way out and overwrites the source
hardware address with the unique hardware address of one of the NIC
slaves in the single logical bonded interface such that different
network-peers use different MAC addresses for their network packet
traffic.
If your switch support the LACP (IEEE 802.3ad) protocol then we recommend using
the corresponding bonding mode (802.3ad). Otherwise you should generally use the
active-backup mode. +
// http://lists.linux-ha.org/pipermail/linux-ha/2013-January/046295.html
If you intend to run your cluster network on the bonding interfaces, then you
have to use active-passive mode on the bonding interfaces, other modes are
unsupported.
The following bond configuration can be used as distributed/shared
storage network. The benefit would be that you get more speed and the
network will be fault-tolerant.
.Example: Use bond with fixed IP address
----
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
iface eno3 inet manual
auto bond0
iface bond0 inet static
bond-slaves eno1 eno2
address 192.168.1.2/24
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
auto vmbr0
iface vmbr0 inet static
address 10.10.10.2/24
gateway 10.10.10.1
bridge-ports eno3
bridge-stp off
bridge-fd 0
----
[thumbnail="default-network-setup-bond.svg"]
Another possibility it to use the bond directly as bridge port.
This can be used to make the guest network fault-tolerant.
.Example: Use a bond as bridge port
----
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet manual
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
auto vmbr0
iface vmbr0 inet static
address 10.10.10.2/24
gateway 10.10.10.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
----
[[sysadmin_network_vlan]]
VLAN 802.1Q
~~~~~~~~~~~
A virtual LAN (VLAN) is a broadcast domain that is partitioned and
isolated in the network at layer two. So it is possible to have
multiple networks (4096) in a physical network, each independent of
the other ones.
Each VLAN network is identified by a number often called 'tag'.
Network packages are then 'tagged' to identify which virtual network
they belong to.
VLAN for Guest Networks
^^^^^^^^^^^^^^^^^^^^^^^
{pve} supports this setup out of the box. You can specify the VLAN tag
when you create a VM. The VLAN tag is part of the guest network
configuration. The networking layer supports different modes to
implement VLANs, depending on the bridge configuration:
* *VLAN awareness on the Linux bridge:*
In this case, each guest's virtual network card is assigned to a VLAN tag,
which is transparently supported by the Linux bridge.
Trunk mode is also possible, but that makes configuration
in the guest necessary.
* *"traditional" VLAN on the Linux bridge:*
In contrast to the VLAN awareness method, this method is not transparent
and creates a VLAN device with associated bridge for each VLAN.
That is, creating a guest on VLAN 5 for example, would create two
interfaces eno1.5 and vmbr0v5, which would remain until a reboot occurs.
* *Open vSwitch VLAN:*
This mode uses the OVS VLAN feature.
* *Guest configured VLAN:*
VLANs are assigned inside the guest. In this case, the setup is
completely done inside the guest and can not be influenced from the
outside. The benefit is that you can use more than one VLAN on a
single virtual NIC.
VLAN on the Host
^^^^^^^^^^^^^^^^
To allow host communication with an isolated network. It is possible
to apply VLAN tags to any network device (NIC, Bond, Bridge). In
general, you should configure the VLAN on the interface with the least
abstraction layers between itself and the physical NIC.
For example, in a default configuration where you want to place
the host management address on a separate VLAN.
.Example: Use VLAN 5 for the {pve} management IP with traditional Linux bridge
----
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno1.5 inet manual
auto vmbr0v5
iface vmbr0v5 inet static
address 10.10.10.2/24
gateway 10.10.10.1
bridge-ports eno1.5
bridge-stp off
bridge-fd 0
auto vmbr0
iface vmbr0 inet manual
bridge-ports eno1
bridge-stp off
bridge-fd 0
----
.Example: Use VLAN 5 for the {pve} management IP with VLAN aware Linux bridge
----
auto lo
iface lo inet loopback
iface eno1 inet manual
auto vmbr0.5
iface vmbr0.5 inet static
address 10.10.10.2/24
gateway 10.10.10.1
auto vmbr0
iface vmbr0 inet manual
bridge-ports eno1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
----
The next example is the same setup but a bond is used to
make this network fail-safe.
.Example: Use VLAN 5 with bond0 for the {pve} management IP with traditional Linux bridge
----
auto lo
iface lo inet loopback
iface eno1 inet manual
iface eno2 inet manual
auto bond0
iface bond0 inet manual
bond-slaves eno1 eno2
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
iface bond0.5 inet manual
auto vmbr0v5
iface vmbr0v5 inet static
address 10.10.10.2/24
gateway 10.10.10.1
bridge-ports bond0.5
bridge-stp off
bridge-fd 0
auto vmbr0
iface vmbr0 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
----
Disabling IPv6 on the Node
~~~~~~~~~~~~~~~~~~~~~~~~~~
{pve} works correctly in all environments, irrespective of whether IPv6 is
deployed or not. We recommend leaving all settings at the provided defaults.
Should you still need to disable support for IPv6 on your node, do so by
creating an appropriate `sysctl.conf (5)` snippet file and setting the proper
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt[sysctls],
for example adding `/etc/sysctl.d/disable-ipv6.conf` with content:
----
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
----
This method is preferred to disabling the loading of the IPv6 module on the
https://www.kernel.org/doc/Documentation/networking/ipv6.rst[kernel commandline].
Disabling MAC Learning on a Bridge
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
By default, MAC learning is enabled on a bridge to ensure a smooth experience
with virtual guests and their networks.
But in some environments this can be undesired. Since {pve} 7.3 you can disable
MAC learning on the bridge by setting the `bridge-disable-mac-learning 1`
configuration on a bridge in `/etc/network/interfaces', for example:
----
# ...
auto vmbr0
iface vmbr0 inet static
address 10.10.10.2/24
gateway 10.10.10.1
bridge-ports ens18
bridge-stp off
bridge-fd 0
bridge-disable-mac-learning 1
----
Once enabled, {pve} will manually add the configured MAC address from VMs and
Containers to the bridges forwarding database to ensure that guest can still
use the network - but only when they are using their actual MAC address.
////
TODO: explain IPv6 support?
TODO: explain OVS
////