2005-04-17 02:20:36 +04:00
#
# Network configuration
#
2008-07-30 14:14:01 +04:00
menuconfig NET
2005-04-17 02:20:36 +04:00
bool "Networking support"
2009-03-04 09:53:30 +03:00
select NLATTR
2013-06-04 20:46:26 +04:00
select GENERIC_NET_UTILS
2014-10-24 05:41:08 +04:00
select BPF
2005-04-17 02:20:36 +04:00
---help---
Unless you really know what you are doing, you should say Y here.
The reason is that some programs need kernel networking support even
when running on a stand-alone machine that isn't connected to any
2005-07-12 08:03:49 +04:00
other computer.
If you are upgrading from an older kernel, you
2005-04-17 02:20:36 +04:00
should consider updating your networking tools too because changes
in the kernel and the tools often go hand in hand. The tools are
contained in the package net-tools, the location and version number
of which are given in <file:Documentation/Changes>.
For a general introduction to Linux networking, it is highly
recommended to read the NET-HOWTO, available from
<http://www.tldp.org/docs.html#howto>.
2005-07-12 08:13:56 +04:00
if NET
2005-04-17 02:20:36 +04:00
net/compat/wext: send different messages to compat tasks
Wireless extensions have the unfortunate problem that events
are multicast netlink messages, and are not independent of
pointer size. Thus, currently 32-bit tasks on 64-bit platforms
cannot properly receive events and fail with all kinds of
strange problems, for instance wpa_supplicant never notices
disassociations, due to the way the 64-bit event looks (to a
32-bit process), the fact that the address is all zeroes is
lost, it thinks instead it is 00:00:00:00:01:00.
The same problem existed with the ioctls, until David Miller
fixed those some time ago in an heroic effort.
A different problem caused by this is that we cannot send the
ASSOCREQIE/ASSOCRESPIE events because sending them causes a
32-bit wpa_supplicant on a 64-bit system to overwrite its
internal information, which is worse than it not getting the
information at all -- so we currently resort to sending a
custom string event that it then parses. This, however, has a
severe size limitation we are frequently hitting with modern
access points; this limitation would can be lifted after this
patch by sending the correct binary, not custom, event.
A similar problem apparently happens for some other netlink
users on x86_64 with 32-bit tasks due to the alignment for
64-bit quantities.
In order to fix these problems, I have implemented a way to
send compat messages to tasks. When sending an event, we send
the non-compat event data together with a compat event data in
skb_shinfo(main_skb)->frag_list. Then, when the event is read
from the socket, the netlink code makes sure to pass out only
the skb that is compatible with the task. This approach was
suggested by David Miller, my original approach required
always sending two skbs but that had various small problems.
To determine whether compat is needed or not, I have used the
MSG_CMSG_COMPAT flag, and adjusted the call path for recv and
recvfrom to include it, even if those calls do not have a cmsg
parameter.
I have not solved one small part of the problem, and I don't
think it is necessary to: if a 32-bit application uses read()
rather than any form of recvmsg() it will still get the wrong
(64-bit) event. However, neither do applications actually do
this, nor would it be a regression.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-01 15:26:02 +04:00
config WANT_COMPAT_NETLINK_MESSAGES
bool
help
This option can be selected by other options that need compat
netlink messages.
config COMPAT_NETLINK_MESSAGES
def_bool y
depends on COMPAT
2010-07-27 00:13:49 +04:00
depends on WEXT_CORE || WANT_COMPAT_NETLINK_MESSAGES
net/compat/wext: send different messages to compat tasks
Wireless extensions have the unfortunate problem that events
are multicast netlink messages, and are not independent of
pointer size. Thus, currently 32-bit tasks on 64-bit platforms
cannot properly receive events and fail with all kinds of
strange problems, for instance wpa_supplicant never notices
disassociations, due to the way the 64-bit event looks (to a
32-bit process), the fact that the address is all zeroes is
lost, it thinks instead it is 00:00:00:00:01:00.
The same problem existed with the ioctls, until David Miller
fixed those some time ago in an heroic effort.
A different problem caused by this is that we cannot send the
ASSOCREQIE/ASSOCRESPIE events because sending them causes a
32-bit wpa_supplicant on a 64-bit system to overwrite its
internal information, which is worse than it not getting the
information at all -- so we currently resort to sending a
custom string event that it then parses. This, however, has a
severe size limitation we are frequently hitting with modern
access points; this limitation would can be lifted after this
patch by sending the correct binary, not custom, event.
A similar problem apparently happens for some other netlink
users on x86_64 with 32-bit tasks due to the alignment for
64-bit quantities.
In order to fix these problems, I have implemented a way to
send compat messages to tasks. When sending an event, we send
the non-compat event data together with a compat event data in
skb_shinfo(main_skb)->frag_list. Then, when the event is read
from the socket, the netlink code makes sure to pass out only
the skb that is compatible with the task. This approach was
suggested by David Miller, my original approach required
always sending two skbs but that had various small problems.
To determine whether compat is needed or not, I have used the
MSG_CMSG_COMPAT flag, and adjusted the call path for recv and
recvfrom to include it, even if those calls do not have a cmsg
parameter.
I have not solved one small part of the problem, and I don't
think it is necessary to: if a 32-bit application uses read()
rather than any form of recvmsg() it will still get the wrong
(64-bit) event. However, neither do applications actually do
this, nor would it be a regression.
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-07-01 15:26:02 +04:00
help
This option makes it possible to send different netlink messages
to tasks depending on whether the task is a compat task or not. To
achieve this, you need to set skb_shinfo(skb)->frag_list to the
compat skb before sending the skb, the netlink code will sort out
which message to actually pass to the task.
Newly written code should NEVER need this option but do
compat-independent messages instead!
2015-05-13 19:19:37 +03:00
config NET_INGRESS
bool
net, sched: add clsact qdisc
This work adds a generalization of the ingress qdisc as a qdisc holding
only classifiers. The clsact qdisc works on ingress, but also on egress.
In both cases, it's execution happens without taking the qdisc lock, and
the main difference for the egress part compared to prior version of [1]
is that this can be applied with _any_ underlying real egress qdisc (also
classless ones).
Besides solving the use-case of [1], that is, allowing for more programmability
on assigning skb->priority for the mqprio case that is supported by most
popular 10G+ NICs, it also opens up a lot more flexibility for other tc
applications. The main work on classification can already be done at clsact
egress time if the use-case allows and state stored for later retrieval
f.e. again in skb->priority with major/minors (which is checked by most
classful qdiscs before consulting tc_classify()) and/or in other skb fields
like skb->tc_index for some light-weight post-processing to get to the
eventual classid in case of a classful qdisc. Another use case is that
the clsact egress part allows to have a central egress counterpart to
the ingress classifiers, so that classifiers can easily share state (e.g.
in cls_bpf via eBPF maps) for ingress and egress.
Currently, default setups like mq + pfifo_fast would require for this to
use, for example, prio qdisc instead (to get a tc_classify() run) and to
duplicate the egress classifier for each queue. With clsact, it allows
for leaving the setup as is, it can additionally assign skb->priority to
put the skb in one of pfifo_fast's bands and it can share state with maps.
Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
w/o the need to perform a skb_dst_force() to hold on to it any longer. In
lwt case, we can also use this facility to setup dst metadata via cls_bpf
(bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
that (case of IFF_NO_QUEUE devices, for example).
The realization can be done without any changes to the scheduler core
framework. All it takes is that we have two a-priori defined minors/child
classes, where we can mux between ingress and egress classifier list
(dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
dev->_tx to avoid extra cacheline miss for moderate loads). The egress
part is a bit similar modelled to handle_ing() and patched to a noop in
case the functionality is not used. Both handlers are now called
sch_handle_ingress() and sch_handle_egress(), code sharing among the two
doesn't seem practical as there are various minor differences in both
paths, so that making them conditional in a single handler would rather
slow things down.
Full compatibility to ingress qdisc is provided as well. Since both
piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
per netdevice, and thus ingress qdisc specific behaviour can be retained
for user space. This means, either a user does 'tc qdisc add dev foo ingress'
and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
alternative, where both, ingress and egress classifier can be configured
as in the below example. ingress qdisc supports attaching classifier to any
minor number whereas clsact has two fixed minors for muxing between the
lists, therefore to not break user space setups, they are better done as
two separate qdiscs.
I decided to extend the sch_ingress module with clsact functionality so
that commonly used code can be reused, the module is being aliased with
sch_clsact so that it can be auto-loaded properly. Alternative would have been
to add a flag when initializing ingress to alter its behaviour plus aliasing
to a different name (as it's more than just ingress). However, the first would
end up, based on the flag, choosing the new/old behaviour by calling different
function implementations to handle each anyway, the latter would require to
register ingress qdisc once again under different alias. So, this really begs
to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
by its own that share callbacks used by both.
Example, adding qdisc:
# tc qdisc add dev foo clsact
# tc qdisc show dev foo
qdisc mq 0: root
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc clsact ffff: parent ffff:fff1
Adding filters (deleting, etc works analogous by specifying ingress/egress):
# tc filter add dev foo ingress bpf da obj bar.o sec ingress
# tc filter add dev foo egress bpf da obj bar.o sec egress
# tc filter show dev foo ingress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
# tc filter show dev foo egress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action
A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
show an empty list for clsact. Either using the parent names (ingress/egress)
or specifying the full major/minor will then show the related filter lists.
Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.
[1] http://patchwork.ozlabs.org/patch/512949/
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-01-08 00:29:47 +03:00
config NET_EGRESS
bool
2005-07-12 08:13:56 +04:00
menu "Networking options"
2005-04-17 02:20:36 +04:00
2005-07-12 08:13:56 +04:00
source "net/packet/Kconfig"
source "net/unix/Kconfig"
source "net/xfrm/Kconfig"
2007-02-09 00:37:42 +03:00
source "net/iucv/Kconfig"
2005-04-17 02:20:36 +04:00
config INET
bool "TCP/IP networking"
2012-09-04 22:20:14 +04:00
select CRYPTO
select CRYPTO_AES
2005-04-17 02:20:36 +04:00
---help---
These are the protocols used on the Internet and on most local
Ethernets. It is highly recommended to say Y here (this will enlarge
2008-02-12 11:35:16 +03:00
your kernel by about 400 KB), since some programs (e.g. the X window
2005-04-17 02:20:36 +04:00
system) use TCP/IP even if your machine is not connected to any
other computer. You will get the so-called loopback device which
allows you to ping yourself (great fun, that!).
For an excellent introduction to Linux networking, please read the
Linux Networking HOWTO, available from
<http://www.tldp.org/docs.html#howto>.
If you say Y here and also to "/proc file system support" and
"Sysctl support" below, you can change various aspects of the
behavior of the TCP/IP code by writing to the (virtual) files in
/proc/sys/net/ipv4/*; the options are explained in the file
<file:Documentation/networking/ip-sysctl.txt>.
Short answer: say Y.
2005-07-12 08:13:56 +04:00
if INET
2005-04-17 02:20:36 +04:00
source "net/ipv4/Kconfig"
source "net/ipv6/Kconfig"
2006-11-06 03:44:06 +03:00
source "net/netlabel/Kconfig"
2005-04-17 02:20:36 +04:00
2005-07-12 08:13:56 +04:00
endif # if INET
2006-06-09 11:29:17 +04:00
config NETWORK_SECMARK
bool "Security Marking"
help
This enables security marking of network packets, similar
to nfmark, but designated for security purposes.
If you are unsure how to answer this question, answer N.
2014-04-01 18:20:23 +04:00
config NET_PTP_CLASSIFY
def_bool n
2010-07-17 12:49:36 +04:00
config NETWORK_PHY_TIMESTAMPING
bool "Timestamping in PHY devices"
2014-04-01 18:20:23 +04:00
select NET_PTP_CLASSIFY
2010-07-17 12:49:36 +04:00
help
This allows timestamping of network packets by PHYs with
hardware timestamping capabilities. This option adds some
overhead in the transmit and receive paths.
If you are unsure how to answer this question, answer N.
2005-04-17 02:20:36 +04:00
menuconfig NETFILTER
2006-11-29 04:35:43 +03:00
bool "Network packet filtering framework (Netfilter)"
2005-04-17 02:20:36 +04:00
---help---
Netfilter is a framework for filtering and mangling network packets
that pass through your Linux box.
The most common use of packet filtering is to run your Linux box as
a firewall protecting a local network from the Internet. The type of
firewall provided by this kernel support is called a "packet
filter", which means that it can reject individual network packets
based on type, source, destination etc. The other kind of firewall,
a "proxy-based" one, is more secure but more intrusive and more
bothersome to set up; it inspects the network traffic much more
closely, modifies it and has knowledge about the higher level
protocols, which a packet filter lacks. Moreover, proxy-based
firewalls often require changes to the programs running on the local
clients. Proxy-based firewalls don't need support by the kernel, but
they are often combined with a packet filter, which only works if
you say Y here.
You should also say Y here if you intend to use your Linux box as
the gateway to the Internet for a local network of machines without
globally valid IP addresses. This is called "masquerading": if one
of the computers on your local network wants to send something to
the outside, your box can "masquerade" as that computer, i.e. it
forwards the traffic to the intended outside destination, but
modifies the packets to make it look like they came from the
firewall box itself. It works both ways: if the outside host
replies, the Linux box will silently forward the traffic to the
correct local computer. This way, the computers on your local net
are completely invisible to the outside world, even though they can
reach the outside and can receive replies. It is even possible to
run globally visible servers from within a masqueraded local network
using a mechanism called portforwarding. Masquerading is also often
called NAT (Network Address Translation).
Another use of Netfilter is in transparent proxying: if a machine on
the local network tries to connect to an outside host, your Linux
box can transparently forward the traffic to a local server,
typically a caching proxy server.
Yet another use of Netfilter is building a bridging firewall. Using
a bridge with Network packet filtering enabled makes iptables "see"
the bridged traffic. For filtering on the lower network and Ethernet
protocols over the bridge, use ebtables (under bridge netfilter
configuration).
Various modules exist for netfilter which replace the previous
masquerading (ipmasqadm), packet filtering (ipchains), transparent
proxying, and portforwarding mechanisms. Please see
<file:Documentation/Changes> under "iptables" for the location of
these packages.
if NETFILTER
config NETFILTER_DEBUG
bool "Network packet filtering debugging"
depends on NETFILTER
help
You can say Y here if you want to get additional messages useful in
debugging the netfilter code.
2007-12-18 09:47:05 +03:00
config NETFILTER_ADVANCED
bool "Advanced netfilter configuration"
depends on NETFILTER
default y
help
If you say Y here you can select between all the netfilter modules.
2009-01-26 13:12:25 +03:00
If you say N the more unusual ones will not be shown and the
2007-12-18 09:47:05 +03:00
basic ones needed by most people will default to 'M'.
If unsure, say Y.
2005-04-17 02:20:36 +04:00
config BRIDGE_NETFILTER
2014-09-18 13:29:03 +04:00
tristate "Bridged IP/ARP packets filtering"
2014-09-30 12:59:18 +04:00
depends on BRIDGE
2014-09-18 13:29:03 +04:00
depends on NETFILTER && INET
2007-12-18 09:47:05 +03:00
depends on NETFILTER_ADVANCED
2014-09-18 13:29:03 +04:00
default m
2005-04-17 02:20:36 +04:00
---help---
Enabling this option will let arptables resp. iptables see bridged
ARP resp. IP traffic. If you want a bridging firewall, you probably
want this option enabled.
Enabling or disabling this option doesn't enable or disable
ebtables.
If unsure, say N.
2005-09-17 11:41:21 +04:00
source "net/netfilter/Kconfig"
2005-04-17 02:20:36 +04:00
source "net/ipv4/netfilter/Kconfig"
source "net/ipv6/netfilter/Kconfig"
source "net/decnet/netfilter/Kconfig"
source "net/bridge/netfilter/Kconfig"
endif
2005-08-10 07:14:34 +04:00
source "net/dccp/Kconfig"
2005-04-17 02:20:36 +04:00
source "net/sctp/Kconfig"
2009-02-24 18:30:39 +03:00
source "net/rds/Kconfig"
2006-01-16 18:39:13 +03:00
source "net/tipc/Kconfig"
2005-07-12 08:13:56 +04:00
source "net/atm/Kconfig"
2010-04-02 10:18:33 +04:00
source "net/l2tp/Kconfig"
2008-07-06 08:25:39 +04:00
source "net/802/Kconfig"
2005-07-12 08:13:56 +04:00
source "net/bridge/Kconfig"
net: Distributed Switch Architecture protocol support
Distributed Switch Architecture is a protocol for managing hardware
switch chips. It consists of a set of MII management registers and
commands to configure the switch, and an ethernet header format to
signal which of the ports of the switch a packet was received from
or is intended to be sent to.
The switches that this driver supports are typically embedded in
access points and routers, and a typical setup with a DSA switch
looks something like this:
+-----------+ +-----------+
| | RGMII | |
| +-------+ +------ 1000baseT MDI ("WAN")
| | | 6-port +------ 1000baseT MDI ("LAN1")
| CPU | | ethernet +------ 1000baseT MDI ("LAN2")
| |MIImgmt| switch +------ 1000baseT MDI ("LAN3")
| +-------+ w/5 PHYs +------ 1000baseT MDI ("LAN4")
| | | |
+-----------+ +-----------+
The switch driver presents each port on the switch as a separate
network interface to Linux, polls the switch to maintain software
link state of those ports, forwards MII management interface
accesses to those network interfaces (e.g. as done by ethtool) to
the switch, and exposes the switch's hardware statistics counters
via the appropriate Linux kernel interfaces.
This initial patch supports the MII management interface register
layout of the Marvell 88E6123, 88E6161 and 88E6165 switch chips, and
supports the "Ethertype DSA" packet tagging format.
(There is no officially registered ethertype for the Ethertype DSA
packet format, so we just grab a random one. The ethertype to use
is programmed into the switch, and the switch driver uses the value
of ETH_P_EDSA for this, so this define can be changed at any time in
the future if the one we chose is allocated to another protocol or
if Ethertype DSA gets its own officially registered ethertype, and
everything will continue to work.)
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>
Tested-by: Nicolas Pitre <nico@marvell.com>
Tested-by: Byron Bradley <byron.bbradley@gmail.com>
Tested-by: Tim Ellis <tim.ellis@mac.com>
Tested-by: Peter van Valderen <linux@ddcrew.com>
Tested-by: Dirk Teurlings <dirk@upexia.nl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-07 17:44:02 +04:00
source "net/dsa/Kconfig"
2005-07-12 08:13:56 +04:00
source "net/8021q/Kconfig"
2005-04-17 02:20:36 +04:00
source "net/decnet/Kconfig"
source "net/llc/Kconfig"
source "net/ipx/Kconfig"
source "drivers/net/appletalk/Kconfig"
2005-07-12 08:13:56 +04:00
source "net/x25/Kconfig"
source "net/lapb/Kconfig"
2009-01-23 06:00:25 +03:00
source "net/phonet/Kconfig"
2014-07-11 12:24:18 +04:00
source "net/6lowpan/Kconfig"
2009-06-08 16:18:48 +04:00
source "net/ieee802154/Kconfig"
2012-05-16 00:50:20 +04:00
source "net/mac802154/Kconfig"
2005-04-17 02:20:36 +04:00
source "net/sched/Kconfig"
2008-11-21 07:52:10 +03:00
source "net/dcb/Kconfig"
2010-08-04 18:16:33 +04:00
source "net/dns_resolver/Kconfig"
2010-12-13 14:19:28 +03:00
source "net/batman-adv/Kconfig"
2011-10-26 06:26:31 +04:00
source "net/openvswitch/Kconfig"
VSOCK: Introduce VM Sockets
VM Sockets allows communication between virtual machines and the hypervisor.
User level applications both in a virtual machine and on the host can use the
VM Sockets API, which facilitates fast and efficient communication between
guest virtual machines and their host. A socket address family, designed to be
compatible with UDP and TCP at the interface level, is provided.
Today, VM Sockets is used by various VMware Tools components inside the guest
for zero-config, network-less access to VMware host services. In addition to
this, VMware's users are using VM Sockets for various applications, where
network access of the virtual machine is restricted or non-existent. Examples
of this are VMs communicating with device proxies for proprietary hardware
running as host applications and automated testing of applications running
within virtual machines.
The VMware VM Sockets are similar to other socket types, like Berkeley UNIX
socket interface. The VM Sockets module supports both connection-oriented
stream sockets like TCP, and connectionless datagram sockets like UDP. The VM
Sockets protocol family is defined as "AF_VSOCK" and the socket operations
split for SOCK_DGRAM and SOCK_STREAM.
For additional information about the use of VM Sockets, please refer to the
VM Sockets Programming Guide available at:
https://www.vmware.com/support/developer/vmci-sdk/
Signed-off-by: George Zhang <georgezhang@vmware.com>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
Signed-off-by: Andy king <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-02-06 18:23:56 +04:00
source "net/vmw_vsock/Kconfig"
2013-03-21 20:33:48 +04:00
source "net/netlink/Kconfig"
2013-05-24 01:02:52 +04:00
source "net/mpls/Kconfig"
2013-10-31 00:10:47 +04:00
source "net/hsr/Kconfig"
2014-11-28 16:34:17 +03:00
source "net/switchdev/Kconfig"
2015-09-30 06:07:11 +03:00
source "net/l3mdev/Kconfig"
2016-05-06 17:09:08 +03:00
source "net/qrtr/Kconfig"
net/ncsi: Resource management
NCSI spec (DSP0222) defines several objects: package, channel, mode,
filter, version and statistics etc. This introduces the data structs
to represent those objects and implement functions to manage them.
Also, this introduces CONFIG_NET_NCSI for the newly implemented NCSI
stack.
* The user (e.g. netdev driver) dereference NCSI device by
"struct ncsi_dev", which is embedded to "struct ncsi_dev_priv".
The later one is used by NCSI stack internally.
* Every NCSI device can have multiple packages simultaneously, up
to 8 packages. It's represented by "struct ncsi_package" and
identified by 3-bits ID.
* Every NCSI package can have multiple channels, up to 32. It's
represented by "struct ncsi_channel" and identified by 5-bits ID.
* Every NCSI channel has version, statistics, various modes and
filters. They are represented by "struct ncsi_channel_version",
"struct ncsi_channel_stats", "struct ncsi_channel_mode" and
"struct ncsi_channel_filter" separately.
* Apart from AEN (Asynchronous Event Notification), the NCSI stack
works in terms of command and response. This introduces "struct
ncsi_req" to represent a complete NCSI transaction made of NCSI
request and response.
link: https://www.dmtf.org/sites/default/files/standards/documents/DSP0222_1.1.0.pdf
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-19 04:54:16 +03:00
source "net/ncsi/Kconfig"
2005-04-17 02:20:36 +04:00
2010-03-24 22:13:54 +03:00
config RPS
2014-12-20 23:41:11 +03:00
bool
2013-11-22 02:32:01 +04:00
depends on SMP && SYSFS
2010-03-24 22:13:54 +03:00
default y
2011-01-19 14:03:53 +03:00
config RFS_ACCEL
2014-12-20 23:41:11 +03:00
bool
2013-08-30 11:39:53 +04:00
depends on RPS
2011-01-19 14:03:53 +03:00
select CPU_RMAP
default y
2010-11-26 11:36:09 +03:00
config XPS
2014-12-20 23:41:11 +03:00
bool
2013-11-22 02:32:01 +04:00
depends on SMP
2010-11-26 11:36:09 +03:00
default y
2016-03-14 11:39:04 +03:00
config HWBM
bool
2015-12-08 01:38:52 +03:00
config SOCK_CGROUP_DATA
bool
default n
2013-12-29 20:27:11 +04:00
config CGROUP_NET_PRIO
2014-02-08 19:36:58 +04:00
bool "Network priority cgroup"
2011-11-22 09:10:51 +04:00
depends on CGROUPS
2015-12-08 01:38:52 +03:00
select SOCK_CGROUP_DATA
2011-11-22 09:10:51 +04:00
---help---
Cgroup subsystem for use in assigning processes to network priorities on
2013-12-29 20:27:11 +04:00
a per-interface basis.
2011-11-22 09:10:51 +04:00
2013-12-29 21:27:10 +04:00
config CGROUP_NET_CLASSID
2014-12-20 23:41:11 +03:00
bool "Network classid cgroup"
2013-12-29 21:27:10 +04:00
depends on CGROUPS
2015-12-08 01:38:52 +03:00
select SOCK_CGROUP_DATA
2013-12-29 21:27:10 +04:00
---help---
Cgroup subsystem for use as general purpose socket classid marker that is
being used in cls_cgroup and for netfilter matching.
2013-08-01 07:10:25 +04:00
config NET_RX_BUSY_POLL
2014-12-20 23:41:11 +03:00
bool
2013-06-14 17:33:46 +04:00
default y
2013-06-10 12:39:50 +04:00
2011-11-28 20:33:09 +04:00
config BQL
2014-12-20 23:41:11 +03:00
bool
2011-11-28 20:33:09 +04:00
depends on SYSFS
select DQL
default y
2011-04-20 13:27:32 +04:00
config BPF_JIT
bool "enable BPF Just In Time compiler"
2016-05-13 20:08:28 +03:00
depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
2011-04-29 21:20:53 +04:00
depends on MODULES
2011-04-20 13:27:32 +04:00
---help---
Berkeley Packet Filter filtering capabilities are normally handled
by an interpreter. This option allows kernel to generate a native
code when filter is loaded in memory. This should speedup
bpf: add generic constant blinding for use in jits
This work adds a generic facility for use from eBPF JIT compilers
that allows for further hardening of JIT generated images through
blinding constants. In response to the original work on BPF JIT
spraying published by Keegan McAllister [1], most BPF JITs were
changed to make images read-only and start at a randomized offset
in the page, where the rest was filled with trap instructions. We
have this nowadays in x86, arm, arm64 and s390 JIT compilers.
Additionally, later work also made eBPF interpreter images read
only for kernels supporting DEBUG_SET_MODULE_RONX, that is, x86,
arm, arm64 and s390 archs as well currently. This is done by
default for mentioned JITs when JITing is enabled. Furthermore,
we had a generic and configurable constant blinding facility on our
todo for quite some time now to further make spraying harder, and
first implementation since around netconf 2016.
We found that for systems where untrusted users can load cBPF/eBPF
code where JIT is enabled, start offset randomization helps a bit
to make jumps into crafted payload harder, but in case where larger
programs that cross page boundary are injected, we again have some
part of the program opcodes at a page start offset. With improved
guessing and more reliable payload injection, chances can increase
to jump into such payload. Elena Reshetova recently wrote a test
case for it [2, 3]. Moreover, eBPF comes with 64 bit constants, which
can leave some more room for payloads. Note that for all this,
additional bugs in the kernel are still required to make the jump
(and of course to guess right, to not jump into a trap) and naturally
the JIT must be enabled, which is disabled by default.
For helping mitigation, the general idea is to provide an option
bpf_jit_harden that admins can tweak along with bpf_jit_enable, so
that for cases where JIT should be enabled for performance reasons,
the generated image can be further hardened with blinding constants
for unpriviledged users (bpf_jit_harden == 1), with trading off
performance for these, but not for privileged ones. We also added
the option of blinding for all users (bpf_jit_harden == 2), which
is quite helpful for testing f.e. with test_bpf.ko. There are no
further e.g. hardening levels of bpf_jit_harden switch intended,
rationale is to have it dead simple to use as on/off. Since this
functionality would need to be duplicated over and over for JIT
compilers to use, which are already complex enough, we provide a
generic eBPF byte-code level based blinding implementation, which is
then just transparently JITed. JIT compilers need to make only a few
changes to integrate this facility and can be migrated one by one.
This option is for eBPF JITs and will be used in x86, arm64, s390
without too much effort, and soon ppc64 JITs, thus that native eBPF
can be blinded as well as cBPF to eBPF migrations, so that both can
be covered with a single implementation. The rule for JITs is that
bpf_jit_blind_constants() must be called from bpf_int_jit_compile(),
and in case blinding is disabled, we follow normally with JITing the
passed program. In case blinding is enabled and we fail during the
process of blinding itself, we must return with the interpreter.
Similarly, in case the JITing process after the blinding failed, we
return normally to the interpreter with the non-blinded code. Meaning,
interpreter doesn't change in any way and operates on eBPF code as
usual. For doing this pre-JIT blinding step, we need to make use of
a helper/auxiliary register, here BPF_REG_AX. This is strictly internal
to the JIT and not in any way part of the eBPF architecture. Just like
in the same way as JITs internally make use of some helper registers
when emitting code, only that here the helper register is one
abstraction level higher in eBPF bytecode, but nevertheless in JIT
phase. That helper register is needed since f.e. manually written
program can issue loads to all registers of eBPF architecture.
The core concept with the additional register is: blind out all 32
and 64 bit constants by converting BPF_K based instructions into a
small sequence from K_VAL into ((RND ^ K_VAL) ^ RND). Therefore, this
is transformed into: BPF_REG_AX := (RND ^ K_VAL), BPF_REG_AX ^= RND,
and REG <OP> BPF_REG_AX, so actual operation on the target register
is translated from BPF_K into BPF_X one that is operating on
BPF_REG_AX's content. During rewriting phase when blinding, RND is
newly generated via prandom_u32() for each processed instruction.
64 bit loads are split into two 32 bit loads to make translation and
patching not too complex. Only basic thing required by JITs is to
call the helper bpf_jit_blind_constants()/bpf_jit_prog_release_other()
pair, and to map BPF_REG_AX into an unused register.
Small bpf_jit_disasm extract from [2] when applied to x86 JIT:
echo 0 > /proc/sys/net/core/bpf_jit_harden
ffffffffa034f5e9 + <x>:
[...]
39: mov $0xa8909090,%eax
3e: mov $0xa8909090,%eax
43: mov $0xa8ff3148,%eax
48: mov $0xa89081b4,%eax
4d: mov $0xa8900bb0,%eax
52: mov $0xa810e0c1,%eax
57: mov $0xa8908eb4,%eax
5c: mov $0xa89020b0,%eax
[...]
echo 1 > /proc/sys/net/core/bpf_jit_harden
ffffffffa034f1e5 + <x>:
[...]
39: mov $0xe1192563,%r10d
3f: xor $0x4989b5f3,%r10d
46: mov %r10d,%eax
49: mov $0xb8296d93,%r10d
4f: xor $0x10b9fd03,%r10d
56: mov %r10d,%eax
59: mov $0x8c381146,%r10d
5f: xor $0x24c7200e,%r10d
66: mov %r10d,%eax
69: mov $0xeb2a830e,%r10d
6f: xor $0x43ba02ba,%r10d
76: mov %r10d,%eax
79: mov $0xd9730af,%r10d
7f: xor $0xa5073b1f,%r10d
86: mov %r10d,%eax
89: mov $0x9a45662b,%r10d
8f: xor $0x325586ea,%r10d
96: mov %r10d,%eax
[...]
As can be seen, original constants that carry payload are hidden
when enabled, actual operations are transformed from constant-based
to register-based ones, making jumps into constants ineffective.
Above extract/example uses single BPF load instruction over and
over, but of course all instructions with constants are blinded.
Performance wise, JIT with blinding performs a bit slower than just
JIT and faster than interpreter case. This is expected, since we
still get all the performance benefits from JITing and in normal
use-cases not every single instruction needs to be blinded. Summing
up all 296 test cases averaged over multiple runs from test_bpf.ko
suite, interpreter was 55% slower than JIT only and JIT with blinding
was 8% slower than JIT only. Since there are also some extremes in
the test suite, I expect for ordinary workloads that the performance
for the JIT with blinding case is even closer to JIT only case,
f.e. nmap test case from suite has averaged timings in ns 29 (JIT),
35 (+ blinding), and 151 (interpreter).
BPF test suite, seccomp test suite, eBPF sample code and various
bigger networking eBPF programs have been tested with this and were
running fine. For testing purposes, I also adapted interpreter and
redirected blinded eBPF image to interpreter and also here all tests
pass.
[1] http://mainisusuallyafunction.blogspot.com/2012/11/attacking-hardened-linux-systems-with.html
[2] https://github.com/01org/jit-spray-poc-for-ksp/
[3] http://www.openwall.com/lists/kernel-hardening/2016/05/03/5
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Elena Reshetova <elena.reshetova@intel.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-13 20:08:32 +03:00
packet sniffing (libpcap/tcpdump).
Note, admin should enable this feature changing:
/proc/sys/net/core/bpf_jit_enable
/proc/sys/net/core/bpf_jit_harden (optional)
2011-04-20 13:27:32 +04:00
2013-05-20 08:02:32 +04:00
config NET_FLOW_LIMIT
2014-12-20 23:41:11 +03:00
bool
2013-05-20 08:02:32 +04:00
depends on RPS
default y
---help---
The network stack has to drop packets when a receive processing CPU's
backlog reaches netdev_max_backlog. If a few out of many active flows
generate the vast majority of load, drop their traffic earlier to
maintain capacity for the other flows. This feature provides servers
with many clients some protection against DoS by a single (spoofed)
flow that greatly exceeds average workload.
2005-04-17 02:20:36 +04:00
menu "Network testing"
config NET_PKTGEN
tristate "Packet Generator (USE WITH CAUTION)"
2013-07-29 15:44:15 +04:00
depends on INET && PROC_FS
2005-04-17 02:20:36 +04:00
---help---
This module will inject preconfigured packets, at a configurable
rate, out of a given interface. It is used for network interface
stress testing and performance analysis. If you don't understand
what was just said, you don't need it: say N.
Documentation on how to use the packet generator can be found
at <file:Documentation/networking/pktgen.txt>.
To compile this code as a module, choose M here: the
module will be called pktgen.
2006-06-06 04:30:32 +04:00
config NET_TCPPROBE
tristate "TCP connection probing"
2012-10-02 22:19:40 +04:00
depends on INET && PROC_FS && KPROBES
2006-06-06 04:30:32 +04:00
---help---
This module allows for capturing the changes to TCP connection
2006-06-09 10:42:09 +04:00
state in response to incoming packets. It is used for debugging
2006-06-06 04:30:32 +04:00
TCP congestion avoidance modules. If you don't understand
what was just said, you don't need it: say N.
2006-09-26 10:47:14 +04:00
Documentation on how to use TCP connection probing can be found
2010-11-15 22:55:34 +03:00
at:
http://www.linuxfoundation.org/collaborate/workgroups/networking/tcpprobe
2006-06-06 04:30:32 +04:00
To compile this code as a module, choose M here: the
module will be called tcp_probe.
2009-03-11 12:53:16 +03:00
config NET_DROP_MONITOR
2012-05-17 14:04:00 +04:00
tristate "Network packet drop alerting service"
2012-10-02 22:19:40 +04:00
depends on INET && TRACEPOINTS
2009-03-11 12:53:16 +03:00
---help---
This feature provides an alerting service to userspace in the
event that packets are discarded in the network stack. Alerts
are broadcast via netlink socket to any listening user space
process. If you don't need network drop alerts, or if you are ok
just checking the various proc files and other utilities for
drop statistics, say N here.
2005-04-17 02:20:36 +04:00
endmenu
endmenu
source "net/ax25/Kconfig"
2007-11-17 02:52:17 +03:00
source "net/can/Kconfig"
2005-04-17 02:20:36 +04:00
source "net/irda/Kconfig"
source "net/bluetooth/Kconfig"
2007-04-27 02:48:28 +04:00
source "net/rxrpc/Kconfig"
2016-03-08 01:11:06 +03:00
source "net/kcm/Kconfig"
2006-01-21 02:46:55 +03:00
2006-08-04 14:38:38 +04:00
config FIB_RULES
bool
2008-07-24 20:20:09 +04:00
menuconfig WIRELESS
bool "Wireless"
2007-05-10 17:46:01 +04:00
depends on !S390
2008-07-24 20:20:09 +04:00
default y
if WIRELESS
2007-04-23 23:19:12 +04:00
source "net/wireless/Kconfig"
2007-05-05 22:45:53 +04:00
source "net/mac80211/Kconfig"
2007-04-23 23:19:12 +04:00
2008-07-24 20:20:09 +04:00
endif # WIRELESS
2007-04-23 23:19:12 +04:00
2008-12-24 03:18:24 +03:00
source "net/wimax/Kconfig"
2007-05-07 11:34:20 +04:00
source "net/rfkill/Kconfig"
2007-07-11 02:57:28 +04:00
source "net/9p/Kconfig"
2010-03-30 17:56:28 +04:00
source "net/caif/Kconfig"
2010-04-07 02:14:15 +04:00
source "net/ceph/Kconfig"
2011-07-02 02:31:33 +04:00
source "net/nfc/Kconfig"
2010-03-30 17:56:28 +04:00
2015-07-21 11:43:46 +03:00
config LWTUNNEL
bool "Network light weight tunnels"
---help---
This feature provides an infrastructure to support light weight
tunnels like mpls. There is no netdevice associated with a light
weight tunnel endpoint. Tunnel encapsulation parameters are stored
with light weight tunnel state associated with fib routes.
2007-05-07 11:34:20 +04:00
2016-02-12 17:43:53 +03:00
config DST_CACHE
2016-03-22 01:37:22 +03:00
bool
2016-02-12 17:43:53 +03:00
default n
2016-02-26 19:32:23 +03:00
config NET_DEVLINK
tristate "Network physical/parent device Netlink interface"
help
Network physical/parent device Netlink interface provides
infrastructure to support access to physical chip-wide config and
monitoring.
2016-03-02 12:40:54 +03:00
config MAY_USE_DEVLINK
tristate
default m if NET_DEVLINK=m
default y if NET_DEVLINK=y || NET_DEVLINK=n
help
Drivers using the devlink infrastructure should have a dependency
on MAY_USE_DEVLINK to ensure they do not cause link errors when
devlink is a loadable module and the driver using it is built-in.
2005-07-12 08:13:56 +04:00
endif # if NET
2012-05-21 22:45:37 +04:00
2016-05-13 20:08:28 +03:00
# Used by archs to tell that they support BPF JIT compiler plus which flavour.
# Only one of the two can be selected for a specific arch since eBPF JIT supersedes
# the cBPF JIT.
# Classic BPF JIT (cBPF)
config HAVE_CBPF_JIT
bool
# Extended BPF JIT (eBPF)
config HAVE_EBPF_JIT
2012-05-21 22:45:37 +04:00
bool