2005-04-17 02:20:36 +04:00
/*
* INET An implementation of the TCP / IP protocol suite for the LINUX
* operating system . INET is implemented using the BSD Socket
* interface as the means of communication with the user level .
*
* The Internet Protocol ( IP ) module .
*
2005-05-06 03:16:16 +04:00
* Authors : Ross Biro
2005-04-17 02:20:36 +04:00
* Fred N . van Kempen , < waltje @ uWalt . NL . Mugnet . ORG >
* Donald Becker , < becker @ super . org >
2008-10-14 06:01:08 +04:00
* Alan Cox , < alan @ lxorguk . ukuu . org . uk >
2005-04-17 02:20:36 +04:00
* Richard Underwood
* Stefan Becker , < stefanb @ yello . ping . de >
* Jorge Cwik , < jorge @ laser . satlink . net >
* Arnt Gulbrandsen , < agulbra @ nvg . unit . no >
2007-02-09 17:24:47 +03:00
*
2005-04-17 02:20:36 +04:00
*
* Fixes :
* Alan Cox : Commented a couple of minor bits of surplus code
* Alan Cox : Undefining IP_FORWARD doesn ' t include the code
* ( just stops a compiler warning ) .
* Alan Cox : Frames with > = MAX_ROUTE record routes , strict routes or loose routes
* are junked rather than corrupting things .
* Alan Cox : Frames to bad broadcast subnets are dumped
* We used to process them non broadcast and
* boy could that cause havoc .
* Alan Cox : ip_forward sets the free flag on the
* new frame it queues . Still crap because
* it copies the frame but at least it
* doesn ' t eat memory too .
* Alan Cox : Generic queue code and memory fixes .
* Fred Van Kempen : IP fragment support ( borrowed from NET2E )
* Gerhard Koerting : Forward fragmented frames correctly .
* Gerhard Koerting : Fixes to my fix of the above 8 - ) .
* Gerhard Koerting : IP interface addressing fix .
* Linus Torvalds : More robustness checks
* Alan Cox : Even more checks : Still not as robust as it ought to be
* Alan Cox : Save IP header pointer for later
* Alan Cox : ip option setting
* Alan Cox : Use ip_tos / ip_ttl settings
* Alan Cox : Fragmentation bogosity removed
* ( Thanks to Mark . Bush @ prg . ox . ac . uk )
* Dmitry Gorodchanin : Send of a raw packet crash fix .
* Alan Cox : Silly ip bug when an overlength
* fragment turns up . Now frees the
* queue .
* Linus Torvalds / : Memory leakage on fragmentation
* Alan Cox : handling .
* Gerhard Koerting : Forwarding uses IP priority hints
* Teemu Rantanen : Fragment problems .
* Alan Cox : General cleanup , comments and reformat
* Alan Cox : SNMP statistics
* Alan Cox : BSD address rule semantics . Also see
* UDP as there is a nasty checksum issue
* if you do things the wrong way .
* Alan Cox : Always defrag , moved IP_FORWARD to the config . in file
* Alan Cox : IP options adjust sk - > priority .
* Pedro Roque : Fix mtu / length error in ip_forward .
* Alan Cox : Avoid ip_chk_addr when possible .
* Richard Underwood : IP multicasting .
* Alan Cox : Cleaned up multicast handlers .
* Alan Cox : RAW sockets demultiplex in the BSD style .
* Gunther Mayer : Fix the SNMP reporting typo
* Alan Cox : Always in group 224.0 .0 .1
* Pauline Middelink : Fast ip_checksum update when forwarding
* Masquerading support .
* Alan Cox : Multicast loopback error for 224.0 .0 .1
* Alan Cox : IP_MULTICAST_LOOP option .
* Alan Cox : Use notifiers .
* Bjorn Ekwall : Removed ip_csum ( from slhc . c too )
* Bjorn Ekwall : Moved ip_fast_csum to ip . h ( inline ! )
* Stefan Becker : Send out ICMP HOST REDIRECT
* Arnt Gulbrandsen : ip_build_xmit
* Alan Cox : Per socket routing cache
* Alan Cox : Fixed routing cache , added header cache .
* Alan Cox : Loopback didn ' t work right in original ip_build_xmit - fixed it .
* Alan Cox : Only send ICMP_REDIRECT if src / dest are the same net .
* Alan Cox : Incoming IP option handling .
* Alan Cox : Set saddr on raw output frames as per BSD .
* Alan Cox : Stopped broadcast source route explosions .
* Alan Cox : Can disable source routing
* Takeshi Sone : Masquerading didn ' t work .
* Dave Bonn , Alan Cox : Faster IP forwarding whenever possible .
* Alan Cox : Memory leaks , tramples , misc debugging .
* Alan Cox : Fixed multicast ( by popular demand 8 ) )
* Alan Cox : Fixed forwarding ( by even more popular demand 8 ) )
* Alan Cox : Fixed SNMP statistics [ I think ]
* Gerhard Koerting : IP fragmentation forwarding fix
* Alan Cox : Device lock against page fault .
* Alan Cox : IP_HDRINCL facility .
* Werner Almesberger : Zero fragment bug
* Alan Cox : RAW IP frame length bug
* Alan Cox : Outgoing firewall on build_xmit
* A . N . Kuznetsov : IP_OPTIONS support throughout the kernel
* Alan Cox : Multicast routing hooks
* Jos Vos : Do accounting * before * call_in_firewall
* Willy Konynenberg : Transparent proxying support
*
2007-02-09 17:24:47 +03:00
*
2005-04-17 02:20:36 +04:00
*
* To Fix :
* IP fragmentation wants rewriting cleanly . The RFC815 algorithm is much more efficient
* and could be made very efficient with the addition of some virtual memory hacks to permit
* the allocation of a buffer that can then be ' grown ' by twiddling page tables .
2007-02-09 17:24:47 +03:00
* Output fragmentation wants updating along with the buffer management to use a single
2005-04-17 02:20:36 +04:00
* interleaved copy algorithm so that fragmenting has a one copy overhead . Actual packet
* output should probably do its own fragmentation at the UDP / RAW layer . TCP shouldn ' t cause
* fragmentation anyway .
*
* This program is free software ; you can redistribute it and / or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation ; either version
* 2 of the License , or ( at your option ) any later version .
*/
# include <asm/system.h>
# include <linux/module.h>
# include <linux/types.h>
# include <linux/kernel.h>
# include <linux/string.h>
# include <linux/errno.h>
# include <linux/net.h>
# include <linux/socket.h>
# include <linux/sockios.h>
# include <linux/in.h>
# include <linux/inet.h>
2005-12-27 07:43:12 +03:00
# include <linux/inetdevice.h>
2005-04-17 02:20:36 +04:00
# include <linux/netdevice.h>
# include <linux/etherdevice.h>
# include <net/snmp.h>
# include <net/ip.h>
# include <net/protocol.h>
# include <net/route.h>
# include <linux/skbuff.h>
# include <net/sock.h>
# include <net/arp.h>
# include <net/icmp.h>
# include <net/raw.h>
# include <net/checksum.h>
# include <linux/netfilter_ipv4.h>
# include <net/xfrm.h>
# include <linux/mroute.h>
# include <linux/netlink.h>
/*
* Process Router Attention IP option
2007-02-09 17:24:47 +03:00
*/
2005-04-17 02:20:36 +04:00
int ip_call_ra_chain ( struct sk_buff * skb )
{
struct ip_ra_chain * ra ;
2007-04-21 09:47:35 +04:00
u8 protocol = ip_hdr ( skb ) - > protocol ;
2005-04-17 02:20:36 +04:00
struct sock * last = NULL ;
2008-03-25 01:31:00 +03:00
struct net_device * dev = skb - > dev ;
2005-04-17 02:20:36 +04:00
read_lock ( & ip_ra_lock ) ;
for ( ra = ip_ra_chain ; ra ; ra = ra - > next ) {
struct sock * sk = ra - > sk ;
/* If socket is bound to an interface, only report
* the packet if it came from that interface .
*/
if ( sk & & inet_sk ( sk ) - > num = = protocol & &
( ! sk - > sk_bound_dev_if | |
2008-03-25 01:31:00 +03:00
sk - > sk_bound_dev_if = = dev - > ifindex ) & &
2008-03-25 20:26:21 +03:00
sock_net ( sk ) = = dev_net ( dev ) ) {
2007-04-21 09:47:35 +04:00
if ( ip_hdr ( skb ) - > frag_off & htons ( IP_MF | IP_OFFSET ) ) {
2007-10-14 11:38:32 +04:00
if ( ip_defrag ( skb , IP_DEFRAG_CALL_RA_CHAIN ) ) {
2005-04-17 02:20:36 +04:00
read_unlock ( & ip_ra_lock ) ;
return 1 ;
}
}
if ( last ) {
struct sk_buff * skb2 = skb_clone ( skb , GFP_ATOMIC ) ;
if ( skb2 )
raw_rcv ( last , skb2 ) ;
}
last = sk ;
}
}
if ( last ) {
raw_rcv ( last , skb ) ;
read_unlock ( & ip_ra_lock ) ;
return 1 ;
}
read_unlock ( & ip_ra_lock ) ;
return 0 ;
}
2007-10-15 12:48:39 +04:00
static int ip_local_deliver_finish ( struct sk_buff * skb )
2005-04-17 02:20:36 +04:00
{
2008-03-25 15:47:49 +03:00
struct net * net = dev_net ( skb - > dev ) ;
2008-03-25 01:33:00 +03:00
2007-03-13 02:09:15 +03:00
__skb_pull ( skb , ip_hdrlen ( skb ) ) ;
2005-04-17 02:20:36 +04:00
2007-02-09 17:24:47 +03:00
/* Point into the IP datagram, just past the header. */
2007-03-13 19:06:52 +03:00
skb_reset_transport_header ( skb ) ;
2005-04-17 02:20:36 +04:00
rcu_read_lock ( ) ;
{
2007-04-21 09:47:35 +04:00
int protocol = ip_hdr ( skb ) - > protocol ;
2007-11-20 09:35:07 +03:00
int hash , raw ;
2009-09-14 16:21:47 +04:00
const struct net_protocol * ipprot ;
2005-04-17 02:20:36 +04:00
resubmit :
2007-11-20 09:35:07 +03:00
raw = raw_local_deliver ( skb , protocol ) ;
2005-04-17 02:20:36 +04:00
2007-11-20 09:35:07 +03:00
hash = protocol & ( MAX_INET_PROTOS - 1 ) ;
2008-03-25 01:33:00 +03:00
ipprot = rcu_dereference ( inet_protos [ hash ] ) ;
2008-11-13 10:23:51 +03:00
if ( ipprot ! = NULL ) {
2005-04-17 02:20:36 +04:00
int ret ;
2008-11-13 10:23:51 +03:00
if ( ! net_eq ( net , & init_net ) & & ! ipprot - > netns_ok ) {
if ( net_ratelimit ( ) )
printk ( " %s: proto %d isn't netns-ready \n " ,
__func__ , protocol ) ;
kfree_skb ( skb ) ;
goto out ;
}
2006-01-07 10:06:10 +03:00
if ( ! ipprot - > no_policy ) {
if ( ! xfrm4_policy_check ( NULL , XFRM_POLICY_IN , skb ) ) {
kfree_skb ( skb ) ;
goto out ;
}
nf_reset ( skb ) ;
2005-04-17 02:20:36 +04:00
}
ret = ipprot - > handler ( skb ) ;
if ( ret < 0 ) {
protocol = - ret ;
goto resubmit ;
}
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( net , IPSTATS_MIB_INDELIVERS ) ;
2005-04-17 02:20:36 +04:00
} else {
2007-11-20 09:35:07 +03:00
if ( ! raw ) {
2005-04-17 02:20:36 +04:00
if ( xfrm4_policy_check ( NULL , XFRM_POLICY_IN , skb ) ) {
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( net , IPSTATS_MIB_INUNKNOWNPROTOS ) ;
2005-04-17 02:20:36 +04:00
icmp_send ( skb , ICMP_DEST_UNREACH ,
ICMP_PROT_UNREACH , 0 ) ;
}
} else
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( net , IPSTATS_MIB_INDELIVERS ) ;
2005-04-17 02:20:36 +04:00
kfree_skb ( skb ) ;
}
}
out :
rcu_read_unlock ( ) ;
return 0 ;
}
/*
* Deliver IP Packets to the higher protocol layers .
2007-02-09 17:24:47 +03:00
*/
2005-04-17 02:20:36 +04:00
int ip_local_deliver ( struct sk_buff * skb )
{
/*
* Reassemble IP fragments .
*/
2007-04-21 09:47:35 +04:00
if ( ip_hdr ( skb ) - > frag_off & htons ( IP_MF | IP_OFFSET ) ) {
2007-10-14 11:38:32 +04:00
if ( ip_defrag ( skb , IP_DEFRAG_LOCAL_DELIVER ) )
2005-04-17 02:20:36 +04:00
return 0 ;
}
2007-11-20 05:53:30 +03:00
return NF_HOOK ( PF_INET , NF_INET_LOCAL_IN , skb , skb - > dev , NULL ,
2005-04-17 02:20:36 +04:00
ip_local_deliver_finish ) ;
}
2005-08-21 04:26:12 +04:00
static inline int ip_rcv_options ( struct sk_buff * skb )
{
struct ip_options * opt ;
struct iphdr * iph ;
struct net_device * dev = skb - > dev ;
/* It looks as overkill, because not all
IP options require packet mangling .
But it is the easiest for now , especially taking
into account that combination of IP options
and running sniffer is extremely rare condition .
- - ANK ( 980813 )
*/
if ( skb_cow ( skb , skb_headroom ( skb ) ) ) {
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( dev_net ( dev ) , IPSTATS_MIB_INDISCARDS ) ;
2005-08-21 04:26:12 +04:00
goto drop ;
}
2007-04-21 09:47:35 +04:00
iph = ip_hdr ( skb ) ;
2008-03-23 02:36:20 +03:00
opt = & ( IPCB ( skb ) - > opt ) ;
opt - > optlen = iph - > ihl * 4 - sizeof ( struct iphdr ) ;
2005-08-21 04:26:12 +04:00
2008-03-25 15:47:49 +03:00
if ( ip_options_compile ( dev_net ( dev ) , opt , skb ) ) {
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( dev_net ( dev ) , IPSTATS_MIB_INHDRERRORS ) ;
2005-08-21 04:26:12 +04:00
goto drop ;
}
if ( unlikely ( opt - > srr ) ) {
struct in_device * in_dev = in_dev_get ( dev ) ;
if ( in_dev ) {
if ( ! IN_DEV_SOURCE_ROUTE ( in_dev ) ) {
if ( IN_DEV_LOG_MARTIANS ( in_dev ) & &
net_ratelimit ( ) )
2008-10-31 10:53:57 +03:00
printk ( KERN_INFO " source route option %pI4 -> %pI4 \n " ,
& iph - > saddr , & iph - > daddr ) ;
2005-08-21 04:26:12 +04:00
in_dev_put ( in_dev ) ;
goto drop ;
}
in_dev_put ( in_dev ) ;
}
if ( ip_options_rcv_srr ( skb ) )
goto drop ;
}
return 0 ;
drop :
return - 1 ;
}
2007-10-15 12:48:39 +04:00
static int ip_rcv_finish ( struct sk_buff * skb )
2005-04-17 02:20:36 +04:00
{
2007-04-21 09:47:35 +04:00
const struct iphdr * iph = ip_hdr ( skb ) ;
2007-04-30 11:48:10 +04:00
struct rtable * rt ;
2005-04-17 02:20:36 +04:00
/*
* Initialise the virtual path cache for the packet . It describes
* how the packet travels inside Linux networking .
2007-02-09 17:24:47 +03:00
*/
2009-06-02 09:19:30 +04:00
if ( skb_dst ( skb ) = = NULL ) {
2005-08-21 04:26:30 +04:00
int err = ip_route_input ( skb , iph - > daddr , iph - > saddr , iph - > tos ,
skb - > dev ) ;
if ( unlikely ( err ) ) {
2005-06-29 00:06:23 +04:00
if ( err = = - EHOSTUNREACH )
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( dev_net ( skb - > dev ) ,
IPSTATS_MIB_INADDRERRORS ) ;
2007-04-30 11:45:49 +04:00
else if ( err = = - ENETUNREACH )
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( dev_net ( skb - > dev ) ,
IPSTATS_MIB_INNOROUTES ) ;
2007-02-09 17:24:47 +03:00
goto drop ;
2005-06-29 00:06:23 +04:00
}
2005-04-17 02:20:36 +04:00
}
# ifdef CONFIG_NET_CLS_ROUTE
2009-06-02 09:19:30 +04:00
if ( unlikely ( skb_dst ( skb ) - > tclassid ) ) {
2007-11-16 14:32:10 +03:00
struct ip_rt_acct * st = per_cpu_ptr ( ip_rt_acct , smp_processor_id ( ) ) ;
2009-06-02 09:19:30 +04:00
u32 idx = skb_dst ( skb ) - > tclassid ;
2005-04-17 02:20:36 +04:00
st [ idx & 0xFF ] . o_packets + + ;
2008-11-03 13:47:38 +03:00
st [ idx & 0xFF ] . o_bytes + = skb - > len ;
2005-04-17 02:20:36 +04:00
st [ ( idx > > 16 ) & 0xFF ] . i_packets + + ;
2008-11-03 13:47:38 +03:00
st [ ( idx > > 16 ) & 0xFF ] . i_bytes + = skb - > len ;
2005-04-17 02:20:36 +04:00
}
# endif
2005-08-21 04:26:12 +04:00
if ( iph - > ihl > 5 & & ip_rcv_options ( skb ) )
goto drop ;
2005-04-17 02:20:36 +04:00
2009-06-02 09:14:27 +04:00
rt = skb_rtable ( skb ) ;
2009-04-27 13:45:02 +04:00
if ( rt - > rt_type = = RTN_MULTICAST ) {
IP_UPD_PO_STATS_BH ( dev_net ( rt - > u . dst . dev ) , IPSTATS_MIB_INMCAST ,
skb - > len ) ;
} else if ( rt - > rt_type = = RTN_BROADCAST )
IP_UPD_PO_STATS_BH ( dev_net ( rt - > u . dst . dev ) , IPSTATS_MIB_INBCAST ,
skb - > len ) ;
2007-04-30 11:48:10 +04:00
2005-04-17 02:20:36 +04:00
return dst_input ( skb ) ;
drop :
2007-02-09 17:24:47 +03:00
kfree_skb ( skb ) ;
return NET_RX_DROP ;
2005-04-17 02:20:36 +04:00
}
/*
* Main IP Receive routine .
2007-02-09 17:24:47 +03:00
*/
2005-08-10 06:34:12 +04:00
int ip_rcv ( struct sk_buff * skb , struct net_device * dev , struct packet_type * pt , struct net_device * orig_dev )
2005-04-17 02:20:36 +04:00
{
struct iphdr * iph ;
2005-08-21 04:25:29 +04:00
u32 len ;
2005-04-17 02:20:36 +04:00
/* When the interface is in promisc. mode, drop all the crap
* that it receives , do not try to analyse it .
*/
if ( skb - > pkt_type = = PACKET_OTHERHOST )
goto drop ;
2009-04-27 13:45:02 +04:00
IP_UPD_PO_STATS_BH ( dev_net ( dev ) , IPSTATS_MIB_IN , skb - > len ) ;
2005-04-17 02:20:36 +04:00
if ( ( skb = skb_share_check ( skb , GFP_ATOMIC ) ) = = NULL ) {
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( dev_net ( dev ) , IPSTATS_MIB_INDISCARDS ) ;
2005-04-17 02:20:36 +04:00
goto out ;
}
if ( ! pskb_may_pull ( skb , sizeof ( struct iphdr ) ) )
goto inhdr_error ;
2007-04-21 09:47:35 +04:00
iph = ip_hdr ( skb ) ;
2005-04-17 02:20:36 +04:00
/*
2008-05-08 12:11:04 +04:00
* RFC1122 : 3.2 .1 .2 MUST silently discard any IP frame that fails the checksum .
2005-04-17 02:20:36 +04:00
*
* Is the datagram acceptable ?
*
* 1. Length at least the size of an ip header
* 2. Version of 4
* 3. Checksums correctly . [ Speed optimisation for later , skip loopback checksums ]
* 4. Doesn ' t have a bogus length
*/
if ( iph - > ihl < 5 | | iph - > version ! = 4 )
2005-08-21 04:25:29 +04:00
goto inhdr_error ;
2005-04-17 02:20:36 +04:00
if ( ! pskb_may_pull ( skb , iph - > ihl * 4 ) )
goto inhdr_error ;
2007-04-21 09:47:35 +04:00
iph = ip_hdr ( skb ) ;
2005-04-17 02:20:36 +04:00
2005-08-21 04:25:52 +04:00
if ( unlikely ( ip_fast_csum ( ( u8 * ) iph , iph - > ihl ) ) )
2005-08-21 04:25:29 +04:00
goto inhdr_error ;
2005-04-17 02:20:36 +04:00
2005-08-21 04:25:29 +04:00
len = ntohs ( iph - > tot_len ) ;
2007-04-30 11:46:30 +04:00
if ( skb - > len < len ) {
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( dev_net ( dev ) , IPSTATS_MIB_INTRUNCATEDPKTS ) ;
2007-04-30 11:46:30 +04:00
goto drop ;
} else if ( len < ( iph - > ihl * 4 ) )
2005-08-21 04:25:29 +04:00
goto inhdr_error ;
2005-04-17 02:20:36 +04:00
2005-08-21 04:25:29 +04:00
/* Our transport medium may have padded the buffer out. Now we know it
* is IP we can trim to the true length of the frame .
* Note this now means skb - > len holds ntohs ( iph - > tot_len ) .
*/
if ( pskb_trim_rcsum ( skb , len ) ) {
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( dev_net ( dev ) , IPSTATS_MIB_INDISCARDS ) ;
2005-08-21 04:25:29 +04:00
goto drop ;
2005-04-17 02:20:36 +04:00
}
2006-07-15 01:49:32 +04:00
/* Remove any debris in the socket control block */
2006-07-25 10:45:16 +04:00
memset ( IPCB ( skb ) , 0 , sizeof ( struct inet_skb_parm ) ) ;
2006-07-15 01:49:32 +04:00
2009-06-27 06:22:37 +04:00
/* Must drop socket now because of tproxy. */
skb_orphan ( skb ) ;
2007-11-20 05:53:30 +03:00
return NF_HOOK ( PF_INET , NF_INET_PRE_ROUTING , skb , dev , NULL ,
2005-04-17 02:20:36 +04:00
ip_rcv_finish ) ;
inhdr_error :
2008-07-17 07:20:11 +04:00
IP_INC_STATS_BH ( dev_net ( dev ) , IPSTATS_MIB_INHDRERRORS ) ;
2005-04-17 02:20:36 +04:00
drop :
2007-02-09 17:24:47 +03:00
kfree_skb ( skb ) ;
2005-04-17 02:20:36 +04:00
out :
2007-02-09 17:24:47 +03:00
return NET_RX_DROP ;
2005-04-17 02:20:36 +04:00
}