2005-04-17 02:20:36 +04:00
/*
* INET An implementation of the TCP / IP protocol suite for the LINUX
* operating system . INET is implemented using the BSD Socket
* interface as the means of communication with the user level .
*
* The Internet Protocol ( IP ) module .
*
* Version : $ Id : ip_input . c , v 1.55 2002 / 01 / 12 07 : 39 : 45 davem Exp $
*
2005-05-06 03:16:16 +04:00
* Authors : Ross Biro
2005-04-17 02:20:36 +04:00
* Fred N . van Kempen , < waltje @ uWalt . NL . Mugnet . ORG >
* Donald Becker , < becker @ super . org >
* Alan Cox , < Alan . Cox @ linux . org >
* Richard Underwood
* Stefan Becker , < stefanb @ yello . ping . de >
* Jorge Cwik , < jorge @ laser . satlink . net >
* Arnt Gulbrandsen , < agulbra @ nvg . unit . no >
*
*
* Fixes :
* Alan Cox : Commented a couple of minor bits of surplus code
* Alan Cox : Undefining IP_FORWARD doesn ' t include the code
* ( just stops a compiler warning ) .
* Alan Cox : Frames with > = MAX_ROUTE record routes , strict routes or loose routes
* are junked rather than corrupting things .
* Alan Cox : Frames to bad broadcast subnets are dumped
* We used to process them non broadcast and
* boy could that cause havoc .
* Alan Cox : ip_forward sets the free flag on the
* new frame it queues . Still crap because
* it copies the frame but at least it
* doesn ' t eat memory too .
* Alan Cox : Generic queue code and memory fixes .
* Fred Van Kempen : IP fragment support ( borrowed from NET2E )
* Gerhard Koerting : Forward fragmented frames correctly .
* Gerhard Koerting : Fixes to my fix of the above 8 - ) .
* Gerhard Koerting : IP interface addressing fix .
* Linus Torvalds : More robustness checks
* Alan Cox : Even more checks : Still not as robust as it ought to be
* Alan Cox : Save IP header pointer for later
* Alan Cox : ip option setting
* Alan Cox : Use ip_tos / ip_ttl settings
* Alan Cox : Fragmentation bogosity removed
* ( Thanks to Mark . Bush @ prg . ox . ac . uk )
* Dmitry Gorodchanin : Send of a raw packet crash fix .
* Alan Cox : Silly ip bug when an overlength
* fragment turns up . Now frees the
* queue .
* Linus Torvalds / : Memory leakage on fragmentation
* Alan Cox : handling .
* Gerhard Koerting : Forwarding uses IP priority hints
* Teemu Rantanen : Fragment problems .
* Alan Cox : General cleanup , comments and reformat
* Alan Cox : SNMP statistics
* Alan Cox : BSD address rule semantics . Also see
* UDP as there is a nasty checksum issue
* if you do things the wrong way .
* Alan Cox : Always defrag , moved IP_FORWARD to the config . in file
* Alan Cox : IP options adjust sk - > priority .
* Pedro Roque : Fix mtu / length error in ip_forward .
* Alan Cox : Avoid ip_chk_addr when possible .
* Richard Underwood : IP multicasting .
* Alan Cox : Cleaned up multicast handlers .
* Alan Cox : RAW sockets demultiplex in the BSD style .
* Gunther Mayer : Fix the SNMP reporting typo
* Alan Cox : Always in group 224.0 .0 .1
* Pauline Middelink : Fast ip_checksum update when forwarding
* Masquerading support .
* Alan Cox : Multicast loopback error for 224.0 .0 .1
* Alan Cox : IP_MULTICAST_LOOP option .
* Alan Cox : Use notifiers .
* Bjorn Ekwall : Removed ip_csum ( from slhc . c too )
* Bjorn Ekwall : Moved ip_fast_csum to ip . h ( inline ! )
* Stefan Becker : Send out ICMP HOST REDIRECT
* Arnt Gulbrandsen : ip_build_xmit
* Alan Cox : Per socket routing cache
* Alan Cox : Fixed routing cache , added header cache .
* Alan Cox : Loopback didn ' t work right in original ip_build_xmit - fixed it .
* Alan Cox : Only send ICMP_REDIRECT if src / dest are the same net .
* Alan Cox : Incoming IP option handling .
* Alan Cox : Set saddr on raw output frames as per BSD .
* Alan Cox : Stopped broadcast source route explosions .
* Alan Cox : Can disable source routing
* Takeshi Sone : Masquerading didn ' t work .
* Dave Bonn , Alan Cox : Faster IP forwarding whenever possible .
* Alan Cox : Memory leaks , tramples , misc debugging .
* Alan Cox : Fixed multicast ( by popular demand 8 ) )
* Alan Cox : Fixed forwarding ( by even more popular demand 8 ) )
* Alan Cox : Fixed SNMP statistics [ I think ]
* Gerhard Koerting : IP fragmentation forwarding fix
* Alan Cox : Device lock against page fault .
* Alan Cox : IP_HDRINCL facility .
* Werner Almesberger : Zero fragment bug
* Alan Cox : RAW IP frame length bug
* Alan Cox : Outgoing firewall on build_xmit
* A . N . Kuznetsov : IP_OPTIONS support throughout the kernel
* Alan Cox : Multicast routing hooks
* Jos Vos : Do accounting * before * call_in_firewall
* Willy Konynenberg : Transparent proxying support
*
*
*
* To Fix :
* IP fragmentation wants rewriting cleanly . The RFC815 algorithm is much more efficient
* and could be made very efficient with the addition of some virtual memory hacks to permit
* the allocation of a buffer that can then be ' grown ' by twiddling page tables .
* Output fragmentation wants updating along with the buffer management to use a single
* interleaved copy algorithm so that fragmenting has a one copy overhead . Actual packet
* output should probably do its own fragmentation at the UDP / RAW layer . TCP shouldn ' t cause
* fragmentation anyway .
*
* This program is free software ; you can redistribute it and / or
* modify it under the terms of the GNU General Public License
* as published by the Free Software Foundation ; either version
* 2 of the License , or ( at your option ) any later version .
*/
# include <asm/system.h>
# include <linux/module.h>
# include <linux/types.h>
# include <linux/kernel.h>
# include <linux/string.h>
# include <linux/errno.h>
# include <linux/config.h>
# include <linux/net.h>
# include <linux/socket.h>
# include <linux/sockios.h>
# include <linux/in.h>
# include <linux/inet.h>
2005-12-27 07:43:12 +03:00
# include <linux/inetdevice.h>
2005-04-17 02:20:36 +04:00
# include <linux/netdevice.h>
# include <linux/etherdevice.h>
# include <net/snmp.h>
# include <net/ip.h>
# include <net/protocol.h>
# include <net/route.h>
# include <linux/skbuff.h>
# include <net/sock.h>
# include <net/arp.h>
# include <net/icmp.h>
# include <net/raw.h>
# include <net/checksum.h>
# include <linux/netfilter_ipv4.h>
# include <net/xfrm.h>
# include <linux/mroute.h>
# include <linux/netlink.h>
/*
* SNMP management statistics
*/
2005-08-26 23:05:31 +04:00
DEFINE_SNMP_STAT ( struct ipstats_mib , ip_statistics ) __read_mostly ;
2005-04-17 02:20:36 +04:00
/*
* Process Router Attention IP option
*/
int ip_call_ra_chain ( struct sk_buff * skb )
{
struct ip_ra_chain * ra ;
u8 protocol = skb - > nh . iph - > protocol ;
struct sock * last = NULL ;
read_lock ( & ip_ra_lock ) ;
for ( ra = ip_ra_chain ; ra ; ra = ra - > next ) {
struct sock * sk = ra - > sk ;
/* If socket is bound to an interface, only report
* the packet if it came from that interface .
*/
if ( sk & & inet_sk ( sk ) - > num = = protocol & &
( ! sk - > sk_bound_dev_if | |
sk - > sk_bound_dev_if = = skb - > dev - > ifindex ) ) {
if ( skb - > nh . iph - > frag_off & htons ( IP_MF | IP_OFFSET ) ) {
skb = ip_defrag ( skb , IP_DEFRAG_CALL_RA_CHAIN ) ;
if ( skb = = NULL ) {
read_unlock ( & ip_ra_lock ) ;
return 1 ;
}
}
if ( last ) {
struct sk_buff * skb2 = skb_clone ( skb , GFP_ATOMIC ) ;
if ( skb2 )
raw_rcv ( last , skb2 ) ;
}
last = sk ;
}
}
if ( last ) {
raw_rcv ( last , skb ) ;
read_unlock ( & ip_ra_lock ) ;
return 1 ;
}
read_unlock ( & ip_ra_lock ) ;
return 0 ;
}
static inline int ip_local_deliver_finish ( struct sk_buff * skb )
{
int ihl = skb - > nh . iph - > ihl * 4 ;
__skb_pull ( skb , ihl ) ;
/* Point into the IP datagram, just past the header. */
skb - > h . raw = skb - > data ;
rcu_read_lock ( ) ;
{
/* Note: See raw.c and net/raw.h, RAWV4_HTABLE_SIZE==MAX_INET_PROTOS */
int protocol = skb - > nh . iph - > protocol ;
int hash ;
struct sock * raw_sk ;
struct net_protocol * ipprot ;
resubmit :
hash = protocol & ( MAX_INET_PROTOS - 1 ) ;
raw_sk = sk_head ( & raw_v4_htable [ hash ] ) ;
/* If there maybe a raw socket we must check - if not we
* don ' t care less
*/
2005-08-10 06:45:02 +04:00
if ( raw_sk & & ! raw_v4_input ( skb , skb - > nh . iph , hash ) )
raw_sk = NULL ;
2005-04-17 02:20:36 +04:00
if ( ( ipprot = rcu_dereference ( inet_protos [ hash ] ) ) ! = NULL ) {
int ret ;
2006-01-07 10:06:10 +03:00
if ( ! ipprot - > no_policy ) {
if ( ! xfrm4_policy_check ( NULL , XFRM_POLICY_IN , skb ) ) {
kfree_skb ( skb ) ;
goto out ;
}
nf_reset ( skb ) ;
2005-04-17 02:20:36 +04:00
}
ret = ipprot - > handler ( skb ) ;
if ( ret < 0 ) {
protocol = - ret ;
goto resubmit ;
}
IP_INC_STATS_BH ( IPSTATS_MIB_INDELIVERS ) ;
} else {
if ( ! raw_sk ) {
if ( xfrm4_policy_check ( NULL , XFRM_POLICY_IN , skb ) ) {
IP_INC_STATS_BH ( IPSTATS_MIB_INUNKNOWNPROTOS ) ;
icmp_send ( skb , ICMP_DEST_UNREACH ,
ICMP_PROT_UNREACH , 0 ) ;
}
} else
IP_INC_STATS_BH ( IPSTATS_MIB_INDELIVERS ) ;
kfree_skb ( skb ) ;
}
}
out :
rcu_read_unlock ( ) ;
return 0 ;
}
/*
* Deliver IP Packets to the higher protocol layers .
*/
int ip_local_deliver ( struct sk_buff * skb )
{
/*
* Reassemble IP fragments .
*/
if ( skb - > nh . iph - > frag_off & htons ( IP_MF | IP_OFFSET ) ) {
skb = ip_defrag ( skb , IP_DEFRAG_LOCAL_DELIVER ) ;
if ( ! skb )
return 0 ;
}
return NF_HOOK ( PF_INET , NF_IP_LOCAL_IN , skb , skb - > dev , NULL ,
ip_local_deliver_finish ) ;
}
2005-08-21 04:26:12 +04:00
static inline int ip_rcv_options ( struct sk_buff * skb )
{
struct ip_options * opt ;
struct iphdr * iph ;
struct net_device * dev = skb - > dev ;
/* It looks as overkill, because not all
IP options require packet mangling .
But it is the easiest for now , especially taking
into account that combination of IP options
and running sniffer is extremely rare condition .
- - ANK ( 980813 )
*/
if ( skb_cow ( skb , skb_headroom ( skb ) ) ) {
IP_INC_STATS_BH ( IPSTATS_MIB_INDISCARDS ) ;
goto drop ;
}
iph = skb - > nh . iph ;
if ( ip_options_compile ( NULL , skb ) ) {
IP_INC_STATS_BH ( IPSTATS_MIB_INHDRERRORS ) ;
goto drop ;
}
opt = & ( IPCB ( skb ) - > opt ) ;
if ( unlikely ( opt - > srr ) ) {
struct in_device * in_dev = in_dev_get ( dev ) ;
if ( in_dev ) {
if ( ! IN_DEV_SOURCE_ROUTE ( in_dev ) ) {
if ( IN_DEV_LOG_MARTIANS ( in_dev ) & &
net_ratelimit ( ) )
printk ( KERN_INFO " source route option "
" %u.%u.%u.%u -> %u.%u.%u.%u \n " ,
NIPQUAD ( iph - > saddr ) ,
NIPQUAD ( iph - > daddr ) ) ;
in_dev_put ( in_dev ) ;
goto drop ;
}
in_dev_put ( in_dev ) ;
}
if ( ip_options_rcv_srr ( skb ) )
goto drop ;
}
return 0 ;
drop :
return - 1 ;
}
2005-04-17 02:20:36 +04:00
static inline int ip_rcv_finish ( struct sk_buff * skb )
{
struct iphdr * iph = skb - > nh . iph ;
/*
* Initialise the virtual path cache for the packet . It describes
* how the packet travels inside Linux networking .
*/
2005-08-21 04:26:30 +04:00
if ( likely ( skb - > dst = = NULL ) ) {
int err = ip_route_input ( skb , iph - > daddr , iph - > saddr , iph - > tos ,
skb - > dev ) ;
if ( unlikely ( err ) ) {
2005-06-29 00:06:23 +04:00
if ( err = = - EHOSTUNREACH )
IP_INC_STATS_BH ( IPSTATS_MIB_INADDRERRORS ) ;
2005-04-17 02:20:36 +04:00
goto drop ;
2005-06-29 00:06:23 +04:00
}
2005-04-17 02:20:36 +04:00
}
# ifdef CONFIG_NET_CLS_ROUTE
2005-08-21 04:26:30 +04:00
if ( unlikely ( skb - > dst - > tclassid ) ) {
2005-04-17 02:20:36 +04:00
struct ip_rt_acct * st = ip_rt_acct + 256 * smp_processor_id ( ) ;
u32 idx = skb - > dst - > tclassid ;
st [ idx & 0xFF ] . o_packets + + ;
st [ idx & 0xFF ] . o_bytes + = skb - > len ;
st [ ( idx > > 16 ) & 0xFF ] . i_packets + + ;
st [ ( idx > > 16 ) & 0xFF ] . i_bytes + = skb - > len ;
}
# endif
2005-08-21 04:26:12 +04:00
if ( iph - > ihl > 5 & & ip_rcv_options ( skb ) )
goto drop ;
2005-04-17 02:20:36 +04:00
return dst_input ( skb ) ;
drop :
kfree_skb ( skb ) ;
return NET_RX_DROP ;
}
/*
* Main IP Receive routine .
*/
2005-08-10 06:34:12 +04:00
int ip_rcv ( struct sk_buff * skb , struct net_device * dev , struct packet_type * pt , struct net_device * orig_dev )
2005-04-17 02:20:36 +04:00
{
struct iphdr * iph ;
2005-08-21 04:25:29 +04:00
u32 len ;
2005-04-17 02:20:36 +04:00
/* When the interface is in promisc. mode, drop all the crap
* that it receives , do not try to analyse it .
*/
if ( skb - > pkt_type = = PACKET_OTHERHOST )
goto drop ;
IP_INC_STATS_BH ( IPSTATS_MIB_INRECEIVES ) ;
if ( ( skb = skb_share_check ( skb , GFP_ATOMIC ) ) = = NULL ) {
IP_INC_STATS_BH ( IPSTATS_MIB_INDISCARDS ) ;
goto out ;
}
if ( ! pskb_may_pull ( skb , sizeof ( struct iphdr ) ) )
goto inhdr_error ;
iph = skb - > nh . iph ;
/*
* RFC1122 : 3.1 .2 .2 MUST silently discard any IP frame that fails the checksum .
*
* Is the datagram acceptable ?
*
* 1. Length at least the size of an ip header
* 2. Version of 4
* 3. Checksums correctly . [ Speed optimisation for later , skip loopback checksums ]
* 4. Doesn ' t have a bogus length
*/
if ( iph - > ihl < 5 | | iph - > version ! = 4 )
2005-08-21 04:25:29 +04:00
goto inhdr_error ;
2005-04-17 02:20:36 +04:00
if ( ! pskb_may_pull ( skb , iph - > ihl * 4 ) )
goto inhdr_error ;
iph = skb - > nh . iph ;
2005-08-21 04:25:52 +04:00
if ( unlikely ( ip_fast_csum ( ( u8 * ) iph , iph - > ihl ) ) )
2005-08-21 04:25:29 +04:00
goto inhdr_error ;
2005-04-17 02:20:36 +04:00
2005-08-21 04:25:29 +04:00
len = ntohs ( iph - > tot_len ) ;
if ( skb - > len < len | | len < ( iph - > ihl * 4 ) )
goto inhdr_error ;
2005-04-17 02:20:36 +04:00
2005-08-21 04:25:29 +04:00
/* Our transport medium may have padded the buffer out. Now we know it
* is IP we can trim to the true length of the frame .
* Note this now means skb - > len holds ntohs ( iph - > tot_len ) .
*/
if ( pskb_trim_rcsum ( skb , len ) ) {
IP_INC_STATS_BH ( IPSTATS_MIB_INDISCARDS ) ;
goto drop ;
2005-04-17 02:20:36 +04:00
}
return NF_HOOK ( PF_INET , NF_IP_PRE_ROUTING , skb , dev , NULL ,
ip_rcv_finish ) ;
inhdr_error :
IP_INC_STATS_BH ( IPSTATS_MIB_INHDRERRORS ) ;
drop :
kfree_skb ( skb ) ;
out :
return NET_RX_DROP ;
}
EXPORT_SYMBOL ( ip_statistics ) ;