2006-06-28 15:26:45 +04:00
/*
2005-04-17 02:20:36 +04:00
Copyright ( C ) 2002 Richard Henderson
Copyright ( C ) 2001 Rusty Russell , 2002 Rusty Russell IBM .
This program is free software ; you can redistribute it and / or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation ; either version 2 of the License , or
( at your option ) any later version .
This program is distributed in the hope that it will be useful ,
but WITHOUT ANY WARRANTY ; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE . See the
GNU General Public License for more details .
You should have received a copy of the GNU General Public License
along with this program ; if not , write to the Free Software
Foundation , Inc . , 59 Temple Place , Suite 330 , Boston , MA 02111 - 1307 USA
*/
# include <linux/module.h>
# include <linux/moduleloader.h>
2009-04-10 22:53:50 +04:00
# include <linux/ftrace_event.h>
2005-04-17 02:20:36 +04:00
# include <linux/init.h>
2007-05-08 11:28:38 +04:00
# include <linux/kallsyms.h>
2008-10-06 13:19:27 +04:00
# include <linux/fs.h>
2007-10-17 10:26:40 +04:00
# include <linux/sysfs.h>
2005-09-13 12:25:16 +04:00
# include <linux/kernel.h>
2005-04-17 02:20:36 +04:00
# include <linux/slab.h>
# include <linux/vmalloc.h>
# include <linux/elf.h>
2008-10-06 13:19:27 +04:00
# include <linux/proc_fs.h>
2005-04-17 02:20:36 +04:00
# include <linux/seq_file.h>
# include <linux/syscalls.h>
# include <linux/fcntl.h>
# include <linux/rcupdate.h>
2006-01-11 23:17:46 +03:00
# include <linux/capability.h>
2005-04-17 02:20:36 +04:00
# include <linux/cpu.h>
# include <linux/moduleparam.h>
# include <linux/errno.h>
# include <linux/err.h>
# include <linux/vermagic.h>
# include <linux/notifier.h>
2006-10-18 09:47:25 +04:00
# include <linux/sched.h>
2005-04-17 02:20:36 +04:00
# include <linux/stop_machine.h>
# include <linux/device.h>
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
# include <linux/string.h>
2006-03-23 14:00:24 +03:00
# include <linux/mutex.h>
2008-08-30 12:09:00 +04:00
# include <linux/rculist.h>
2005-04-17 02:20:36 +04:00
# include <asm/uaccess.h>
# include <asm/cacheflush.h>
2006-06-09 23:53:55 +04:00
# include <linux/license.h>
2008-02-08 15:18:42 +03:00
# include <asm/sections.h>
tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 20:16:16 +04:00
# include <linux/tracepoint.h>
2008-08-14 23:45:09 +04:00
# include <linux/ftrace.h>
2009-01-07 19:45:46 +03:00
# include <linux/async.h>
2009-02-20 10:29:08 +03:00
# include <linux/percpu.h>
2009-06-11 16:23:20 +04:00
# include <linux/kmemleak.h>
2005-04-17 02:20:36 +04:00
2009-08-17 12:56:28 +04:00
# define CREATE_TRACE_POINTS
# include <trace/events/module.h>
EXPORT_TRACEPOINT_SYMBOL ( module_get ) ;
2005-04-17 02:20:36 +04:00
#if 0
# define DEBUGP printk
# else
# define DEBUGP(fmt , a...)
# endif
# ifndef ARCH_SHF_SMALL
# define ARCH_SHF_SMALL 0
# endif
/* If this is set, the section belongs in the init part of the module */
# define INIT_OFFSET_MASK (1UL << (BITS_PER_LONG-1))
2007-07-16 10:41:46 +04:00
/* List of modules, protected by module_mutex or preempt_disable
2008-08-30 12:09:00 +04:00
* ( delete uses stop_machine / add uses RCU list operations ) . */
2008-12-06 03:03:59 +03:00
DEFINE_MUTEX ( module_mutex ) ;
EXPORT_SYMBOL_GPL ( module_mutex ) ;
2005-04-17 02:20:36 +04:00
static LIST_HEAD ( modules ) ;
2009-04-14 11:27:18 +04:00
/* Block module loading/unloading? */
int modules_disabled = 0 ;
2008-01-30 01:13:18 +03:00
/* Waiting for a module to finish initializing? */
static DECLARE_WAIT_QUEUE_HEAD ( module_wq ) ;
[PATCH] Notifier chain update: API changes
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 13:16:30 +04:00
static BLOCKING_NOTIFIER_HEAD ( module_notify_list ) ;
2005-04-17 02:20:36 +04:00
2009-03-31 23:05:31 +04:00
/* Bounds of module allocation, for speeding __module_address */
2008-07-23 04:24:28 +04:00
static unsigned long module_addr_min = - 1UL , module_addr_max = 0 ;
2005-04-17 02:20:36 +04:00
int register_module_notifier ( struct notifier_block * nb )
{
[PATCH] Notifier chain update: API changes
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 13:16:30 +04:00
return blocking_notifier_chain_register ( & module_notify_list , nb ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( register_module_notifier ) ;
int unregister_module_notifier ( struct notifier_block * nb )
{
[PATCH] Notifier chain update: API changes
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 13:16:30 +04:00
return blocking_notifier_chain_unregister ( & module_notify_list , nb ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( unregister_module_notifier ) ;
2007-11-08 19:37:38 +03:00
/* We require a truly strong try_module_get(): 0 means failure due to
ongoing or failed initialization etc . */
2005-04-17 02:20:36 +04:00
static inline int strong_try_module_get ( struct module * mod )
{
if ( mod & & mod - > state = = MODULE_STATE_COMING )
2008-01-30 01:13:18 +03:00
return - EBUSY ;
if ( try_module_get ( mod ) )
2005-04-17 02:20:36 +04:00
return 0 ;
2008-01-30 01:13:18 +03:00
else
return - ENOENT ;
2005-04-17 02:20:36 +04:00
}
2006-10-11 12:21:48 +04:00
static inline void add_taint_module ( struct module * mod , unsigned flag )
{
add_taint ( flag ) ;
2008-10-16 09:01:41 +04:00
mod - > taints | = ( 1U < < flag ) ;
2006-10-11 12:21:48 +04:00
}
2007-05-09 09:26:28 +04:00
/*
* A thread that wants to hold a reference to a module only while it
* is running can call this to safely exit . nfsd and lockd use this .
2005-04-17 02:20:36 +04:00
*/
void __module_put_and_exit ( struct module * mod , long code )
{
module_put ( mod ) ;
do_exit ( code ) ;
}
EXPORT_SYMBOL ( __module_put_and_exit ) ;
2007-10-18 14:06:07 +04:00
2005-04-17 02:20:36 +04:00
/* Find a module section: 0 means not found. */
static unsigned int find_sec ( Elf_Ehdr * hdr ,
Elf_Shdr * sechdrs ,
const char * secstrings ,
const char * name )
{
unsigned int i ;
for ( i = 1 ; i < hdr - > e_shnum ; i + + )
/* Alloc bit cleared means "ignore it." */
if ( ( sechdrs [ i ] . sh_flags & SHF_ALLOC )
& & strcmp ( secstrings + sechdrs [ i ] . sh_name , name ) = = 0 )
return i ;
return 0 ;
}
2008-10-22 19:00:13 +04:00
/* Find a module section, or NULL. */
static void * section_addr ( Elf_Ehdr * hdr , Elf_Shdr * shdrs ,
const char * secstrings , const char * name )
{
/* Section 0 has sh_addr 0. */
return ( void * ) shdrs [ find_sec ( hdr , shdrs , secstrings , name ) ] . sh_addr ;
}
/* Find a module section, or NULL. Fill in number of "objects" in section. */
static void * section_objs ( Elf_Ehdr * hdr ,
Elf_Shdr * sechdrs ,
const char * secstrings ,
const char * name ,
size_t object_size ,
unsigned int * num )
{
unsigned int sec = find_sec ( hdr , sechdrs , secstrings , name ) ;
/* Section 0 has sh_addr 0 and sh_size 0. */
* num = sechdrs [ sec ] . sh_size / object_size ;
return ( void * ) sechdrs [ sec ] . sh_addr ;
}
2005-04-17 02:20:36 +04:00
/* Provided by the linker */
extern const struct kernel_symbol __start___ksymtab [ ] ;
extern const struct kernel_symbol __stop___ksymtab [ ] ;
extern const struct kernel_symbol __start___ksymtab_gpl [ ] ;
extern const struct kernel_symbol __stop___ksymtab_gpl [ ] ;
2006-03-21 00:17:13 +03:00
extern const struct kernel_symbol __start___ksymtab_gpl_future [ ] ;
extern const struct kernel_symbol __stop___ksymtab_gpl_future [ ] ;
2006-06-28 15:26:45 +04:00
extern const struct kernel_symbol __start___ksymtab_gpl_future [ ] ;
extern const struct kernel_symbol __stop___ksymtab_gpl_future [ ] ;
2005-04-17 02:20:36 +04:00
extern const unsigned long __start___kcrctab [ ] ;
extern const unsigned long __start___kcrctab_gpl [ ] ;
2006-03-21 00:17:13 +03:00
extern const unsigned long __start___kcrctab_gpl_future [ ] ;
2008-07-23 04:24:26 +04:00
# ifdef CONFIG_UNUSED_SYMBOLS
extern const struct kernel_symbol __start___ksymtab_unused [ ] ;
extern const struct kernel_symbol __stop___ksymtab_unused [ ] ;
extern const struct kernel_symbol __start___ksymtab_unused_gpl [ ] ;
extern const struct kernel_symbol __stop___ksymtab_unused_gpl [ ] ;
2006-06-28 15:26:45 +04:00
extern const unsigned long __start___kcrctab_unused [ ] ;
extern const unsigned long __start___kcrctab_unused_gpl [ ] ;
2008-07-23 04:24:26 +04:00
# endif
2005-04-17 02:20:36 +04:00
# ifndef CONFIG_MODVERSIONS
# define symversion(base, idx) NULL
# else
2006-03-28 13:56:20 +04:00
# define symversion(base, idx) ((base != NULL) ? ((base) + (idx)) : NULL)
2005-04-17 02:20:36 +04:00
# endif
2008-07-23 04:24:25 +04:00
static bool each_symbol_in_section ( const struct symsearch * arr ,
unsigned int arrsize ,
struct module * owner ,
bool ( * fn ) ( const struct symsearch * syms ,
struct module * owner ,
unsigned int symnum , void * data ) ,
void * data )
2008-05-02 06:14:59 +04:00
{
2008-07-23 04:24:25 +04:00
unsigned int i , j ;
2008-05-02 06:14:59 +04:00
2008-07-23 04:24:25 +04:00
for ( j = 0 ; j < arrsize ; j + + ) {
for ( i = 0 ; i < arr [ j ] . stop - arr [ j ] . start ; i + + )
if ( fn ( & arr [ j ] , owner , i , data ) )
return true ;
2006-06-28 15:26:45 +04:00
}
2008-07-23 04:24:25 +04:00
return false ;
2008-05-02 06:14:59 +04:00
}
2008-07-23 04:24:25 +04:00
/* Returns true as soon as fn returns true, otherwise false. */
2008-12-06 03:03:59 +03:00
bool each_symbol ( bool ( * fn ) ( const struct symsearch * arr , struct module * owner ,
unsigned int symnum , void * data ) , void * data )
2008-05-02 06:14:59 +04:00
{
struct module * mod ;
const struct symsearch arr [ ] = {
{ __start___ksymtab , __stop___ksymtab , __start___kcrctab ,
2008-07-23 04:24:25 +04:00
NOT_GPL_ONLY , false } ,
2008-05-02 06:14:59 +04:00
{ __start___ksymtab_gpl , __stop___ksymtab_gpl ,
2008-07-23 04:24:25 +04:00
__start___kcrctab_gpl ,
GPL_ONLY , false } ,
2008-05-02 06:14:59 +04:00
{ __start___ksymtab_gpl_future , __stop___ksymtab_gpl_future ,
2008-07-23 04:24:25 +04:00
__start___kcrctab_gpl_future ,
WILL_BE_GPL_ONLY , false } ,
2008-07-23 04:24:26 +04:00
# ifdef CONFIG_UNUSED_SYMBOLS
2008-05-02 06:14:59 +04:00
{ __start___ksymtab_unused , __stop___ksymtab_unused ,
2008-07-23 04:24:25 +04:00
__start___kcrctab_unused ,
NOT_GPL_ONLY , true } ,
2008-05-02 06:14:59 +04:00
{ __start___ksymtab_unused_gpl , __stop___ksymtab_unused_gpl ,
2008-07-23 04:24:25 +04:00
__start___kcrctab_unused_gpl ,
GPL_ONLY , true } ,
2008-07-23 04:24:26 +04:00
# endif
2008-05-02 06:14:59 +04:00
} ;
2006-06-28 15:26:45 +04:00
2008-07-23 04:24:25 +04:00
if ( each_symbol_in_section ( arr , ARRAY_SIZE ( arr ) , NULL , fn , data ) )
return true ;
2006-06-28 15:26:45 +04:00
2008-08-30 12:09:00 +04:00
list_for_each_entry_rcu ( mod , & modules , list ) {
2008-05-02 06:14:59 +04:00
struct symsearch arr [ ] = {
{ mod - > syms , mod - > syms + mod - > num_syms , mod - > crcs ,
2008-07-23 04:24:25 +04:00
NOT_GPL_ONLY , false } ,
2008-05-02 06:14:59 +04:00
{ mod - > gpl_syms , mod - > gpl_syms + mod - > num_gpl_syms ,
2008-07-23 04:24:25 +04:00
mod - > gpl_crcs ,
GPL_ONLY , false } ,
2008-05-02 06:14:59 +04:00
{ mod - > gpl_future_syms ,
mod - > gpl_future_syms + mod - > num_gpl_future_syms ,
2008-07-23 04:24:25 +04:00
mod - > gpl_future_crcs ,
WILL_BE_GPL_ONLY , false } ,
2008-07-23 04:24:26 +04:00
# ifdef CONFIG_UNUSED_SYMBOLS
2008-05-02 06:14:59 +04:00
{ mod - > unused_syms ,
mod - > unused_syms + mod - > num_unused_syms ,
2008-07-23 04:24:25 +04:00
mod - > unused_crcs ,
NOT_GPL_ONLY , true } ,
2008-05-02 06:14:59 +04:00
{ mod - > unused_gpl_syms ,
mod - > unused_gpl_syms + mod - > num_unused_gpl_syms ,
2008-07-23 04:24:25 +04:00
mod - > unused_gpl_crcs ,
GPL_ONLY , true } ,
2008-07-23 04:24:26 +04:00
# endif
2008-05-02 06:14:59 +04:00
} ;
2008-07-23 04:24:25 +04:00
if ( each_symbol_in_section ( arr , ARRAY_SIZE ( arr ) , mod , fn , data ) )
return true ;
}
return false ;
}
2008-12-06 03:03:59 +03:00
EXPORT_SYMBOL_GPL ( each_symbol ) ;
2008-07-23 04:24:25 +04:00
struct find_symbol_arg {
/* Input */
const char * name ;
bool gplok ;
bool warn ;
/* Output */
struct module * owner ;
const unsigned long * crc ;
2008-12-06 03:03:56 +03:00
const struct kernel_symbol * sym ;
2008-07-23 04:24:25 +04:00
} ;
static bool find_symbol_in_section ( const struct symsearch * syms ,
struct module * owner ,
unsigned int symnum , void * data )
{
struct find_symbol_arg * fsa = data ;
if ( strcmp ( syms - > start [ symnum ] . name , fsa - > name ) ! = 0 )
return false ;
if ( ! fsa - > gplok ) {
if ( syms - > licence = = GPL_ONLY )
return false ;
if ( syms - > licence = = WILL_BE_GPL_ONLY & & fsa - > warn ) {
printk ( KERN_WARNING " Symbol %s is being used "
" by a non-GPL module, which will not "
" be allowed in the future \n " , fsa - > name ) ;
printk ( KERN_WARNING " Please see the file "
" Documentation/feature-removal-schedule.txt "
" in the kernel source tree for more details. \n " ) ;
2006-03-21 00:17:13 +03:00
}
2005-04-17 02:20:36 +04:00
}
2008-05-02 06:14:59 +04:00
2008-07-23 04:24:26 +04:00
# ifdef CONFIG_UNUSED_SYMBOLS
2008-07-23 04:24:25 +04:00
if ( syms - > unused & & fsa - > warn ) {
printk ( KERN_WARNING " Symbol %s is marked as UNUSED, "
" however this module is using it. \n " , fsa - > name ) ;
printk ( KERN_WARNING
" This symbol will go away in the future. \n " ) ;
printk ( KERN_WARNING
" Please evalute if this is the right api to use and if "
" it really is, submit a report the linux kernel "
" mailinglist together with submitting your code for "
" inclusion. \n " ) ;
}
2008-07-23 04:24:26 +04:00
# endif
2008-07-23 04:24:25 +04:00
fsa - > owner = owner ;
fsa - > crc = symversion ( syms - > crcs , symnum ) ;
2008-12-06 03:03:56 +03:00
fsa - > sym = & syms - > start [ symnum ] ;
2008-07-23 04:24:25 +04:00
return true ;
}
2008-12-06 03:03:56 +03:00
/* Find a symbol and return it, along with, (optional) crc and
* ( optional ) module which owns it */
2008-12-06 03:03:59 +03:00
const struct kernel_symbol * find_symbol ( const char * name ,
struct module * * owner ,
const unsigned long * * crc ,
bool gplok ,
bool warn )
2008-07-23 04:24:25 +04:00
{
struct find_symbol_arg fsa ;
fsa . name = name ;
fsa . gplok = gplok ;
fsa . warn = warn ;
if ( each_symbol ( find_symbol_in_section , & fsa ) ) {
if ( owner )
* owner = fsa . owner ;
if ( crc )
* crc = fsa . crc ;
2008-12-06 03:03:56 +03:00
return fsa . sym ;
2008-07-23 04:24:25 +04:00
}
2005-04-17 02:20:36 +04:00
DEBUGP ( " Failed to find symbol %s \n " , name ) ;
2008-12-06 03:03:56 +03:00
return NULL ;
2005-04-17 02:20:36 +04:00
}
2008-12-06 03:03:59 +03:00
EXPORT_SYMBOL_GPL ( find_symbol ) ;
2005-04-17 02:20:36 +04:00
/* Search for module by name: must hold module_mutex. */
2008-12-06 03:03:59 +03:00
struct module * find_module ( const char * name )
2005-04-17 02:20:36 +04:00
{
struct module * mod ;
list_for_each_entry ( mod , & modules , list ) {
if ( strcmp ( mod - > name , name ) = = 0 )
return mod ;
}
return NULL ;
}
2008-12-06 03:03:59 +03:00
EXPORT_SYMBOL_GPL ( find_module ) ;
2005-04-17 02:20:36 +04:00
# ifdef CONFIG_SMP
2009-02-20 10:29:08 +03:00
# ifdef CONFIG_HAVE_DYNAMIC_PER_CPU_AREA
static void * percpu_modalloc ( unsigned long size , unsigned long align ,
const char * name )
{
void * ptr ;
if ( align > PAGE_SIZE ) {
printk ( KERN_WARNING " %s: per-cpu alignment %li > %li \n " ,
name , align , PAGE_SIZE ) ;
align = PAGE_SIZE ;
}
2009-03-06 08:33:59 +03:00
ptr = __alloc_reserved_percpu ( size , align ) ;
2009-02-20 10:29:08 +03:00
if ( ! ptr )
printk ( KERN_WARNING
" Could not allocate %lu bytes percpu data \n " , size ) ;
return ptr ;
}
static void percpu_modfree ( void * freeme )
{
free_percpu ( freeme ) ;
}
# else /* ... !CONFIG_HAVE_DYNAMIC_PER_CPU_AREA */
2005-04-17 02:20:36 +04:00
/* Number of blocks used and allocated. */
static unsigned int pcpu_num_used , pcpu_num_allocated ;
/* Size of each block. -ve means used. */
static int * pcpu_size ;
static int split_block ( unsigned int i , unsigned short size )
{
/* Reallocation required? */
if ( pcpu_num_used + 1 > pcpu_num_allocated ) {
2007-05-08 11:24:58 +04:00
int * new ;
new = krealloc ( pcpu_size , sizeof ( new [ 0 ] ) * pcpu_num_allocated * 2 ,
GFP_KERNEL ) ;
2005-04-17 02:20:36 +04:00
if ( ! new )
return 0 ;
pcpu_num_allocated * = 2 ;
pcpu_size = new ;
}
/* Insert a new subblock */
memmove ( & pcpu_size [ i + 1 ] , & pcpu_size [ i ] ,
sizeof ( pcpu_size [ 0 ] ) * ( pcpu_num_used - i ) ) ;
pcpu_num_used + + ;
pcpu_size [ i + 1 ] - = size ;
pcpu_size [ i ] = size ;
return 1 ;
}
static inline unsigned int block_size ( int val )
{
if ( val < 0 )
return - val ;
return val ;
}
2005-08-02 08:11:47 +04:00
static void * percpu_modalloc ( unsigned long size , unsigned long align ,
const char * name )
2005-04-17 02:20:36 +04:00
{
unsigned long extra ;
unsigned int i ;
void * ptr ;
2009-06-11 16:23:20 +04:00
int cpu ;
2005-04-17 02:20:36 +04:00
2007-05-02 21:27:12 +04:00
if ( align > PAGE_SIZE ) {
printk ( KERN_WARNING " %s: per-cpu alignment %li > %li \n " ,
name , align , PAGE_SIZE ) ;
align = PAGE_SIZE ;
2005-08-02 08:11:47 +04:00
}
2005-04-17 02:20:36 +04:00
ptr = __per_cpu_start ;
for ( i = 0 ; i < pcpu_num_used ; ptr + = block_size ( pcpu_size [ i ] ) , i + + ) {
/* Extra for alignment requirement. */
extra = ALIGN ( ( unsigned long ) ptr , align ) - ( unsigned long ) ptr ;
BUG_ON ( i = = 0 & & extra ! = 0 ) ;
if ( pcpu_size [ i ] < 0 | | pcpu_size [ i ] < extra + size )
continue ;
/* Transfer extra to previous block. */
if ( pcpu_size [ i - 1 ] < 0 )
pcpu_size [ i - 1 ] - = extra ;
else
pcpu_size [ i - 1 ] + = extra ;
pcpu_size [ i ] - = extra ;
ptr + = extra ;
/* Split block if warranted */
if ( pcpu_size [ i ] - size > sizeof ( unsigned long ) )
if ( ! split_block ( i , size ) )
return NULL ;
2009-06-11 16:23:20 +04:00
/* add the per-cpu scanning areas */
for_each_possible_cpu ( cpu )
kmemleak_alloc ( ptr + per_cpu_offset ( cpu ) , size , 0 ,
GFP_KERNEL ) ;
2005-04-17 02:20:36 +04:00
/* Mark allocated */
pcpu_size [ i ] = - pcpu_size [ i ] ;
return ptr ;
}
printk ( KERN_WARNING " Could not allocate %lu bytes percpu data \n " ,
size ) ;
return NULL ;
}
static void percpu_modfree ( void * freeme )
{
unsigned int i ;
void * ptr = __per_cpu_start + block_size ( pcpu_size [ 0 ] ) ;
2009-06-11 16:23:20 +04:00
int cpu ;
2005-04-17 02:20:36 +04:00
/* First entry is core kernel percpu data. */
for ( i = 1 ; i < pcpu_num_used ; ptr + = block_size ( pcpu_size [ i ] ) , i + + ) {
if ( ptr = = freeme ) {
pcpu_size [ i ] = - pcpu_size [ i ] ;
goto free ;
}
}
BUG ( ) ;
free :
2009-06-11 16:23:20 +04:00
/* remove the per-cpu scanning areas */
for_each_possible_cpu ( cpu )
kmemleak_free ( freeme + per_cpu_offset ( cpu ) ) ;
2005-04-17 02:20:36 +04:00
/* Merge with previous? */
if ( pcpu_size [ i - 1 ] > = 0 ) {
pcpu_size [ i - 1 ] + = pcpu_size [ i ] ;
pcpu_num_used - - ;
memmove ( & pcpu_size [ i ] , & pcpu_size [ i + 1 ] ,
( pcpu_num_used - i ) * sizeof ( pcpu_size [ 0 ] ) ) ;
i - - ;
}
/* Merge with next? */
if ( i + 1 < pcpu_num_used & & pcpu_size [ i + 1 ] > = 0 ) {
pcpu_size [ i ] + = pcpu_size [ i + 1 ] ;
pcpu_num_used - - ;
memmove ( & pcpu_size [ i + 1 ] , & pcpu_size [ i + 2 ] ,
( pcpu_num_used - ( i + 1 ) ) * sizeof ( pcpu_size [ 0 ] ) ) ;
}
}
static int percpu_modinit ( void )
{
pcpu_num_used = 2 ;
pcpu_num_allocated = 2 ;
pcpu_size = kmalloc ( sizeof ( pcpu_size [ 0 ] ) * pcpu_num_allocated ,
GFP_KERNEL ) ;
/* Static in-kernel percpu data (used). */
2007-05-02 21:27:11 +04:00
pcpu_size [ 0 ] = - ( __per_cpu_end - __per_cpu_start ) ;
2005-04-17 02:20:36 +04:00
/* Free room. */
pcpu_size [ 1 ] = PERCPU_ENOUGH_ROOM + pcpu_size [ 0 ] ;
if ( pcpu_size [ 1 ] < 0 ) {
printk ( KERN_ERR " No per-cpu room for modules. \n " ) ;
pcpu_num_used = 1 ;
}
return 0 ;
2007-10-18 14:06:07 +04:00
}
2005-04-17 02:20:36 +04:00
__initcall ( percpu_modinit ) ;
2009-02-20 10:29:07 +03:00
2009-02-20 10:29:08 +03:00
# endif /* CONFIG_HAVE_DYNAMIC_PER_CPU_AREA */
2009-02-20 10:29:07 +03:00
static unsigned int find_pcpusec ( Elf_Ehdr * hdr ,
Elf_Shdr * sechdrs ,
const char * secstrings )
{
return find_sec ( hdr , sechdrs , secstrings , " .data.percpu " ) ;
}
static void percpu_modcopy ( void * pcpudest , const void * from , unsigned long size )
{
int cpu ;
for_each_possible_cpu ( cpu )
memcpy ( pcpudest + per_cpu_offset ( cpu ) , from , size ) ;
}
2005-04-17 02:20:36 +04:00
# else /* ... !CONFIG_SMP */
2009-02-20 10:29:07 +03:00
2005-08-02 08:11:47 +04:00
static inline void * percpu_modalloc ( unsigned long size , unsigned long align ,
const char * name )
2005-04-17 02:20:36 +04:00
{
return NULL ;
}
static inline void percpu_modfree ( void * pcpuptr )
{
BUG ( ) ;
}
static inline unsigned int find_pcpusec ( Elf_Ehdr * hdr ,
Elf_Shdr * sechdrs ,
const char * secstrings )
{
return 0 ;
}
static inline void percpu_modcopy ( void * pcpudst , const void * src ,
unsigned long size )
{
/* pcpusec should be 0, and size of that section should be 0. */
BUG_ON ( size ! = 0 ) ;
}
2009-02-20 10:29:07 +03:00
2005-04-17 02:20:36 +04:00
# endif /* CONFIG_SMP */
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
# define MODINFO_ATTR(field) \
static void setup_modinfo_ # # field ( struct module * mod , const char * s ) \
{ \
mod - > field = kstrdup ( s , GFP_KERNEL ) ; \
} \
static ssize_t show_modinfo_ # # field ( struct module_attribute * mattr , \
struct module * mod , char * buffer ) \
{ \
return sprintf ( buffer , " %s \n " , mod - > field ) ; \
} \
static int modinfo_ # # field # # _exists ( struct module * mod ) \
{ \
return mod - > field ! = NULL ; \
} \
static void free_modinfo_ # # field ( struct module * mod ) \
{ \
2007-10-18 14:06:07 +04:00
kfree ( mod - > field ) ; \
mod - > field = NULL ; \
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
} \
static struct module_attribute modinfo_ # # field = { \
2007-06-13 22:45:17 +04:00
. attr = { . name = __stringify ( field ) , . mode = 0444 } , \
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
. show = show_modinfo_ # # field , \
. setup = setup_modinfo_ # # field , \
. test = modinfo_ # # field # # _exists , \
. free = free_modinfo_ # # field , \
} ;
MODINFO_ATTR ( version ) ;
MODINFO_ATTR ( srcversion ) ;
2008-01-25 23:08:33 +03:00
static char last_unloaded_module [ MODULE_NAME_LEN + 1 ] ;
2006-02-17 00:50:23 +03:00
# ifdef CONFIG_MODULE_UNLOAD
2005-04-17 02:20:36 +04:00
/* Init the unload section of the module. */
static void module_unload_init ( struct module * mod )
{
2009-02-03 06:01:36 +03:00
int cpu ;
2005-04-17 02:20:36 +04:00
INIT_LIST_HEAD ( & mod - > modules_which_use_me ) ;
2009-02-03 06:01:36 +03:00
for_each_possible_cpu ( cpu )
local_set ( __module_ref_addr ( mod , cpu ) , 0 ) ;
2005-04-17 02:20:36 +04:00
/* Hold reference count during initialization. */
2009-02-03 06:01:36 +03:00
local_set ( __module_ref_addr ( mod , raw_smp_processor_id ( ) ) , 1 ) ;
2005-04-17 02:20:36 +04:00
/* Backwards compatibility macros put refcount during init. */
mod - > waiter = current ;
}
/* modules using other modules */
struct module_use
{
struct list_head list ;
struct module * module_which_uses ;
} ;
/* Does a already use b? */
static int already_uses ( struct module * a , struct module * b )
{
struct module_use * use ;
list_for_each_entry ( use , & b - > modules_which_use_me , list ) {
if ( use - > module_which_uses = = a ) {
DEBUGP ( " %s uses %s! \n " , a - > name , b - > name ) ;
return 1 ;
}
}
DEBUGP ( " %s does not use %s! \n " , a - > name , b - > name ) ;
return 0 ;
}
/* Module a uses b */
2008-12-06 03:03:59 +03:00
int use_module ( struct module * a , struct module * b )
2005-04-17 02:20:36 +04:00
{
struct module_use * use ;
2008-01-30 01:13:18 +03:00
int no_warn , err ;
2007-01-18 15:26:15 +03:00
2005-04-17 02:20:36 +04:00
if ( b = = NULL | | already_uses ( a , b ) ) return 1 ;
2008-01-30 01:13:18 +03:00
/* If we're interrupted or time out, we fail. */
if ( wait_event_interruptible_timeout (
module_wq , ( err = strong_try_module_get ( b ) ) ! = - EBUSY ,
30 * HZ ) < = 0 ) {
printk ( " %s: gave up waiting for init of module %s. \n " ,
a - > name , b - > name ) ;
return 0 ;
}
/* If strong_try_module_get() returned a different error, we fail. */
if ( err )
2005-04-17 02:20:36 +04:00
return 0 ;
DEBUGP ( " Allocating new usage for %s. \n " , a - > name ) ;
use = kmalloc ( sizeof ( * use ) , GFP_ATOMIC ) ;
if ( ! use ) {
printk ( " %s: out of memory loading \n " , a - > name ) ;
module_put ( b ) ;
return 0 ;
}
use - > module_which_uses = a ;
list_add ( & use - > list , & b - > modules_which_use_me ) ;
2007-01-18 15:26:15 +03:00
no_warn = sysfs_create_link ( b - > holders_dir , & a - > mkobj . kobj , a - > name ) ;
2005-04-17 02:20:36 +04:00
return 1 ;
}
2008-12-06 03:03:59 +03:00
EXPORT_SYMBOL_GPL ( use_module ) ;
2005-04-17 02:20:36 +04:00
/* Clear the unload stuff of the module. */
static void module_unload_free ( struct module * mod )
{
struct module * i ;
list_for_each_entry ( i , & modules , list ) {
struct module_use * use ;
list_for_each_entry ( use , & i - > modules_which_use_me , list ) {
if ( use - > module_which_uses = = mod ) {
DEBUGP ( " %s unusing %s \n " , mod - > name , i - > name ) ;
module_put ( i ) ;
list_del ( & use - > list ) ;
kfree ( use ) ;
2007-01-18 15:26:15 +03:00
sysfs_remove_link ( i - > holders_dir , mod - > name ) ;
2005-04-17 02:20:36 +04:00
/* There can be at most one match. */
break ;
}
}
}
}
# ifdef CONFIG_MODULE_FORCE_UNLOAD
2006-01-08 12:04:29 +03:00
static inline int try_force_unload ( unsigned int flags )
2005-04-17 02:20:36 +04:00
{
int ret = ( flags & O_TRUNC ) ;
if ( ret )
2006-01-08 12:04:29 +03:00
add_taint ( TAINT_FORCED_RMMOD ) ;
2005-04-17 02:20:36 +04:00
return ret ;
}
# else
2006-01-08 12:04:29 +03:00
static inline int try_force_unload ( unsigned int flags )
2005-04-17 02:20:36 +04:00
{
return 0 ;
}
# endif /* CONFIG_MODULE_FORCE_UNLOAD */
struct stopref
{
struct module * mod ;
int flags ;
int * forced ;
} ;
/* Whole machine is stopped with interrupts off when this runs. */
static int __try_stop_module ( void * _sref )
{
struct stopref * sref = _sref ;
2008-07-23 04:24:25 +04:00
/* If it's not unused, quit unless we're forcing. */
if ( module_refcount ( sref - > mod ) ! = 0 ) {
2006-01-08 12:04:29 +03:00
if ( ! ( * sref - > forced = try_force_unload ( sref - > flags ) ) )
2005-04-17 02:20:36 +04:00
return - EWOULDBLOCK ;
}
/* Mark it as dying. */
sref - > mod - > state = MODULE_STATE_GOING ;
return 0 ;
}
static int try_stop_module ( struct module * mod , int flags , int * forced )
{
2008-07-23 04:24:25 +04:00
if ( flags & O_NONBLOCK ) {
struct stopref sref = { mod , flags , forced } ;
2005-04-17 02:20:36 +04:00
2008-07-28 21:16:30 +04:00
return stop_machine ( __try_stop_module , & sref , NULL ) ;
2008-07-23 04:24:25 +04:00
} else {
/* We don't need to stop the machine for this. */
mod - > state = MODULE_STATE_GOING ;
synchronize_sched ( ) ;
return 0 ;
}
2005-04-17 02:20:36 +04:00
}
unsigned int module_refcount ( struct module * mod )
{
2009-02-03 06:01:36 +03:00
unsigned int total = 0 ;
int cpu ;
2005-04-17 02:20:36 +04:00
2009-02-03 06:01:36 +03:00
for_each_possible_cpu ( cpu )
total + = local_read ( __module_ref_addr ( mod , cpu ) ) ;
2005-04-17 02:20:36 +04:00
return total ;
}
EXPORT_SYMBOL ( module_refcount ) ;
/* This exists whether we can unload or not */
static void free_module ( struct module * mod ) ;
static void wait_for_zero_refcount ( struct module * mod )
{
2008-02-26 18:47:18 +03:00
/* Since we might sleep for some time, release the mutex first */
2006-03-23 14:00:46 +03:00
mutex_unlock ( & module_mutex ) ;
2005-04-17 02:20:36 +04:00
for ( ; ; ) {
DEBUGP ( " Looking at refcount... \n " ) ;
set_current_state ( TASK_UNINTERRUPTIBLE ) ;
if ( module_refcount ( mod ) = = 0 )
break ;
schedule ( ) ;
}
current - > state = TASK_RUNNING ;
2006-03-23 14:00:46 +03:00
mutex_lock ( & module_mutex ) ;
2005-04-17 02:20:36 +04:00
}
2009-01-14 16:14:10 +03:00
SYSCALL_DEFINE2 ( delete_module , const char __user * , name_user ,
unsigned int , flags )
2005-04-17 02:20:36 +04:00
{
struct module * mod ;
2007-02-24 01:54:57 +03:00
char name [ MODULE_NAME_LEN ] ;
2005-04-17 02:20:36 +04:00
int ret , forced = 0 ;
2009-04-03 02:49:29 +04:00
if ( ! capable ( CAP_SYS_MODULE ) | | modules_disabled )
2007-02-24 01:54:57 +03:00
return - EPERM ;
if ( strncpy_from_user ( name , name_user , MODULE_NAME_LEN - 1 ) < 0 )
return - EFAULT ;
name [ MODULE_NAME_LEN - 1 ] = ' \0 ' ;
2008-12-22 14:36:31 +03:00
/* Create stop_machine threads since free_module relies on
* a non - failing stop_machine call . */
ret = stop_machine_create ( ) ;
if ( ret )
return ret ;
if ( mutex_lock_interruptible ( & module_mutex ) ! = 0 ) {
ret = - EINTR ;
goto out_stop ;
}
2005-04-17 02:20:36 +04:00
mod = find_module ( name ) ;
if ( ! mod ) {
ret = - ENOENT ;
goto out ;
}
if ( ! list_empty ( & mod - > modules_which_use_me ) ) {
/* Other modules depend on us: get rid of them first. */
ret = - EWOULDBLOCK ;
goto out ;
}
/* Doing init or already dying? */
if ( mod - > state ! = MODULE_STATE_LIVE ) {
/* FIXME: if (force), slam module count and wake up
waiter - - RR */
DEBUGP ( " %s already dying \n " , mod - > name ) ;
ret = - EBUSY ;
goto out ;
}
/* If it has an init func, it must have an exit func to unload */
2007-10-17 10:26:27 +04:00
if ( mod - > init & & ! mod - > exit ) {
2006-01-08 12:04:29 +03:00
forced = try_force_unload ( flags ) ;
2005-04-17 02:20:36 +04:00
if ( ! forced ) {
/* This module can't be removed */
ret = - EBUSY ;
goto out ;
}
}
/* Set this up before setting mod->state */
mod - > waiter = current ;
/* Stop the machine so refcounts can't move and disable module. */
ret = try_stop_module ( mod , flags , & forced ) ;
if ( ret ! = 0 )
goto out ;
/* Never wait if forced. */
if ( ! forced & & module_refcount ( mod ) ! = 0 )
wait_for_zero_refcount ( mod ) ;
2008-04-21 16:34:31 +04:00
mutex_unlock ( & module_mutex ) ;
2005-04-17 02:20:36 +04:00
/* Final destruction now noone is using it. */
2008-04-21 16:34:31 +04:00
if ( mod - > exit ! = NULL )
2005-04-17 02:20:36 +04:00
mod - > exit ( ) ;
2008-04-21 16:34:31 +04:00
blocking_notifier_call_chain ( & module_notify_list ,
MODULE_STATE_GOING , mod ) ;
2009-01-07 19:45:46 +03:00
async_synchronize_full ( ) ;
2008-04-21 16:34:31 +04:00
mutex_lock ( & module_mutex ) ;
2008-01-25 23:08:33 +03:00
/* Store the name of the last unloaded module for diagnostic purposes */
2008-01-30 01:13:20 +03:00
strlcpy ( last_unloaded_module , mod - > name , sizeof ( last_unloaded_module ) ) ;
2009-02-05 19:51:38 +03:00
ddebug_remove_module ( mod - > name ) ;
2005-04-17 02:20:36 +04:00
free_module ( mod ) ;
out :
2006-03-23 14:00:46 +03:00
mutex_unlock ( & module_mutex ) ;
2008-12-22 14:36:31 +03:00
out_stop :
stop_machine_destroy ( ) ;
2005-04-17 02:20:36 +04:00
return ret ;
}
2008-12-08 09:26:29 +03:00
static inline void print_unload_info ( struct seq_file * m , struct module * mod )
2005-04-17 02:20:36 +04:00
{
struct module_use * use ;
int printed_something = 0 ;
seq_printf ( m , " %u " , module_refcount ( mod ) ) ;
/* Always include a trailing , so userspace can differentiate
between this and the old multi - field proc format . */
list_for_each_entry ( use , & mod - > modules_which_use_me , list ) {
printed_something = 1 ;
seq_printf ( m , " %s, " , use - > module_which_uses - > name ) ;
}
if ( mod - > init ! = NULL & & mod - > exit = = NULL ) {
printed_something = 1 ;
seq_printf ( m , " [permanent], " ) ;
}
if ( ! printed_something )
seq_printf ( m , " - " ) ;
}
void __symbol_put ( const char * symbol )
{
struct module * owner ;
2007-07-16 10:41:46 +04:00
preempt_disable ( ) ;
2008-12-06 03:03:56 +03:00
if ( ! find_symbol ( symbol , & owner , NULL , true , false ) )
2005-04-17 02:20:36 +04:00
BUG ( ) ;
module_put ( owner ) ;
2007-07-16 10:41:46 +04:00
preempt_enable ( ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL ( __symbol_put ) ;
2009-08-26 16:32:54 +04:00
/* Note this assumes addr is a function, which it currently always is. */
2005-04-17 02:20:36 +04:00
void symbol_put_addr ( void * addr )
{
2006-05-15 20:44:06 +04:00
struct module * modaddr ;
2009-08-26 16:32:54 +04:00
unsigned long a = ( unsigned long ) dereference_function_descriptor ( addr ) ;
2005-04-17 02:20:36 +04:00
2009-08-26 16:32:54 +04:00
if ( core_kernel_text ( a ) )
2006-05-15 20:44:06 +04:00
return ;
2005-04-17 02:20:36 +04:00
2009-03-31 23:05:31 +04:00
/* module_text_address is safe here: we're supposed to have reference
* to module from symbol_get , so it can ' t go away . */
2009-08-26 16:32:54 +04:00
modaddr = __module_text_address ( a ) ;
2009-03-31 23:05:31 +04:00
BUG_ON ( ! modaddr ) ;
2006-05-15 20:44:06 +04:00
module_put ( modaddr ) ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL_GPL ( symbol_put_addr ) ;
static ssize_t show_refcnt ( struct module_attribute * mattr ,
struct module * mod , char * buffer )
{
2007-08-06 23:47:45 +04:00
return sprintf ( buffer , " %u \n " , module_refcount ( mod ) ) ;
2005-04-17 02:20:36 +04:00
}
static struct module_attribute refcnt = {
2007-06-13 22:45:17 +04:00
. attr = { . name = " refcnt " , . mode = 0444 } ,
2005-04-17 02:20:36 +04:00
. show = show_refcnt ,
} ;
2006-10-18 09:47:25 +04:00
void module_put ( struct module * module )
{
if ( module ) {
unsigned int cpu = get_cpu ( ) ;
2009-02-03 06:01:36 +03:00
local_dec ( __module_ref_addr ( module , cpu ) ) ;
2009-08-17 12:56:28 +04:00
trace_module_put ( module , _RET_IP_ ,
local_read ( __module_ref_addr ( module , cpu ) ) ) ;
2006-10-18 09:47:25 +04:00
/* Maybe they're waiting for us to drop reference? */
if ( unlikely ( ! module_is_live ( module ) ) )
wake_up_process ( module - > waiter ) ;
put_cpu ( ) ;
}
}
EXPORT_SYMBOL ( module_put ) ;
2005-04-17 02:20:36 +04:00
# else /* !CONFIG_MODULE_UNLOAD */
2008-12-08 09:26:29 +03:00
static inline void print_unload_info ( struct seq_file * m , struct module * mod )
2005-04-17 02:20:36 +04:00
{
/* We don't know the usage count, or what modules are using. */
seq_printf ( m , " - - " ) ;
}
static inline void module_unload_free ( struct module * mod )
{
}
2008-12-06 03:03:59 +03:00
int use_module ( struct module * a , struct module * b )
2005-04-17 02:20:36 +04:00
{
2008-01-30 01:13:18 +03:00
return strong_try_module_get ( b ) = = 0 ;
2005-04-17 02:20:36 +04:00
}
2008-12-06 03:03:59 +03:00
EXPORT_SYMBOL_GPL ( use_module ) ;
2005-04-17 02:20:36 +04:00
static inline void module_unload_init ( struct module * mod )
{
}
# endif /* CONFIG_MODULE_UNLOAD */
2006-11-24 14:15:25 +03:00
static ssize_t show_initstate ( struct module_attribute * mattr ,
struct module * mod , char * buffer )
{
const char * state = " unknown " ;
switch ( mod - > state ) {
case MODULE_STATE_LIVE :
state = " live " ;
break ;
case MODULE_STATE_COMING :
state = " coming " ;
break ;
case MODULE_STATE_GOING :
state = " going " ;
break ;
}
return sprintf ( buffer , " %s \n " , state ) ;
}
static struct module_attribute initstate = {
2007-06-13 22:45:17 +04:00
. attr = { . name = " initstate " , . mode = 0444 } ,
2006-11-24 14:15:25 +03:00
. show = show_initstate ,
} ;
2006-02-17 00:50:23 +03:00
static struct module_attribute * modinfo_attrs [ ] = {
& modinfo_version ,
& modinfo_srcversion ,
2006-11-24 14:15:25 +03:00
& initstate ,
2006-02-17 00:50:23 +03:00
# ifdef CONFIG_MODULE_UNLOAD
& refcnt ,
# endif
NULL ,
} ;
2005-04-17 02:20:36 +04:00
static const char vermagic [ ] = VERMAGIC_STRING ;
2009-03-31 23:05:33 +04:00
static int try_to_force_load ( struct module * mod , const char * reason )
2008-05-05 04:04:16 +04:00
{
# ifdef CONFIG_MODULE_FORCE_LOAD
2008-10-16 09:01:41 +04:00
if ( ! test_taint ( TAINT_FORCED_MODULE ) )
2009-03-31 23:05:33 +04:00
printk ( KERN_WARNING " %s: %s: kernel tainted. \n " ,
mod - > name , reason ) ;
2008-05-05 04:04:16 +04:00
add_taint_module ( mod , TAINT_FORCED_MODULE ) ;
return 0 ;
# else
return - ENOEXEC ;
# endif
}
2005-04-17 02:20:36 +04:00
# ifdef CONFIG_MODVERSIONS
static int check_version ( Elf_Shdr * sechdrs ,
unsigned int versindex ,
const char * symname ,
struct module * mod ,
const unsigned long * crc )
{
unsigned int i , num_versions ;
struct modversion_info * versions ;
/* Exporting module didn't supply crcs? OK, we're already tainted. */
if ( ! crc )
return 1 ;
2008-05-09 10:24:21 +04:00
/* No versions at all? modprobe --force does this. */
if ( versindex = = 0 )
return try_to_force_load ( mod , symname ) = = 0 ;
2005-04-17 02:20:36 +04:00
versions = ( void * ) sechdrs [ versindex ] . sh_addr ;
num_versions = sechdrs [ versindex ] . sh_size
/ sizeof ( struct modversion_info ) ;
for ( i = 0 ; i < num_versions ; i + + ) {
if ( strcmp ( versions [ i ] . name , symname ) ! = 0 )
continue ;
if ( versions [ i ] . crc = = * crc )
return 1 ;
DEBUGP ( " Found checksum %lX vs module %lX \n " ,
* crc , versions [ i ] . crc ) ;
2008-05-05 04:04:16 +04:00
goto bad_version ;
2005-04-17 02:20:36 +04:00
}
2008-05-05 04:04:16 +04:00
2008-05-09 10:24:21 +04:00
printk ( KERN_WARNING " %s: no symbol version for %s \n " ,
mod - > name , symname ) ;
return 0 ;
2008-05-05 04:04:16 +04:00
bad_version :
printk ( " %s: disagrees about version of symbol %s \n " ,
mod - > name , symname ) ;
return 0 ;
2005-04-17 02:20:36 +04:00
}
static inline int check_modstruct_version ( Elf_Shdr * sechdrs ,
unsigned int versindex ,
struct module * mod )
{
const unsigned long * crc ;
2009-07-23 18:12:08 +04:00
if ( ! find_symbol ( MODULE_SYMBOL_PREFIX " module_layout " , NULL ,
& crc , true , false ) )
2005-04-17 02:20:36 +04:00
BUG ( ) ;
2009-03-31 23:05:34 +04:00
return check_version ( sechdrs , versindex , " module_layout " , mod , crc ) ;
2005-04-17 02:20:36 +04:00
}
2008-05-09 10:25:28 +04:00
/* First part is kernel version, which we ignore if module has crcs. */
static inline int same_magic ( const char * amagic , const char * bmagic ,
bool has_crcs )
2005-04-17 02:20:36 +04:00
{
2008-05-09 10:25:28 +04:00
if ( has_crcs ) {
amagic + = strcspn ( amagic , " " ) ;
bmagic + = strcspn ( bmagic , " " ) ;
}
2005-04-17 02:20:36 +04:00
return strcmp ( amagic , bmagic ) = = 0 ;
}
# else
static inline int check_version ( Elf_Shdr * sechdrs ,
unsigned int versindex ,
const char * symname ,
struct module * mod ,
const unsigned long * crc )
{
return 1 ;
}
static inline int check_modstruct_version ( Elf_Shdr * sechdrs ,
unsigned int versindex ,
struct module * mod )
{
return 1 ;
}
2008-05-09 10:25:28 +04:00
static inline int same_magic ( const char * amagic , const char * bmagic ,
bool has_crcs )
2005-04-17 02:20:36 +04:00
{
return strcmp ( amagic , bmagic ) = = 0 ;
}
# endif /* CONFIG_MODVERSIONS */
/* Resolve a symbol for this module. I.e. if we find one, record usage.
Must be holding module_mutex . */
2008-12-06 03:03:56 +03:00
static const struct kernel_symbol * resolve_symbol ( Elf_Shdr * sechdrs ,
unsigned int versindex ,
const char * name ,
struct module * mod )
2005-04-17 02:20:36 +04:00
{
struct module * owner ;
2008-12-06 03:03:56 +03:00
const struct kernel_symbol * sym ;
2005-04-17 02:20:36 +04:00
const unsigned long * crc ;
2008-12-06 03:03:56 +03:00
sym = find_symbol ( name , & owner , & crc ,
2008-10-16 09:01:41 +04:00
! ( mod - > taints & ( 1 < < TAINT_PROPRIETARY_MODULE ) ) , true ) ;
2008-12-06 03:03:56 +03:00
/* use_module can fail due to OOM,
or module initialization or unloading */
if ( sym ) {
2005-04-17 02:20:36 +04:00
if ( ! check_version ( sechdrs , versindex , name , mod , crc ) | |
! use_module ( mod , owner ) )
2008-12-06 03:03:56 +03:00
sym = NULL ;
2005-04-17 02:20:36 +04:00
}
2008-12-06 03:03:56 +03:00
return sym ;
2005-04-17 02:20:36 +04:00
}
/*
* / sys / module / foo / sections stuff
* J . Corbet < corbet @ lwn . net >
*/
2008-02-21 02:33:20 +03:00
# if defined(CONFIG_KALLSYMS) && defined(CONFIG_SYSFS)
2008-03-13 12:03:44 +03:00
struct module_sect_attr
{
struct module_attribute mattr ;
char * name ;
unsigned long address ;
} ;
struct module_sect_attrs
{
struct attribute_group grp ;
unsigned int nsections ;
struct module_sect_attr attrs [ 0 ] ;
} ;
2005-04-17 02:20:36 +04:00
static ssize_t module_sect_show ( struct module_attribute * mattr ,
struct module * mod , char * buf )
{
struct module_sect_attr * sattr =
container_of ( mattr , struct module_sect_attr , mattr ) ;
return sprintf ( buf , " 0x%lx \n " , sattr - > address ) ;
}
2006-09-29 13:01:31 +04:00
static void free_sect_attrs ( struct module_sect_attrs * sect_attrs )
{
2008-03-13 12:03:44 +03:00
unsigned int section ;
2006-09-29 13:01:31 +04:00
for ( section = 0 ; section < sect_attrs - > nsections ; section + + )
kfree ( sect_attrs - > attrs [ section ] . name ) ;
kfree ( sect_attrs ) ;
}
2005-04-17 02:20:36 +04:00
static void add_sect_attrs ( struct module * mod , unsigned int nsect ,
char * secstrings , Elf_Shdr * sechdrs )
{
unsigned int nloaded = 0 , i , size [ 2 ] ;
struct module_sect_attrs * sect_attrs ;
struct module_sect_attr * sattr ;
struct attribute * * gattr ;
2007-10-18 14:06:07 +04:00
2005-04-17 02:20:36 +04:00
/* Count loaded sections and allocate structures */
for ( i = 0 ; i < nsect ; i + + )
if ( sechdrs [ i ] . sh_flags & SHF_ALLOC )
nloaded + + ;
size [ 0 ] = ALIGN ( sizeof ( * sect_attrs )
+ nloaded * sizeof ( sect_attrs - > attrs [ 0 ] ) ,
sizeof ( sect_attrs - > grp . attrs [ 0 ] ) ) ;
size [ 1 ] = ( nloaded + 1 ) * sizeof ( sect_attrs - > grp . attrs [ 0 ] ) ;
2006-09-29 13:01:31 +04:00
sect_attrs = kzalloc ( size [ 0 ] + size [ 1 ] , GFP_KERNEL ) ;
if ( sect_attrs = = NULL )
2005-04-17 02:20:36 +04:00
return ;
/* Setup section attributes. */
sect_attrs - > grp . name = " sections " ;
sect_attrs - > grp . attrs = ( void * ) sect_attrs + size [ 0 ] ;
2006-09-29 13:01:31 +04:00
sect_attrs - > nsections = 0 ;
2005-04-17 02:20:36 +04:00
sattr = & sect_attrs - > attrs [ 0 ] ;
gattr = & sect_attrs - > grp . attrs [ 0 ] ;
for ( i = 0 ; i < nsect ; i + + ) {
if ( ! ( sechdrs [ i ] . sh_flags & SHF_ALLOC ) )
continue ;
sattr - > address = sechdrs [ i ] . sh_addr ;
2006-09-29 13:01:31 +04:00
sattr - > name = kstrdup ( secstrings + sechdrs [ i ] . sh_name ,
GFP_KERNEL ) ;
if ( sattr - > name = = NULL )
goto out ;
sect_attrs - > nsections + + ;
2005-04-17 02:20:36 +04:00
sattr - > mattr . show = module_sect_show ;
sattr - > mattr . store = NULL ;
sattr - > mattr . attr . name = sattr - > name ;
sattr - > mattr . attr . mode = S_IRUGO ;
* ( gattr + + ) = & ( sattr + + ) - > mattr . attr ;
}
* gattr = NULL ;
if ( sysfs_create_group ( & mod - > mkobj . kobj , & sect_attrs - > grp ) )
goto out ;
mod - > sect_attrs = sect_attrs ;
return ;
out :
2006-09-29 13:01:31 +04:00
free_sect_attrs ( sect_attrs ) ;
2005-04-17 02:20:36 +04:00
}
static void remove_sect_attrs ( struct module * mod )
{
if ( mod - > sect_attrs ) {
sysfs_remove_group ( & mod - > mkobj . kobj ,
& mod - > sect_attrs - > grp ) ;
/* We are positive that no one is using any sect attrs
* at this point . Deallocate immediately . */
2006-09-29 13:01:31 +04:00
free_sect_attrs ( mod - > sect_attrs ) ;
2005-04-17 02:20:36 +04:00
mod - > sect_attrs = NULL ;
}
}
2007-10-17 10:26:40 +04:00
/*
* / sys / module / foo / notes / . section . name gives contents of SHT_NOTE sections .
*/
struct module_notes_attrs {
struct kobject * dir ;
unsigned int notes ;
struct bin_attribute attrs [ 0 ] ;
} ;
static ssize_t module_notes_read ( struct kobject * kobj ,
struct bin_attribute * bin_attr ,
char * buf , loff_t pos , size_t count )
{
/*
* The caller checked the pos and count against our size .
*/
memcpy ( buf , bin_attr - > private + pos , count ) ;
return count ;
}
static void free_notes_attrs ( struct module_notes_attrs * notes_attrs ,
unsigned int i )
{
if ( notes_attrs - > dir ) {
while ( i - - > 0 )
sysfs_remove_bin_file ( notes_attrs - > dir ,
& notes_attrs - > attrs [ i ] ) ;
2008-09-23 23:51:11 +04:00
kobject_put ( notes_attrs - > dir ) ;
2007-10-17 10:26:40 +04:00
}
kfree ( notes_attrs ) ;
}
static void add_notes_attrs ( struct module * mod , unsigned int nsect ,
char * secstrings , Elf_Shdr * sechdrs )
{
unsigned int notes , loaded , i ;
struct module_notes_attrs * notes_attrs ;
struct bin_attribute * nattr ;
2009-08-28 12:44:56 +04:00
/* failed to create section attributes, so can't create notes */
if ( ! mod - > sect_attrs )
return ;
2007-10-17 10:26:40 +04:00
/* Count notes sections and allocate structures. */
notes = 0 ;
for ( i = 0 ; i < nsect ; i + + )
if ( ( sechdrs [ i ] . sh_flags & SHF_ALLOC ) & &
( sechdrs [ i ] . sh_type = = SHT_NOTE ) )
+ + notes ;
if ( notes = = 0 )
return ;
notes_attrs = kzalloc ( sizeof ( * notes_attrs )
+ notes * sizeof ( notes_attrs - > attrs [ 0 ] ) ,
GFP_KERNEL ) ;
if ( notes_attrs = = NULL )
return ;
notes_attrs - > notes = notes ;
nattr = & notes_attrs - > attrs [ 0 ] ;
for ( loaded = i = 0 ; i < nsect ; + + i ) {
if ( ! ( sechdrs [ i ] . sh_flags & SHF_ALLOC ) )
continue ;
if ( sechdrs [ i ] . sh_type = = SHT_NOTE ) {
nattr - > attr . name = mod - > sect_attrs - > attrs [ loaded ] . name ;
nattr - > attr . mode = S_IRUGO ;
nattr - > size = sechdrs [ i ] . sh_size ;
nattr - > private = ( void * ) sechdrs [ i ] . sh_addr ;
nattr - > read = module_notes_read ;
+ + nattr ;
}
+ + loaded ;
}
2007-11-06 09:24:43 +03:00
notes_attrs - > dir = kobject_create_and_add ( " notes " , & mod - > mkobj . kobj ) ;
2007-10-17 10:26:40 +04:00
if ( ! notes_attrs - > dir )
goto out ;
for ( i = 0 ; i < notes ; + + i )
if ( sysfs_create_bin_file ( notes_attrs - > dir ,
& notes_attrs - > attrs [ i ] ) )
goto out ;
mod - > notes_attrs = notes_attrs ;
return ;
out :
free_notes_attrs ( notes_attrs , i ) ;
}
static void remove_notes_attrs ( struct module * mod )
{
if ( mod - > notes_attrs )
free_notes_attrs ( mod - > notes_attrs , mod - > notes_attrs - > notes ) ;
}
2005-04-17 02:20:36 +04:00
# else
2006-09-29 13:01:31 +04:00
2005-04-17 02:20:36 +04:00
static inline void add_sect_attrs ( struct module * mod , unsigned int nsect ,
char * sectstrings , Elf_Shdr * sechdrs )
{
}
static inline void remove_sect_attrs ( struct module * mod )
{
}
2007-10-17 10:26:40 +04:00
static inline void add_notes_attrs ( struct module * mod , unsigned int nsect ,
char * sectstrings , Elf_Shdr * sechdrs )
{
}
static inline void remove_notes_attrs ( struct module * mod )
{
}
2008-02-21 02:33:20 +03:00
# endif
2005-04-17 02:20:36 +04:00
2007-02-14 02:19:06 +03:00
# ifdef CONFIG_SYSFS
int module_add_modinfo_attrs ( struct module * mod )
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
{
struct module_attribute * attr ;
2006-02-17 00:50:23 +03:00
struct module_attribute * temp_attr ;
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
int error = 0 ;
int i ;
2006-02-17 00:50:23 +03:00
mod - > modinfo_attrs = kzalloc ( ( sizeof ( struct module_attribute ) *
( ARRAY_SIZE ( modinfo_attrs ) + 1 ) ) ,
GFP_KERNEL ) ;
if ( ! mod - > modinfo_attrs )
return - ENOMEM ;
temp_attr = mod - > modinfo_attrs ;
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
for ( i = 0 ; ( attr = modinfo_attrs [ i ] ) & & ! error ; i + + ) {
if ( ! attr - > test | |
2006-02-17 00:50:23 +03:00
( attr - > test & & attr - > test ( mod ) ) ) {
memcpy ( temp_attr , attr , sizeof ( * temp_attr ) ) ;
error = sysfs_create_file ( & mod - > mkobj . kobj , & temp_attr - > attr ) ;
+ + temp_attr ;
}
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
}
return error ;
}
2007-02-14 02:19:06 +03:00
void module_remove_modinfo_attrs ( struct module * mod )
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
{
struct module_attribute * attr ;
int i ;
2006-02-17 00:50:23 +03:00
for ( i = 0 ; ( attr = & mod - > modinfo_attrs [ i ] ) ; i + + ) {
/* pick a field to test for end of list */
if ( ! attr - > attr . name )
break ;
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
sysfs_remove_file ( & mod - > mkobj . kobj , & attr - > attr ) ;
2006-02-17 00:50:23 +03:00
if ( attr - > free )
attr - > free ( mod ) ;
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
}
2006-02-17 00:50:23 +03:00
kfree ( mod - > modinfo_attrs ) ;
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
}
2005-04-17 02:20:36 +04:00
2007-02-14 02:19:06 +03:00
int mod_sysfs_init ( struct module * mod )
2005-04-17 02:20:36 +04:00
{
int err ;
2008-01-28 02:38:40 +03:00
struct kobject * kobj ;
2005-04-17 02:20:36 +04:00
2007-04-14 00:15:19 +04:00
if ( ! module_sysfs_initialized ) {
printk ( KERN_ERR " %s: module sysfs not initialized \n " ,
2006-09-26 03:25:36 +04:00
mod - > name ) ;
err = - EINVAL ;
goto out ;
}
2008-01-28 02:38:40 +03:00
kobj = kset_find_obj ( module_kset , mod - > name ) ;
if ( kobj ) {
printk ( KERN_ERR " %s: module is already loaded \n " , mod - > name ) ;
kobject_put ( kobj ) ;
err = - EINVAL ;
goto out ;
}
2005-04-17 02:20:36 +04:00
mod - > mkobj . mod = mod ;
2006-11-24 14:15:25 +03:00
2007-12-18 09:05:35 +03:00
memset ( & mod - > mkobj . kobj , 0 , sizeof ( mod - > mkobj . kobj ) ) ;
mod - > mkobj . kobj . kset = module_kset ;
err = kobject_init_and_add ( & mod - > mkobj . kobj , & module_ktype , NULL ,
" %s " , mod - > name ) ;
if ( err )
kobject_put ( & mod - > mkobj . kobj ) ;
2007-01-18 15:26:15 +03:00
2007-11-30 01:46:11 +03:00
/* delay uevent until full sysfs population */
2007-01-18 15:26:15 +03:00
out :
return err ;
}
2007-02-14 02:19:06 +03:00
int mod_sysfs_setup ( struct module * mod ,
2007-01-18 15:26:15 +03:00
struct kernel_param * kparam ,
unsigned int num_params )
{
int err ;
2007-11-06 09:24:43 +03:00
mod - > holders_dir = kobject_create_and_add ( " holders " , & mod - > mkobj . kobj ) ;
2007-04-26 11:12:09 +04:00
if ( ! mod - > holders_dir ) {
err = - ENOMEM ;
2007-01-18 15:26:15 +03:00
goto out_unreg ;
2007-04-26 11:12:09 +04:00
}
2007-01-18 15:26:15 +03:00
2005-04-17 02:20:36 +04:00
err = module_param_sysfs_setup ( mod , kparam , num_params ) ;
if ( err )
2007-01-18 15:26:15 +03:00
goto out_unreg_holders ;
2005-04-17 02:20:36 +04:00
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
err = module_add_modinfo_attrs ( mod ) ;
if ( err )
2006-11-24 14:15:25 +03:00
goto out_unreg_param ;
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
2006-11-24 14:15:25 +03:00
kobject_uevent ( & mod - > mkobj . kobj , KOBJ_ADD ) ;
2005-04-17 02:20:36 +04:00
return 0 ;
2006-11-24 14:15:25 +03:00
out_unreg_param :
module_param_sysfs_remove ( mod ) ;
2007-01-18 15:26:15 +03:00
out_unreg_holders :
2007-12-20 19:13:05 +03:00
kobject_put ( mod - > holders_dir ) ;
2007-01-18 15:26:15 +03:00
out_unreg :
2006-11-24 14:15:25 +03:00
kobject_put ( & mod - > mkobj . kobj ) ;
2005-04-17 02:20:36 +04:00
return err ;
}
2008-05-20 13:59:48 +04:00
static void mod_sysfs_fini ( struct module * mod )
{
kobject_put ( & mod - > mkobj . kobj ) ;
}
# else /* CONFIG_SYSFS */
static void mod_sysfs_fini ( struct module * mod )
{
}
# endif /* CONFIG_SYSFS */
2005-04-17 02:20:36 +04:00
static void mod_kobject_remove ( struct module * mod )
{
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
module_remove_modinfo_attrs ( mod ) ;
2005-04-17 02:20:36 +04:00
module_param_sysfs_remove ( mod ) ;
2007-12-20 19:13:05 +03:00
kobject_put ( mod - > mkobj . drivers_dir ) ;
kobject_put ( mod - > holders_dir ) ;
2008-05-20 13:59:48 +04:00
mod_sysfs_fini ( mod ) ;
2005-04-17 02:20:36 +04:00
}
/*
* unlink the module with the whole machine is stopped with interrupts off
* - this defends against kallsyms not taking locks
*/
static int __unlink_module ( void * _mod )
{
struct module * mod = _mod ;
list_del ( & mod - > list ) ;
return 0 ;
}
2007-05-09 09:26:28 +04:00
/* Free a module, remove from lists, etc (must hold module_mutex). */
2005-04-17 02:20:36 +04:00
static void free_module ( struct module * mod )
{
2009-08-17 12:56:28 +04:00
trace_module_free ( mod ) ;
2005-04-17 02:20:36 +04:00
/* Delete from various lists */
2008-07-28 21:16:30 +04:00
stop_machine ( __unlink_module , mod , NULL ) ;
2007-10-17 10:26:40 +04:00
remove_notes_attrs ( mod ) ;
2005-04-17 02:20:36 +04:00
remove_sect_attrs ( mod ) ;
mod_kobject_remove ( mod ) ;
/* Arch-specific cleanup. */
module_arch_cleanup ( mod ) ;
/* Module unload stuff */
module_unload_free ( mod ) ;
2009-03-31 23:05:29 +04:00
/* Free any allocated parameters. */
destroy_params ( mod - > kp , mod - > num_kp ) ;
2005-04-17 02:20:36 +04:00
/* This may be NULL, but that's OK */
module_free ( mod , mod - > module_init ) ;
kfree ( mod - > args ) ;
if ( mod - > percpu )
percpu_modfree ( mod - > percpu ) ;
2009-02-03 06:01:36 +03:00
# if defined(CONFIG_MODULE_UNLOAD) && defined(CONFIG_SMP)
if ( mod - > refptr )
percpu_modfree ( mod - > refptr ) ;
# endif
[PATCH] lockdep: core
Do 'make oldconfig' and accept all the defaults for new config options -
reboot into the kernel and if everything goes well it should boot up fine and
you should have /proc/lockdep and /proc/lockdep_stats files.
Typically if the lock validator finds some problem it will print out
voluminous debug output that begins with "BUG: ..." and which syslog output
can be used by kernel developers to figure out the precise locking scenario.
What does the lock validator do? It "observes" and maps all locking rules as
they occur dynamically (as triggered by the kernel's natural use of spinlocks,
rwlocks, mutexes and rwsems). Whenever the lock validator subsystem detects a
new locking scenario, it validates this new rule against the existing set of
rules. If this new rule is consistent with the existing set of rules then the
new rule is added transparently and the kernel continues as normal. If the
new rule could create a deadlock scenario then this condition is printed out.
When determining validity of locking, all possible "deadlock scenarios" are
considered: assuming arbitrary number of CPUs, arbitrary irq context and task
context constellations, running arbitrary combinations of all the existing
locking scenarios. In a typical system this means millions of separate
scenarios. This is why we call it a "locking correctness" validator - for all
rules that are observed the lock validator proves it with mathematical
certainty that a deadlock could not occur (assuming that the lock validator
implementation itself is correct and its internal data structures are not
corrupted by some other kernel subsystem). [see more details and conditionals
of this statement in include/linux/lockdep.h and
Documentation/lockdep-design.txt]
Furthermore, this "all possible scenarios" property of the validator also
enables the finding of complex, highly unlikely multi-CPU multi-context races
via single single-context rules, increasing the likelyhood of finding bugs
drastically. In practical terms: the lock validator already found a bug in
the upstream kernel that could only occur on systems with 3 or more CPUs, and
which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
That bug was found and reported on a single-CPU system (!). So in essence a
race will be found "piecemail-wise", triggering all the necessary components
for the race, without having to reproduce the race scenario itself! In its
short existence the lock validator found and reported many bugs before they
actually caused a real deadlock.
To further increase the efficiency of the validator, the mapping is not per
"lock instance", but per "lock-class". For example, all struct inode objects
in the kernel have inode->inotify_mutex. If there are 10,000 inodes cached,
then there are 10,000 lock objects. But ->inotify_mutex is a single "lock
type", and all locking activities that occur against ->inotify_mutex are
"unified" into this single lock-class. The advantage of the lock-class
approach is that all historical ->inotify_mutex uses are mapped into a single
(and as narrow as possible) set of locking rules - regardless of how many
different tasks or inode structures it took to build this set of rules. The
set of rules persist during the lifetime of the kernel.
To see the rough magnitude of checking that the lock validator does, here's a
portion of /proc/lockdep_stats, fresh after bootup:
lock-classes: 694 [max: 2048]
direct dependencies: 1598 [max: 8192]
indirect dependencies: 17896
all direct dependencies: 16206
dependency chains: 1910 [max: 8192]
in-hardirq chains: 17
in-softirq chains: 105
in-process chains: 1065
stack-trace entries: 38761 [max: 131072]
combined max dependencies: 2033928
hardirq-safe locks: 24
hardirq-unsafe locks: 176
softirq-safe locks: 53
softirq-unsafe locks: 137
irq-safe locks: 59
irq-unsafe locks: 176
The lock validator has observed 1598 actual single-thread locking patterns,
and has validated all possible 2033928 distinct locking scenarios.
More details about the design of the lock validator can be found in
Documentation/lockdep-design.txt, which can also found at:
http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt
[bunk@stusta.de: cleanups]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-03 11:24:50 +04:00
/* Free lock-classes: */
lockdep_free_key_range ( mod - > module_core , mod - > core_size ) ;
2005-04-17 02:20:36 +04:00
/* Finally, free the core (containing the module structure) */
module_free ( mod , mod - > module_core ) ;
}
void * __symbol_get ( const char * symbol )
{
struct module * owner ;
2008-12-06 03:03:56 +03:00
const struct kernel_symbol * sym ;
2005-04-17 02:20:36 +04:00
2007-07-16 10:41:46 +04:00
preempt_disable ( ) ;
2008-12-06 03:03:56 +03:00
sym = find_symbol ( symbol , & owner , NULL , true , true ) ;
if ( sym & & strong_try_module_get ( owner ) )
sym = NULL ;
2007-07-16 10:41:46 +04:00
preempt_enable ( ) ;
2005-04-17 02:20:36 +04:00
2008-12-06 03:03:56 +03:00
return sym ? ( void * ) sym - > value : NULL ;
2005-04-17 02:20:36 +04:00
}
EXPORT_SYMBOL_GPL ( __symbol_get ) ;
2006-01-08 12:04:25 +03:00
/*
* Ensure that an exported symbol [ global namespace ] does not already exist
2007-05-09 09:26:28 +04:00
* in the kernel or in some other module ' s exported symbol table .
2006-01-08 12:04:25 +03:00
*/
static int verify_export_symbols ( struct module * mod )
{
2008-05-02 06:15:00 +04:00
unsigned int i ;
2006-01-08 12:04:25 +03:00
struct module * owner ;
2008-05-02 06:15:00 +04:00
const struct kernel_symbol * s ;
struct {
const struct kernel_symbol * sym ;
unsigned int num ;
} arr [ ] = {
{ mod - > syms , mod - > num_syms } ,
{ mod - > gpl_syms , mod - > num_gpl_syms } ,
{ mod - > gpl_future_syms , mod - > num_gpl_future_syms } ,
2008-07-23 04:24:26 +04:00
# ifdef CONFIG_UNUSED_SYMBOLS
2008-05-02 06:15:00 +04:00
{ mod - > unused_syms , mod - > num_unused_syms } ,
{ mod - > unused_gpl_syms , mod - > num_unused_gpl_syms } ,
2008-07-23 04:24:26 +04:00
# endif
2008-05-02 06:15:00 +04:00
} ;
2006-01-08 12:04:25 +03:00
2008-05-02 06:15:00 +04:00
for ( i = 0 ; i < ARRAY_SIZE ( arr ) ; i + + ) {
for ( s = arr [ i ] . sym ; s < arr [ i ] . sym + arr [ i ] . num ; s + + ) {
2008-12-06 03:03:56 +03:00
if ( find_symbol ( s - > name , & owner , NULL , true , false ) ) {
2008-05-02 06:15:00 +04:00
printk ( KERN_ERR
" %s: exports duplicate symbol %s "
" (owned by %s) \n " ,
mod - > name , s - > name , module_name ( owner ) ) ;
return - ENOEXEC ;
}
2006-01-08 12:04:25 +03:00
}
2008-05-02 06:15:00 +04:00
}
return 0 ;
2006-01-08 12:04:25 +03:00
}
2007-11-08 19:37:38 +03:00
/* Change all symbols so that st_value encodes the pointer directly. */
2005-04-17 02:20:36 +04:00
static int simplify_symbols ( Elf_Shdr * sechdrs ,
unsigned int symindex ,
const char * strtab ,
unsigned int versindex ,
unsigned int pcpuindex ,
struct module * mod )
{
Elf_Sym * sym = ( void * ) sechdrs [ symindex ] . sh_addr ;
unsigned long secbase ;
unsigned int i , n = sechdrs [ symindex ] . sh_size / sizeof ( Elf_Sym ) ;
int ret = 0 ;
2008-12-06 03:03:56 +03:00
const struct kernel_symbol * ksym ;
2005-04-17 02:20:36 +04:00
for ( i = 1 ; i < n ; i + + ) {
switch ( sym [ i ] . st_shndx ) {
case SHN_COMMON :
/* We compiled with -fno-common. These are not
supposed to happen . */
DEBUGP ( " Common symbol: %s \n " , strtab + sym [ i ] . st_name ) ;
printk ( " %s: please compile with -fno-common \n " ,
mod - > name ) ;
ret = - ENOEXEC ;
break ;
case SHN_ABS :
/* Don't need to do anything */
DEBUGP ( " Absolute symbol: 0x%08lx \n " ,
( long ) sym [ i ] . st_value ) ;
break ;
case SHN_UNDEF :
2008-12-06 03:03:56 +03:00
ksym = resolve_symbol ( sechdrs , versindex ,
strtab + sym [ i ] . st_name , mod ) ;
2005-04-17 02:20:36 +04:00
/* Ok if resolved. */
2008-12-06 03:03:56 +03:00
if ( ksym ) {
sym [ i ] . st_value = ksym - > value ;
2005-04-17 02:20:36 +04:00
break ;
2008-12-06 03:03:56 +03:00
}
2005-04-17 02:20:36 +04:00
/* Ok if weak. */
if ( ELF_ST_BIND ( sym [ i ] . st_info ) = = STB_WEAK )
break ;
printk ( KERN_WARNING " %s: Unknown symbol %s \n " ,
mod - > name , strtab + sym [ i ] . st_name ) ;
ret = - ENOENT ;
break ;
default :
/* Divert to percpu allocation if a percpu var. */
if ( sym [ i ] . st_shndx = = pcpuindex )
secbase = ( unsigned long ) mod - > percpu ;
else
secbase = sechdrs [ sym [ i ] . st_shndx ] . sh_addr ;
sym [ i ] . st_value + = secbase ;
break ;
}
}
return ret ;
}
2008-12-31 14:31:18 +03:00
/* Additional bytes needed by arch in front of individual sections */
unsigned int __weak arch_mod_section_prepend ( struct module * mod ,
unsigned int section )
{
/* default implementation just returns zero */
return 0 ;
}
2005-04-17 02:20:36 +04:00
/* Update size with this section: return offset. */
2008-12-31 14:31:18 +03:00
static long get_offset ( struct module * mod , unsigned int * size ,
Elf_Shdr * sechdr , unsigned int section )
2005-04-17 02:20:36 +04:00
{
long ret ;
2008-12-31 14:31:18 +03:00
* size + = arch_mod_section_prepend ( mod , section ) ;
2005-04-17 02:20:36 +04:00
ret = ALIGN ( * size , sechdr - > sh_addralign ? : 1 ) ;
* size = ret + sechdr - > sh_size ;
return ret ;
}
/* Lay out the SHF_ALLOC sections in a way not dissimilar to how ld
might - - code , read - only data , read - write data , small data . Tally
sizes , and place the offsets into sh_entsize fields : high bit means it
belongs in init . */
static void layout_sections ( struct module * mod ,
const Elf_Ehdr * hdr ,
Elf_Shdr * sechdrs ,
const char * secstrings )
{
static unsigned long const masks [ ] [ 2 ] = {
/* NOTE: all executable code must be the first section
* in this array ; otherwise modify the text_size
* finder in the two loops below */
{ SHF_EXECINSTR | SHF_ALLOC , ARCH_SHF_SMALL } ,
{ SHF_ALLOC , SHF_WRITE | ARCH_SHF_SMALL } ,
{ SHF_WRITE | SHF_ALLOC , ARCH_SHF_SMALL } ,
{ ARCH_SHF_SMALL | SHF_ALLOC , 0 }
} ;
unsigned int m , i ;
for ( i = 0 ; i < hdr - > e_shnum ; i + + )
sechdrs [ i ] . sh_entsize = ~ 0UL ;
DEBUGP ( " Core section allocation order: \n " ) ;
for ( m = 0 ; m < ARRAY_SIZE ( masks ) ; + + m ) {
for ( i = 0 ; i < hdr - > e_shnum ; + + i ) {
Elf_Shdr * s = & sechdrs [ i ] ;
if ( ( s - > sh_flags & masks [ m ] [ 0 ] ) ! = masks [ m ] [ 0 ]
| | ( s - > sh_flags & masks [ m ] [ 1 ] )
| | s - > sh_entsize ! = ~ 0UL
2009-03-31 23:05:36 +04:00
| | strstarts ( secstrings + s - > sh_name , " .init " ) )
2005-04-17 02:20:36 +04:00
continue ;
2008-12-31 14:31:18 +03:00
s - > sh_entsize = get_offset ( mod , & mod - > core_size , s , i ) ;
2005-04-17 02:20:36 +04:00
DEBUGP ( " \t %s \n " , secstrings + s - > sh_name ) ;
}
if ( m = = 0 )
mod - > core_text_size = mod - > core_size ;
}
DEBUGP ( " Init section allocation order: \n " ) ;
for ( m = 0 ; m < ARRAY_SIZE ( masks ) ; + + m ) {
for ( i = 0 ; i < hdr - > e_shnum ; + + i ) {
Elf_Shdr * s = & sechdrs [ i ] ;
if ( ( s - > sh_flags & masks [ m ] [ 0 ] ) ! = masks [ m ] [ 0 ]
| | ( s - > sh_flags & masks [ m ] [ 1 ] )
| | s - > sh_entsize ! = ~ 0UL
2009-03-31 23:05:36 +04:00
| | ! strstarts ( secstrings + s - > sh_name , " .init " ) )
2005-04-17 02:20:36 +04:00
continue ;
2008-12-31 14:31:18 +03:00
s - > sh_entsize = ( get_offset ( mod , & mod - > init_size , s , i )
2005-04-17 02:20:36 +04:00
| INIT_OFFSET_MASK ) ;
DEBUGP ( " \t %s \n " , secstrings + s - > sh_name ) ;
}
if ( m = = 0 )
mod - > init_text_size = mod - > init_size ;
}
}
static void set_license ( struct module * mod , const char * license )
{
if ( ! license )
license = " unspecified " ;
2006-10-11 12:21:48 +04:00
if ( ! license_is_gpl_compatible ( license ) ) {
2008-10-16 09:01:41 +04:00
if ( ! test_taint ( TAINT_PROPRIETARY_MODULE ) )
2006-10-28 21:38:38 +04:00
printk ( KERN_WARNING " %s: module license '%s' taints "
2006-10-11 12:21:48 +04:00
" kernel. \n " , mod - > name , license ) ;
add_taint_module ( mod , TAINT_PROPRIETARY_MODULE ) ;
2005-04-17 02:20:36 +04:00
}
}
/* Parse tag=value strings from .modinfo section */
static char * next_string ( char * string , unsigned long * secsize )
{
/* Skip non-zero chars */
while ( string [ 0 ] ) {
string + + ;
if ( ( * secsize ) - - < = 1 )
return NULL ;
}
/* Skip any zero padding. */
while ( ! string [ 0 ] ) {
string + + ;
if ( ( * secsize ) - - < = 1 )
return NULL ;
}
return string ;
}
static char * get_modinfo ( Elf_Shdr * sechdrs ,
unsigned int info ,
const char * tag )
{
char * p ;
unsigned int taglen = strlen ( tag ) ;
unsigned long size = sechdrs [ info ] . sh_size ;
for ( p = ( char * ) sechdrs [ info ] . sh_addr ; p ; p = next_string ( p , & size ) ) {
if ( strncmp ( p , tag , taglen ) = = 0 & & p [ taglen ] = = ' = ' )
return p + taglen + 1 ;
}
return NULL ;
}
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
static void setup_modinfo ( struct module * mod , Elf_Shdr * sechdrs ,
unsigned int infoindex )
{
struct module_attribute * attr ;
int i ;
for ( i = 0 ; ( attr = modinfo_attrs [ i ] ) ; i + + ) {
if ( attr - > setup )
attr - > setup ( mod ,
get_modinfo ( sechdrs ,
infoindex ,
attr - > attr . name ) ) ;
}
}
2005-04-17 02:20:36 +04:00
# ifdef CONFIG_KALLSYMS
2008-07-24 18:41:48 +04:00
/* lookup symbol in given range of kernel_symbols */
static const struct kernel_symbol * lookup_symbol ( const char * name ,
const struct kernel_symbol * start ,
const struct kernel_symbol * stop )
{
const struct kernel_symbol * ks = start ;
for ( ; ks < stop ; ks + + )
if ( strcmp ( ks - > name , name ) = = 0 )
return ks ;
return NULL ;
}
2009-01-05 17:40:10 +03:00
static int is_exported ( const char * name , unsigned long value ,
const struct module * mod )
2005-04-17 02:20:36 +04:00
{
2009-01-05 17:40:10 +03:00
const struct kernel_symbol * ks ;
if ( ! mod )
ks = lookup_symbol ( name , __start___ksymtab , __stop___ksymtab ) ;
2006-02-08 23:16:45 +03:00
else
2009-01-05 17:40:10 +03:00
ks = lookup_symbol ( name , mod - > syms , mod - > syms + mod - > num_syms ) ;
return ks ! = NULL & & ks - > value = = value ;
2005-04-17 02:20:36 +04:00
}
/* As per nm */
static char elf_type ( const Elf_Sym * sym ,
Elf_Shdr * sechdrs ,
const char * secstrings ,
struct module * mod )
{
if ( ELF_ST_BIND ( sym - > st_info ) = = STB_WEAK ) {
if ( ELF_ST_TYPE ( sym - > st_info ) = = STT_OBJECT )
return ' v ' ;
else
return ' w ' ;
}
if ( sym - > st_shndx = = SHN_UNDEF )
return ' U ' ;
if ( sym - > st_shndx = = SHN_ABS )
return ' a ' ;
if ( sym - > st_shndx > = SHN_LORESERVE )
return ' ? ' ;
if ( sechdrs [ sym - > st_shndx ] . sh_flags & SHF_EXECINSTR )
return ' t ' ;
if ( sechdrs [ sym - > st_shndx ] . sh_flags & SHF_ALLOC
& & sechdrs [ sym - > st_shndx ] . sh_type ! = SHT_NOBITS ) {
if ( ! ( sechdrs [ sym - > st_shndx ] . sh_flags & SHF_WRITE ) )
return ' r ' ;
else if ( sechdrs [ sym - > st_shndx ] . sh_flags & ARCH_SHF_SMALL )
return ' g ' ;
else
return ' d ' ;
}
if ( sechdrs [ sym - > st_shndx ] . sh_type = = SHT_NOBITS ) {
if ( sechdrs [ sym - > st_shndx ] . sh_flags & ARCH_SHF_SMALL )
return ' s ' ;
else
return ' b ' ;
}
2009-03-31 23:05:36 +04:00
if ( strstarts ( secstrings + sechdrs [ sym - > st_shndx ] . sh_name , " .debug " ) )
2005-04-17 02:20:36 +04:00
return ' n ' ;
return ' ? ' ;
}
static void add_kallsyms ( struct module * mod ,
Elf_Shdr * sechdrs ,
unsigned int symindex ,
unsigned int strindex ,
const char * secstrings )
{
unsigned int i ;
mod - > symtab = ( void * ) sechdrs [ symindex ] . sh_addr ;
mod - > num_symtab = sechdrs [ symindex ] . sh_size / sizeof ( Elf_Sym ) ;
mod - > strtab = ( void * ) sechdrs [ strindex ] . sh_addr ;
/* Set types up while we still have access to sections. */
for ( i = 0 ; i < mod - > num_symtab ; i + + )
mod - > symtab [ i ] . st_info
= elf_type ( & mod - > symtab [ i ] , sechdrs , secstrings , mod ) ;
}
# else
static inline void add_kallsyms ( struct module * mod ,
Elf_Shdr * sechdrs ,
unsigned int symindex ,
unsigned int strindex ,
const char * secstrings )
{
}
# endif /* CONFIG_KALLSYMS */
2009-02-05 19:51:38 +03:00
static void dynamic_debug_setup ( struct _ddebug * debug , unsigned int num )
driver core: basic infrastructure for per-module dynamic debug messages
Base infrastructure to enable per-module debug messages.
I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
control of debugging statements on a per-module basis in one /proc file,
currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
is not set, debugging statements can still be enabled as before, often by
defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.
The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
can be dynamically enabled/disabled on a per-module basis.
Future plans include extending this functionality to subsystems, that define
their own debug levels and flags.
Usage:
Dynamic debugging is controlled by the debugfs file,
<debugfs>/dynamic_printk/modules. This file contains a list of the modules that
can be enabled. The format of the file is as follows:
<module_name> <enabled=0/1>
.
.
.
<module_name> : Name of the module in which the debug call resides
<enabled=0/1> : whether the messages are enabled or not
For example:
snd_hda_intel enabled=0
fixup enabled=1
driver enabled=0
Enable a module:
$echo "set enabled=1 <module_name>" > dynamic_printk/modules
Disable a module:
$echo "set enabled=0 <module_name>" > dynamic_printk/modules
Enable all modules:
$echo "set enabled=1 all" > dynamic_printk/modules
Disable all modules:
$echo "set enabled=0 all" > dynamic_printk/modules
Finally, passing "dynamic_printk" at the command line enables
debugging for all modules. This mode can be turned off via the above
disable command.
[gkh: minor cleanups and tweaks to make the build work quietly]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-13 00:46:19 +04:00
{
2009-02-05 19:51:38 +03:00
# ifdef CONFIG_DYNAMIC_DEBUG
if ( ddebug_add_module ( debug , num , debug - > modname ) )
printk ( KERN_ERR " dynamic debug error adding module: %s \n " ,
debug - > modname ) ;
# endif
2008-10-22 19:00:13 +04:00
}
driver core: basic infrastructure for per-module dynamic debug messages
Base infrastructure to enable per-module debug messages.
I've introduced CONFIG_DYNAMIC_PRINTK_DEBUG, which when enabled centralizes
control of debugging statements on a per-module basis in one /proc file,
currently, <debugfs>/dynamic_printk/modules. When, CONFIG_DYNAMIC_PRINTK_DEBUG,
is not set, debugging statements can still be enabled as before, often by
defining 'DEBUG' for the proper compilation unit. Thus, this patch set has no
affect when CONFIG_DYNAMIC_PRINTK_DEBUG is not set.
The infrastructure currently ties into all pr_debug() and dev_dbg() calls. That
is, if CONFIG_DYNAMIC_PRINTK_DEBUG is set, all pr_debug() and dev_dbg() calls
can be dynamically enabled/disabled on a per-module basis.
Future plans include extending this functionality to subsystems, that define
their own debug levels and flags.
Usage:
Dynamic debugging is controlled by the debugfs file,
<debugfs>/dynamic_printk/modules. This file contains a list of the modules that
can be enabled. The format of the file is as follows:
<module_name> <enabled=0/1>
.
.
.
<module_name> : Name of the module in which the debug call resides
<enabled=0/1> : whether the messages are enabled or not
For example:
snd_hda_intel enabled=0
fixup enabled=1
driver enabled=0
Enable a module:
$echo "set enabled=1 <module_name>" > dynamic_printk/modules
Disable a module:
$echo "set enabled=0 <module_name>" > dynamic_printk/modules
Enable all modules:
$echo "set enabled=1 all" > dynamic_printk/modules
Disable all modules:
$echo "set enabled=0 all" > dynamic_printk/modules
Finally, passing "dynamic_printk" at the command line enables
debugging for all modules. This mode can be turned off via the above
disable command.
[gkh: minor cleanups and tweaks to make the build work quietly]
Signed-off-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-08-13 00:46:19 +04:00
2008-07-23 04:24:28 +04:00
static void * module_alloc_update_bounds ( unsigned long size )
{
void * ret = module_alloc ( size ) ;
if ( ret ) {
/* Update module bounds. */
if ( ( unsigned long ) ret < module_addr_min )
module_addr_min = ( unsigned long ) ret ;
if ( ( unsigned long ) ret + size > module_addr_max )
module_addr_max = ( unsigned long ) ret + size ;
}
return ret ;
}
2009-06-11 16:23:20 +04:00
# ifdef CONFIG_DEBUG_KMEMLEAK
static void kmemleak_load_module ( struct module * mod , Elf_Ehdr * hdr ,
Elf_Shdr * sechdrs , char * secstrings )
{
unsigned int i ;
/* only scan the sections containing data */
kmemleak_scan_area ( mod - > module_core , ( unsigned long ) mod -
( unsigned long ) mod - > module_core ,
sizeof ( struct module ) , GFP_KERNEL ) ;
for ( i = 1 ; i < hdr - > e_shnum ; i + + ) {
if ( ! ( sechdrs [ i ] . sh_flags & SHF_ALLOC ) )
continue ;
if ( strncmp ( secstrings + sechdrs [ i ] . sh_name , " .data " , 5 ) ! = 0
& & strncmp ( secstrings + sechdrs [ i ] . sh_name , " .bss " , 4 ) ! = 0 )
continue ;
kmemleak_scan_area ( mod - > module_core , sechdrs [ i ] . sh_addr -
( unsigned long ) mod - > module_core ,
sechdrs [ i ] . sh_size , GFP_KERNEL ) ;
}
}
# else
static inline void kmemleak_load_module ( struct module * mod , Elf_Ehdr * hdr ,
Elf_Shdr * sechdrs , char * secstrings )
{
}
# endif
2005-04-17 02:20:36 +04:00
/* Allocate and load the module: note that size of section 0 is always
zero , and we rely on this for optional sections . */
2008-08-25 22:10:26 +04:00
static noinline struct module * load_module ( void __user * umod ,
2005-04-17 02:20:36 +04:00
unsigned long len ,
const char __user * uargs )
{
Elf_Ehdr * hdr ;
Elf_Shdr * sechdrs ;
char * secstrings , * args , * modmagic , * strtab = NULL ;
2008-09-25 01:46:44 +04:00
char * staging ;
2006-06-28 15:26:46 +04:00
unsigned int i ;
unsigned int symindex = 0 ;
unsigned int strindex = 0 ;
2008-10-22 19:00:13 +04:00
unsigned int modindex , versindex , infoindex , pcpuindex ;
2005-04-17 02:20:36 +04:00
struct module * mod ;
long err = 0 ;
void * percpu = NULL , * ptr = NULL ; /* Stops spurious gcc warning */
2005-09-07 02:17:11 +04:00
mm_segment_t old_fs ;
2005-04-17 02:20:36 +04:00
DEBUGP ( " load_module: umod=%p, len=%lu, uargs=%p \n " ,
umod , len , uargs ) ;
if ( len < sizeof ( * hdr ) )
return ERR_PTR ( - ENOEXEC ) ;
/* Suck in entire file: we'll want most of it. */
/* vmalloc barfs on "unusual" numbers. Check here */
if ( len > 64 * 1024 * 1024 | | ( hdr = vmalloc ( len ) ) = = NULL )
return ERR_PTR ( - ENOMEM ) ;
2008-12-22 14:36:31 +03:00
2005-04-17 02:20:36 +04:00
if ( copy_from_user ( hdr , umod , len ) ! = 0 ) {
err = - EFAULT ;
goto free_hdr ;
}
/* Sanity checks against insmoding binaries or wrong arch,
weird elf version */
2008-05-15 03:27:29 +04:00
if ( memcmp ( hdr - > e_ident , ELFMAG , SELFMAG ) ! = 0
2005-04-17 02:20:36 +04:00
| | hdr - > e_type ! = ET_REL
| | ! elf_check_arch ( hdr )
| | hdr - > e_shentsize ! = sizeof ( * sechdrs ) ) {
err = - ENOEXEC ;
goto free_hdr ;
}
if ( len < hdr - > e_shoff + hdr - > e_shnum * sizeof ( Elf_Shdr ) )
goto truncated ;
/* Convenience variables */
sechdrs = ( void * ) hdr + hdr - > e_shoff ;
secstrings = ( void * ) hdr + sechdrs [ hdr - > e_shstrndx ] . sh_offset ;
sechdrs [ 0 ] . sh_addr = 0 ;
for ( i = 1 ; i < hdr - > e_shnum ; i + + ) {
if ( sechdrs [ i ] . sh_type ! = SHT_NOBITS
& & len < sechdrs [ i ] . sh_offset + sechdrs [ i ] . sh_size )
goto truncated ;
/* Mark all sections sh_addr with their address in the
temporary image . */
sechdrs [ i ] . sh_addr = ( size_t ) hdr + sechdrs [ i ] . sh_offset ;
/* Internal symbols and strings. */
if ( sechdrs [ i ] . sh_type = = SHT_SYMTAB ) {
symindex = i ;
strindex = sechdrs [ i ] . sh_link ;
strtab = ( char * ) hdr + sechdrs [ strindex ] . sh_offset ;
}
# ifndef CONFIG_MODULE_UNLOAD
/* Don't load .exit sections */
2009-03-31 23:05:36 +04:00
if ( strstarts ( secstrings + sechdrs [ i ] . sh_name , " .exit " ) )
2005-04-17 02:20:36 +04:00
sechdrs [ i ] . sh_flags & = ~ ( unsigned long ) SHF_ALLOC ;
# endif
}
modindex = find_sec ( hdr , sechdrs , secstrings ,
" .gnu.linkonce.this_module " ) ;
if ( ! modindex ) {
printk ( KERN_WARNING " No module found in object \n " ) ;
err = - ENOEXEC ;
goto free_hdr ;
}
2008-10-22 19:00:13 +04:00
/* This is temporary: point mod into copy of data. */
2005-04-17 02:20:36 +04:00
mod = ( void * ) sechdrs [ modindex ] . sh_addr ;
if ( symindex = = 0 ) {
printk ( KERN_WARNING " %s: module has no symbols (stripped?) \n " ,
mod - > name ) ;
err = - ENOEXEC ;
goto free_hdr ;
}
versindex = find_sec ( hdr , sechdrs , secstrings , " __versions " ) ;
infoindex = find_sec ( hdr , sechdrs , secstrings , " .modinfo " ) ;
pcpuindex = find_pcpusec ( hdr , sechdrs , secstrings ) ;
2008-03-13 12:02:17 +03:00
/* Don't keep modinfo and version sections. */
2005-04-17 02:20:36 +04:00
sechdrs [ infoindex ] . sh_flags & = ~ ( unsigned long ) SHF_ALLOC ;
2008-03-13 12:02:17 +03:00
sechdrs [ versindex ] . sh_flags & = ~ ( unsigned long ) SHF_ALLOC ;
2005-04-17 02:20:36 +04:00
# ifdef CONFIG_KALLSYMS
/* Keep symbol and string tables for decoding later. */
sechdrs [ symindex ] . sh_flags | = SHF_ALLOC ;
sechdrs [ strindex ] . sh_flags | = SHF_ALLOC ;
# endif
/* Check module struct version now, before we try to use module. */
if ( ! check_modstruct_version ( sechdrs , versindex , mod ) ) {
err = - ENOEXEC ;
goto free_hdr ;
}
modmagic = get_modinfo ( sechdrs , infoindex , " vermagic " ) ;
/* This is allowed: modprobe --force will invalidate it. */
if ( ! modmagic ) {
2009-03-31 23:05:33 +04:00
err = try_to_force_load ( mod , " bad vermagic " ) ;
2008-05-05 04:04:16 +04:00
if ( err )
goto free_hdr ;
2008-05-09 10:25:28 +04:00
} else if ( ! same_magic ( modmagic , vermagic , versindex ) ) {
2005-04-17 02:20:36 +04:00
printk ( KERN_ERR " %s: version magic '%s' should be '%s' \n " ,
mod - > name , modmagic , vermagic ) ;
err = - ENOEXEC ;
goto free_hdr ;
}
2008-09-25 01:46:44 +04:00
staging = get_modinfo ( sechdrs , infoindex , " staging " ) ;
if ( staging ) {
add_taint_module ( mod , TAINT_CRAP ) ;
printk ( KERN_WARNING " %s: module is from the staging directory, "
" the quality is unknown, you have been warned. \n " ,
mod - > name ) ;
}
2005-04-17 02:20:36 +04:00
/* Now copy in args */
2006-03-24 14:18:43 +03:00
args = strndup_user ( uargs , ~ 0UL > > 1 ) ;
if ( IS_ERR ( args ) ) {
err = PTR_ERR ( args ) ;
2005-04-17 02:20:36 +04:00
goto free_hdr ;
}
2006-02-07 23:58:45 +03:00
2005-04-17 02:20:36 +04:00
if ( find_module ( mod - > name ) ) {
err = - EEXIST ;
goto free_mod ;
}
mod - > state = MODULE_STATE_COMING ;
/* Allow arches to frob section contents and sizes. */
err = module_frob_arch_sections ( hdr , sechdrs , secstrings , mod ) ;
if ( err < 0 )
goto free_mod ;
if ( pcpuindex ) {
/* We have a special allocation for this section. */
percpu = percpu_modalloc ( sechdrs [ pcpuindex ] . sh_size ,
2005-08-02 08:11:47 +04:00
sechdrs [ pcpuindex ] . sh_addralign ,
mod - > name ) ;
2005-04-17 02:20:36 +04:00
if ( ! percpu ) {
err = - ENOMEM ;
2009-03-17 01:13:36 +03:00
goto free_mod ;
2005-04-17 02:20:36 +04:00
}
sechdrs [ pcpuindex ] . sh_flags & = ~ ( unsigned long ) SHF_ALLOC ;
mod - > percpu = percpu ;
}
/* Determine total sizes, and put offsets in sh_entsize. For now
this is done generically ; there doesn ' t appear to be any
special cases for the architectures . */
layout_sections ( mod , hdr , sechdrs , secstrings ) ;
/* Do the allocs. */
2008-07-23 04:24:28 +04:00
ptr = module_alloc_update_bounds ( mod - > core_size ) ;
2009-06-11 16:23:20 +04:00
/*
* The pointer to this block is stored in the module structure
* which is inside the block . Just mark it as not being a
* leak .
*/
kmemleak_not_leak ( ptr ) ;
2005-04-17 02:20:36 +04:00
if ( ! ptr ) {
err = - ENOMEM ;
goto free_percpu ;
}
memset ( ptr , 0 , mod - > core_size ) ;
mod - > module_core = ptr ;
2008-07-23 04:24:28 +04:00
ptr = module_alloc_update_bounds ( mod - > init_size ) ;
2009-06-11 16:23:20 +04:00
/*
* The pointer to this block is stored in the module structure
* which is inside the block . This block doesn ' t need to be
* scanned as it contains data and code that will be freed
* after the module is initialized .
*/
kmemleak_ignore ( ptr ) ;
2005-04-17 02:20:36 +04:00
if ( ! ptr & & mod - > init_size ) {
err = - ENOMEM ;
goto free_core ;
}
memset ( ptr , 0 , mod - > init_size ) ;
mod - > module_init = ptr ;
/* Transfer each section which specifies SHF_ALLOC */
DEBUGP ( " final section addresses: \n " ) ;
for ( i = 0 ; i < hdr - > e_shnum ; i + + ) {
void * dest ;
if ( ! ( sechdrs [ i ] . sh_flags & SHF_ALLOC ) )
continue ;
if ( sechdrs [ i ] . sh_entsize & INIT_OFFSET_MASK )
dest = mod - > module_init
+ ( sechdrs [ i ] . sh_entsize & ~ INIT_OFFSET_MASK ) ;
else
dest = mod - > module_core + sechdrs [ i ] . sh_entsize ;
if ( sechdrs [ i ] . sh_type ! = SHT_NOBITS )
memcpy ( dest , ( void * ) sechdrs [ i ] . sh_addr ,
sechdrs [ i ] . sh_size ) ;
/* Update sh_addr to point to copy in image. */
sechdrs [ i ] . sh_addr = ( unsigned long ) dest ;
DEBUGP ( " \t 0x%lx %s \n " , sechdrs [ i ] . sh_addr , secstrings + sechdrs [ i ] . sh_name ) ;
}
/* Module has been moved. */
mod = ( void * ) sechdrs [ modindex ] . sh_addr ;
2009-06-11 16:23:20 +04:00
kmemleak_load_module ( mod , hdr , sechdrs , secstrings ) ;
2005-04-17 02:20:36 +04:00
2009-03-17 01:13:36 +03:00
# if defined(CONFIG_MODULE_UNLOAD) && defined(CONFIG_SMP)
mod - > refptr = percpu_modalloc ( sizeof ( local_t ) , __alignof__ ( local_t ) ,
mod - > name ) ;
if ( ! mod - > refptr ) {
err = - ENOMEM ;
goto free_init ;
}
# endif
2005-04-17 02:20:36 +04:00
/* Now we've moved module, initialize linked lists, etc. */
module_unload_init ( mod ) ;
2007-11-30 01:46:11 +03:00
/* add kobject, so we can reference it. */
2007-10-17 10:30:27 +04:00
err = mod_sysfs_init ( mod ) ;
if ( err )
2007-11-30 01:46:11 +03:00
goto free_unload ;
2007-01-18 15:26:15 +03:00
2005-04-17 02:20:36 +04:00
/* Set up license info based on the info section */
set_license ( mod , get_modinfo ( sechdrs , infoindex , " license " ) ) ;
2008-02-29 01:11:02 +03:00
/*
* ndiswrapper is under GPL by itself , but loads proprietary modules .
* Don ' t use add_taint_module ( ) , as it would prevent ndiswrapper from
* using GPL - only symbols it needs .
*/
2006-10-11 12:21:48 +04:00
if ( strcmp ( mod - > name , " ndiswrapper " ) = = 0 )
2008-02-29 01:11:02 +03:00
add_taint ( TAINT_PROPRIETARY_MODULE ) ;
/* driverloader was caught wrongly pretending to be under GPL */
2006-10-11 12:21:48 +04:00
if ( strcmp ( mod - > name , " driverloader " ) = = 0 )
add_taint_module ( mod , TAINT_PROPRIETARY_MODULE ) ;
2006-01-08 12:03:41 +03:00
[PATCH] modules: add version and srcversion to sysfs
This patch adds version and srcversion files to
/sys/module/${modulename} containing the version and srcversion fields
of the module's modinfo section (if present).
/sys/module/e1000
|-- srcversion
`-- version
This patch differs slightly from the version posted in January, as it
now uses the new kstrdup() call in -mm.
Why put this in sysfs?
a) Tools like DKMS, which deal with changing out individual kernel
modules without replacing the whole kernel, can behave smarter if they
can tell the version of a given module. The autoinstaller feature, for
example, which determines if your system has a "good" version of a
driver (i.e. if the one provided by DKMS has a newer verson than that
provided by the kernel package installed), and to automatically compile
and install a newer version if DKMS has it but your kernel doesn't yet
have that version.
b) Because sysadmins manually, or with tools like DKMS, can switch out
modules on the file system, you can't count on 'modinfo foo.ko', which
looks at /lib/modules/${kernelver}/... actually matching what is loaded
into the kernel already. Hence asking sysfs for this.
c) as the unbind-driver-from-device work takes shape, it will be
possible to rebind a driver that's built-in (no .ko to modinfo for the
version) to a newly loaded module. sysfs will have the
currently-built-in version info, for comparison.
d) tech support scripts can then easily grab the version info for what's
running presently - a question I get often.
There has been renewed interest in this patch on linux-scsi by driver
authors.
As the idea originated from GregKH, I leave his Signed-off-by: intact,
though the implementation is nearly completely new. Compiled and run on
x86 and x86_64.
From: Matthew Dobson <colpatch@us.ibm.com>
build fix
From: Thierry Vignaud <tvignaud@mandriva.com>
build fix
From: Matthew Dobson <colpatch@us.ibm.com>
warning fix
Signed-off-by: Greg Kroah-Hartman <greg@kroah.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-06-24 09:05:15 +04:00
/* Set up MODINFO_ATTR fields */
setup_modinfo ( mod , sechdrs , infoindex ) ;
2005-04-17 02:20:36 +04:00
/* Fix up syms, so that st_value is a pointer to location. */
err = simplify_symbols ( sechdrs , symindex , strtab , versindex , pcpuindex ,
mod ) ;
if ( err < 0 )
goto cleanup ;
2008-10-22 19:00:13 +04:00
/* Now we've got everything in the final locations, we can
* find optional sections . */
2009-03-31 23:05:29 +04:00
mod - > kp = section_objs ( hdr , sechdrs , secstrings , " __param " ,
sizeof ( * mod - > kp ) , & mod - > num_kp ) ;
2008-10-22 19:00:13 +04:00
mod - > syms = section_objs ( hdr , sechdrs , secstrings , " __ksymtab " ,
sizeof ( * mod - > syms ) , & mod - > num_syms ) ;
mod - > crcs = section_addr ( hdr , sechdrs , secstrings , " __kcrctab " ) ;
mod - > gpl_syms = section_objs ( hdr , sechdrs , secstrings , " __ksymtab_gpl " ,
sizeof ( * mod - > gpl_syms ) ,
& mod - > num_gpl_syms ) ;
mod - > gpl_crcs = section_addr ( hdr , sechdrs , secstrings , " __kcrctab_gpl " ) ;
mod - > gpl_future_syms = section_objs ( hdr , sechdrs , secstrings ,
" __ksymtab_gpl_future " ,
sizeof ( * mod - > gpl_future_syms ) ,
& mod - > num_gpl_future_syms ) ;
mod - > gpl_future_crcs = section_addr ( hdr , sechdrs , secstrings ,
" __kcrctab_gpl_future " ) ;
2005-04-17 02:20:36 +04:00
2008-07-23 04:24:26 +04:00
# ifdef CONFIG_UNUSED_SYMBOLS
2008-10-22 19:00:13 +04:00
mod - > unused_syms = section_objs ( hdr , sechdrs , secstrings ,
" __ksymtab_unused " ,
sizeof ( * mod - > unused_syms ) ,
& mod - > num_unused_syms ) ;
mod - > unused_crcs = section_addr ( hdr , sechdrs , secstrings ,
" __kcrctab_unused " ) ;
mod - > unused_gpl_syms = section_objs ( hdr , sechdrs , secstrings ,
" __ksymtab_unused_gpl " ,
sizeof ( * mod - > unused_gpl_syms ) ,
& mod - > num_unused_gpl_syms ) ;
mod - > unused_gpl_crcs = section_addr ( hdr , sechdrs , secstrings ,
" __kcrctab_unused_gpl " ) ;
# endif
2009-06-18 03:28:03 +04:00
# ifdef CONFIG_CONSTRUCTORS
mod - > ctors = section_objs ( hdr , sechdrs , secstrings , " .ctors " ,
sizeof ( * mod - > ctors ) , & mod - > num_ctors ) ;
# endif
2008-10-22 19:00:13 +04:00
# ifdef CONFIG_MARKERS
mod - > markers = section_objs ( hdr , sechdrs , secstrings , " __markers " ,
sizeof ( * mod - > markers ) , & mod - > num_markers ) ;
# endif
# ifdef CONFIG_TRACEPOINTS
mod - > tracepoints = section_objs ( hdr , sechdrs , secstrings ,
" __tracepoints " ,
sizeof ( * mod - > tracepoints ) ,
& mod - > num_tracepoints ) ;
2008-07-23 04:24:26 +04:00
# endif
2009-04-10 22:53:50 +04:00
# ifdef CONFIG_EVENT_TRACING
mod - > trace_events = section_objs ( hdr , sechdrs , secstrings ,
" _ftrace_events " ,
sizeof ( * mod - > trace_events ) ,
& mod - > num_trace_events ) ;
# endif
2009-04-15 21:24:06 +04:00
# ifdef CONFIG_FTRACE_MCOUNT_RECORD
/* sechdrs[0].sh_size is always zero */
mod - > ftrace_callsites = section_objs ( hdr , sechdrs , secstrings ,
" __mcount_loc " ,
sizeof ( * mod - > ftrace_callsites ) ,
& mod - > num_ftrace_callsites ) ;
# endif
2005-04-17 02:20:36 +04:00
# ifdef CONFIG_MODVERSIONS
2008-10-22 19:00:13 +04:00
if ( ( mod - > num_syms & & ! mod - > crcs )
| | ( mod - > num_gpl_syms & & ! mod - > gpl_crcs )
| | ( mod - > num_gpl_future_syms & & ! mod - > gpl_future_crcs )
2008-07-23 04:24:26 +04:00
# ifdef CONFIG_UNUSED_SYMBOLS
2008-10-22 19:00:13 +04:00
| | ( mod - > num_unused_syms & & ! mod - > unused_crcs )
| | ( mod - > num_unused_gpl_syms & & ! mod - > unused_gpl_crcs )
2008-07-23 04:24:26 +04:00
# endif
) {
2009-03-31 23:05:33 +04:00
err = try_to_force_load ( mod ,
" no versions for exported symbols " ) ;
2008-05-05 04:04:16 +04:00
if ( err )
goto cleanup ;
2005-04-17 02:20:36 +04:00
}
# endif
/* Now do relocations. */
for ( i = 1 ; i < hdr - > e_shnum ; i + + ) {
const char * strtab = ( char * ) sechdrs [ strindex ] . sh_addr ;
unsigned int info = sechdrs [ i ] . sh_info ;
/* Not a valid relocation section? */
if ( info > = hdr - > e_shnum )
continue ;
/* Don't bother with non-allocated sections */
if ( ! ( sechdrs [ info ] . sh_flags & SHF_ALLOC ) )
continue ;
if ( sechdrs [ i ] . sh_type = = SHT_REL )
err = apply_relocate ( sechdrs , strtab , symindex , i , mod ) ;
else if ( sechdrs [ i ] . sh_type = = SHT_RELA )
err = apply_relocate_add ( sechdrs , strtab , symindex , i ,
mod ) ;
if ( err < 0 )
goto cleanup ;
}
2006-01-08 12:04:25 +03:00
/* Find duplicate symbols */
err = verify_export_symbols ( mod ) ;
if ( err < 0 )
goto cleanup ;
2005-04-17 02:20:36 +04:00
/* Set up and sort exception table */
2008-10-22 19:00:13 +04:00
mod - > extable = section_objs ( hdr , sechdrs , secstrings , " __ex_table " ,
sizeof ( * mod - > extable ) , & mod - > num_exentries ) ;
sort_extable ( mod - > extable , mod - > extable + mod - > num_exentries ) ;
2005-04-17 02:20:36 +04:00
/* Finally, copy percpu area over. */
percpu_modcopy ( mod - > percpu , ( void * ) sechdrs [ pcpuindex ] . sh_addr ,
sechdrs [ pcpuindex ] . sh_size ) ;
add_kallsyms ( mod , sechdrs , symindex , strindex , secstrings ) ;
tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 20:16:16 +04:00
if ( ! mod - > taints ) {
2009-02-05 19:51:38 +03:00
struct _ddebug * debug ;
2008-10-22 19:00:13 +04:00
unsigned int num_debug ;
debug = section_objs ( hdr , sechdrs , secstrings , " __verbose " ,
sizeof ( * debug ) , & num_debug ) ;
2009-02-05 19:51:38 +03:00
if ( debug )
dynamic_debug_setup ( debug , num_debug ) ;
tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 20:16:16 +04:00
}
2008-08-14 23:45:09 +04:00
2005-04-17 02:20:36 +04:00
err = module_finalize ( hdr , sechdrs , mod ) ;
if ( err < 0 )
goto cleanup ;
2005-09-07 02:17:11 +04:00
/* flush the icache in correct context */
old_fs = get_fs ( ) ;
set_fs ( KERNEL_DS ) ;
/*
* Flush the instruction cache , since we ' ve played with text .
* Do it before processing of module parameters , so the module
* can provide parameter accessor functions of its own .
*/
if ( mod - > module_init )
flush_icache_range ( ( unsigned long ) mod - > module_init ,
( unsigned long ) mod - > module_init
+ mod - > init_size ) ;
flush_icache_range ( ( unsigned long ) mod - > module_core ,
( unsigned long ) mod - > module_core + mod - > core_size ) ;
set_fs ( old_fs ) ;
2005-04-17 02:20:36 +04:00
mod - > args = args ;
2008-10-22 19:00:13 +04:00
if ( section_addr ( hdr , sechdrs , secstrings , " __obsparm " ) )
2006-03-25 14:07:05 +03:00
printk ( KERN_WARNING " %s: Ignoring obsolete parameters \n " ,
mod - > name ) ;
2008-01-30 01:13:21 +03:00
/* Now sew it into the lists so we can get lockdep and oops
2008-08-30 12:09:00 +04:00
* info during argument parsing . Noone should access us , since
* strong_try_module_get ( ) will fail .
* lockdep / oops can run asynchronous , so use the RCU list insertion
* function to insert in a way safe to concurrent readers .
* The mutex protects against concurrent writers .
*/
list_add_rcu ( & mod - > list , & modules ) ;
2008-01-30 01:13:21 +03:00
2009-03-31 23:05:29 +04:00
err = parse_args ( mod - > name , mod - > args , mod - > kp , mod - > num_kp , NULL ) ;
2005-04-17 02:20:36 +04:00
if ( err < 0 )
2008-01-30 01:13:21 +03:00
goto unlink ;
2005-04-17 02:20:36 +04:00
2009-03-31 23:05:29 +04:00
err = mod_sysfs_setup ( mod , mod - > kp , mod - > num_kp ) ;
2005-04-17 02:20:36 +04:00
if ( err < 0 )
2008-01-30 01:13:21 +03:00
goto unlink ;
2005-04-17 02:20:36 +04:00
add_sect_attrs ( mod , hdr - > e_shnum , secstrings , sechdrs ) ;
2007-10-17 10:26:40 +04:00
add_notes_attrs ( mod , hdr - > e_shnum , secstrings , sechdrs ) ;
2005-04-17 02:20:36 +04:00
/* Get rid of temporary copy */
vfree ( hdr ) ;
2009-08-17 12:56:28 +04:00
trace_module_load ( mod ) ;
2005-04-17 02:20:36 +04:00
/* Done! */
return mod ;
2008-01-30 01:13:21 +03:00
unlink :
2009-03-31 23:05:35 +04:00
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu ( & mod - > list ) ;
synchronize_sched ( ) ;
2005-04-17 02:20:36 +04:00
module_arch_cleanup ( mod ) ;
cleanup :
2007-11-30 01:46:11 +03:00
kobject_del ( & mod - > mkobj . kobj ) ;
kobject_put ( & mod - > mkobj . kobj ) ;
free_unload :
2005-04-17 02:20:36 +04:00
module_unload_free ( mod ) ;
2009-03-17 01:13:36 +03:00
# if defined(CONFIG_MODULE_UNLOAD) && defined(CONFIG_SMP)
2009-03-24 19:07:19 +03:00
free_init :
2009-03-17 01:13:36 +03:00
percpu_modfree ( mod - > refptr ) ;
# endif
2005-04-17 02:20:36 +04:00
module_free ( mod , mod - > module_init ) ;
free_core :
module_free ( mod , mod - > module_core ) ;
2009-03-17 01:13:36 +03:00
/* mod will be freed with core. Don't access it beyond this line! */
2005-04-17 02:20:36 +04:00
free_percpu :
if ( percpu )
percpu_modfree ( percpu ) ;
free_mod :
kfree ( args ) ;
free_hdr :
vfree ( hdr ) ;
2006-01-06 11:19:54 +03:00
return ERR_PTR ( err ) ;
2005-04-17 02:20:36 +04:00
truncated :
printk ( KERN_ERR " Module len %lu truncated \n " , len ) ;
err = - ENOEXEC ;
goto free_hdr ;
}
2009-06-18 03:28:03 +04:00
/* Call module constructors. */
static void do_mod_ctors ( struct module * mod )
{
# ifdef CONFIG_CONSTRUCTORS
unsigned long i ;
for ( i = 0 ; i < mod - > num_ctors ; i + + )
mod - > ctors [ i ] ( ) ;
# endif
}
2005-04-17 02:20:36 +04:00
/* This is where the real work happens */
2009-01-14 16:14:10 +03:00
SYSCALL_DEFINE3 ( init_module , void __user * , umod ,
unsigned long , len , const char __user * , uargs )
2005-04-17 02:20:36 +04:00
{
struct module * mod ;
int ret = 0 ;
/* Must have permission */
2009-04-03 02:49:29 +04:00
if ( ! capable ( CAP_SYS_MODULE ) | | modules_disabled )
2005-04-17 02:20:36 +04:00
return - EPERM ;
/* Only one module load at a time, please */
2006-03-23 14:00:46 +03:00
if ( mutex_lock_interruptible ( & module_mutex ) ! = 0 )
2005-04-17 02:20:36 +04:00
return - EINTR ;
/* Do all the hard work */
mod = load_module ( umod , len , uargs ) ;
if ( IS_ERR ( mod ) ) {
2006-03-23 14:00:46 +03:00
mutex_unlock ( & module_mutex ) ;
2005-04-17 02:20:36 +04:00
return PTR_ERR ( mod ) ;
}
/* Drop lock so they can recurse */
2006-03-23 14:00:46 +03:00
mutex_unlock ( & module_mutex ) ;
2005-04-17 02:20:36 +04:00
[PATCH] Notifier chain update: API changes
The kernel's implementation of notifier chains is unsafe. There is no
protection against entries being added to or removed from a chain while the
chain is in use. The issues were discussed in this thread:
http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2
We noticed that notifier chains in the kernel fall into two basic usage
classes:
"Blocking" chains are always called from a process context
and the callout routines are allowed to sleep;
"Atomic" chains can be called from an atomic context and
the callout routines are not allowed to sleep.
We decided to codify this distinction and make it part of the API. Therefore
this set of patches introduces three new, parallel APIs: one for blocking
notifiers, one for atomic notifiers, and one for "raw" notifiers (which is
really just the old API under a new name). New kinds of data structures are
used for the heads of the chains, and new routines are defined for
registration, unregistration, and calling a chain. The three APIs are
explained in include/linux/notifier.h and their implementation is in
kernel/sys.c.
With atomic and blocking chains, the implementation guarantees that the chain
links will not be corrupted and that chain callers will not get messed up by
entries being added or removed. For raw chains the implementation provides no
guarantees at all; users of this API must provide their own protections. (The
idea was that situations may come up where the assumptions of the atomic and
blocking APIs are not appropriate, so it should be possible for users to
handle these things in their own way.)
There are some limitations, which should not be too hard to live with. For
atomic/blocking chains, registration and unregistration must always be done in
a process context since the chain is protected by a mutex/rwsem. Also, a
callout routine for a non-raw chain must not try to register or unregister
entries on its own chain. (This did happen in a couple of places and the code
had to be changed to avoid it.)
Since atomic chains may be called from within an NMI handler, they cannot use
spinlocks for synchronization. Instead we use RCU. The overhead falls almost
entirely in the unregister routine, which is okay since unregistration is much
less frequent that calling a chain.
Here is the list of chains that we adjusted and their classifications. None
of them use the raw API, so for the moment it is only a placeholder.
ATOMIC CHAINS
-------------
arch/i386/kernel/traps.c: i386die_chain
arch/ia64/kernel/traps.c: ia64die_chain
arch/powerpc/kernel/traps.c: powerpc_die_chain
arch/sparc64/kernel/traps.c: sparc64die_chain
arch/x86_64/kernel/traps.c: die_chain
drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list
kernel/panic.c: panic_notifier_list
kernel/profile.c: task_free_notifier
net/bluetooth/hci_core.c: hci_notifier
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain
net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain
net/ipv6/addrconf.c: inet6addr_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_chain
net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain
net/netlink/af_netlink.c: netlink_chain
BLOCKING CHAINS
---------------
arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain
arch/s390/kernel/process.c: idle_chain
arch/x86_64/kernel/process.c idle_notifier
drivers/base/memory.c: memory_chain
drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list
drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list
drivers/macintosh/adb.c: adb_client_list
drivers/macintosh/via-pmu.c sleep_notifier_list
drivers/macintosh/via-pmu68k.c sleep_notifier_list
drivers/macintosh/windfarm_core.c wf_client_list
drivers/usb/core/notify.c usb_notifier_list
drivers/video/fbmem.c fb_notifier_list
kernel/cpu.c cpu_chain
kernel/module.c module_notify_list
kernel/profile.c munmap_notifier
kernel/profile.c task_exit_notifier
kernel/sys.c reboot_notifier_list
net/core/dev.c netdev_chain
net/decnet/dn_dev.c: dnaddr_chain
net/ipv4/devinet.c: inetaddr_chain
It's possible that some of these classifications are wrong. If they are,
please let us know or submit a patch to fix them. Note that any chain that
gets called very frequently should be atomic, because the rwsem read-locking
used for blocking chains is very likely to incur cache misses on SMP systems.
(However, if the chain's callout routines may sleep then the chain cannot be
atomic.)
The patch set was written by Alan Stern and Chandra Seetharaman, incorporating
material written by Keith Owens and suggestions from Paul McKenney and Andrew
Morton.
[jes@sgi.com: restructure the notifier chain initialization macros]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 13:16:30 +04:00
blocking_notifier_call_chain ( & module_notify_list ,
MODULE_STATE_COMING , mod ) ;
2005-04-17 02:20:36 +04:00
2009-06-18 03:28:03 +04:00
do_mod_ctors ( mod ) ;
2005-04-17 02:20:36 +04:00
/* Start the module */
if ( mod - > init ! = NULL )
2008-07-30 23:49:02 +04:00
ret = do_one_initcall ( mod - > init ) ;
2005-04-17 02:20:36 +04:00
if ( ret < 0 ) {
/* Init routine failed: abort. Try to protect us from
buggy refcounters . */
mod - > state = MODULE_STATE_GOING ;
2005-05-01 19:59:04 +04:00
synchronize_sched ( ) ;
2007-10-17 10:26:27 +04:00
module_put ( mod ) ;
2008-04-21 16:34:31 +04:00
blocking_notifier_call_chain ( & module_notify_list ,
MODULE_STATE_GOING , mod ) ;
2007-10-17 10:26:27 +04:00
mutex_lock ( & module_mutex ) ;
free_module ( mod ) ;
mutex_unlock ( & module_mutex ) ;
2008-01-30 01:13:18 +03:00
wake_up ( & module_wq ) ;
2005-04-17 02:20:36 +04:00
return ret ;
}
2008-03-10 21:43:53 +03:00
if ( ret > 0 ) {
2009-07-07 00:05:40 +04:00
printk ( KERN_WARNING
" %s: '%s'->init suspiciously returned %d, it should follow 0/-E convention \n "
" %s: loading module anyway... \n " ,
2008-03-10 21:43:53 +03:00
__func__ , mod - > name , ret ,
__func__ ) ;
dump_stack ( ) ;
}
2005-04-17 02:20:36 +04:00
2008-03-10 21:43:52 +03:00
/* Now it's a first class citizen! Wake up anyone waiting for it. */
2005-04-17 02:20:36 +04:00
mod - > state = MODULE_STATE_LIVE ;
2008-03-10 21:43:52 +03:00
wake_up ( & module_wq ) ;
2009-01-07 01:41:54 +03:00
blocking_notifier_call_chain ( & module_notify_list ,
MODULE_STATE_LIVE , mod ) ;
2008-03-10 21:43:52 +03:00
async: Fix module loading async-work regression
Several drivers use asynchronous work to do device discovery, and we
synchronize with them in the compiled-in case before we actually try to
mount root filesystems etc.
However, when compiled as modules, that synchronization is missing - the
module loading completes, but the driver hasn't actually finished
probing for devices, and that means that any user mode that expects to
use the devices after the 'insmod' is now potentially broken.
We already saw one case of a similar issue in the ACPI battery code,
where the kernel itself expected the module to be all done, and unmapped
the init memory - but the async device discovery was still running.
That got hacked around by just removing the "__init" (see commit
5d38258ec026921a7b266f4047ebeaa75db358e5 "ACPI battery: fix async boot
oops"), but the real fix is to just make the module loading wait for all
async work to be completed.
It will slow down module loading, but since common devices should be
built in anyway, and since the bug is really annoying and hard to handle
from user space (and caused several S3 resume regressions), the simple
fix to wait is the right one.
This fixes at least
http://bugzilla.kernel.org/show_bug.cgi?id=13063
but probably a few other bugzilla entries too (12936, for example), and
is confirmed to fix Rafael's storage driver breakage after resume bug
report (no bugzilla entry).
We should also be able to now revert that ACPI battery fix.
Reported-and-tested-by: Rafael J. Wysocki <rjw@suse.com>
Tested-by: Heinz Diehl <htd@fancy-poultry.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-10 23:17:41 +04:00
/* We need to finish all async code before the module init sequence is done */
async_synchronize_full ( ) ;
2008-03-10 21:43:52 +03:00
mutex_lock ( & module_mutex ) ;
2005-04-17 02:20:36 +04:00
/* Drop initial reference. */
module_put ( mod ) ;
2009-06-13 07:47:03 +04:00
trim_init_extable ( mod ) ;
2005-04-17 02:20:36 +04:00
module_free ( mod , mod - > module_init ) ;
mod - > module_init = NULL ;
mod - > init_size = 0 ;
mod - > init_text_size = 0 ;
2006-03-23 14:00:46 +03:00
mutex_unlock ( & module_mutex ) ;
2005-04-17 02:20:36 +04:00
return 0 ;
}
static inline int within ( unsigned long addr , void * start , unsigned long size )
{
return ( ( void * ) addr > = start & & ( void * ) addr < start + size ) ;
}
# ifdef CONFIG_KALLSYMS
/*
* This ignores the intensely annoying " mapping symbols " found
* in ARM ELF files : $ a , $ t and $ d .
*/
static inline int is_arm_mapping_symbol ( const char * str )
{
2007-10-18 14:06:07 +04:00
return str [ 0 ] = = ' $ ' & & strchr ( " atd " , str [ 1 ] )
2005-04-17 02:20:36 +04:00
& & ( str [ 2 ] = = ' \0 ' | | str [ 2 ] = = ' . ' ) ;
}
static const char * get_ksymbol ( struct module * mod ,
unsigned long addr ,
unsigned long * size ,
unsigned long * offset )
{
unsigned int i , best = 0 ;
unsigned long nextval ;
/* At worse, next value is at end of module */
2009-01-07 01:41:49 +03:00
if ( within_module_init ( addr , mod ) )
2005-04-17 02:20:36 +04:00
nextval = ( unsigned long ) mod - > module_init + mod - > init_text_size ;
2007-10-18 14:06:07 +04:00
else
2005-04-17 02:20:36 +04:00
nextval = ( unsigned long ) mod - > module_core + mod - > core_text_size ;
/* Scan for closest preceeding symbol, and next symbol. (ELF
2007-10-18 14:06:07 +04:00
starts real symbols at 1 ) . */
2005-04-17 02:20:36 +04:00
for ( i = 1 ; i < mod - > num_symtab ; i + + ) {
if ( mod - > symtab [ i ] . st_shndx = = SHN_UNDEF )
continue ;
/* We ignore unnamed symbols: they're uninformative
* and inserted at a whim . */
if ( mod - > symtab [ i ] . st_value < = addr
& & mod - > symtab [ i ] . st_value > mod - > symtab [ best ] . st_value
& & * ( mod - > strtab + mod - > symtab [ i ] . st_name ) ! = ' \0 '
& & ! is_arm_mapping_symbol ( mod - > strtab + mod - > symtab [ i ] . st_name ) )
best = i ;
if ( mod - > symtab [ i ] . st_value > addr
& & mod - > symtab [ i ] . st_value < nextval
& & * ( mod - > strtab + mod - > symtab [ i ] . st_name ) ! = ' \0 '
& & ! is_arm_mapping_symbol ( mod - > strtab + mod - > symtab [ i ] . st_name ) )
nextval = mod - > symtab [ i ] . st_value ;
}
if ( ! best )
return NULL ;
2007-05-08 11:28:41 +04:00
if ( size )
* size = nextval - mod - > symtab [ best ] . st_value ;
if ( offset )
* offset = addr - mod - > symtab [ best ] . st_value ;
2005-04-17 02:20:36 +04:00
return mod - > strtab + mod - > symtab [ best ] . st_name ;
}
2008-01-30 01:13:22 +03:00
/* For kallsyms to ask for address resolution. NULL means not found. Careful
* not to lock to avoid deadlock on oopses , simply disable preemption . */
2008-02-08 15:18:43 +03:00
const char * module_address_lookup ( unsigned long addr ,
2008-01-30 01:13:22 +03:00
unsigned long * size ,
unsigned long * offset ,
char * * modname ,
char * namebuf )
2005-04-17 02:20:36 +04:00
{
struct module * mod ;
2008-01-14 11:55:03 +03:00
const char * ret = NULL ;
2005-04-17 02:20:36 +04:00
2008-01-14 11:55:03 +03:00
preempt_disable ( ) ;
2008-08-30 12:09:00 +04:00
list_for_each_entry_rcu ( mod , & modules , list ) {
2009-01-07 01:41:49 +03:00
if ( within_module_init ( addr , mod ) | |
within_module_core ( addr , mod ) ) {
2006-10-03 12:13:48 +04:00
if ( modname )
* modname = mod - > name ;
2008-01-14 11:55:03 +03:00
ret = get_ksymbol ( mod , addr , size , offset ) ;
break ;
2005-04-17 02:20:36 +04:00
}
}
2008-01-30 01:13:22 +03:00
/* Make a copy in here where it's safe */
if ( ret ) {
strncpy ( namebuf , ret , KSYM_NAME_LEN - 1 ) ;
ret = namebuf ;
}
2008-01-14 11:55:03 +03:00
preempt_enable ( ) ;
2008-02-08 15:18:43 +03:00
return ret ;
2005-04-17 02:20:36 +04:00
}
2007-05-08 11:28:43 +04:00
int lookup_module_symbol_name ( unsigned long addr , char * symname )
{
struct module * mod ;
2008-01-14 11:55:03 +03:00
preempt_disable ( ) ;
2008-08-30 12:09:00 +04:00
list_for_each_entry_rcu ( mod , & modules , list ) {
2009-01-07 01:41:49 +03:00
if ( within_module_init ( addr , mod ) | |
within_module_core ( addr , mod ) ) {
2007-05-08 11:28:43 +04:00
const char * sym ;
sym = get_ksymbol ( mod , addr , NULL , NULL ) ;
if ( ! sym )
goto out ;
2007-07-17 15:03:51 +04:00
strlcpy ( symname , sym , KSYM_NAME_LEN ) ;
2008-01-14 11:55:03 +03:00
preempt_enable ( ) ;
2007-05-08 11:28:43 +04:00
return 0 ;
}
}
out :
2008-01-14 11:55:03 +03:00
preempt_enable ( ) ;
2007-05-08 11:28:43 +04:00
return - ERANGE ;
}
2007-05-08 11:28:47 +04:00
int lookup_module_symbol_attrs ( unsigned long addr , unsigned long * size ,
unsigned long * offset , char * modname , char * name )
{
struct module * mod ;
2008-01-14 11:55:03 +03:00
preempt_disable ( ) ;
2008-08-30 12:09:00 +04:00
list_for_each_entry_rcu ( mod , & modules , list ) {
2009-01-07 01:41:49 +03:00
if ( within_module_init ( addr , mod ) | |
within_module_core ( addr , mod ) ) {
2007-05-08 11:28:47 +04:00
const char * sym ;
sym = get_ksymbol ( mod , addr , size , offset ) ;
if ( ! sym )
goto out ;
if ( modname )
2007-07-17 15:03:51 +04:00
strlcpy ( modname , mod - > name , MODULE_NAME_LEN ) ;
2007-05-08 11:28:47 +04:00
if ( name )
2007-07-17 15:03:51 +04:00
strlcpy ( name , sym , KSYM_NAME_LEN ) ;
2008-01-14 11:55:03 +03:00
preempt_enable ( ) ;
2007-05-08 11:28:47 +04:00
return 0 ;
}
}
out :
2008-01-14 11:55:03 +03:00
preempt_enable ( ) ;
2007-05-08 11:28:47 +04:00
return - ERANGE ;
}
2007-05-08 11:28:39 +04:00
int module_get_kallsym ( unsigned int symnum , unsigned long * value , char * type ,
char * name , char * module_name , int * exported )
2005-04-17 02:20:36 +04:00
{
struct module * mod ;
2008-01-14 11:55:03 +03:00
preempt_disable ( ) ;
2008-08-30 12:09:00 +04:00
list_for_each_entry_rcu ( mod , & modules , list ) {
2005-04-17 02:20:36 +04:00
if ( symnum < mod - > num_symtab ) {
* value = mod - > symtab [ symnum ] . st_value ;
* type = mod - > symtab [ symnum ] . st_info ;
2006-07-14 11:24:04 +04:00
strlcpy ( name , mod - > strtab + mod - > symtab [ symnum ] . st_name ,
2007-07-17 15:03:51 +04:00
KSYM_NAME_LEN ) ;
strlcpy ( module_name , mod - > name , MODULE_NAME_LEN ) ;
2009-01-05 17:40:10 +03:00
* exported = is_exported ( name , * value , mod ) ;
2008-01-14 11:55:03 +03:00
preempt_enable ( ) ;
2007-05-08 11:28:39 +04:00
return 0 ;
2005-04-17 02:20:36 +04:00
}
symnum - = mod - > num_symtab ;
}
2008-01-14 11:55:03 +03:00
preempt_enable ( ) ;
2007-05-08 11:28:39 +04:00
return - ERANGE ;
2005-04-17 02:20:36 +04:00
}
static unsigned long mod_find_symname ( struct module * mod , const char * name )
{
unsigned int i ;
for ( i = 0 ; i < mod - > num_symtab ; i + + )
2006-02-03 14:03:53 +03:00
if ( strcmp ( name , mod - > strtab + mod - > symtab [ i ] . st_name ) = = 0 & &
mod - > symtab [ i ] . st_info ! = ' U ' )
2005-04-17 02:20:36 +04:00
return mod - > symtab [ i ] . st_value ;
return 0 ;
}
/* Look for this name: can be of form module:name. */
unsigned long module_kallsyms_lookup_name ( const char * name )
{
struct module * mod ;
char * colon ;
unsigned long ret = 0 ;
/* Don't lock: we're in enough trouble already. */
2008-01-14 11:55:03 +03:00
preempt_disable ( ) ;
2005-04-17 02:20:36 +04:00
if ( ( colon = strchr ( name , ' : ' ) ) ! = NULL ) {
* colon = ' \0 ' ;
if ( ( mod = find_module ( name ) ) ! = NULL )
ret = mod_find_symname ( mod , colon + 1 ) ;
* colon = ' : ' ;
} else {
2008-08-30 12:09:00 +04:00
list_for_each_entry_rcu ( mod , & modules , list )
2005-04-17 02:20:36 +04:00
if ( ( ret = mod_find_symname ( mod , name ) ) ! = 0 )
break ;
}
2008-01-14 11:55:03 +03:00
preempt_enable ( ) ;
2005-04-17 02:20:36 +04:00
return ret ;
}
2008-12-06 03:03:58 +03:00
int module_kallsyms_on_each_symbol ( int ( * fn ) ( void * , const char * ,
struct module * , unsigned long ) ,
void * data )
{
struct module * mod ;
unsigned int i ;
int ret ;
list_for_each_entry ( mod , & modules , list ) {
for ( i = 0 ; i < mod - > num_symtab ; i + + ) {
ret = fn ( data , mod - > strtab + mod - > symtab [ i ] . st_name ,
mod , mod - > symtab [ i ] . st_value ) ;
if ( ret ! = 0 )
return ret ;
}
}
return 0 ;
}
2005-04-17 02:20:36 +04:00
# endif /* CONFIG_KALLSYMS */
2008-01-25 23:08:33 +03:00
static char * module_flags ( struct module * mod , char * buf )
2006-10-11 12:21:48 +04:00
{
int bx = 0 ;
2008-01-25 23:08:33 +03:00
if ( mod - > taints | |
mod - > state = = MODULE_STATE_GOING | |
mod - > state = = MODULE_STATE_COMING ) {
2006-10-11 12:21:48 +04:00
buf [ bx + + ] = ' ( ' ;
2008-10-16 09:01:41 +04:00
if ( mod - > taints & ( 1 < < TAINT_PROPRIETARY_MODULE ) )
2006-10-11 12:21:48 +04:00
buf [ bx + + ] = ' P ' ;
2008-10-16 09:01:41 +04:00
if ( mod - > taints & ( 1 < < TAINT_FORCED_MODULE ) )
2006-10-11 12:21:48 +04:00
buf [ bx + + ] = ' F ' ;
2008-10-17 20:50:12 +04:00
if ( mod - > taints & ( 1 < < TAINT_CRAP ) )
2008-09-25 01:46:44 +04:00
buf [ bx + + ] = ' C ' ;
2006-10-11 12:21:48 +04:00
/*
* TAINT_FORCED_RMMOD : could be added .
* TAINT_UNSAFE_SMP , TAINT_MACHINE_CHECK , TAINT_BAD_PAGE don ' t
* apply to modules .
*/
2008-01-25 23:08:33 +03:00
/* Show a - for module-is-being-unloaded */
if ( mod - > state = = MODULE_STATE_GOING )
buf [ bx + + ] = ' - ' ;
/* Show a + for module-is-being-loaded */
if ( mod - > state = = MODULE_STATE_COMING )
buf [ bx + + ] = ' + ' ;
2006-10-11 12:21:48 +04:00
buf [ bx + + ] = ' ) ' ;
}
buf [ bx ] = ' \0 ' ;
return buf ;
}
2008-10-06 13:19:27 +04:00
# ifdef CONFIG_PROC_FS
/* Called by the /proc file system to return a list of modules. */
static void * m_start ( struct seq_file * m , loff_t * pos )
{
mutex_lock ( & module_mutex ) ;
return seq_list_start ( & modules , * pos ) ;
}
static void * m_next ( struct seq_file * m , void * p , loff_t * pos )
{
return seq_list_next ( p , & modules , pos ) ;
}
static void m_stop ( struct seq_file * m , void * p )
{
mutex_unlock ( & module_mutex ) ;
}
2005-04-17 02:20:36 +04:00
static int m_show ( struct seq_file * m , void * p )
{
struct module * mod = list_entry ( p , struct module , list ) ;
2006-10-11 12:21:48 +04:00
char buf [ 8 ] ;
2008-07-23 04:24:27 +04:00
seq_printf ( m , " %s %u " ,
2005-04-17 02:20:36 +04:00
mod - > name , mod - > init_size + mod - > core_size ) ;
print_unload_info ( m , mod ) ;
/* Informative for users. */
seq_printf ( m , " %s " ,
mod - > state = = MODULE_STATE_GOING ? " Unloading " :
mod - > state = = MODULE_STATE_COMING ? " Loading " :
" Live " ) ;
/* Used by oprofile and other similar tools. */
seq_printf ( m , " 0x%p " , mod - > module_core ) ;
2006-10-11 12:21:48 +04:00
/* Taints info */
if ( mod - > taints )
2008-01-25 23:08:33 +03:00
seq_printf ( m , " %s " , module_flags ( mod , buf ) ) ;
2006-10-11 12:21:48 +04:00
2005-04-17 02:20:36 +04:00
seq_printf ( m , " \n " ) ;
return 0 ;
}
/* Format: modulename size refcount deps address
Where refcount is a number or - , and deps is a comma - separated list
of depends or - .
*/
2008-10-06 13:19:27 +04:00
static const struct seq_operations modules_op = {
2005-04-17 02:20:36 +04:00
. start = m_start ,
. next = m_next ,
. stop = m_stop ,
. show = m_show
} ;
2008-10-06 13:19:27 +04:00
static int modules_open ( struct inode * inode , struct file * file )
{
return seq_open ( file , & modules_op ) ;
}
static const struct file_operations proc_modules_operations = {
. open = modules_open ,
. read = seq_read ,
. llseek = seq_lseek ,
. release = seq_release ,
} ;
static int __init proc_modules_init ( void )
{
proc_create ( " modules " , 0 , NULL , & proc_modules_operations ) ;
return 0 ;
}
module_init ( proc_modules_init ) ;
# endif
2005-04-17 02:20:36 +04:00
/* Given an address, look for it in the module exception tables. */
const struct exception_table_entry * search_module_extables ( unsigned long addr )
{
const struct exception_table_entry * e = NULL ;
struct module * mod ;
2007-07-16 10:41:46 +04:00
preempt_disable ( ) ;
2008-08-30 12:09:00 +04:00
list_for_each_entry_rcu ( mod , & modules , list ) {
2005-04-17 02:20:36 +04:00
if ( mod - > num_exentries = = 0 )
continue ;
2007-10-18 14:06:07 +04:00
2005-04-17 02:20:36 +04:00
e = search_extable ( mod - > extable ,
mod - > extable + mod - > num_exentries - 1 ,
addr ) ;
if ( e )
break ;
}
2007-07-16 10:41:46 +04:00
preempt_enable ( ) ;
2005-04-17 02:20:36 +04:00
/* Now, if we found one, we are running inside it now, hence
2007-10-18 14:06:07 +04:00
we cannot unload the module , hence no refcnt needed . */
2005-04-17 02:20:36 +04:00
return e ;
}
2006-07-03 11:24:24 +04:00
/*
2009-03-31 23:05:31 +04:00
* is_module_address - is this address inside a module ?
* @ addr : the address to check .
*
* See is_module_text_address ( ) if you simply want to see if the address
* is code ( not data ) .
2006-07-03 11:24:24 +04:00
*/
2009-03-31 23:05:31 +04:00
bool is_module_address ( unsigned long addr )
2006-07-03 11:24:24 +04:00
{
2009-03-31 23:05:31 +04:00
bool ret ;
2006-07-03 11:24:24 +04:00
2007-07-16 10:41:46 +04:00
preempt_disable ( ) ;
2009-03-31 23:05:31 +04:00
ret = __module_address ( addr ) ! = NULL ;
2007-07-16 10:41:46 +04:00
preempt_enable ( ) ;
2006-07-03 11:24:24 +04:00
2009-03-31 23:05:31 +04:00
return ret ;
2006-07-03 11:24:24 +04:00
}
2009-03-31 23:05:31 +04:00
/*
* __module_address - get the module which contains an address .
* @ addr : the address .
*
* Must be called with preempt disabled or module mutex held so that
* module doesn ' t get freed during this .
*/
2009-04-05 22:04:19 +04:00
struct module * __module_address ( unsigned long addr )
2005-04-17 02:20:36 +04:00
{
struct module * mod ;
2008-07-23 04:24:28 +04:00
if ( addr < module_addr_min | | addr > module_addr_max )
return NULL ;
2008-08-30 12:09:00 +04:00
list_for_each_entry_rcu ( mod , & modules , list )
2009-03-31 23:05:31 +04:00
if ( within_module_core ( addr , mod )
| | within_module_init ( addr , mod ) )
2005-04-17 02:20:36 +04:00
return mod ;
return NULL ;
}
2008-12-06 03:03:59 +03:00
EXPORT_SYMBOL_GPL ( __module_address ) ;
2005-04-17 02:20:36 +04:00
2009-03-31 23:05:31 +04:00
/*
* is_module_text_address - is this address inside module code ?
* @ addr : the address to check .
*
* See is_module_address ( ) if you simply want to see if the address is
* anywhere in a module . See kernel_text_address ( ) for testing if an
* address corresponds to kernel or module code .
*/
bool is_module_text_address ( unsigned long addr )
{
bool ret ;
preempt_disable ( ) ;
ret = __module_text_address ( addr ) ! = NULL ;
preempt_enable ( ) ;
return ret ;
}
/*
* __module_text_address - get the module whose code contains an address .
* @ addr : the address .
*
* Must be called with preempt disabled or module mutex held so that
* module doesn ' t get freed during this .
*/
struct module * __module_text_address ( unsigned long addr )
{
struct module * mod = __module_address ( addr ) ;
if ( mod ) {
/* Make sure it's within the text section. */
if ( ! within ( addr , mod - > module_init , mod - > init_text_size )
& & ! within ( addr , mod - > module_core , mod - > core_text_size ) )
mod = NULL ;
}
return mod ;
}
2008-12-06 03:03:59 +03:00
EXPORT_SYMBOL_GPL ( __module_text_address ) ;
2009-03-31 23:05:31 +04:00
2005-04-17 02:20:36 +04:00
/* Don't grab lock, we're oopsing. */
void print_modules ( void )
{
struct module * mod ;
2006-10-02 13:17:02 +04:00
char buf [ 8 ] ;
2005-04-17 02:20:36 +04:00
2009-06-16 22:07:14 +04:00
printk ( KERN_DEFAULT " Modules linked in: " ) ;
2008-08-30 12:09:00 +04:00
/* Most callers should already have preempt disabled, but make sure */
preempt_disable ( ) ;
list_for_each_entry_rcu ( mod , & modules , list )
2008-01-25 23:08:33 +03:00
printk ( " %s%s " , mod - > name , module_flags ( mod , buf ) ) ;
2008-08-30 12:09:00 +04:00
preempt_enable ( ) ;
2008-01-25 23:08:33 +03:00
if ( last_unloaded_module [ 0 ] )
printk ( " [last unloaded: %s] " , last_unloaded_module ) ;
2005-04-17 02:20:36 +04:00
printk ( " \n " ) ;
}
# ifdef CONFIG_MODVERSIONS
2009-03-31 23:05:34 +04:00
/* Generate the signature for all relevant module structures here.
* If these change , we don ' t want to try to parse the module . */
void module_layout ( struct module * mod ,
struct modversion_info * ver ,
struct kernel_param * kp ,
struct kernel_symbol * ks ,
struct marker * marker ,
struct tracepoint * tp )
{
}
EXPORT_SYMBOL ( module_layout ) ;
2005-04-17 02:20:36 +04:00
# endif
2007-10-19 10:41:06 +04:00
# ifdef CONFIG_MARKERS
Linux Kernel Markers: support multiple probes
RCU style multiple probes support for the Linux Kernel Markers. Common case
(one probe) is still fast and does not require dynamic allocation or a
supplementary pointer dereference on the fast path.
- Move preempt disable from the marker site to the callback.
Since we now have an internal callback, move the preempt disable/enable to the
callback instead of the marker site.
Since the callback change is done asynchronously (passing from a handler that
supports arguments to a handler that does not setup the arguments is no
arguments are passed), we can safely update it even if it is outside the
preempt disable section.
- Move probe arm to probe connection. Now, a connected probe is automatically
armed.
Remove MARK_MAX_FORMAT_LEN, unused.
This patch modifies the Linux Kernel Markers API : it removes the probe
"arm/disarm" and changes the probe function prototype : it now expects a
va_list * instead of a "...".
If we want to have more than one probe connected to a marker at a given
time (LTTng, or blktrace, ssytemtap) then we need this patch. Without it,
connecting a second probe handler to a marker will fail.
It allow us, for instance, to do interesting combinations :
Do standard tracing with LTTng and, eventually, to compute statistics
with SystemTAP, or to have a special trigger on an event that would call
a systemtap script which would stop flight recorder tracing.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Mason <mmlnx@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: David Smith <dsmith@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 02:03:37 +03:00
void module_update_markers ( void )
2007-10-19 10:41:06 +04:00
{
struct module * mod ;
mutex_lock ( & module_mutex ) ;
list_for_each_entry ( mod , & modules , list )
if ( ! mod - > taints )
marker_update_probe_range ( mod - > markers ,
Linux Kernel Markers: support multiple probes
RCU style multiple probes support for the Linux Kernel Markers. Common case
(one probe) is still fast and does not require dynamic allocation or a
supplementary pointer dereference on the fast path.
- Move preempt disable from the marker site to the callback.
Since we now have an internal callback, move the preempt disable/enable to the
callback instead of the marker site.
Since the callback change is done asynchronously (passing from a handler that
supports arguments to a handler that does not setup the arguments is no
arguments are passed), we can safely update it even if it is outside the
preempt disable section.
- Move probe arm to probe connection. Now, a connected probe is automatically
armed.
Remove MARK_MAX_FORMAT_LEN, unused.
This patch modifies the Linux Kernel Markers API : it removes the probe
"arm/disarm" and changes the probe function prototype : it now expects a
va_list * instead of a "...".
If we want to have more than one probe connected to a marker at a given
time (LTTng, or blktrace, ssytemtap) then we need this patch. Without it,
connecting a second probe handler to a marker will fail.
It allow us, for instance, to do interesting combinations :
Do standard tracing with LTTng and, eventually, to compute statistics
with SystemTAP, or to have a special trigger on an event that would call
a systemtap script which would stop flight recorder tracing.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Mike Mason <mmlnx@us.ibm.com>
Cc: Dipankar Sarma <dipankar@in.ibm.com>
Cc: David Smith <dsmith@redhat.com>
Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 02:03:37 +03:00
mod - > markers + mod - > num_markers ) ;
2007-10-19 10:41:06 +04:00
mutex_unlock ( & module_mutex ) ;
}
# endif
tracing: Kernel Tracepoints
Implementation of kernel tracepoints. Inspired from the Linux Kernel
Markers. Allows complete typing verification by declaring both tracing
statement inline functions and probe registration/unregistration static
inline functions within the same macro "DEFINE_TRACE". No format string
is required. See the tracepoint Documentation and Samples patches for
usage examples.
Taken from the documentation patch :
"A tracepoint placed in code provides a hook to call a function (probe)
that you can provide at runtime. A tracepoint can be "on" (a probe is
connected to it) or "off" (no probe is attached). When a tracepoint is
"off" it has no effect, except for adding a tiny time penalty (checking
a condition for a branch) and space penalty (adding a few bytes for the
function call at the end of the instrumented function and adds a data
structure in a separate section). When a tracepoint is "on", the
function you provide is called each time the tracepoint is executed, in
the execution context of the caller. When the function provided ends its
execution, it returns to the caller (continuing from the tracepoint
site).
You can put tracepoints at important locations in the code. They are
lightweight hooks that can pass an arbitrary number of parameters, which
prototypes are described in a tracepoint declaration placed in a header
file."
Addition and removal of tracepoints is synchronized by RCU using the
scheduler (and preempt_disable) as guarantees to find a quiescent state
(this is really RCU "classic"). The update side uses rcu_barrier_sched()
with call_rcu_sched() and the read/execute side uses
"preempt_disable()/preempt_enable()".
We make sure the previous array containing probes, which has been
scheduled for deletion by the rcu callback, is indeed freed before we
proceed to the next update. It therefore limits the rate of modification
of a single tracepoint to one update per RCU period. The objective here
is to permit fast batch add/removal of probes on _different_
tracepoints.
Changelog :
- Use #name ":" #proto as string to identify the tracepoint in the
tracepoint table. This will make sure not type mismatch happens due to
connexion of a probe with the wrong type to a tracepoint declared with
the same name in a different header.
- Add tracepoint_entry_free_old.
- Change __TO_TRACE to get rid of the 'i' iterator.
Masami Hiramatsu <mhiramat@redhat.com> :
Tested on x86-64.
Performance impact of a tracepoint : same as markers, except that it
adds about 70 bytes of instructions in an unlikely branch of each
instrumented function (the for loop, the stack setup and the function
call). It currently adds a memory read, a test and a conditional branch
at the instrumentation site (in the hot path). Immediate values will
eventually change this into a load immediate, test and branch, which
removes the memory read which will make the i-cache impact smaller
(changing the memory read for a load immediate removes 3-4 bytes per
site on x86_32 (depending on mov prefixes), or 7-8 bytes on x86_64, it
also saves the d-cache hit).
About the performance impact of tracepoints (which is comparable to
markers), even without immediate values optimizations, tests done by
Hideo Aoki on ia64 show no regression. His test case was using hackbench
on a kernel where scheduler instrumentation (about 5 events in code
scheduler code) was added.
Quoting Hideo Aoki about Markers :
I evaluated overhead of kernel marker using linux-2.6-sched-fixes git
tree, which includes several markers for LTTng, using an ia64 server.
While the immediate trace mark feature isn't implemented on ia64, there
is no major performance regression. So, I think that we don't have any
issues to propose merging marker point patches into Linus's tree from
the viewpoint of performance impact.
I prepared two kernels to evaluate. The first one was compiled without
CONFIG_MARKERS. The second one was enabled CONFIG_MARKERS.
I downloaded the original hackbench from the following URL:
http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c
I ran hackbench 5 times in each condition and calculated the average and
difference between the kernels.
The parameter of hackbench: every 50 from 50 to 800
The number of CPUs of the server: 2, 4, and 8
Below is the results. As you can see, major performance regression
wasn't found in any case. Even if number of processes increases,
differences between marker-enabled kernel and marker- disabled kernel
doesn't increase. Moreover, if number of CPUs increases, the differences
doesn't increase either.
Curiously, marker-enabled kernel is better than marker-disabled kernel
in more than half cases, although I guess it comes from the difference
of memory access pattern.
* 2 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 4.811 | 4.872 | +0.061 | +1.27 |
100 | 9.854 | 10.309 | +0.454 | +4.61 |
150 | 15.602 | 15.040 | -0.562 | -3.6 |
200 | 20.489 | 20.380 | -0.109 | -0.53 |
250 | 25.798 | 25.652 | -0.146 | -0.56 |
300 | 31.260 | 30.797 | -0.463 | -1.48 |
350 | 36.121 | 35.770 | -0.351 | -0.97 |
400 | 42.288 | 42.102 | -0.186 | -0.44 |
450 | 47.778 | 47.253 | -0.526 | -1.1 |
500 | 51.953 | 52.278 | +0.325 | +0.63 |
550 | 58.401 | 57.700 | -0.701 | -1.2 |
600 | 63.334 | 63.222 | -0.112 | -0.18 |
650 | 68.816 | 68.511 | -0.306 | -0.44 |
700 | 74.667 | 74.088 | -0.579 | -0.78 |
750 | 78.612 | 79.582 | +0.970 | +1.23 |
800 | 85.431 | 85.263 | -0.168 | -0.2 |
--------------------------------------------------------------
* 4 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.586 | 2.584 | -0.003 | -0.1 |
100 | 5.254 | 5.283 | +0.030 | +0.56 |
150 | 8.012 | 8.074 | +0.061 | +0.76 |
200 | 11.172 | 11.000 | -0.172 | -1.54 |
250 | 13.917 | 14.036 | +0.119 | +0.86 |
300 | 16.905 | 16.543 | -0.362 | -2.14 |
350 | 19.901 | 20.036 | +0.135 | +0.68 |
400 | 22.908 | 23.094 | +0.186 | +0.81 |
450 | 26.273 | 26.101 | -0.172 | -0.66 |
500 | 29.554 | 29.092 | -0.461 | -1.56 |
550 | 32.377 | 32.274 | -0.103 | -0.32 |
600 | 35.855 | 35.322 | -0.533 | -1.49 |
650 | 39.192 | 38.388 | -0.804 | -2.05 |
700 | 41.744 | 41.719 | -0.025 | -0.06 |
750 | 45.016 | 44.496 | -0.520 | -1.16 |
800 | 48.212 | 47.603 | -0.609 | -1.26 |
--------------------------------------------------------------
* 8 CPUs
Number of | without | with | diff | diff |
processes | Marker [Sec] | Marker [Sec] | [Sec] | [%] |
--------------------------------------------------------------
50 | 2.094 | 2.072 | -0.022 | -1.07 |
100 | 4.162 | 4.273 | +0.111 | +2.66 |
150 | 6.485 | 6.540 | +0.055 | +0.84 |
200 | 8.556 | 8.478 | -0.078 | -0.91 |
250 | 10.458 | 10.258 | -0.200 | -1.91 |
300 | 12.425 | 12.750 | +0.325 | +2.62 |
350 | 14.807 | 14.839 | +0.032 | +0.22 |
400 | 16.801 | 16.959 | +0.158 | +0.94 |
450 | 19.478 | 19.009 | -0.470 | -2.41 |
500 | 21.296 | 21.504 | +0.208 | +0.98 |
550 | 23.842 | 23.979 | +0.137 | +0.57 |
600 | 26.309 | 26.111 | -0.198 | -0.75 |
650 | 28.705 | 28.446 | -0.259 | -0.9 |
700 | 31.233 | 31.394 | +0.161 | +0.52 |
750 | 34.064 | 33.720 | -0.344 | -1.01 |
800 | 36.320 | 36.114 | -0.206 | -0.57 |
--------------------------------------------------------------
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: 'Peter Zijlstra' <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-18 20:16:16 +04:00
# ifdef CONFIG_TRACEPOINTS
void module_update_tracepoints ( void )
{
struct module * mod ;
mutex_lock ( & module_mutex ) ;
list_for_each_entry ( mod , & modules , list )
if ( ! mod - > taints )
tracepoint_update_probe_range ( mod - > tracepoints ,
mod - > tracepoints + mod - > num_tracepoints ) ;
mutex_unlock ( & module_mutex ) ;
}
/*
* Returns 0 if current not found .
* Returns 1 if current found .
*/
int module_get_iter_tracepoints ( struct tracepoint_iter * iter )
{
struct module * iter_mod ;
int found = 0 ;
mutex_lock ( & module_mutex ) ;
list_for_each_entry ( iter_mod , & modules , list ) {
if ( ! iter_mod - > taints ) {
/*
* Sorted module list
*/
if ( iter_mod < iter - > module )
continue ;
else if ( iter_mod > iter - > module )
iter - > tracepoint = NULL ;
found = tracepoint_get_iter_range ( & iter - > tracepoint ,
iter_mod - > tracepoints ,
iter_mod - > tracepoints
+ iter_mod - > num_tracepoints ) ;
if ( found ) {
iter - > module = iter_mod ;
break ;
}
}
}
mutex_unlock ( & module_mutex ) ;
return found ;
}
# endif