49550b6055
499936 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Michal Hocko
|
49550b6055 |
oom: add helpers for setting and clearing TIF_MEMDIE
This patchset addresses a race which was described in the changelog for
|
||
Johannes Weiner
|
1dfab5abcd |
mm: memcontrol: fold move_anon() and move_file()
Turn the move type enum into flags and give the flags field a shorter name. Once that is done, move_anon() and move_file() are simple enough to just fold them into the callsites. [akpm@linux-foundation.org: tweak MOVE_MASK definition, per Michal] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Johannes Weiner
|
241994ed86 |
mm: memcontrol: default hierarchy interface for memory
Introduce the basic control files to account, partition, and limit memory using cgroups in default hierarchy mode. This interface versioning allows us to address fundamental design issues in the existing memory cgroup interface, further explained below. The old interface will be maintained indefinitely, but a clearer model and improved workload performance should encourage existing users to switch over to the new one eventually. The control files are thus: - memory.current shows the current consumption of the cgroup and its descendants, in bytes. - memory.low configures the lower end of the cgroup's expected memory consumption range. The kernel considers memory below that boundary to be a reserve - the minimum that the workload needs in order to make forward progress - and generally avoids reclaiming it, unless there is an imminent risk of entering an OOM situation. - memory.high configures the upper end of the cgroup's expected memory consumption range. A cgroup whose consumption grows beyond this threshold is forced into direct reclaim, to work off the excess and to throttle new allocations heavily, but is generally allowed to continue and the OOM killer is not invoked. - memory.max configures the hard maximum amount of memory that the cgroup is allowed to consume before the OOM killer is invoked. - memory.events shows event counters that indicate how often the cgroup was reclaimed while below memory.low, how often it was forced to reclaim excess beyond memory.high, how often it hit memory.max, and how often it entered OOM due to memory.max. This allows users to identify configuration problems when observing a degradation in workload performance. An overcommitted system will have an increased rate of low boundary breaches, whereas increased rates of high limit breaches, maximum hits, or even OOM situations will indicate internally overcommitted cgroups. For existing users of memory cgroups, the following deviations from the current interface are worth pointing out and explaining: - The original lower boundary, the soft limit, is defined as a limit that is per default unset. As a result, the set of cgroups that global reclaim prefers is opt-in, rather than opt-out. The costs for optimizing these mostly negative lookups are so high that the implementation, despite its enormous size, does not even provide the basic desirable behavior. First off, the soft limit has no hierarchical meaning. All configured groups are organized in a global rbtree and treated like equal peers, regardless where they are located in the hierarchy. This makes subtree delegation impossible. Second, the soft limit reclaim pass is so aggressive that it not just introduces high allocation latencies into the system, but also impacts system performance due to overreclaim, to the point where the feature becomes self-defeating. The memory.low boundary on the other hand is a top-down allocated reserve. A cgroup enjoys reclaim protection when it and all its ancestors are below their low boundaries, which makes delegation of subtrees possible. Secondly, new cgroups have no reserve per default and in the common case most cgroups are eligible for the preferred reclaim pass. This allows the new low boundary to be efficiently implemented with just a minor addition to the generic reclaim code, without the need for out-of-band data structures and reclaim passes. Because the generic reclaim code considers all cgroups except for the ones running low in the preferred first reclaim pass, overreclaim of individual groups is eliminated as well, resulting in much better overall workload performance. - The original high boundary, the hard limit, is defined as a strict limit that can not budge, even if the OOM killer has to be called. But this generally goes against the goal of making the most out of the available memory. The memory consumption of workloads varies during runtime, and that requires users to overcommit. But doing that with a strict upper limit requires either a fairly accurate prediction of the working set size or adding slack to the limit. Since working set size estimation is hard and error prone, and getting it wrong results in OOM kills, most users tend to err on the side of a looser limit and end up wasting precious resources. The memory.high boundary on the other hand can be set much more conservatively. When hit, it throttles allocations by forcing them into direct reclaim to work off the excess, but it never invokes the OOM killer. As a result, a high boundary that is chosen too aggressively will not terminate the processes, but instead it will lead to gradual performance degradation. The user can monitor this and make corrections until the minimal memory footprint that still gives acceptable performance is found. In extreme cases, with many concurrent allocations and a complete breakdown of reclaim progress within the group, the high boundary can be exceeded. But even then it's mostly better to satisfy the allocation from the slack available in other groups or the rest of the system than killing the group. Otherwise, memory.max is there to limit this type of spillover and ultimately contain buggy or even malicious applications. - The original control file names are unwieldy and inconsistent in many different ways. For example, the upper boundary hit count is exported in the memory.failcnt file, but an OOM event count has to be manually counted by listening to memory.oom_control events, and lower boundary / soft limit events have to be counted by first setting a threshold for that value and then counting those events. Also, usage and limit files encode their units in the filename. That makes the filenames very long, even though this is not information that a user needs to be reminded of every time they type out those names. To address these naming issues, as well as to signal clearly that the new interface carries a new configuration model, the naming conventions in it necessarily differ from the old interface. - The original limit files indicate the state of an unset limit with a very high number, and a configured limit can be unset by echoing -1 into those files. But that very high number is implementation and architecture dependent and not very descriptive. And while -1 can be understood as an underflow into the highest possible value, -2 or -10M etc. do not work, so it's not inconsistent. memory.low, memory.high, and memory.max will use the string "infinity" to indicate and set the highest possible value. [akpm@linux-foundation.org: use seq_puts() for basic strings] Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Johannes Weiner
|
650c5e5654 |
mm: page_counter: pull "-1" handling out of page_counter_memparse()
The unified hierarchy interface for memory cgroups will no longer use "-1" to mean maximum possible resource value. In preparation for this, make the string an argument and let the caller supply it. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Cc: Greg Thelen <gthelen@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Juergen Gross
|
8d29e18a45 |
mm: use correct format specifiers when printing address ranges
Especially on 32 bit kernels memory node ranges are printed with 32 bit wide addresses only. Use u64 types and %llx specifiers to print full width of addresses. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Greg Thelen
|
0ca44b148e |
memcg: add BUILD_BUG_ON() for string tables
Use BUILD_BUG_ON() to compile assert that memcg string tables are in sync with corresponding enums. There aren't currently any issues with these tables. This is just defensive. Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vladimir Davydov
|
90cbc25088 |
vmscan: force scan offline memory cgroups
Since commit
|
||
Kirill A. Shutemov
|
81422f29c5 |
mm: more checks on free_pages_prepare() for tail pages
Although it was not called, destroy_compound_page() did some potentially useful checks. Let's re-introduce them in free_pages_prepare(), where they can be actually triggered when CONFIG_DEBUG_VM=y. compound_order() assert is already in free_pages_prepare(). We have few checks for tail pages left. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Kirill A. Shutemov
|
6e9f0d582d |
mm/page_alloc.c: drop dead destroy_compound_page()
The only caller is __free_one_page(). By the time we should have page->flags to be cleared already: - for 0-order pages though PCP list: free_hot_cold_page() free_pages_prepare() free_pages_check() page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; <put the page to PCP list> free_pcppages_bulk() page = <withdraw pages from PCP list> __free_one_page(page) - for non-0-order pages: __free_pages_ok() free_pages_prepare() free_pages_check() page->flags &= ~PAGE_FLAGS_CHECK_AT_PREP; free_one_page() __free_one_page() So there's no way PageCompound() will return true in __free_one_page(). Let's remove dead destroy_compound_page() and put assert for page->flags there instead. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
05891fb065 |
mm: microoptimize zonelist operations
next_zones_zonelist() returns a zoneref pointer, as well as a zone pointer via extra parameter. Since the latter can be trivially obtained by dereferencing the former, the overhead of the extra parameter is unjustified. This patch thus removes the zone parameter from next_zones_zonelist(). Both callers happen to be in the same header file, so it's simple to add the zoneref dereference inline. We save some bytes of code size. add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-105 (-105) function old new delta nr_free_zone_pages 129 115 -14 __alloc_pages_nodemask 2300 2285 -15 get_page_from_freelist 2652 2576 -76 add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10) function old new delta try_to_compact_pages 569 579 +10 Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
1a6d53a105 |
mm: reduce try_to_compact_pages parameters
Expand the usage of the struct alloc_context introduced in the previous patch also for calling try_to_compact_pages(), to reduce the number of its parameters. Since the function is in different compilation unit, we need to move alloc_context definition in the shared mm/internal.h header. With this change we get simpler code and small savings of code size and stack usage: add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-27 (-27) function old new delta __alloc_pages_direct_compact 283 256 -27 add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-13 (-13) function old new delta try_to_compact_pages 582 569 -13 Stack usage of __alloc_pages_direct_compact goes from 24 to none (per scripts/checkstack.pl). Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
a9263751e1 |
mm, page_alloc: reduce number of alloc_pages* functions' parameters
Introduce struct alloc_context to accumulate the numerous parameters passed between the alloc_pages* family of functions and get_page_from_freelist(). This excludes gfp_flags and alloc_info, which mutate too much along the way, and allocation order, which is conceptually different. The result is shorter function signatures, as well as overal code size and stack usage reductions. bloat-o-meter: add/remove: 0/0 grow/shrink: 1/2 up/down: 127/-310 (-183) function old new delta get_page_from_freelist 2525 2652 +127 __alloc_pages_direct_compact 329 283 -46 __alloc_pages_nodemask 2564 2300 -264 checkstack.pl: function old new __alloc_pages_nodemask 248 200 get_page_from_freelist 168 184 __alloc_pages_direct_compact 40 24 Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Minchan Kim <minchan@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
753791910e |
mm: set page->pfmemalloc in prep_new_page()
The possibility of replacing the numerous parameters of alloc_pages*
functions with a single structure has been discussed when Minchan proposed
to expand the x86 kernel stack [1]. This series implements the change,
along with few more cleanups/microoptimizations.
The series is based on next-20150108 and I used gcc 4.8.3 20140627 on
openSUSE 13.2 for compiling. Config includess NUMA and COMPACTION.
The core change is the introduction of a new struct alloc_context, which looks
like this:
struct alloc_context {
struct zonelist *zonelist;
nodemask_t *nodemask;
struct zone *preferred_zone;
int classzone_idx;
int migratetype;
enum zone_type high_zoneidx;
};
All the contents is mostly constant, except that __alloc_pages_slowpath()
changes preferred_zone, classzone_idx and potentially zonelist. But
that's not a problem in case control returns to retry_cpuset: in
__alloc_pages_nodemask(), those will be reset to initial values again
(although it's a bit subtle). On the other hand, gfp_flags and alloc_info
mutate so much that it doesn't make sense to put them into alloc_context.
Still, the result is one parameter instead of up to 7. This is all in
Patch 2.
Patch 3 is a step to expand alloc_context usage out of page_alloc.c
itself. The function try_to_compact_pages() can also much benefit from
the parameter reduction, but it means the struct definition has to be
moved to a shared header.
Patch 1 should IMHO be included even if the rest is deemed not useful
enough. It improves maintainability and also has some code/stack
reduction. Patch 4 is OTOH a tiny optimization.
Overall bloat-o-meter results:
add/remove: 0/0 grow/shrink: 0/4 up/down: 0/-460 (-460)
function old new delta
nr_free_zone_pages 129 115 -14
__alloc_pages_direct_compact 329 256 -73
get_page_from_freelist 2670 2576 -94
__alloc_pages_nodemask 2564 2285 -279
try_to_compact_pages 582 579 -3
Overall stack sizes per ./scripts/checkstack.pl:
old new delta
get_page_from_freelist: 184 184 0
__alloc_pages_nodemask 248 200 -48
__alloc_pages_direct_c 40 - -40
try_to_compact_pages 72 72 0
-88
[1] http://marc.info/?l=linux-mm&m=140142462528257&w=2
This patch (of 4):
prep_new_page() sets almost everything in the struct page of the page
being allocated, except page->pfmemalloc. This is not obvious and has at
least once led to a bug where page->pfmemalloc was forgotten to be set
correctly, see commit
|
||
Kirill A. Shutemov
|
4ecf886045 |
sparc32: fix broken set_pte()
32-bit sparc uses swap instruction to implement set_pte(). It called using GCC inline assembler. But it misses the "memory" clobber to indicate that pte value will be updated in memory. As result GCC doesn't know that it cannot postpone pte pointer dereference which occurs before set_pte() to post-set_pte() time. It leads to real-world bugs -- [1]. In this situation we have code: ptent = ptep_modify_prot_start(mm, addr, pte); ptent = pte_modify(ptent, newprot); ... ptep_modify_prot_commit(mm, addr, pte, ptent); ptep_modify_prot_start() in sparc case is just 'pte' dereference plus pte_clear(). pte_clear() calls broken set_pte(). GCC thinks it's valid to dereference 'pte' again on pte_modify() and gets cleared pte. ptep_modify_prot_commit() puts 'pteent' with pfn==0 back to page table, which eventually leads to the crash. [1] http://lkml.kernel.org/r/54C06B19.8060305@roeck-us.net Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Guenter Roeck <linux@roeck-us.net> Cc: Paul Moore <pmoore@redhat.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: David Miller <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Naoya Horiguchi
|
9fbc1f635f |
mm/hugetlb: add migration entry check in __unmap_hugepage_range
If __unmap_hugepage_range() tries to unmap the address range over which
hugepage migration is on the way, we get the wrong page because pte_page()
doesn't work for migration entries. This patch simply clears the pte for
migration entries as we do for hwpoison entries.
Fixes:
|
||
Naoya Horiguchi
|
a8bda28d87 |
mm/hugetlb: add migration/hwpoisoned entry check in hugetlb_change_protection
There is a race condition between hugepage migration and
change_protection(), where hugetlb_change_protection() doesn't care about
migration entries and wrongly overwrites them. That causes unexpected
results like kernel crash. HWPoison entries also can cause the same
problem.
This patch adds is_hugetlb_entry_(migration|hwpoisoned) check in this
function to do proper actions.
Fixes:
|
||
Naoya Horiguchi
|
0f792cf949 |
mm/hugetlb: fix getting refcount 0 page in hugetlb_fault()
When running the test which causes the race as shown in the previous patch,
we can hit the BUG "get_page() on refcount 0 page" in hugetlb_fault().
This race happens when pte turns into migration entry just after the first
check of is_hugetlb_entry_migration() in hugetlb_fault() passed with false.
To fix this, we need to check pte_present() again after huge_ptep_get().
This patch also reorders taking ptl and doing pte_page(), because
pte_page() should be done in ptl. Due to this reordering, we need use
trylock_page() in page != pagecache_page case to respect locking order.
Fixes:
|
||
Naoya Horiguchi
|
e66f17ff71 |
mm/hugetlb: take page table lock in follow_huge_pmd()
We have a race condition between move_pages() and freeing hugepages, where
move_pages() calls follow_page(FOLL_GET) for hugepages internally and
tries to get its refcount without preventing concurrent freeing. This
race crashes the kernel, so this patch fixes it by moving FOLL_GET code
for hugepages into follow_huge_pmd() with taking the page table lock.
This patch intentionally removes page==NULL check after pte_page.
This is justified because pte_page() never returns NULL for any
architectures or configurations.
This patch changes the behavior of follow_huge_pmd() for tail pages and
then tail pages can be pinned/returned. So the caller must be changed to
properly handle the returned tail pages.
We could have a choice to add the similar locking to
follow_huge_(addr|pud) for consistency, but it's not necessary because
currently these functions don't support FOLL_GET flag, so let's leave it
for future development.
Here is the reproducer:
$ cat movepages.c
#include <stdio.h>
#include <stdlib.h>
#include <numaif.h>
#define ADDR_INPUT 0x700000000000UL
#define HPS 0x200000
#define PS 0x1000
int main(int argc, char *argv[]) {
int i;
int nr_hp = strtol(argv[1], NULL, 0);
int nr_p = nr_hp * HPS / PS;
int ret;
void **addrs;
int *status;
int *nodes;
pid_t pid;
pid = strtol(argv[2], NULL, 0);
addrs = malloc(sizeof(char *) * nr_p + 1);
status = malloc(sizeof(char *) * nr_p + 1);
nodes = malloc(sizeof(char *) * nr_p + 1);
while (1) {
for (i = 0; i < nr_p; i++) {
addrs[i] = (void *)ADDR_INPUT + i * PS;
nodes[i] = 1;
status[i] = 0;
}
ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
MPOL_MF_MOVE_ALL);
if (ret == -1)
err("move_pages");
for (i = 0; i < nr_p; i++) {
addrs[i] = (void *)ADDR_INPUT + i * PS;
nodes[i] = 0;
status[i] = 0;
}
ret = numa_move_pages(pid, nr_p, addrs, nodes, status,
MPOL_MF_MOVE_ALL);
if (ret == -1)
err("move_pages");
}
return 0;
}
$ cat hugepage.c
#include <stdio.h>
#include <sys/mman.h>
#include <string.h>
#define ADDR_INPUT 0x700000000000UL
#define HPS 0x200000
int main(int argc, char *argv[]) {
int nr_hp = strtol(argv[1], NULL, 0);
char *p;
while (1) {
p = mmap((void *)ADDR_INPUT, nr_hp * HPS, PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
if (p != (void *)ADDR_INPUT) {
perror("mmap");
break;
}
memset(p, 0, nr_hp * HPS);
munmap(p, nr_hp * HPS);
}
}
$ sysctl vm.nr_hugepages=40
$ ./hugepage 10 &
$ ./movepages 10 $(pgrep -f hugepage)
Fixes:
|
||
Naoya Horiguchi
|
cbef8478be |
mm/hugetlb: pmd_huge() returns true for non-present hugepage
Migrating hugepages and hwpoisoned hugepages are considered as non-present
hugepages, and they are referenced via migration entries and hwpoison
entries in their page table slots.
This behavior causes race condition because pmd_huge() doesn't tell
non-huge pages from migrating/hwpoisoned hugepages. follow_page_mask() is
one example where the kernel would call follow_page_pte() for such
hugepage while this function is supposed to handle only normal pages.
To avoid this, this patch makes pmd_huge() return true when pmd_none() is
true *and* pmd_present() is false. We don't have to worry about mixing up
non-present pmd entry with normal pmd (pointing to leaf level pte entry)
because pmd_present() is true in normal pmd.
The same race condition could happen in (x86-specific) gup_pmd_range(),
where this patch simply adds pmd_present() check instead of pmd_huge().
This is because gup_pmd_range() is fast path. If we have non-present
hugepage in this function, we will go into gup_huge_pmd(), then return 0
at flag mask check, and finally fall back to the slow path.
Fixes:
|
||
Naoya Horiguchi
|
61f77eda9b |
mm/hugetlb: reduce arch dependent code around follow_huge_*
Currently we have many duplicates in definitions around follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this patch tries to remove the m. The basic idea is to put the default implementation for these functions in mm/hugetlb.c as weak symbols (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement arch-specific code only when the arch needs it. For follow_huge_addr(), only powerpc and ia64 have their own implementation, and in all other architectures this function just returns ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as default. As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to always return 0 in your architecture (like in ia64 or sparc,) it's never called (the callsite is optimized away) no matter how implemented it is. So in such architectures, we don't need arch-specific implementation. In some architecture (like mips, s390 and tile,) their current arch-specific follow_huge_(pmd|pud)() are effectively identical with the common code, so this patch lets these architecture use the common code. One exception is metag, where pmd_huge() could return non-zero but it expects follow_huge_pmd() to always return NULL. This means that we need arch-specific implementation which returns NULL. This behavior looks strange to me (because non-zero pmd_huge() implies that the architecture supports PMD-based hugepage, so follow_huge_pmd() can/should return some relevant value,) but that's beyond this cleanup patch, so let's keep it. Justification of non-trivial changes: - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE is true when follow_huge_pmd() can be called (note that pmd_huge() has the same check and always returns 0 for !MACHINE_HAS_HPAGE.) - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common code. This patch forces these archs use PMD_MASK, but it's OK because they are identical in both archs. In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20. In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but PTE_ORDER is always 0, so these are identical. Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Acked-by: Hugh Dickins <hughd@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Steve Capper <steve.capper@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vlastimil Babka
|
cfc5115579 |
mm, vmscan: wake up all pfmemalloc-throttled processes at once
Kswapd in balance_pgdate() currently uses wake_up() on processes waiting in throttle_direct_reclaim(), which only wakes up a single process. This might leave processes waiting for longer than necessary, until the check is reached in the next loop iteration. Processes might also be left waiting if zone was fully balanced in single iteration. Note that the comment in balance_pgdat() also says "Wake them", so waking up a single process does not seem intentional. Thus, replace wake_up() with wake_up_all(). Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@suse.de> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@parallels.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Baoquan He
|
44628d9755 |
mm: fix typo of MIGRATE_RESERVE in comment
Found it when I want to jump to the definition of MIGRATE_RESERVE ctags. Signed-off-by: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Xishi Qiu
|
23f086f962 |
kmemcheck: move hook into __alloc_pages_nodemask() for the page allocator
Now kmemcheck_pagealloc_alloc() is only called by __alloc_pages_slowpath(). __alloc_pages_nodemask() __alloc_pages_slowpath() kmemcheck_pagealloc_alloc() And the page will not be tracked by kmemcheck in the following path. __alloc_pages_nodemask() get_page_from_freelist() So move kmemcheck_pagealloc_alloc() into __alloc_pages_nodemask(), like this: __alloc_pages_nodemask() ... get_page_from_freelist() if (!page) __alloc_pages_slowpath() kmemcheck_pagealloc_alloc() ... Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrew Morton
|
91fbdc0f89 |
mm/page_alloc.c:__alloc_pages_nodemask(): don't alter arg gfp_mask
__alloc_pages_nodemask() strips __GFP_IO when retrying the page allocation. But it does this by altering the function-wide variable gfp_mask. This will cause subsequent allocation attempts to inadvertently use the modified gfp_mask. Also, pass the correct mask (the mask we actually used) into trace_mm_page_alloc(). Cc: Ming Lei <ming.lei@canonical.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Johannes Weiner
|
6de226191d |
mm: memcontrol: track move_lock state internally
The complexity of memcg page stat synchronization is currently leaking into the callsites, forcing them to keep track of the move_lock state and the IRQ flags. Simplify the API by tracking it in the memcg. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Vladimir Davydov <vdavydov@parallels.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Vladimir Davydov
|
93aa7d9524 |
swap: remove unused mem_cgroup_uncharge_swapcache declaration
The body of this function was removed by commit
|
||
Michal Hocko
|
83363b917a |
oom: make sure that TIF_MEMDIE is set under task_lock
OOM killer tries to exclude tasks which do not have mm_struct associated because killing such a task wouldn't help much. The OOM victim gets TIF_MEMDIE set to disable OOM killer while the current victim releases the memory and then enables the OOM killer again by dropping the flag. oom_kill_process is currently prone to a race condition when the OOM victim is already exiting and TIF_MEMDIE is set after the task releases its address space. This might theoretically lead to OOM livelock if the OOM victim blocks on an allocation later during exiting because it wouldn't kill any other process and the exiting one won't be able to exit. The situation is highly unlikely because the OOM victim is expected to release some memory which should help to sort out OOM situation. Fix this by checking task->mm and setting TIF_MEMDIE flag under task_lock which will serialize the OOM killer with exit_mm which sets task->mm to NULL. Setting the flag for current is not necessary because check and set is not racy. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Tetsuo Handa
|
d7a94e7e11 |
oom: don't count on mm-less current process
out_of_memory() doesn't trigger the OOM killer if the current task is already exiting or it has fatal signals pending, and gives the task access to memory reserves instead. However, doing so is wrong if out_of_memory() is called by an allocation (e.g. from exit_task_work()) after the current task has already released its memory and cleared TIF_MEMDIE at exit_mm(). If we again set TIF_MEMDIE to post-exit_mm() current task, the OOM killer will be blocked by the task sitting in the final schedule() waiting for its parent to reap it. It will trigger an OOM livelock if its parent is unable to reap it due to doing an allocation and waiting for the OOM killer to kill it. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Wang, Yalin
|
56873f43ab |
mm:add KPF_ZERO_PAGE flag for /proc/kpageflags
Add KPF_ZERO_PAGE flag for zero_page, so that userspace processes can detect zero_page in /proc/kpageflags, and then do memory analysis more accurately. Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Wang, Yalin
|
1d148e218a |
mm: add VM_BUG_ON_PAGE() to page_mapcount()
Add VM_BUG_ON_PAGE() for slab pages. _mapcount is an union with slab struct in struct page, so we must avoid accessing _mapcount if this page is a slab page. Also remove the unneeded bracket. Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Kirill A. Shutemov
|
e4b294c2d8 |
mm: add fields for compound destructor and order into struct page
Currently, we use lru.next/lru.prev plus cast to access or set destructor and order of compound page. Let's replace it with explicit fields in struct page. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Jerome Marchand <jmarchan@redhat.com> Acked-by: Christoph Lameter <cl@linux.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
aa7ed01f93 |
MMC core:
- Support for MMC power sequences. - SDIO function devicetree subnode parsing. - Refactor the hardware reset routines and enable it for SD cards. - Various code quality improvements, especially for slot-gpio. MMC host: - dw_mmc: Various fixes and cleanups. - dw_mmc: Convert to mmc_send_tuning(). - moxart: Fix probe logic. - sdhci: Various fixes and cleanups - sdhci: Asynchronous request handling support. - sdhci-pxav3: Various fixes and cleanups. - sdhci-tegra: Fixes for T114, T124 and T132. - rtsx: Various fixes and cleanups. - rtsx: Support for SDIO. - sdhi/tmio: Refactor and cleanup of header files. - omap_hsmmc: Use slot-gpio and common MMC DT parser. - Make all hosts to deal with errors from mmc_of_parse(). - sunxi: Various fixes and cleanups. - sdhci: Support for Fujitsu SDHCI controller f_sdh30. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU2b9MAAoJEP4mhCVzWIwp678P/2Hjoo17FDnCQT2qXCRWmMmx 98n7mrkPw20cVm6dlXyVxHFxrgRWan1eATiu1vBdnNmXkeUmThMbuGpATDi40fIT C2g9wPDM1/naJ+Qg8mPGy0vEDQYHEzxHHlAyfOaeXdhxhll1iHqhk+Jb6cFQN5DP /CvNmuL/7m9uuFhHlGJnqSNMyenLAFFXthIiVJrQeZeYq9NZ1ZZfW7+esHDmu2lP EFkrZf+xYFmFWAqccyTR58QZsYKlDv4NS/0UMU941DkO7x7R8ZsQG8xFu9bIN5Wn EJfgP7EfEXHlD5a1/QQ918IT1ifxhPGiCbBXpdfAUt7Xte6zYyASpTyAm8v7vT2I 2hot1T1BZgADALE2EHAP4kzK49ipfhQmlVZgFeYVsTpPKk8Nvczio7Y3LYlzNmBo V0jaTUTtU7u7ICtGbo7OqOybW/Sm5E00xsq22txIXObURa7bPbZ4CnxJpstSaU2Z nweZaa79HaHZE7xyUNh9kAbxfGC0pOT0oPoPYcTxcpk2vva+atULEYnLEHUULrgs D4+m8tnbuwoZoGanlMKqgPXP8Xkau/meEdz4WaYrXQEIafrVIR2/kcXGQjhD8ucO VkjUaZDKxNXTkwOzM/siOxJwj75Ka6GDHM7JGx4F30QHqgRTtg2wzInU9nsViuiA 02698dNk9CdP3JirDtbm =ojsj -----END PGP SIGNATURE----- Merge tag 'mmc-v3.20-1' of git://git.linaro.org/people/ulf.hansson/mmc Pull MMC updates from Ulf Hansson: "MMC core: - Support for MMC power sequences. - SDIO function devicetree subnode parsing. - Refactor the hardware reset routines and enable it for SD cards. - Various code quality improvements, especially for slot-gpio. MMC host: - dw_mmc: Various fixes and cleanups. - dw_mmc: Convert to mmc_send_tuning(). - moxart: Fix probe logic. - sdhci: Various fixes and cleanups - sdhci: Asynchronous request handling support. - sdhci-pxav3: Various fixes and cleanups. - sdhci-tegra: Fixes for T114, T124 and T132. - rtsx: Various fixes and cleanups. - rtsx: Support for SDIO. - sdhi/tmio: Refactor and cleanup of header files. - omap_hsmmc: Use slot-gpio and common MMC DT parser. - Make all hosts to deal with errors from mmc_of_parse(). - sunxi: Various fixes and cleanups. - sdhci: Support for Fujitsu SDHCI controller f_sdh30" * tag 'mmc-v3.20-1' of git://git.linaro.org/people/ulf.hansson/mmc: (117 commits) mmc: sdhci-s3c: solve problem with sleeping in atomic context mmc: pwrseq: add driver for emmc hardware reset mmc: moxart: fix probe logic mmc: core: Invoke mmc_pwrseq_post_power_on() prior MMC_POWER_ON state mmc: pwrseq_simple: Add optional reference clock support mmc: pwrseq: Document optional clock for the simple power sequence mmc: pwrseq_simple: Extend to support more pins mmc: pwrseq: Document that simple sequence support more than one GPIO mmc: Add hardware dependencies for sdhci-pxav3 and sdhci-pxav2 mmc: sdhci-pxav3: Modify clock settings for the SDR50 and DDR50 modes mmc: sdhci-pxav3: Extend binding with SDIO3 conf reg for the Armada 38x mmc: sdhci-pxav3: Fix Armada 38x controller's caps according to erratum ERR-7878951 mmc: sdhci-pxav3: Fix SDR50 and DDR50 capabilities for the Armada 38x flavor mmc: sdhci: switch voltage before sdhci_set_ios in runtime resume mmc: tegra: Write xfer_mode, CMD regs in together mmc: Resolve BKOPS compatability issue mmc: sdhci-pxav3: fix setting of pdata->clk_delay_cycles mmc: dw_mmc: rockchip: remove incorrect __exit_p() mmc: dw_mmc: exynos: remove incorrect __exit_p() mmc: Fix menuconfig alignment of MMC_SDHCI_* options ... |
||
Linus Torvalds
|
7796c11c72 |
xilinx usb2 gadget: get rid of incredibly annoying compile warning
This one was driving me mad, with several lines of warnings during the allmodconfig build for a single bogus pointer cast. The warning was so verbose due to the indirect macro expansion explanation, and the whole thing was just for a debug printout. The bogus pointer-to-integer cast was pointless anyway, so just remove it, and use '%p' to show the pointer. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
540a7c5061 |
SCSI misc on 20150209
This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas, megaraid_sas, ses) plus an assortment of minor updates. There's also an update to ufs which adds new phy drivers and finally a new logging infrastructure for SCSI. Signed-off-by: James Bottomley <JBottomley@Parallels.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABAgAGBQJU2Ty5AAoJEDeqqVYsXL0M9rAH/1xNpAxXuxQq+dW5Z+uOaX60 5RRIu7/xA1HEfzkT5FTHrolmogDjVqawu4PZS66iHDeo05RBVUlbTA8qCK+MlRcN U6s0cLEw59eH3EaCfOGuYp/MnbhuV0eNxe0btmqJIQwuW3+gwZKGJdOq6LS2YasJ k/DyIBVmkJAVsN56vm9q2vbtcZp+Bg+ngqBS+SC4TF7vV1WCtFmS6yaUf62PYW3D +Irx37qHZntDR5wdw3dsuKDi5U8bl6myPjaVLnVJqg/WIF9RlCkjk5xpWT99AmVO NmtYQxLLBlAQ5K+sIlBUwxZe+8q1l+Aj4TTmJHAfFtyfp25s7JR9I6/QtOyC5Kw= =odol -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull first round of SCSI updates from James Bottomley: "This is the usual grab bag of driver updates (hpsa, storvsc, mp2sas, megaraid_sas, ses) plus an assortment of minor updates. There's also an update to ufs which adds new phy drivers and finally a new logging infrastructure for SCSI" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (114 commits) scsi_logging: return void for dev_printk() functions scsi: print single-character strings with seq_putc scsi: merge consecutive seq_puts calls scsi: replace seq_printf with seq_puts aha152x: replace seq_printf with seq_puts advansys: replace seq_printf with seq_puts scsi: remove SPRINTF macro sg: remove an unused variable hpsa: Use local workqueues instead of system workqueues hpsa: add in P840ar controller model name hpsa: add in gen9 controller model names hpsa: detect and report failures changing controller transport modes hpsa: shorten the wait for the CISS doorbell mode change ack hpsa: refactor duplicated scan completion code into a new routine hpsa: move SG descriptor set-up out of hpsa_scatter_gather() hpsa: do not use function pointers in fast path command submission hpsa: print CDBs instead of kernel virtual addresses for uncommon errors hpsa: do not use a void pointer for scsi_cmd field of struct CommandList hpsa: return failed from device reset/abort handlers hpsa: check for ctlr lockup after command allocation in main io path ... |
||
Linus Torvalds
|
718749d562 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input updates from Dmitry Torokhov: "The first round of updates for the input subsystem. A few new drivers (power button handler for AXP20x PMIC, tps65218 power button driver, sun4i keys driver, regulator haptic driver, NI Ettus Research USRP E3x0 button, Alwinner A10/A20 PS/2 controller). Updates to Synaptics and ALPS touchpad drivers (with more to come later), brand new Focaltech PS/2 support, update to Cypress driver to handle Gen5 (in addition to Gen3) devices, and number of other fixups to various drivers as well as input core" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (54 commits) Input: elan_i2c - fix wrong %p extension Input: evdev - do not queue SYN_DROPPED if queue is empty Input: gscps2 - fix MODULE_DEVICE_TABLE invocation Input: synaptics - use dmax in input_mt_assign_slots Input: pxa27x_keypad - remove unnecessary ARM includes Input: ti_am335x_tsc - replace delta filtering with median filtering ARM: dts: AM335x: Make charge delay a DT parameter for TSC Input: ti_am335x_tsc - read charge delay from DT Input: ti_am335x_tsc - remove udelay in interrupt handler Input: ti_am335x_tsc - interchange touchscreen and ADC steps Input: MT - add support for balanced slot assignment Input: drv2667 - remove wrong and unneeded drv2667-haptics modalias Input: drv260x - remove wrong and unneeded drv260x-haptics modalias Input: cap11xx - remove wrong and unneeded cap11xx modalias Input: sun4i-ts - add support for touchpanel controller on A31 Input: serio - add support for Alwinner A10/A20 PS/2 controller Input: gtco - use sign_extend32() for sign extension Input: elan_i2c - verify firmware signature applying it Input: elantech - remove stale comment from Kconfig Input: cyapa - off by one in cyapa_update_fw_store() ... |
||
Linus Torvalds
|
e0c8453769 |
fbdev changes for v3.20
* omapdss: add DRA7xxx SoC support * fbdev: support DMT (Display Monitor Timing) calculation -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU2yD3AAoJEPo9qoy8lh71mWQQAIakYyfFAYFnGOZU7vj9zxlj //UYaAWjjcksRd31hSBjGT/rQCmM/vM159W7RmIiJfqlw+hBIaHzWC3Wt9+4E3qt 1p/eO/QdwRoOAixrY2WQhC1O70PldDIO75rw85EjxlISkw0gmEKeG2eSiYFVvPfI 2afNj4gOkP1KUOZOTABMc0H+BMJo/EVQ34MJx8JNFGHRynGaDx7O44/0G8k/kfnk /tEit0iS4T7oF2Rz89fxFZxzoAtDmtR+ftFSkm42/2pmlmHXeh5Sn2Nxz3Kt6P0J bwvGXt7Q9VkKSB257wZ06tVER18JUNo6hOzEKZDYpfteDSX3pREMiNHi/EnDBLe+ eXQ4GGozh50MfBYUnIYZ30vG8iY3oGzSPTENVfyMT6knVzTe2fbnu6vco231upBB DKak4+vqZk7ODC+PO3S3IjoxvpRziEiwbr4X7gk8CCU+5S8lwGZ1hAH91sUbiHVd p14wfMke5/RkgAF4USwbeyKxA/tNJosbrrKQW+9zpTAZL2iPR9g/6NM689LiEGpL uzM0Va0RxaFnqNDbbh4iFUEDcMD8/riRI6Tqa/QWtZvYVD+R/cdr4G/6aV8zG6gL B+yWPJxBOGOU3cuONWSC2jcUaT9v+AupV5oxRKcmmXNhByQ77g1ncX+TAPiv++ni 1PMCAO2IIBt0GgY4SkfK =FBW1 -----END PGP SIGNATURE----- Merge tag 'fbdev-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux Pull fbdev changes from Tomi Valkeinen: - omapdss: add DRA7xxx SoC support - fbdev: support DMT (Display Monitor Timing) calculation * tag 'fbdev-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/tomba/linux: (40 commits) omapfb: Return error code when applying overlay settings fails OMAPDSS: DPI: DRA7xx support OMAPDSS: HDMI: Add DRA7xx support OMAPDSS: DISPC: program dispc polarities to control module OMAPDSS: DISPC: Add DRA7xx support OMAPDSS: Add Video PLLs for DRA7xx OMAPDSS: Add functions for external control of PLL OMAPDSS: DSS: Add DRA7xx base support Doc/DT: Add DT binding doc for DRA7xx DSS OMAPDSS: add define for DRA7xx HW version OMAPDSS: encoder-tpd12s015: Fix race issue with LS_OE OMAPDSS: OMAP5: fix digit output's allowed mgrs OMAPDSS: constify port arrays OMAPDSS: PLL: add dss_pll_wait_reset_done() OMAPDSS: Add enum dss_pll_id video: fbdev: fix sys_copyarea video/mmpfb: allow modular build fb: via: turn gpiolib and i2c selects into dependencies fbdev: ssd1307fb: return proper error code if write command fails fbdev: fix CVT vertical front and back porch values ... |
||
Linus Torvalds
|
a323ae93a7 |
sound updates for 3.20-rc1
In this batch, you can find lots of cleanups through the whole subsystem, as our good New Year's resolution. Lots of LOCs and commits are about LINE6 driver that was promoted finally from staging tree, and as usual, there've been widely spread ASoC changes. Here some highlights: ALSA core changes - Embedding struct device into ALSA core structures - sequencer core cleanups / fixes - PCM msbits constraints cleanups / fixes - New SNDRV_PCM_TRIGGER_DRAIN command - PCM kerneldoc fixes, header cleanups - PCM code cleanups using more standard codes - Control notification ID fixes Driver cleanups - Cleanups of PCI PM callbacks - Timer helper usages cleanups - Simplification (e.g. argument reduction) of many driver codes HD-audio - Hotkey and LED support on HP laptops with Realtek codecs - Dock station support on HP laptops - Toshiba Satellite S50D fixup - Enhanced wallclock timestamp handling for HD-audio - Componentization to simplify the linkage between i915 and hd-audio drivers for Intel HDMI/DP USB-audio - Akai MPC Element support - Enhanced timestamp handling ASoC - Lots of refactoringin ASoC core, moving drivers to more data driven initialization and rationalizing a lot of DAPM usage - Much improved handling of CDCLK clocks on Samsung I2S controllers - Lots of driver specific cleanups and feature improvements - CODEC support for TI PCM514x and TLV320AIC3104 devices - Board support for Tegra systems with Realtek RT5677 - New driver for Maxim max98357a - More enhancements / fixes for Intel SST driver Others - Promotion of LINE6 driver from staging along with lots of rewrites and cleanups - DT support for old non-ASoC atmel driver - oxygen cleanups, XIO2001 init, Studio Evolution SE6x support - Emu8000 DRAM size detection fix on ISA(!!) AWE64 boards - A few more ak411x fixes for ice1724 boards -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJU2ySIAAoJEGwxgFQ9KSmk/YMP/1v0r60aDt6VxlTbadt008R/ jTEIzD4oGEMkhFQdlmN8MegZlx+05vxQUCGrVy8PLelhfy/mnj6z/iUt9ohE1PqK 530eVr5FnlAbHs1JzP8Tm8Xbbtk8RXt5uvgohJvt7HBrc0Def9N/w37fUQ0ytO+s Ot/0Xm8BNsdJ90DfMVLc0Ok9cAFn4Z70gylE/PuGxbBBzxQh8PYPXtJ6Q/s5lKLV QC7VitJa0H7vsFYb+Ve7GU4cKMTt8uEPw8CdnQbDwb63ia93iWJJrlqKVUWYF2Gu K+mX5Igdb88ToXbMPrLKXe73IfFcdpWNTbj8IAv+Rp9fArylzz+3GAYmrqTAdare JEE5qAZTtJZEeD2vgNCnA4JpSbRzL0bHrEow21LnPONq3V9FB044NAeMSx3dI4j1 fk+SnqrpJMtlCtgj2PuWzIcqRiJ25F/Qax/xFeZHo7FwLIBF7z5pLu9DP4CfUSXj fDEcB9aNF2VirJkQdbhHaPqTYVf2rHQ/ebDpDHBwkqFe865IHlJ8g8MrHnAFInKN jQlSTOqi9V3two53U1JIKcB6QcBH3vh60w2JsWsQadsr45YYQ/bvBHGYNpQ00C3U rbDBANhAHaF/hFncNnOQDsH65FqHj/ZlBQRhzX0LqxN4K0DM1FqGcLf2k6u/pzZU 09+QlcIOOzN8lbvHR8Qx =/84r -----END PGP SIGNATURE----- Merge tag 'sound-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "In this batch, you can find lots of cleanups through the whole subsystem, as our good New Year's resolution. Lots of LOCs and commits are about LINE6 driver that was promoted finally from staging tree, and as usual, there've been widely spread ASoC changes. Here some highlights: ALSA core changes - Embedding struct device into ALSA core structures - sequencer core cleanups / fixes - PCM msbits constraints cleanups / fixes - New SNDRV_PCM_TRIGGER_DRAIN command - PCM kerneldoc fixes, header cleanups - PCM code cleanups using more standard codes - Control notification ID fixes Driver cleanups - Cleanups of PCI PM callbacks - Timer helper usages cleanups - Simplification (e.g. argument reduction) of many driver codes HD-audio - Hotkey and LED support on HP laptops with Realtek codecs - Dock station support on HP laptops - Toshiba Satellite S50D fixup - Enhanced wallclock timestamp handling for HD-audio - Componentization to simplify the linkage between i915 and hd-audio drivers for Intel HDMI/DP USB-audio - Akai MPC Element support - Enhanced timestamp handling ASoC - Lots of refactoringin ASoC core, moving drivers to more data driven initialization and rationalizing a lot of DAPM usage - Much improved handling of CDCLK clocks on Samsung I2S controllers - Lots of driver specific cleanups and feature improvements - CODEC support for TI PCM514x and TLV320AIC3104 devices - Board support for Tegra systems with Realtek RT5677 - New driver for Maxim max98357a - More enhancements / fixes for Intel SST driver Others - Promotion of LINE6 driver from staging along with lots of rewrites and cleanups - DT support for old non-ASoC atmel driver - oxygen cleanups, XIO2001 init, Studio Evolution SE6x support - Emu8000 DRAM size detection fix on ISA(!!) AWE64 boards - A few more ak411x fixes for ice1724 boards" * tag 'sound-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (542 commits) ALSA: line6: toneport: Use explicit type for firmware version ALSA: line6: Use explicit type for serial number ALSA: line6: Return EIO if read/write not successful ALSA: line6: Return error if device not responding ALSA: line6: Add delay before reading status ASoC: Intel: Clean data after SST fw fetch ALSA: hda - Add docking station support for another HP machine ALSA: control: fix failure to return new numerical ID in 'replace' event data ALSA: usb: update trigger timestamp on first non-zero URB submitted ALSA: hda: read trigger_timestamp immediately after starting DMA ALSA: pcm: allow for trigger_tstamp snapshot in .trigger ALSA: pcm: don't override timestamp unconditionally ALSA: off by one bug in snd_riptide_joystick_probe() ASoC: rt5670: Set use_single_rw flag for regmap ASoC: rt286: Add rt288 codec support ASoC: max98357a: Fix build in !CONFIG_OF case ASoC: Intel: fix platform_no_drv_owner.cocci warnings ARM: dts: Switch Odroid X2/U2 to simple-audio-card ARM: dts: Exynos4 and Odroid X2/U3 sound device nodes update ALSA: control: fix failure to return numerical ID in 'add' event ... |
||
Linus Torvalds
|
3e63430a5c |
media updates for v3.20-rc1
-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJU2NQQAAoJEAhfPr2O5OEV5BgQAIja/XsIgpeNhfN8kJ3GrdhL Z+QRTcHNc6AWGm1dkI+YTl4B38/xLlmxhUYPKsDl19N7n1oKkqdUxYtLe1mLdecW dvqMXMVBKQSCgyDP5sgZNHKlavEX1ZPTTtkrY8zYWaXbkcf4dOZyisbNQrmFdO3T wt4zwaO8+ziCEYbotLsaI1VpEDKFZV6AVhKnLsWxc4ZoCnAqJbmA31jtANxrQ0tw UgXRjJmf1uWrS+MWM5xFDi+v+FmZiUAHMJ5iksqWhp2pKj41geIqy7lAueytEN+Q vQHZ9cfhnoF/7VrqDtqq5CaJZPKfA80PSxml9mbjc4wytvWLevoc4UxFtU+lohOf YbM3nB5J3nAcq0bNF/cSpuYUoiGnK86FazuM6YAQy2CaucrVKALKHHmziWbK6gBv 1yA4qnDuRYKps3SQSQQKuNlv8dmcVTD/sVhf8EIx62son6xxeXf21nas61lw8k5P lrUVH9nJxkwTkRJ7wMjlAZeh0pTyB/Ag1bSn81myziv0r4AsNyWJT5qxN8szmZDe nXGIdQ1h5JkMQ0kCfhhLqgdIUwhx7dMXIlXcCfR/8a9uYm4StegPNCEZDybIi6co 8Ok3rPYt15PlrCyfMjXFOG/TYi/cZ/xIbffLbSFMOqnCUZElaA7RNpOnswNc9fc6 2WsY54Lb4ftC4bQ7hM90 =VH6m -----END PGP SIGNATURE----- Merge tag 'media/v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media updates from Mauro Carvalho Chehab: - Some documentation updates and a few new pixel formats - Stop btcx-risc abuse by cx88 and move it to bt8xx driver - New platform driver: am437x - New webcam driver: toptek - New remote controller hardware protocols added to img-ir driver - Removal of a few very old drivers that relies on old kABIs and are for very hard to find hardware: parallel port webcam drivers (bw-qcam, c-cam, pms and w9966), tlg2300, Video In/Out for SGI (vino) - Removal of the USB Telegent driver (tlg2300). The company that developed this driver has long gone and the hardware is hard to find. As it relies on a legacy set of kABI symbols and nobody seems to care about it, remove it. - several improvements at rtl2832 driver - conversion on cx28521 and au0828 to use videobuf2 (VB2) - several improvements, fixups and board additions * tag 'media/v3.20-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (321 commits) [media] dvb_net: Convert local hex dump to print_hex_dump_debug [media] dvb_net: Use standard debugging facilities [media] dvb_net: Use vsprintf %pM extension to print Ethernet addresses [media] staging: lirc_serial: adjust boolean assignments [media] stb0899: use sign_extend32() for sign extension [media] si2168: add support for 1.7MHz bandwidth [media] si2168: return error if set_frontend is called with invalid parameters [media] lirc_dev: avoid potential null-dereference [media] mn88472: simplify bandwidth registers setting code [media] dvb: tc90522: re-add symbol-rate report [media] lmedm04: add read snr, signal strength and ber call backs [media] lmedm04: Create frontend call back for read status [media] lmedm04: create frontend callbacks for signal/snr/ber/ucblocks [media] lmedm04: Fix usb_submit_urb BOGUS urb xfer, pipe 1 != type 3 in interrupt urb [media] lmedm04: Increase Interupt due time to 200 msec [media] cx88-dvb: whitespace cleanup [media] rtl28xxu: properly initialize pdata [media] rtl2832: declare functions as static [media] rtl2830: declare functions as static [media] rtl2832_sdr: add kernel-doc comments for platform_data ... |
||
Linus Torvalds
|
6fc26fc578 |
HSI changes for the v3.20 series
* fix uninitialized device pointer in nokia-modem -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJU2MynAAoJENju1/PIO/qa1X4P/iO0P21IvIpCy0AzVCApNBmA TXm64ELhSkjefUK9FJAvhsNxzmZ4Pr9MyVNvv8vqJcAn8lMB7ECKKJuibcLB2bsn xywWW1U9xr9WJQCcoz+6A3TLrvkPGxLekR2TGauHqMUPRBv7cGsnThvDWQDXQXYc tQ3d5j+IKAB9xFiab2WxNy4whkghmtXz7C75iwmGV4Ko8fE+JJW5hOQaOFo3HO8F 35URWQlDmFJdEECgAi3yGHn7M6X2EPwoecs0YxSiUarOyJqwDtLK9zoKI0syxMNB e9mAoBwDMjmRFqvZzbVEL82SpzOOI+f+3m6ihOa6NLjhLo+UDjoMCs8SJE1Nzhoq UNO/YXzd+vtNMSmrXMpbDdTF7xOYtSt6MEqX/YNf18QJggQok9sHHyw/8aMyWXF4 znqCELz7Aw3zbNvrIEusk1FWaurAd8iBkYEKQTfodViK1tFTLSzkLWMn0LHk51nD RNx84j1M/TH9SsW+J2V2gU3Iirreeeu4Yvqhd3hBQKH/JgLVy0UpZW7N5k7MSjO5 hopY4DSzIPww7GIqTj/yee9VnxgXZSgx7p5YecLs7sBFpSX8jxeByMxp/SBowLQA kmwDMxnIi+ZfAxoXBWzVwK9kqT7DxTC7l4ldMfKpjxue0rk3zzqE7LMYhZmSx3qd NHVNRbPwK+2Agvlr25GD =zVwV -----END PGP SIGNATURE----- Merge tag 'hsi-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi Pull HSI fix from Sebastian Reichel: "Fix uninitialized device pointer in nokia-modem" * tag 'hsi-for-3.20' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: hsi: nokia-modem: fix uninitialized device pointer |
||
Linus Torvalds
|
13c071907b |
power supply and reset changes for the v3.20 series
* new drivers - charger driver for Maxim 77693 - battery gauge driver for LTC 2941/2943 - battery gauge driver for RT5033 - reset driver for R-Mobile platforms * convert drivers to restart handler framework - arm-versatile - at91 - st-poweroff * remove deprecated sun6i reboot driver * use alarmtimer instead of rtc in charger-manager * misc. fixes -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABCgAGBQJU2MKtAAoJENju1/PIO/qaWAoP/Rr7rr82WidViY7R8SlSjaIM Iombb0I4/M1d1QLnWEXcn6g59ujq9Qt7OggFQbyB3SiHk3pn9FgYNelyMO5LlMgz /Or2WshMaweef9jDn3TQRCvMty9VStjZw9rVrUn8sEHDU9lSH97Em4wlmLaeE8LI pPvMillZF1F9HYpgkRw7i59XOpC+fC+RuwE394l3JqvfCvhZIYlEDEhdYJAi+Pro xYnx6sf2MQU1dqyuTCvxespNf1lvzFBXEtpn3iXcRu6jCc664coIcIr9cfUP9xTA 5qyiqzHPzT0LeZF5gZDhctegkdGJwqoNw7s1Z5LQyo43noDeTf4LgkdssrU7j7w0 In7JUN8CassjhDZaKPN82B8jYoY19X/x7hDE53kP8BBUcU78QAWY4PtI6/IN4iOe u9+mbOw5/8UkwF2V2qblkHOA51E+4Q6qsiLE9zJKoh69AIeefErFfpyL/FnVD2VQ MUbUtNKPvfTwqJfP7YnYstmg5rYUuIwEOda7yf5VQuUybtagKScQWte8edPDqkLM Y3GNUgkr/vSS2Xvil6yYuv+VfblFdtFci+Cq4cj/CtiCy7HZfwdcTbKbpKvmqRIC RKpSpq+njTdeDHczY4tKKkx7lb6XfsSc1njcn/2dVNd/AyNUnc4zorY3VxNRu3Ra 8bFYXOhh0pEUyOQgJ5Mn =M8sG -----END PGP SIGNATURE----- Merge tag 'for-v3.20' of git://git.infradead.org/battery-2.6 Pull power supply and reset changes from Sebastian Reichel: "New drivers: - charger driver for Maxim 77693 - battery gauge driver for LTC 2941/2943 - battery gauge driver for RT5033 - reset driver for R-Mobile platforms Convert drivers to restart handler framework: - arm-versatile - at91 - st-poweroff Misc: - remove deprecated sun6i reboot driver - use alarmtimer instead of rtc in charger-manager - misc fixes" * tag 'for-v3.20' of git://git.infradead.org/battery-2.6: (48 commits) power_supply: 88pm860x: Fix leaked power supply on probe fail power/reset: restart-poweroff: Remove arm dependencies power/reset: st-poweroff: Fix misleading Kconfig description power/reset: st-poweroff: Register with kernel restart handler power/reset: Remove sun6i reboot driver power/reset: at91: Register with kernel restart handler power/reset: arm-versatile: Register with kernel restart handler power: test_power: Use enum as index for array of supplies Add devicetree binding documentation for the LTC2941/LTC2943 driver Add LTC2941/LTC2943 Battery Gauge Driver power/reset: brcmstb: Add support for old 65nm chips power/reset: brcmstb: Use the DT "compatible" string to indicate bit positions power/reset: brcmstb: Make the driver buildable on MIPS power: charger-manager: Use alarmtimer for battery monitoring in suspend. power/reset: at91-poweroff: Fix error handling and other compiler warnings bq27x00_battery: Call power_supply_changed only when capacity changed bq27x00_battery: fix register offset for bq27425 power: max14577: Remove SYSFS dependency from Kconfig power: bq24190_charger: suppress build warning power: reset: Add reset driver for R-Mobile platforms ... |
||
Chris Rorvick
|
0e806151e8 |
ALSA: line6: toneport: Use explicit type for firmware version
The firmware version is a single byte so have the variable type agree. Since the address to this member is passed to the read function, using an int is not even portable. Signed-off-by: Chris Rorvick <chris@rorvick.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> |
||
Chris Rorvick
|
12b00157fd |
ALSA: line6: Use explicit type for serial number
The serial number (aka ESN) is a 32-bit value. Signed-off-by: Chris Rorvick <chris@rorvick.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> |
||
Chris Rorvick
|
e474e7fd40 |
ALSA: line6: Return EIO if read/write not successful
Signed-off-by: Chris Rorvick <chris@rorvick.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> |
||
Chris Rorvick
|
f3dfd1be08 |
ALSA: line6: Return error if device not responding
Put an upper bound on how long we will wait for the device to respond to a read/write request (i.e., 100 milliseconds) and return an error if this is reached. Signed-off-by: Chris Rorvick <chris@rorvick.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> |
||
Chris Rorvick
|
e64e94df99 |
ALSA: line6: Add delay before reading status
The device indicates the result of a read/write operation by making the status available on a subsequent request from the driver. This is not ready immediately, though, so the driver is currently slamming the device with hundreds of pointless requests before getting the expected response. Add a two millisecond delay before each attempt. This is approximately the behavior observed with version 4.2.7.1 of the Windows driver. Signed-off-by: Chris Rorvick <chris@rorvick.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> |
||
Libin Yang
|
1b006996b6 |
ASoC: Intel: Clean data after SST fw fetch
The BDW audio firmware DSP manages the DMA and the DMA cannot be stopped exactly at the end of the playback stream. This means stale samples may be played at PCM stop unless the driver copies silence to the subsequent periods. Signed-off-by: Libin Yang <libin.yang@intel.com> Reviewed-by: Liam Girdwood <liam.r.girdwood@linux.intel.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> |
||
Linus Torvalds
|
c5ce28df0e |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: 1) More iov_iter conversion work from Al Viro. [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was wrong, and this pull actually adds an extra commit on top of the branch I'm pulling to fix that up, so that the pre-merge state is ok. - Linus ] 2) Various optimizations to the ipv4 forwarding information base trie lookup implementation. From Alexander Duyck. 3) Remove sock_iocb altogether, from CHristoph Hellwig. 4) Allow congestion control algorithm selection via routing metrics. From Daniel Borkmann. 5) Make ipv4 uncached route list per-cpu, from Eric Dumazet. 6) Handle rfs hash collisions more gracefully, also from Eric Dumazet. 7) Add xmit_more support to r8169, e1000, and e1000e drivers. From Florian Westphal. 8) Transparent Ethernet Bridging support for GRO, from Jesse Gross. 9) Add BPF packet actions to packet scheduler, from Jiri Pirko. 10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer. 11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman Kwok. 12) More sanely handle out-of-window dupacks, which can result in serious ACK storms. From Neal Cardwell. 13) Various rhashtable bug fixes and enhancements, from Herbert Xu, Patrick McHardy, and Thomas Graf. 14) Support xmit_more in be2net, from Sathya Perla. 15) Group Policy extensions for vxlan, from Thomas Graf. 16) Remove Checksum Offload support for vxlan, from Tom Herbert. 17) Like ipv4, support lockless transmit over ipv6 UDP sockets. From Vlad Yasevich. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits) crypto: fix af_alg_make_sg() conversion to iov_iter ipv4: Namespecify TCP PMTU mechanism i40e: Fix for stats init function call in Rx setup tcp: don't include Fast Open option in SYN-ACK on pure SYN-data openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set ipv6: Make __ipv6_select_ident static ipv6: Fix fragment id assignment on LE arches. bridge: Fix inability to add non-vlan fdb entry net: Mellanox: Delete unnecessary checks before the function call "vunmap" cxgb4: Add support in cxgb4 to get expansion rom version via ethtool ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version net: dsa: Remove redundant phy_attach() IB/mlx4: Reset flow support for IB kernel ULPs IB/mlx4: Always use the correct port for mirrored multicast attachments net/bonding: Fix potential bad memory access during bonding events tipc: remove tipc_snprintf tipc: nl compat add noop and remove legacy nl framework tipc: convert legacy nl stats show to nl compat tipc: convert legacy nl net id get to nl compat tipc: convert legacy nl net id set to nl compat ... |
||
Linus Torvalds
|
9399f0c514 |
crypto: fix af_alg_make_sg() conversion to iov_iter
Commit
|
||
Linus Torvalds
|
29afc4e9a4 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree changes from Jiri Kosina: "Patches from trivial.git that keep the world turning around. Mostly documentation and comment fixes, and a two corner-case code fixes from Alan Cox" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: kexec, Kconfig: spell "architecture" properly mm: fix cleancache debugfs directory path blackfin: mach-common: ints-priority: remove unused function doubletalk: probe failure causes OOPS ARM: cache-l2x0.c: Make it clear that cache-l2x0 handles L310 cache controller msdos_fs.h: fix 'fields' in comment scsi: aic7xxx: fix comment ARM: l2c: fix comment ibmraid: fix writeable attribute with no store method dynamic_debug: fix comment doc: usbmon: fix spelling s/unpriviledged/unprivileged/ x86: init_mem_mapping(): use capital BIOS in comment |
||
Linus Torvalds
|
1d9c5d79e6 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull live patching infrastructure from Jiri Kosina: "Let me provide a bit of history first, before describing what is in this pile. Originally, there was kSplice as a standalone project that implemented stop_machine()-based patching for the linux kernel. This project got later acquired, and the current owner is providing live patching as a proprietary service, without any intentions to have their implementation merged. Then, due to rising user/customer demand, both Red Hat and SUSE started working on their own implementation (not knowing about each other), and announced first versions roughly at the same time [1] [2]. The principle difference between the two solutions is how they are making sure that the patching is performed in a consistent way when it comes to different execution threads with respect to the semantic nature of the change that is being introduced. In a nutshell, kPatch is issuing stop_machine(), then looking at stacks of all existing processess, and if it decides that the system is in a state that can be patched safely, it proceeds insterting code redirection machinery to the patched functions. On the other hand, kGraft provides a per-thread consistency during one single pass of a process through the kernel and performs a lazy contignuous migration of threads from "unpatched" universe to the "patched" one at safe checkpoints. If interested in a more detailed discussion about the consistency models and its possible combinations, please see the thread that evolved around [3]. It pretty quickly became obvious to the interested parties that it's absolutely impractical in this case to have several isolated solutions for one task to co-exist in the kernel. During a dedicated Live Kernel Patching track at LPC in Dusseldorf, all the interested parties sat together and came up with a joint aproach that would work for both distro vendors. Steven Rostedt took notes [4] from this meeting. And the foundation for that aproach is what's present in this pull request. It provides a basic infrastructure for function "live patching" (i.e. code redirection), including API for kernel modules containing the actual patches, and API/ABI for userspace to be able to operate on the patches (look up what patches are applied, enable/disable them, etc). It's relatively simple and minimalistic, as it's making use of existing kernel infrastructure (namely ftrace) as much as possible. It's also self-contained, in a sense that it doesn't hook itself in any other kernel subsystem (it doesn't even touch any other code). It's now implemented for x86 only as a reference architecture, but support for powerpc, s390 and arm is already in the works (adding arch-specific support basically boils down to teaching ftrace about regs-saving). Once this common infrastructure gets merged, both Red Hat and SUSE have agreed to immediately start porting their current solutions on top of this, abandoning their out-of-tree code. The plan basically is that each patch will be marked by flag(s) that would indicate which consistency model it is willing to use (again, the details have been sketched out already in the thread at [3]). Before this happens, the current codebase can be used to patch a large group of secruity/stability problems the patches for which are not too complex (in a sense that they don't introduce non-trivial change of function's return value semantics, they don't change layout of data structures, etc) -- this corresponds to LEAVE_FUNCTION && SWITCH_FUNCTION semantics described at [3]. This tree has been in linux-next since December. [1] https://lkml.org/lkml/2014/4/30/477 [2] https://lkml.org/lkml/2014/7/14/857 [3] https://lkml.org/lkml/2014/11/7/354 [4] http://linuxplumbersconf.org/2014/wp-content/uploads/2014/10/LPC2014_LivePatching.txt [ The core code is introduced by the three commits authored by Seth Jennings, which got a lot of changes incorporated during numerous respins and reviews of the initial implementation. All the followup commits have materialized only after public tree has been created, so they were not folded into initial three commits so that the public tree doesn't get rebased ]" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: add missing newline to error message livepatch: rename config to CONFIG_LIVEPATCH livepatch: fix uninitialized return value livepatch: support for repatching a function livepatch: enforce patch stacking semantics livepatch: change ARCH_HAVE_LIVE_PATCHING to HAVE_LIVE_PATCHING livepatch: fix deferred module patching order livepatch: handle ancient compilers with more grace livepatch: kconfig: use bool instead of boolean livepatch: samples: fix usage example comments livepatch: MAINTAINERS: add git tree location livepatch: use FTRACE_OPS_FL_IPMODIFY livepatch: move x86 specific ftrace handler code to arch/x86 livepatch: samples: add sample live patching module livepatch: kernel: add support for live patching livepatch: kernel: add TAINT_LIVEPATCH |