100267 Commits

Author SHA1 Message Date
Max Filippov
d1b6ba82a5 xtensa: fix a6 and a7 handling in fast_syscall_xtensa
Remove restoring a6 on some return paths and instead modify and restore
it in a single place, using symbolic name.
Correctly restore a7 from PT_AREG7 in case of illegal a6 value.

Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:31 +04:00
Max Filippov
a83b02e9bd xtensa: allow single-stepping through unaligned load/store
Update icount when icountlevel is non-zero but not greater than EXCM level
when load/store instruction is successfully emulated. This allows
single-stepping over such instruction in userspace debugger.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:30 +04:00
Max Filippov
21570465a3 xtensa: move invalid unaligned instruction handler closer to its users
With this change a threaded jump from .Linvalid_instruction_load to
.Linvalid_instruction can be removed and more code may be added to
common load/store exit path.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:29 +04:00
Max Filippov
e9500dd852 xtensa: make fast_unaligned store restartable
fast_unaligned may encounter DTLB miss or SEGFAULT during the store
emulation. Don't update epc1 and lcount until after the store emulation
is complete, so that the faulting store instruction could be replayed.
Remove duplicate code handling zero overhead loops and calculate new
epc1 and lcount in one place.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:28 +04:00
Max Filippov
c3ef1f4d37 xtensa: add double exception fixup handler for fast_unaligned
fast_unaligned_fixup restores user registers and runs normal exception
handler in the current stack frame. Unaligned load/store is retried
after that.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:27 +04:00
Max Filippov
a450dc69dc xtensa: fix kernel/user jump out of fast_unaligned
Use correct register (a0, just read from the PS) to check user mode bit.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:26 +04:00
Max Filippov
b82837c772 xtensa: configure kc705 for highmem
Enable all memory available on KC705 (1G - 128M) by default. Update memory
node in DTS and also limit usable memory in bootargs in case memmap is
passed from the bootloader.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:25 +04:00
Max Filippov
270eec76de xtensa: support highmem in aliasing cache flushing code
Use __flush_invalidate_dcache_page_alias with alias set to color of the
page physical address instead of __flush_invalidate_dcache_page: this
works for high memory pages and mapping/unmapping to the TLBTEMP area is
virtually free.

Allow building configurations with aliasing cache and highmem enabled.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:23 +04:00
Max Filippov
8504b503df xtensa: support aliasing cache in kmap
Define ARCH_PKMAP_COLORING and provide corresponding macro definitions
on cores with aliasing data cache.

Instead of single last_pkmap_nr maintain an array last_pkmap_nr_arr of
pkmap counters for each page color. Make sure that kmap maps physical
page at virtual address with color matching its physical address.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:22 +04:00
Max Filippov
32544d9c10 xtensa: support aliasing cache in k[un]map_atomic
Map high memory pages at virtual addresses with color that match color
of their physical address. Existing cache alias management mechanisms
may be used with such pages.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:21 +04:00
Max Filippov
a91902db29 xtensa: implement clear_user_highpage and copy_user_highpage
Existing clear_user_page and copy_user_page cannot be used with highmem
because they calculate physical page address from its virtual address
and do it incorrectly in case of high memory page mapped with
kmap_atomic. Also kmap is not needed, as most likely userspace mapping
color would be different from the kmapped color.

Provide clear_user_highpage and copy_user_highpage functions that
determine if temporary mapping is needed for the pages. Move most of the
logic of the former clear_user_page and copy_user_page to
xtensa/mm/cache.c only leaving temporary mapping setup, invalidation and
clearing/copying in the xtensa/mm/misc.S. Rename these functions to
clear_page_alias and copy_page_alias.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:20 +04:00
Max Filippov
7128039fe2 xtensa: fix TLBTEMP_BASE_2 region handling in fast_second_level_miss
Current definition of TLBTEMP_BASE_2 is always 32K above the
TLBTEMP_BASE_1, whereas fast_second_level_miss handler for the TLBTEMP
region analyzes virtual address bit (PAGE_SHIFT + DCACHE_ALIAS_ORDER)
to determine TLBTEMP region where the fault happened. The size of the
TLBTEMP region is also checked incorrectly: not 64K, but twice data
cache way size (whicht may as well be less than the instruction cache
way size).

Fix TLBTEMP_BASE_2 to be TLBTEMP_BASE_1 + data cache way size.
Provide TLBTEMP_SIZE that is a greater of doubled data cache way size or
the instruction cache way size, and use it to determine if the second
level TLB miss occured in the TLBTEMP region.

Practical occurence of page faults in the TLBTEMP area is extremely
rare, this code can be tested by deletion of all w[di]tlb instructions
in the tlbtemp_mapping region.

Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:19 +04:00
Max Filippov
dec7305d9f xtensa: allow fixmap and kmap span more than one page table
To support aliasing cache both kmap region sizes are multiplied by the
number of data cache colors. After that expansion page tables that cover
kmap regions may become larger than one page. Correctly allocate and
initialize page tables in this case.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:18 +04:00
Max Filippov
22def76811 xtensa: make fixmap region addressing grow with index
It's much easier to reason about alignment and coloring of regions
located in the fixmap when fixmap index is just a PFN within the fixmap
region. Change fixmap addressing so that index 0 corresponds to
FIXADDR_START instead of the FIXADDR_TOP.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:17 +04:00
Max Filippov
5224712374 xtensa: fix access to THREAD_RA/THREAD_SP/THREAD_DS
With SMP and a lot of debug options enabled task_struct::thread gets out
of reach of s32i/l32i instructions with base pointing at task_struct,
breaking build with the following messages:

  arch/xtensa/kernel/entry.S: Assembler messages:
  arch/xtensa/kernel/entry.S:1002: Error: operand 3 of 'l32i.n' has invalid value '1048'
  arch/xtensa/kernel/entry.S:1831: Error: operand 3 of 's32i.n' has invalid value '1040'
  arch/xtensa/kernel/entry.S:1832: Error: operand 3 of 's32i.n' has invalid value '1044'

Change base to point to task_struct::thread in such cases.
Don't use a10 in _switch_to to save/restore prev pointer as a2 is not
clobbered.

Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:16 +04:00
Miklos Szeredi
89f77c6f5b xtensa: add renameat2 syscall
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:15 +04:00
Alan Douglas
1ca49463c4 xtensa: fix address checks in dma_{alloc,free}_coherent
Virtual address is translated to the XCHAL_KSEG_CACHED region in the
dma_free_coherent, but is checked to be in the 0...XCHAL_KSEG_SIZE
range.

Change check for end of the range from 'addr >= X' to 'addr > X - 1' to
handle the case of X == 0.

Replace 'if (C) BUG();' construct with 'BUG_ON(C);'.

Cc: stable@vger.kernel.org
Signed-off-by: Alan Douglas <adouglas@cadence.com>
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:14 +04:00
Max Filippov
f61bf8e7d1 xtensa: replace IOCTL code definitions with constants
This fixes userspace code that builds on other architectures but fails
on xtensa due to references to structures that other architectures don't
refer to. E.g. this fixes the following issue with python-2.7.8:

  python-2.7.8/Modules/termios.c:861:25: error: invalid application
     of 'sizeof' to incomplete type 'struct serial_multiport_struct'
     {"TIOCSERGETMULTI", TIOCSERGETMULTI},
  python-2.7.8/Modules/termios.c:870:25: error: invalid application
     of 'sizeof' to incomplete type 'struct serial_multiport_struct'
     {"TIOCSERSETMULTI", TIOCSERSETMULTI},
  python-2.7.8/Modules/termios.c:900:24: error: invalid application
     of 'sizeof' to incomplete type 'struct tty_struct'
     {"TIOCTTYGSTRUCT", TIOCTTYGSTRUCT},

Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:13 +04:00
Max Filippov
ad4a96b418 xtensa: remove orphan MATH_EMULATION symbol
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:12 +04:00
Max Filippov
4964527da8 xtensa: select HAVE_IDE only on platforms that may have it
HAVE_IDE is not a property of architecture but of a platform, and neither
ISS or XTFPGA support it.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:11 +04:00
Max Filippov
920f8a3965 xtensa: sort 'select' statements in Kconfig
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:10 +04:00
Max Filippov
8a9de05954 xtensa: make MMU-related configuration options depend on MMU
MMUv3 and HIGHMEM support are available only on configurations with MMU,
don't show them otherwise.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:09 +04:00
Max Filippov
420ae95184 xtensa: simplify addition of new core variants
Instead of adding new Kconfig options and Makefile rules for each new
core variant provide XTENSA_VARIANT_CUSTOM variant and record variant
name in the XTENSA_VARIANT_NAME variable. Adding new core variant now
means providing directory structure under arch/xtensa/variant and
specifying correct name in kernel configuration.

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
2014-08-14 11:59:08 +04:00
David S. Miller
10cf15e1d1 sparc: Hook up memfd_create system call.
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13 22:00:09 -07:00
David S. Miller
f1d25d37d3 sparc64: Properly claim resources as each PCI bus is probed.
Perform a pci_claim_resource() on all valid resources discovered
during the OF device tree scan.

Based almost entirely upon the PCI OF bus probing code which does
the same thing there.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13 21:17:49 -07:00
David S. Miller
4afba24e5f sparc64: Skip bogus PCI bridge ranges.
It seems that when a PCI Express bridge is not in use and has no devices
behind it, the ranges property is bogus.  Specifically the size property
is of the form [0xffffffff:...], and if you add this size to the resource
start address the 64-bit calculation will overflow.

Just check specifically for this size value signature and skip them.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13 21:17:49 -07:00
David S. Miller
93a6423bd8 sparc64: Expand PCI bridge probing debug logging.
Dump the various aspects of the PCI bridge probed at boot time, most
importantly the bridge number ranges, and the ranges property.

This helps diagnose PCI resource issues and other problems by giving
ofpci_debug=1 on the boot command line.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-08-13 21:17:49 -07:00
Linus Torvalds
f0094b28f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Several networking final fixes and tidies for the merge window:

   1) Changes during the merge window unintentionally took away the
      ability to build bluetooth modular, fix from Geert Uytterhoeven.

   2) Several phy_node reference count bug fixes from Uwe Kleine-König.

   3) Fix ucc_geth build failures, also from Uwe Kleine-König.

   4) Fix klog false positivies when netlink messages go to network
      taps, by properly resetting the network header.  Fix from Daniel
      Borkmann.

   5) Sizing estimate of VF netlink messages is too small, from Jiri
      Benc.

   6) New APM X-Gene SoC ethernet driver, from Iyappan Subramanian.

   7) VLAN untagging is erroneously dependent upon whether the VLAN
      module is loaded or not, but there are generic dependencies that
      matter wrt what can be expected as the SKB enters the stack.
      Make the basic untagging generic code, and do it unconditionally.
      From Vlad Yasevich.

   8) xen-netfront only has so many slots in it's transmit queue so
      linearize packets that have too many frags.  From Zoltan Kiss.

   9) Fix suspend/resume PHY handling in bcmgenet driver, from Florian
      Fainelli"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (55 commits)
  net: bcmgenet: correctly resume adapter from Wake-on-LAN
  net: bcmgenet: update UMAC_CMD only when link is detected
  net: bcmgenet: correctly suspend and resume PHY device
  net: bcmgenet: request and enable main clock earlier
  net: ethernet: myricom: myri10ge: myri10ge.c: Cleaning up missing null-terminate after strncpy call
  xen-netfront: Fix handling packets on compound pages with skb_linearize
  net: fec: Support phys probed from devicetree and fixed-link
  smsc: replace WARN_ON() with WARN_ON_SMP()
  xen-netback: Don't deschedule NAPI when carrier off
  net: ethernet: qlogic: qlcnic: Remove duplicate object file from Makefile
  wan: wanxl: Remove typedefs from struct names
  m68k/atari: EtherNEC - ethernet support (ne)
  net: ethernet: ti: cpmac.c: Cleaning up missing null-terminate after strncpy call
  hdlc: Remove typedefs from struct names
  airo_cs: Remove typedef local_info_t
  atmel: Remove typedef atmel_priv_ioctl
  com20020_cs: Remove typedef com20020_dev_t
  ethernet: amd: Remove typedef local_info_t
  net: Always untag vlan-tagged traffic on input.
  drivers: net: Add APM X-Gene SoC ethernet driver support.
  ...
2014-08-13 18:27:40 -06:00
Linus Torvalds
13b102bf48 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull Sparc fixes from David Miller:
 "Sparc bug fixes, one of which was preventing successful SMP boots with
  mainline"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix pcr_ops initialization and usage bugs.
  sparc64: Do not disable interrupts in nmi_cpu_busy()
  sparc: Hook up seccomp and getrandom system calls.
  sparc: fix decimal printf format specifiers prefixed with 0x
2014-08-13 18:26:50 -06:00
Linus Torvalds
81c02a21b2 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/apic updates from Thomas Gleixner:
 "This is a major overhaul to the x86 apic subsystem consisting of the
  following parts:

   - Remove obsolete APIC driver abstractions (David Rientjes)

   - Use the irqdomain facilities to dynamically allocate IRQs for
     IOAPICs.  This is a prerequisite to enable IOAPIC hotplug support,
     and it also frees up wasted vectors (Jiang Liu)

   - Misc fixlets.

  Despite the hickup in Ingos previous pull request - caused by the
  missing fixup for the suspend/resume issue reported by Borislav - I
  strongly recommend that this update finds its way into 3.17.  Some
  history for you:

  This is preparatory work for physical IOAPIC hotplug.  The first
  attempt to support this was done by Yinghai and I shot it down because
  it just added another layer of obscurity and complexity to the already
  existing mess without tackling the underlying shortcomings of the
  current implementation.

  After quite some on- and offlist discussions, I requested that the
  design of this functionality must use generic infrastructure, i.e.
  irq domains, which provide all the mechanisms to dynamically map linux
  interrupt numbers to physical interrupts.

  Jiang picked up the idea and did a great job of consolidating the
  existing interfaces to manage the x86 (IOAPIC) interrupt system by
  utilizing irq domains.

  The testing in tip, Linux-next and inside of Intel on various machines
  did not unearth any oddities until Borislav exposed it to one of his
  oddball machines.  The issue was resolved quickly, but unfortunately
  the fix fell through the cracks and did not hit the tip tree before
  Ingo sent the pull request.  Not entirely Ingos fault, I also assumed
  that the fix was already merged when Ingo asked me whether he could
  send it.

  Nevertheless this work has a proper design, has undergone several
  rounds of review and the final fallout after applying it to tip and
  integrating it into Linux-next has been more than moderate.  It's the
  ground work not only for IOAPIC hotplug, it will also allow us to move
  the lowlevel vector allocation into the irqdomain hierarchy, which
  will benefit other architectures as well.  Patches are posted already,
  but they are on hold for two weeks, see below.

  I really appreciate the competence and responsiveness Jiang has shown
  in course of this endavour.  So I'm sure that any fallout of this will
  be addressed in a timely manner.

  FYI, I'm vanishing for 2 weeks into my annual kids summer camp kitchen
  duty^Wvacation, while you folks are drooling at KS/LinuxCon :) But HPA
  will have a look at the hopefully zero fallout until I'm back"

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits)
  x86, irq, PCI: Keep IRQ assignment for PCI devices during suspend/hibernation
  x86/apic/vsmp: Make is_vsmp_box() static
  x86, apic: Remove enable_apic_mode callback
  x86, apic: Remove setup_portio_remap callback
  x86, apic: Remove multi_timer_check callback
  x86, apic: Replace noop_check_apicid_used
  x86, apic: Remove check_apicid_present callback
  x86, apic: Remove mps_oem_check callback
  x86, apic: Remove smp_callin_clear_local_apic callback
  x86, apic: Replace trampoline physical addresses with defaults
  x86, apic: Remove x86_32_numa_cpu_node callback
  x86: intel-mid: Use the new io_apic interfaces
  x86, vsmp: Remove is_vsmp_box() from apic_is_clustered_box()
  x86, irq: Clean up irqdomain transition code
  x86, irq, devicetree: Release IOAPIC pin when PCI device is disabled
  x86, irq, SFI: Release IOAPIC pin when PCI device is disabled
  x86, irq, mpparse: Release IOAPIC pin when PCI device is disabled
  x86, irq, ACPI: Release IOAPIC pin when PCI device is disabled
  x86, irq: Introduce helper functions to release IOAPIC pin
  x86, irq: Simplify the way to handle ISA IRQ
  ...
2014-08-13 18:23:32 -06:00
Linus Torvalds
d27c0d9018 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/efix fixes from Peter Anvin:
 "Two EFI-related Kconfig changes, which happen to touch immediately
  adjacent lines in Kconfig and thus collapse to a single patch"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/efi: Enforce CONFIG_RELOCATABLE for EFI boot stub
  x86/efi: Fix 3DNow optimization build failure in EFI stub
2014-08-13 18:21:35 -06:00
Linus Torvalds
7453f33b2e Merge branch 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/xsave changes from Peter Anvin:
 "This is a patchset to support the XSAVES instruction required to
  support context switch of supervisor-only features in upcoming
  silicon.

  This patchset missed the 3.16 merge window, which is why it is based
  on 3.15-rc7"

* 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, xsave: Add forgotten inline annotation
  x86/xsaves: Clean up code in xstate offsets computation in xsave area
  x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi)
  Define kernel API to get address of each state in xsave area
  x86/xsaves: Enable xsaves/xrstors
  x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf
  x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time
  x86/xsaves: Add xsaves and xrstors support for booting time
  x86/xsaves: Clear reserved bits in xsave header
  x86/xsaves: Use xsave/xrstor for saving and restoring user space context
  x86/xsaves: Use xsaves/xrstors for context switch
  x86/xsaves: Use xsaves/xrstors to save and restore xsave area
  x86/xsaves: Define a macro for handling xsave/xrstor instruction fault
  x86/xsaves: Define macros for xsave instructions
  x86/xsaves: Change compacted format xsave area header
  x86/alternative: Add alternative_input_2 to support alternative with two features and input
  x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors
2014-08-13 18:20:04 -06:00
Linus Torvalds
fd1cf90580 Metag architecture changes for v3.17
Just a couple of minor static analysis fixes, removal of a NULL check
 that should never happen, and fix an error check where an unsigned value
 was being checked to see if it was negative.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJT6JHIAAoJEGwLaZPeOHZ6dfoP/0NrVlO+ldKALXEaPXvLyiWN
 MAhWt8xMHoTsgGlnbnr4Gi6mDDuSnH9CvCoelE/Htn4UOnNbH1bZ/XKJ9gyfnoIv
 NM0xlInWd9dN1PObgcLONBG5KF9vSb8yMhzCZ+GJYZY7mWmjp7641csLL+ksC5uD
 +qkl6+/RsEAXNHC9Hnzib0zpUf/q+u1ZWa5LmvoUTtCuOfdCEeAN+oCuM7+/d9vd
 iyrzXTzdLOn8OA6biNWHH9OVzBEvmAvkJbhccBN1VOUi85OWnmCMrBkB11T9hUSp
 +O2lNfAJYZfqf51OMh8OmdBhcN+GxJf8YIgJqlfzbEzKDRCqhUvvYIDI0tXmTKIC
 CSkifBSWfyi8/+Nrijs+8Z/q7EJJ3VHX2AFwQlZM1eH4cZR4MM0AmSAGsu+vwGsA
 QwkHRcKKqGJUXGusw5+ezPYy7Esn23KkWIBjucZm2gRc1zb+KqB8+TnypJMIpl5U
 Kf8yWahptEH9fTFjqu557E29Oe0lb9fKRGPovdLfcQ9jMNHaZHoix19+kMO+dol2
 LgKhvt1OEC/NCZf/FjFwOvkdRWmkbOLyn2NJinmCGpAmT88LUI/vcpGHpjWgdxhd
 svDd/hFxIUnv10E9wPf27UMrS5zVKJ1loPfQ8N6m89/q3Mx1Q+zErgeq6TWfWBAm
 UJG8U2dv4JeIWUHmcSQ+
 =3kq/
 -----END PGP SIGNATURE-----

Merge tag 'metag-for-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag

Pull metag architecture updates from James Hogan:
 "Just a couple of minor static analysis fixes, removal of a NULL check
  that should never happen, and fix an error check where an unsigned
  value was being checked to see if it was negative"

* tag 'metag-for-v3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  metag: cachepart: Fix failure check
  metag: hugetlbpage: Remove null pointer checks that could never happen
2014-08-13 18:18:09 -06:00
Aneesh Kumar K.V
9e813308a5 powerpc/thp: Add tracepoints to track hugepage invalidate
Add tracepoint to track hugepage invalidate. This help us
in debugging difficult to track bugs.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:42 +10:00
Aneesh Kumar K.V
85c1fafd72 powerpc/mm: Use read barrier when creating real_pte
On ppc64 we support 4K hash pte with 64K page size. That requires
us to track the hash pte slot information on a per 4k basis. We do that
by storing the slot details in the second half of pte page. The pte bit
_PAGE_COMBO is used to indicate whether the second half need to be
looked while building real_pte. We need to use read memory barrier while
doing that so that load of hidx is not reordered w.r.t _PAGE_COMBO
check. On the store side we already do a lwsync in __hash_page_4K

CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:41 +10:00
Aneesh Kumar K.V
7e467245bf powerpc/thp: Use ACCESS_ONCE when loading pmdp
We would get wrong results in compiler recomputed old_pmd. Avoid
that by using ACCESS_ONCE

CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:41 +10:00
Aneesh Kumar K.V
969b7b208f powerpc/thp: Invalidate with vpn in loop
As per ISA, for 4k base page size we compare 14..65 bits of VA specified
with the entry_VA in tlb. That implies we need to make sure we do a
tlbie with all the possible 4k va we used to access the 16MB hugepage.
With 64k base page size we compare 14..57 bits of VA. Hence we cannot
ignore the lower 24 bits of va while tlbie .We also cannot tlb
invalidate a 16MB entry with just one tlbie instruction because
we don't track which va was used to instantiate the tlb entry.

CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:40 +10:00
Aneesh Kumar K.V
fc04795575 powerpc/thp: Handle combo pages in invalidate
If we changed base page size of the segment, either via sub_page_protect
or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
table entries. We do a lazy hash page table flush for all mapped pages
in the demoted segment. This happens when we handle hash page fault for
these pages.

We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
that implies that we could possibly have older 64K hash pte entries in
the hash page table and we need to invalidate those entries.

Use _PAGE_COMBO to determine the page size with which we should
invalidate the hash table entries on unmap.

CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:39 +10:00
Aneesh Kumar K.V
629149fae4 powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte
If we changed base page size of the segment, either via sub_page_protect
or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash
table entries. We do a lazy hash page table flush for all mapped pages
in the demoted segment. This happens when we handle hash page fault
for these pages.

We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a
pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte,
that implies that we could possibly have older 64K hash pte entries in
the hash page table and we need to invalidate those entries.

Handle this correctly for 16M pages

CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:39 +10:00
Aneesh Kumar K.V
fa1f8ae80f powerpc/thp: Don't recompute vsid and ssize in loop on invalidate
The segment identifier and segment size will remain the same in
the loop, So we can compute it outside. We also change the
hugepage_invalidate interface so that we can use it the later patch

CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:38 +10:00
Aneesh Kumar K.V
b0aa44a3df powerpc/thp: Add write barrier after updating the valid bit
With hugepages, we store the hpte valid information in the pte page
whose address is stored in the second half of the PMD. Use a
write barrier to make sure clearing pmd busy bit and updating
hpte valid info are ordered properly.

CC: <stable@vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 18:20:37 +10:00
Nishanth Aravamudan
2fabf084b6 powerpc: reorder per-cpu NUMA information's initialization
There is an issue currently where NUMA information is used on powerpc
(and possibly ia64) before it has been read from the device-tree, which
leads to large slab consumption with CONFIG_SLUB and memoryless nodes.

NUMA powerpc non-boot CPU's cpu_to_node/cpu_to_mem is only accurate
after start_secondary(), similar to ia64, which is invoked via
smp_init().

Commit 6ee0578b4daae ("workqueue: mark init_workqueues() as
early_initcall()") made init_workqueues() be invoked via
do_pre_smp_initcalls(), which is obviously before the secondary
processors are online.

Additionally, the following commits changed init_workqueues() to use
cpu_to_node to determine the node to use for kthread_create_on_node:

bce903809ab3f ("workqueue: add wq_numa_tbl_len and
wq_numa_possible_cpumask[]")
f3f90ad469342 ("workqueue: determine NUMA node of workers accourding to
the allowed cpumask")

Therefore, when init_workqueues() runs, it sees all CPUs as being on
Node 0. On LPARs or KVM guests where Node 0 is memoryless, this leads to
a high number of slab deactivations
(http://www.spinics.net/lists/linux-mm/msg67489.html).

Fix this by initializing the powerpc-specific CPU<->node/local memory
node mapping as early as possible, which on powerpc is
do_init_bootmem(). Currently that function initializes the mapping for
the boot CPU, but we extend it to setup the mapping for all possible
CPUs. Then, in smp_prepare_cpus(), we can correspondingly set the
per-cpu values for all possible CPUs. That ensures that before the
early_initcalls run (and really as early as possible), the per-cpu NUMA
mapping is accurate.

While testing memoryless nodes on PowerKVM guests with a fix to the
workqueue logic to use cpu_to_mem() instead of cpu_to_node(), with a
guest topology of:

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
node 1 size: 16336 MB
node 1 free: 15329 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10

the slab consumption decreases from

Slab:             932416 kB
SUnreclaim:       902336 kB

to

Slab:             395264 kB
SUnreclaim:       359424 kB

And we a corresponding increase in the slab efficiency from

slab                                   mem     objs    slabs
                                      used   active   active
------------------------------------------------------------
kmalloc-16384                       337 MB   11.28%  100.00%
task_struct                         288 MB    9.93%  100.00%

to

slab                                   mem     objs    slabs
                                      used   active   active
------------------------------------------------------------
kmalloc-16384                        37 MB  100.00%  100.00%
task_struct                          31 MB  100.00%  100.00%

Powerpc didn't support memoryless nodes until recently (64bb80d87f01
"powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES" and 8c272261194d
"powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"). Those commits also
helped improve memory consumption with these kind of environments.

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:14:05 +10:00
Himangi Saraogi
d658972284 powerpc/perf/hv-24x7: Use kmem_cache_free
Free memory allocated using kmem_cache_zalloc using kmem_cache_free
rather than kfree.

The Coccinelle semantic patch that makes this change is as follows:

// <smpl>
@@
expression x,E,c;
@@

 x = \(kmem_cache_alloc\|kmem_cache_zalloc\|kmem_cache_alloc_node\)(c,...)
 ... when != x = E
     when != &x
?-kfree(x)
+kmem_cache_free(c,x)
// </smpl>

Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:14:04 +10:00
Thomas Falcon
587870e865 powerpc/pseries/hvcserver: Fix endian issue in hvcs_get_partner_info
A buffer returned by H_VTERM_PARTNER_INFO contains device information
in big endian format, causing problems for little endian architectures.
This patch ensures that they are in cpu endian.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:14:04 +10:00
Anton Blanchard
a71d64b4dc powerpc: Hard disable interrupts in xmon
xmon only soft disables interrupts. This seems like a bad idea - we
certainly don't want decrementer and PMU exceptions going off when
we are debugging something inside xmon.

This issue was uncovered when the hard lockup detector went off
inside xmon. To ensure we wont get a spurious hard lockup warning,
I also call touch_nmi_watchdog() when exiting xmon.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:48 +10:00
Nishanth Aravamudan
56758e3c3c powerpc: remove duplicate definition of TEXASR_FS
It appears that commits 7f06f21d40a6 ("powerpc/tm: Add checking to
treclaim/trechkpt") and e4e38121507a ("KVM: PPC: Book3S HV: Add
transactional memory support") both added definitions of TEXASR_FS.
Remove one of them. At the same time, fix the alignment of the remaining
definition (should be tab-separated like the rest of the #defines).

Signed-off-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:47 +10:00
Gavin Shan
5efbabe09d powerpc/pseries: Avoid deadlock on removing ddw
Function remove_ddw() could be called in of_reconfig_notifier and
we potentially remove the dynamic DMA window property, which invokes
of_reconfig_notifier again. Eventually, it leads to the deadlock as
following backtrace shows.

The patch fixes the above issue by deferring releasing the dynamic
DMA window property while releasing the device node.

=============================================
[ INFO: possible recursive locking detected ]
3.16.0+ #428 Tainted: G        W
---------------------------------------------
drmgr/2273 is trying to acquire lock:
 ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \
 .__blocking_notifier_call_chain+0x40/0x78

but task is already holding lock:
 ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \
 .__blocking_notifier_call_chain+0x40/0x78

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock((of_reconfig_chain).rwsem);
  lock((of_reconfig_chain).rwsem);
 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by drmgr/2273:
 #0:  (sb_writers#4){.+.+.+}, at: [<c0000000001cbe70>] \
      .vfs_write+0xb0/0x1f8
 #1:  ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \
      .__blocking_notifier_call_chain+0x40/0x78

stack backtrace:
CPU: 17 PID: 2273 Comm: drmgr Tainted: G        W     3.16.0+ #428
Call Trace:
[c0000000137e7000] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
[c0000000137e70b0] [c00000000083cd34] .dump_stack+0x7c/0x9c
[c0000000137e7130] [c0000000000b8afc] .__lock_acquire+0x128c/0x1c68
[c0000000137e7280] [c0000000000b9a4c] .lock_acquire+0xe8/0x104
[c0000000137e7350] [c00000000083588c] .down_read+0x4c/0x90
[c0000000137e73e0] [c000000000091890] .__blocking_notifier_call_chain+0x40/0x78
[c0000000137e7490] [c000000000091900] .blocking_notifier_call_chain+0x38/0x48
[c0000000137e7520] [c000000000682a28] .of_reconfig_notify+0x34/0x5c
[c0000000137e75b0] [c000000000682a9c] .of_property_notify+0x4c/0x54
[c0000000137e7650] [c000000000682bf0] .of_remove_property+0x30/0xd4
[c0000000137e76f0] [c000000000052a44] .remove_ddw+0x144/0x168
[c0000000137e7790] [c000000000053204] .iommu_reconfig_notifier+0x30/0xe0
[c0000000137e7820] [c00000000009137c] .notifier_call_chain+0x6c/0xb4
[c0000000137e78c0] [c0000000000918ac] .__blocking_notifier_call_chain+0x5c/0x78
[c0000000137e7970] [c000000000091900] .blocking_notifier_call_chain+0x38/0x48
[c0000000137e7a00] [c000000000682a28] .of_reconfig_notify+0x34/0x5c
[c0000000137e7a90] [c000000000682e14] .of_detach_node+0x44/0x1fc
[c0000000137e7b40] [c0000000000518e4] .ofdt_write+0x3ac/0x688
[c0000000137e7c20] [c000000000238430] .proc_reg_write+0xb8/0xd4
[c0000000137e7cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8
[c0000000137e7d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0
[c0000000137e7e30] [c00000000000a064] syscall_exit+0x0/0x98

Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:47 +10:00
Gavin Shan
f1b3929c23 powerpc/pseries: Failure on removing device node
While running command "drmgr -c phb -r -s 'PHB 528'", following
backtrace jumped out because the target device node isn't marked
with OF_DETACHED by of_detach_node(), which caused by error
returned from memory hotplug related reconfig notifier when
disabling CONFIG_MEMORY_HOTREMOVE. The patch fixes it.

ERROR: Bad of_node_put() on /pci@800000020000210/ethernet@0
CPU: 14 PID: 2252 Comm: drmgr Tainted: G        W     3.16.0+ #427
Call Trace:
[c000000012a776a0] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
[c000000012a77750] [c00000000083cd34] .dump_stack+0x7c/0x9c
[c000000012a777d0] [c0000000006807c4] .of_node_release+0x58/0xe0
[c000000012a77860] [c00000000038a7d0] .kobject_release+0x174/0x1b8
[c000000012a77900] [c00000000038a884] .kobject_put+0x70/0x78
[c000000012a77980] [c000000000681680] .of_node_put+0x28/0x34
[c000000012a77a00] [c000000000681ea8] .__of_get_next_child+0x64/0x70
[c000000012a77a90] [c000000000682138] .of_find_node_by_path+0x1b8/0x20c
[c000000012a77b40] [c000000000051840] .ofdt_write+0x308/0x688
[c000000012a77c20] [c000000000238430] .proc_reg_write+0xb8/0xd4
[c000000012a77cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8
[c000000012a77d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0
[c000000012a77e30] [c00000000000a064] syscall_exit+0x0/0x98

Cc: stable@vger.kernel.org
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:46 +10:00
Benjamin Herrenschmidt
ce8f150a17 powerpc/boot: Use correct zlib types for comparison
Avoids this warning:

arch/powerpc/boot/gunzip_util.c:118:9: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:45 +10:00
Vasant Hegde
b09c2ec408 powerpc/powernv: Interface to register/unregister opal dump region
PowerNV platform is capable of capturing host memory region when system
crashes (because of host/firmware). We have new OPAL API to register/
unregister memory region to be captured when system crashes.

This patch adds support for new API. Also during boot time we register
kernel log buffer and unregister before doing kexec.

Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:45 +10:00