linux

iv/linux

Author	SHA1	Message	Date
Linus Torvalds	93874681aa	The common clock framework changes for 3.8 are comprised of lots of fixes for existing platforms as well as new ports for some ARM platforms. In addition there are new clk drivers for audio devices and MFDs. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQxtdeAAoJEDqPOy9afJhJwzUP/2/oaBAGXakQf+TTOsRo2IMh ejwgOxFsBcspR0OrJ73TAPDqbgY3xZ+BeVdvbIiYikcZLqT9dZsoN7oa9udcu6aL 1OxBT6F/CFnxUR4EVkpUdQ+vVIR8svxsAAv71zvaVGCeie0D7MDL2JgK8TvgRxHF DKxFYJ935CJC64JHJBYhW/1b4T/Tt94z/nYMijcQxkjmpEimTm/qLHpbK6OCQFUU fmvs3VmSA4p7hBmgXu3zp6NkOF3JJa7NWb+3kJh1UmqM7xh/CijxZP2YHhLkIdU1 g2qhYVKIIxmAFa7xJjXY05VrjMKvAkXGNJGVwCFQHnP17By4Pni3BDsQ+61u30Nj B/bIRrzAC17EOh6c6pAZIbNLTHHaQGe0XQMDuHGsjgmVpn2CTRmIduEVJPiq9wAk lNkwqh6Dftq72Xepy1RieqFuDOO8kHSsOPqS2e9A9yDuh5bzLsKlhKWKUahhxrML TnRBd7NfwctoEsKy42HtrXA2+iQsQDmHXNlec3ARNgWS3Hhre7qb1d0Q00y28OTA RWyPoxOn1O+wQsV2cu3I1LKVo9CmNU55evHG5zSDPIA3GsrMcPZmP/4KM9Vbs3Ye 5BIMtptUrOeZQ2PRxcTHnCbWvch5bQyvDkDiK/xR7XsiQIheE/0Ak8wGgVZ7TW4d 0zLm7UmmkmFu4xTwf2Nk =GoXf -----END PGP SIGNATURE----- Merge tag 'clk-for-linus' of git://git.linaro.org/people/mturquette/linux Pull clock framework changes from Mike Turquette: "The common clock framework changes for 3.8 are comprised of lots of fixes for existing platforms as well as new ports for some ARM platforms. In addition there are new clk drivers for audio devices and MFDs." Fix up trivial conflict in <linux/clk-provider.h> (removal of 'inline' clashing with return type fixes) * tag 'clk-for-linus' of git://git.linaro.org/people/mturquette/linux: (51 commits) MAINTAINERS: bad email address for Mike Turquette clk: introduce optional disable_unused callback clk: ux500: fix bit error clk: clock multiplexers may register out of order clk: ux500: Initial support for abx500 clock driver CLK: SPEAr: Remove unused dummy apb_pclk CLK: SPEAr: Correct index scanning done for clock synths CLK: SPEAr: Update clock rate table CLK: SPEAr: Add missing clocks CLK: SPEAr: Set CLK_SET_RATE_PARENT for few clocks CLK: SPEAr13xx: fix parent names of multiple clocks CLK: SPEAr13xx: Fix mux clock names CLK: SPEAr: Fix dev_id & con_id for multiple clocks clk: move IM-PD1 clocks to drivers/clk clk: make ICST driver handle the VCO registers clk: add GPLv2 headers to the Versatile clock files clk: mxs: Use a better name for the USB PHY clock clk: spear: Add stub functions for spear3[0\|1\|2]0_clk_init() CLK: clk-twl6040: fix return value check in twl6040_clk_probe() clk: ux500: Register nomadik keypad clock lookups for u8500 ...	2012-12-11 11:25:08 -08:00
Linus Torvalds	505cbedab9	This is the pinctrl big pull request for v3.8. As can be seen from the diffstat the major changes are: - A big conversion of the AT91 pinctrl driver and the associated ACKed platform changes under arch/arm/max-at91 and its device trees. This has been coordinated with the AT91 maintainers to go in through the pinctrl tree. - A larger chunk of changes to the SPEAr drivers and the addition of the "plgpio" driver for the SPEAr as well. - The removal of the remnants of the Nomadik driver from the arch/arm tree and fusion of that into the Nomadik driver and platform data header files. - Some local movement in the Marvell MVEBU drivers, these now have their own subdirectory. - The addition of a chunk of code to gpiolib under drivers/gpio to register gpio-to-pin range mappings from the GPIO side of things. This has been requested by Grant Likely and is now implemented, it is particularly useful for device tree work. Then we have incremental updates all over the place, many of these are cleanups and fixes from Axel Lin who has done a great job of removing minor mistakes and compilation annoyances. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) iQIcBAABAgAGBQJQupLkAAoJEEEQszewGV1z8ykP/3yLi5hb3QstajrL3jvrHcqN 7sc4uW1/9pCa6802nBw7qOfIGxgTAriGtAIePdtGhIkij6TLyyWfvlmUz3iJkeZ5 4nYy69yOdNMeBGvhXBkBD4K4lL3NGZ9eR+S1rgWY0J3Y+a5upibJeaXxmYBayjBH bN/OiK77zaKv91zKSZ4YW9WCzrjn2E0w1mDRcWdffcyrNplY8qm/G2iXBT+UoCLa UoR1zxG9nqF+nQ8mL+dVtVjlHsUcj0NEp34HQrUQ8ACOaEIiSI/zDe7afOC38Iy7 EUTV4IwKeKJyTnAN/QSzbTXF41CR/Qbihubo6sUrbAmyJXLnybVotd4Inh4ca7II c2TPV89tSnJWwDSizHwbY3sXIVw8ojmjYMr1ib0Z9GBGyoij1va5WqCJ4iIzTzuc imvDSz8ctuuxo6iOQs3smUaHXGz1V+3zvQ5v+Ioc1h9mN2LVKNa6NjmFNZmeFHLa 44zIes51DUXizaRobOffjoTIlUkAdwYQUpRtq0hvQtgYTyUIeXzfzCNzDoT6bhK3 VhLn4c4apETER6KtYCPu8PtxM/yyopwUj95WvnPK2fu/m+1B26jUVawomWfRtCQF kuovLCTTemn04jWWl3r0JovE/tVcgBrpxTYi6Z4RPY7PuD4sQ477DeM2x3DWZPQQ MHveLGA87735XKZkqQRR =rUOP -----END PGP SIGNATURE----- Merge tag 'pinctrl-for-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pinctrl changes from Linus Walleij: "These are the first and major pinctrl changes for the v3.8 merge cycle. Some of this is used as merge base for other trees so I better be early on the trigger. As can be seen from the diffstat the major changes are: - A big conversion of the AT91 pinctrl driver and the associated ACKed platform changes under arch/arm/max-at91 and its device trees. This has been coordinated with the AT91 maintainers to go in through the pinctrl tree. - A larger chunk of changes to the SPEAr drivers and the addition of the "plgpio" driver for the SPEAr as well. - The removal of the remnants of the Nomadik driver from the arch/arm tree and fusion of that into the Nomadik driver and platform data header files. - Some local movement in the Marvell MVEBU drivers, these now have their own subdirectory. - The addition of a chunk of code to gpiolib under drivers/gpio to register gpio-to-pin range mappings from the GPIO side of things. This has been requested by Grant Likely and is now implemented, it is particularly useful for device tree work. Then we have incremental updates all over the place, many of these are cleanups and fixes from Axel Lin who has done a great job of removing minor mistakes and compilation annoyances." * tag 'pinctrl-for-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: (114 commits) ARM: mmp: select PINCTRL for ARCH_MMP pinctrl: Drop selecting PINCONF for MMP2, PXA168 and PXA910 pinctrl: pinctrl-single: Fix error check condition pinctrl: SPEAr: Update error check for unsigned variables gpiolib: Fix use after free in gpiochip_add_pin_range gpiolib: rename pin range arguments pinctrl: single: support gpio request and free pinctrl: generic: add input schmitt disable parameter pinctrl/u300/coh901: stop spawning pinctrl from GPIO pinctrl/u300/coh901: let the gpio_chip register the range pinctrl: add function to retrieve range from pin gpiolib: return any error code from range creation pinctrl: make range registration defer properly gpiolib: rename find_pinctrl_* gpiolib: let gpiochip_add_pin_range() specify offset ARM: at91: pm9g45: add mmc support ARM: at91: Animeo IP: add mmc support ARM: at91: dt: add mmc pinctrl for Atmel reference boards ARM: at91: dt: at91sam9: add mmc pinctrl support ARM: at91/dts: add nodes for atmel hsmci controllers for atmel boards ...	2012-12-11 11:21:33 -08:00
Linus Torvalds	a8936db7c2	New driver: DA9055 Added/improved support for new chips in existing drivers: Z650/670, N550/570, ADS7830, AMD 16h family -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iQIcBAABAgAGBQJQxslVAAoJEMsfJm/On5mBCCIP/2zEvtwW2nEjlqlBIuIaKsW5 +8nGOZvyd9Td0ApuC/b4UgzG4BYkOuiHFyUJGwJBcEMetNJqr650TRfS9RrLsi1w vOrCsIgjioWpzCEwD8EkgHLXFNk9lXYK9jlkpl434WuPvSUKwN4QE46XtHoSfAye NUJBj9hwDv87P68lcg7MhEyZxEPzcInPTtXMGFOiPprAKHQHeDJReSnqx8oKbmBb 5ZD+zy+Byx1UyX1lqc77xjaInvGGW1OfUkljJb7iFdlXTqyNe0g2Chn07PMF5dJJ 7stNCQ/z1l+Mo4mmgvGTgHcjooeZL+HpD3XQG0rERCV0X8prKHV2ctGPEQUoCt6c OpV08OgsaLii+2x7+otL6svyD22FoFFxS5UP1EkYZPyGekf/C5I9Wh9f77xJ+hes PNki3TrnNUzVAZKYOH9JgW2MJ9oYjvKkskZoSytnEoPAs1fS4rJ0a5bQf2sryXdT xsDw57fdjqK7A9hPzc/lIae6TYK0iMnQ4wO9nWD7ftQlH6AOT5O4vYAbwoCEks/Q 6Bv3XCaBkt1GbWyCC+aXUWjvSdU0A3N/H7jaocwR9vcgoJmfwinJi10hVcgE1Kfu JVKQMvDFe4jV1j8hiQhGPSjxe+9suoaVBrWFB51W4DyDfJZQPOFd6+3om9/vGDb3 Y3opAqGVV7IsenVtPif8 =HQlU -----END PGP SIGNATURE----- Merge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: "New driver: DA9055 Added/improved support for new chips in existing drivers: Z650/670, N550/570, ADS7830, AMD 16h family" * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (da9055) Fix chan_mux[DA9055_ADC_ADCIN3] setting hwmon: DA9055 HWMON driver hwmon: (coretemp) List TjMax for Z650/670 and N550/570 hwmon: (coretemp) Drop N4xx, N5xx, D4xx, D5xx CPUs from tjmax table hwmon: (coretemp) Use model table instead of if/else to identify CPU models hwmon: da9052: Use da9052_reg_update for rmw operations hwmon: (coretemp) Drop dependency on PCI for TjMax detection on Atom CPUs hwmon: (ina2xx) use module_i2c_driver to simplify the code hwmon: (ads7828) add support for ADS7830 hwmon: (ads7828) driver cleanup x86,AMD: Power driver support for AMD's family 16h processors	2012-12-11 11:20:34 -08:00
Linus Torvalds	11b84c5857	MMC highlights for 3.8: Core: - Expose access to the eMMC RPMB ("Replay Protected Memory Block") area by extending the existing mmc_block ioctl. - Add SDIO powered-suspend DT properties to the core MMC DT binding. - Add no-1-8-v DT flag for boards where the SD controller reports that it supports 1.8V but the board itself has no way to switch to 1.8V. - More work on switching to 1.8V UHS support using a vqmmc regulator. - Fix up a case where the slot-gpio helper may fail to reset the host controller properly if a card was removed during a transfer. - Fix several cases where a broken device could cause an infinite loop while we wait for a register to update. Drivers: - at91-mci: Remove obsolete driver, atmel-mci handles these devices now. - sdhci-dove: Allow using GPIOs for card-detect notifications. - sdhci-esdhc: Fix for recovering from ADMA errors on broken silicon. - sdhci-s3c: Add pinctrl support. - wmt-sdmmc: New driver for WonderMedia SD/MMC controllers. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAABAgAGBQJQxpwoAAoJEHNBYZ7TNxYMir0P/in5xv6vVq+06hMb9JTA2+M0 +eSgNAoSQt3Zn3dTxGtcaMbrnVvegMD2sRmnHncSKoi6iUgWh7RhuT7o8vp2zgJe KbuOF/OF5+RSnLwgCMcOCjauhBtHO7s+NPNbUH4DMN2/xQ7fpiY+QIsxrKWtImyO E4v62zRm/6aa1IfGDgYTAlX8Y1QfEQhH55OYrF9UzU1m7TKsSGxYTXwx+LeLQZQ0 j6HO+FVlmu5jbZ4V69z3qNIdiklRGQBE7D+7cJqW6btv7x4oLmyygEoNA8M8ncxX g5BD5oEt9oxW5OW8hYcEJ/flDErgekZekH8n+fZoNVDVq8gMJxHOV73mnN+wxRnd lRGLhniVOzl4cU+ObcYTBzdUV+2yR3sRHSK8nwtb+HFATetM70gBlpoWni1+sGr5 qvFhnpzNrj0xCwoQpLerwrvCwPpyko9uxZQwO51HqhugFwWlJuTqQsnCgj+Q4MDE CRUq4R6TCH7oMGeGin0qyD7jyigDnh7g4pPNQmuIYJSLsDwvinisz98MQ0XC8szc Z1H9YXKch2woLHLPFWzNyAIlJWHXakjk5vb9uSQmkUIto/y7q+LiGVCc5uvny3rU pJmda+U/8R8YmSWS7pUi6OmPRfcnPuAIllEwo3lBypVpdP/fcZ6aribrbGEo6DLb XLFIHnl3O9/sjfUcMl+L =oSX3 -----END PGP SIGNATURE----- Merge tag 'mmc-updates-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc Pull MMC updates from Chris Ball: "MMC highlights for 3.8: Core: - Expose access to the eMMC RPMB ("Replay Protected Memory Block") area by extending the existing mmc_block ioctl. - Add SDIO powered-suspend DT properties to the core MMC DT binding. - Add no-1-8-v DT flag for boards where the SD controller reports that it supports 1.8V but the board itself has no way to switch to 1.8V. - More work on switching to 1.8V UHS support using a vqmmc regulator. - Fix up a case where the slot-gpio helper may fail to reset the host controller properly if a card was removed during a transfer. - Fix several cases where a broken device could cause an infinite loop while we wait for a register to update. Drivers: - at91-mci: Remove obsolete driver, atmel-mci handles these devices now. - sdhci-dove: Allow using GPIOs for card-detect notifications. - sdhci-esdhc: Fix for recovering from ADMA errors on broken silicon. - sdhci-s3c: Add pinctrl support. - wmt-sdmmc: New driver for WonderMedia SD/MMC controllers." * tag 'mmc-updates-for-3.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (65 commits) mmc: sdhci: implement the .card_event() method mmc: extend the slot-gpio card-detection to use host's .card_event() method mmc: add a card-event host operation mmc: sdhci-s3c: Fix compilation warning mmc: sdhci-pci: Enable SDHCI_CAN_DO_HISPD for Ricoh SDHCI controller mmc: sdhci-dove: allow GPIOs to be used for card detection on Dove mmc: sdhci-dove: use two-stage initialization for sdhci-pltfm mmc: sdhci-dove: use devm_clk_get() mmc: eSDHC: Recover from ADMA errors mmc: dw_mmc: remove duplicated buswidth code mmc: dw_mmc: relocate where dw_mci_setup_bus() is called from mmc: Limit MMC speed to 52MHz if not HS200 mmc: dw_mmc: use devres functions in dw_mmc mmc: sh_mmcif: remove unneeded clock connection ID mmc: sh_mobile_sdhi: remove unneeded clock connection ID mmc: sh_mobile_sdhi: fix clock frequency printing mmc: Remove redundant null check before kfree in bus.c mmc: Remove redundant null check before kfree in sdio_bus.c mmc: sdhci-imx-esdhc: use more devm_* functions mmc: dt: add no-1-8-v device tree flag ...	2012-12-11 11:19:09 -08:00
Eric Dumazet	f8e8f97c11	net: fix a race in gro_cell_poll() Dmitry Kravkov reported packet drops for GRE packets since GRO support was added. There is a race in gro_cell_poll() because we call napi_complete() without any synchronization with a concurrent gro_cells_receive() Once bug was triggered, we queued packets but did not schedule NAPI poll. We can fix this issue using the spinlock protected the napi_skbs queue, as we have to hold it to perform skb dequeue anyway. As we open-code skb_dequeue(), we no longer need to mask IRQS, as both producer and consumer run under BH context. Bug added in commit c9e6bc644e (net: add gro_cells infrastructure) Reported-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2012-12-11 12:49:53 -05:00
Guennadi Liakhovetski	93c667ca25	of: node argument to of_parse_phandle_with_args should be const The "struct device_node " argument of of_parse_phandle_with_args() can be const. Making this change makes it explicit that the function will not modify a node. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> [grant.likely: Resolved conflict with previous patch modifying of_parse_phandle()] Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2012-12-11 17:30:16 +00:00
Guennadi Liakhovetski	1daa8a0b20	of/i2c: add dummy inline functions for when CONFIG_OF_I2C(_MODULE) isn't defined If CONFIG_OF_I2C and CONFIG_OF_I2C_MODULE are undefined no declaration of of_find_i2c_device_by_node and of_find_i2c_adapter_by_node will be available. Add dummy inline functions to avoid compilation breakage. Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>	2012-12-11 17:30:16 +00:00
Ingo Molnar	4fc3f1d66b	mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable rmap_walk_anon() and try_to_unmap_anon() appears to be too careful about locking the anon vma: while it needs protection against anon vma list modifications, it does not need exclusive access to the list itself. Transforming this exclusive lock to a read-locked rwsem removes a global lock from the hot path of page-migration intense threaded workloads which can cause pathological performance like this: 96.43% process 0 [kernel.kallsyms] [k] perf_trace_sched_switch \| --- perf_trace_sched_switch __schedule schedule schedule_preempt_disabled __mutex_lock_common.isra.6 __mutex_lock_slowpath mutex_lock \| \|--50.61%-- rmap_walk \| move_to_new_page \| migrate_pages \| migrate_misplaced_page \| __do_numa_page.isra.69 \| handle_pte_fault \| handle_mm_fault \| __do_page_fault \| do_page_fault \| page_fault \| __memset_sse2 \| \| \| --100.00%-- worker_thread \| \| \| --100.00%-- start_thread \| --49.39%-- page_lock_anon_vma try_to_unmap_anon try_to_unmap migrate_pages migrate_misplaced_page __do_numa_page.isra.69 handle_pte_fault handle_mm_fault __do_page_fault do_page_fault page_fault __memset_sse2 \| --100.00%-- worker_thread start_thread With this change applied the profile is now nicely flat and there's no anon-vma related scheduling/blocking. Rename anon_vma_[un]lock() => anon_vma_[un]lock_write(), to make it clearer that it's an exclusive write-lock in that case - suggested by Rik van Riel. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Turner <pjt@google.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Christoph Lameter <cl@linux.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:43:00 +00:00
Ingo Molnar	5a505085f0	mm/rmap: Convert the struct anon_vma::mutex to an rwsem Convert the struct anon_vma::mutex to an rwsem, which will help in solving a page-migration scalability problem. (Addressed in a separate patch.) The conversion is simple and straightforward: in every case where we mutex_lock()ed we'll now down_write(). Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Turner <pjt@google.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Christoph Lameter <cl@linux.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:43:00 +00:00
Mel Gorman	220018d388	mm: numa: Add THP migration for the NUMA working set scanning fault case build fix Commit "Add THP migration for the NUMA working set scanning fault case" breaks the build because HPAGE_PMD_SHIFT and HPAGE_PMD_MASK defined to explode without CONFIG_TRANSPARENT_HUGEPAGE: mm/migrate.c: In function 'migrate_misplaced_transhuge_page_put': mm/migrate.c:1549: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed mm/migrate.c:1564: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed mm/migrate.c:1566: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed mm/migrate.c:1573: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed mm/migrate.c:1606: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed mm/migrate.c:1648: error: call to '__build_bug_failed' declared with attribute error: BUILD_BUG failed CONFIG_NUMA_BALANCING allows compilation without enabling transparent hugepages, so define the dummy function for such a configuration and only define migrate_misplaced_transhuge_page_put() when transparent hugepages are enabled. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:58 +00:00
Mel Gorman	b32967ff10	mm: numa: Add THP migration for the NUMA working set scanning fault case. Note: This is very heavily based on a patch from Peter Zijlstra with fixes from Ingo Molnar, Hugh Dickins and Johannes Weiner. That patch put a lot of migration logic into mm/huge_memory.c where it does not belong. This version puts tries to share some of the migration logic with migrate_misplaced_page. However, it should be noted that now migrate.c is doing more with the pagetable manipulation than is preferred. The end result is barely recognisable so as before, the signed-offs had to be removed but will be re-added if the original authors are ok with it. Add THP migration for the NUMA working set scanning fault case. It uses the page lock to serialize. No migration pte dance is necessary because the pte is already unmapped when we decide to migrate. [dhillf@gmail.com: Fix memory leak on isolation failure] [dhillf@gmail.com: Fix transfer of last_nid information] Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:57 +00:00
Mel Gorman	5bca230353	mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node Due to the fact that migrations are driven by the CPU a task is running on there is no point tracking NUMA faults until one task runs on a new node. This patch tracks the first node used by an address space. Until it changes, PTE scanning is disabled and no NUMA hinting faults are trapped. This should help workloads that are short-lived, do not care about NUMA placement or have bound themselves to a single node. This takes advantage of the logic in "mm: sched: numa: Implement slow start for working set sampling" to delay when the checks are made. This will take advantage of processes that set their CPU and node bindings early in their lifetime. It will also potentially allow any initial load balancing to take place. Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:56 +00:00
Mel Gorman	1a687c2e9a	mm: sched: numa: Control enabling and disabling of NUMA balancing This patch adds Kconfig options and kernel parameters to allow the enabling and disabling of automatic NUMA balancing. The existance of such a switch was and is very important when debugging problems related to transparent hugepages and we should have the same for automatic NUMA placement. Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:55 +00:00
Mel Gorman	b8593bfda1	mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate The PTE scanning rate and fault rates are two of the biggest sources of system CPU overhead with automatic NUMA placement. Ideally a proper policy would detect if a workload was properly placed, schedule and adjust the PTE scanning rate accordingly. We do not track the necessary information to do that but we at least know if we migrated or not. This patch scans slower if a page was not migrated as the result of a NUMA hinting fault up to sysctl_numa_balancing_scan_period_max which is now higher than the previous default. Once every minute it will reset the scanner in case of phase changes. This is hilariously crude and the numbers are arbitrary. Workloads will converge quite slowly in comparison to what a proper policy should be able to do. On the plus side, we will chew up less CPU for workloads that have no need for automatic balancing. Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:55 +00:00
Mel Gorman	57e0a03091	mm: numa: Introduce last_nid to the page frame This patch introduces a last_nid field to the page struct. This is used to build a two-stage filter in the next patch that is aimed at mitigating a problem whereby pages migrate to the wrong node when referenced by a process that was running off its home node. Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:52 +00:00
Mel Gorman	e14808b49f	mm: numa: Rate limit setting of pte_numa if node is saturated If there are a large number of NUMA hinting faults and all of them are resulting in migrations it may indicate that memory is just bouncing uselessly around. NUMA balancing cost is likely exceeding any benefit from locality. Rate limit the PTE updates if the node is migration rate-limited. As noted in the comments, this distorts the NUMA faulting statistics. Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:51 +00:00
Andrea Arcangeli	8177a420ed	mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting This defines the per-node data used by Migrate On Fault in order to rate limit the migration. The rate limiting is applied independently to each destination node. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:50 +00:00
Mel Gorman	5606e3877a	mm: numa: Migrate on reference policy This is the simplest possible policy that still does something of note. When a pte_numa is faulted, it is moved immediately. Any replacement policy must at least do better than this and in all likelihood this policy regresses normal workloads. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:42:48 +00:00
Mel Gorman	03c5a6e163	mm: numa: Add pte updates, hinting and migration stats It is tricky to quantify the basic cost of automatic NUMA placement in a meaningful manner. This patch adds some vmstats that can be used as part of a basic costing model. u = basic unit = sizeof(void ) Ca = cost of struct page access = sizeof(struct page) / u Cpte = Cost PTE access = Ca Cupdate = Cost PTE update = (2 Cpte) + (2 * Wlock) where Cpte is incurred twice for a read and a write and Wlock is a constant representing the cost of taking or releasing a lock Cnumahint = Cost of a minor page fault = some high constant e.g. 1000 Cpagerw = Cost to read or write a full page = Ca + PAGE_SIZE/u Ci = Cost of page isolation = Ca + Wi where Wi is a constant that should reflect the approximate cost of the locking operation Cpagecopy = Cpagerw + (Cpagerw * Wnuma) + Ci + (Ci * Wnuma) where Wnuma is the approximate NUMA factor. 1 is local. 1.2 would imply that remote accesses are 20% more expensive Balancing cost = Cpte * numa_pte_updates + Cnumahint * numa_hint_faults + Ci * numa_pages_migrated + Cpagecopy * numa_pages_migrated Note that numa_pages_migrated is used as a measure of how many pages were isolated even though it would miss pages that failed to migrate. A vmstat counter could have been added for it but the isolation cost is pretty marginal in comparison to the overall cost so it seemed overkill. The ideal way to measure automatic placement benefit would be to count the number of remote accesses versus local accesses and do something like benefit = (remote_accesses_before - remove_access_after) * Wnuma but the information is not readily available. As a workload converges, the expection would be that the number of remote numa hints would reduce to 0. convergence = numa_hint_faults_local / numa_hint_faults where this is measured for the last N number of numa hints recorded. When the workload is fully converged the value is 1. This can measure if the placement policy is converging and how fast it is doing it. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:42:48 +00:00
Peter Zijlstra	4b96a29ba8	mm: sched: numa: Implement slow start for working set sampling Add a 1 second delay before starting to scan the working set of a task and starting to balance it amongst nodes. [ note that before the constant per task WSS sampling rate patch the initial scan would happen much later still, in effect that patch caused this regression. ] The theory is that short-run tasks benefit very little from NUMA placement: they come and go, and they better stick to the node they were started on. As tasks mature and rebalance to other CPUs and nodes, so does their NUMA placement have to change and so does it start to matter more and more. In practice this change fixes an observable kbuild regression: # [ a perf stat --null --repeat 10 test of ten bzImage builds to /dev/shm ] !NUMA: 45.291088843 seconds time elapsed ( +- 0.40% ) 45.154231752 seconds time elapsed ( +- 0.36% ) +NUMA, no slow start: 46.172308123 seconds time elapsed ( +- 0.30% ) 46.343168745 seconds time elapsed ( +- 0.25% ) +NUMA, 1 sec slow start: 45.224189155 seconds time elapsed ( +- 0.25% ) 45.160866532 seconds time elapsed ( +- 0.17% ) and it also fixes an observable perf bench (hackbench) regression: # perf stat --null --repeat 10 perf bench sched messaging -NUMA: -NUMA: 0.246225691 seconds time elapsed ( +- 1.31% ) +NUMA no slow start: 0.252620063 seconds time elapsed ( +- 1.13% ) +NUMA 1sec delay: 0.248076230 seconds time elapsed ( +- 1.35% ) The implementation is simple and straightforward, most of the patch deals with adding the /proc/sys/kernel/numa_balancing_scan_delay_ms tunable knob. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> [ Wrote the changelog, ran measurements, tuned the default. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:42:47 +00:00
Peter Zijlstra	6e5fb223e8	mm: sched: numa: Implement constant, per task Working Set Sampling (WSS) rate Previously, to probe the working set of a task, we'd use a very simple and crude method: mark all of its address space PROT_NONE. That method has various (obvious) disadvantages: - it samples the working set at dissimilar rates, giving some tasks a sampling quality advantage over others. - creates performance problems for tasks with very large working sets - over-samples processes with large address spaces but which only very rarely execute Improve that method by keeping a rotating offset into the address space that marks the current position of the scan, and advance it by a constant rate (in a CPU cycles execution proportional manner). If the offset reaches the last mapped address of the mm then it then it starts over at the first address. The per-task nature of the working set sampling functionality in this tree allows such constant rate, per task, execution-weight proportional sampling of the working set, with an adaptive sampling interval/frequency that goes from once per 100ms up to just once per 8 seconds. The current sampling volume is 256 MB per interval. As tasks mature and converge their working set, so does the sampling rate slow down to just a trickle, 256 MB per 8 seconds of CPU time executed. This, beyond being adaptive, also rate-limits rarely executing systems and does not over-sample on overloaded systems. [ In AutoNUMA speak, this patch deals with the effective sampling rate of the 'hinting page fault'. AutoNUMA's scanning is currently rate-limited, but it is also fundamentally single-threaded, executing in the knuma_scand kernel thread, so the limit in AutoNUMA is global and does not scale up with the number of CPUs, nor does it scan tasks in an execution proportional manner. So the idea of rate-limiting the scanning was first implemented in the AutoNUMA tree via a global rate limit. This patch goes beyond that by implementing an execution rate proportional working set sampling rate that is not implemented via a single global scanning daemon. ] [ Dan Carpenter pointed out a possible NULL pointer dereference in the first version of this patch. ] Based-on-idea-by: Andrea Arcangeli <aarcange@redhat.com> Bug-Found-By: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> [ Wrote changelog and fixed bug. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:42:46 +00:00
Peter Zijlstra	cbee9f88ec	mm: numa: Add fault driven placement and migration NOTE: This patch is based on "sched, numa, mm: Add fault driven placement and migration policy" but as it throws away all the policy to just leave a basic foundation I had to drop the signed-offs-by. This patch creates a bare-bones method for setting PTEs pte_numa in the context of the scheduler that when faulted later will be faulted onto the node the CPU is running on. In itself this does nothing useful but any placement policy will fundamentally depend on receiving hints on placement from fault context and doing something intelligent about it. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:42:45 +00:00
Mel Gorman	a720094ded	mm: mempolicy: Hide MPOL_NOOP and MPOL_MF_LAZY from userspace for now The use of MPOL_NOOP and MPOL_MF_LAZY to allow an application to explicitly request lazy migration is a good idea but the actual API has not been well reviewed and once released we have to support it. For now this patch prevents an application using the services. This will need to be revisited. Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:44 +00:00
Mel Gorman	4b10e7d562	mm: mempolicy: Implement change_prot_numa() in terms of change_protection() This patch converts change_prot_numa() to use change_protection(). As pte_numa and friends check the PTE bits directly it is necessary for change_protection() to use pmd_mknuma(). Hence the required modifications to change_protection() are a little clumsy but the end result is that most of the numa page table helpers are just one or two instructions. Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:44 +00:00
Lee Schermerhorn	b24f53a0be	mm: mempolicy: Add MPOL_MF_LAZY NOTE: Once again there is a lot of patch stealing and the end result is sufficiently different that I had to drop the signed-offs. Will re-add if the original authors are ok with that. This patch adds another mbind() flag to request "lazy migration". The flag, MPOL_MF_LAZY, modifies MPOL_MF_MOVE* such that the selected pages are marked PROT_NONE. The pages will be migrated in the fault path on "first touch", if the policy dictates at that time. "Lazy Migration" will allow testing of migrate-on-fault via mbind(). Also allows applications to specify that only subsequently touched pages be migrated to obey new policy, instead of all pages in range. This can be useful for multi-threaded applications working on a large shared data area that is initialized by an initial thread resulting in all pages on one [or a few, if overflowed] nodes. After PROT_NONE, the pages in regions assigned to the worker threads will be automatically migrated local to the threads on 1st touch. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:42:43 +00:00
Mel Gorman	4daae3b4b9	mm: mempolicy: Use _PAGE_NUMA to migrate pages Note: Based on "mm/mpol: Use special PROT_NONE to migrate pages" but sufficiently different that the signed-off-bys were dropped Combine our previous _PAGE_NUMA, mpol_misplaced and migrate_misplaced_page() pieces into an effective migrate on fault scheme. Note that (on x86) we rely on PROT_NONE pages being !present and avoid the TLB flush from try_to_unmap(TTU_MIGRATION). This greatly improves the page-migration performance. Based-on-work-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:42 +00:00
Peter Zijlstra	7039e1dbec	mm: migrate: Introduce migrate_misplaced_page() Note: This was originally based on Peter's patch "mm/migrate: Introduce migrate_misplaced_page()" but borrows extremely heavily from Andrea's "autonuma: memory follows CPU algorithm and task/mm_autonuma stats collection". The end result is barely recognisable so signed-offs had to be dropped. If original authors are ok with it, I'll re-add the signed-off-bys. Add migrate_misplaced_page() which deals with migrating pages from faults. Based-on-work-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Based-on-work-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Based-on-work-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:42:41 +00:00
Lee Schermerhorn	771fb4d806	mm: mempolicy: Check for misplaced page This patch provides a new function to test whether a page resides on a node that is appropriate for the mempolicy for the vma and address where the page is supposed to be mapped. This involves looking up the node where the page belongs. So, the function returns that node so that it may be used to allocated the page without consulting the policy again. A subsequent patch will call this function from the fault path. Because of this, I don't want to go ahead and allocate the page, e.g., via alloc_page_vma() only to have to free it if it has the correct policy. So, I just mimic the alloc_page_vma() node computation logic--sort of. Note: we could use this function to implement a MPOL_MF_STRICT behavior when migrating pages to match mbind() mempolicy--e.g., to ensure that pages in an interleaved range are reinterleaved rather than left where they are when they reside on any page in the interleave nodemask. Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> [ Added MPOL_F_LAZY to trigger migrate-on-fault; simplified code now that we don't have to bother with special crap for interleaved ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:41 +00:00
Lee Schermerhorn	d3a710337b	mm: mempolicy: Add MPOL_NOOP This patch augments the MPOL_MF_LAZY feature by adding a "NOOP" policy to mbind(). When the NOOP policy is used with the 'MOVE and 'LAZY flags, mbind() will map the pages PROT_NONE so that they will be migrated on the next touch. This allows an application to prepare for a new phase of operation where different regions of shared storage will be assigned to worker threads, w/o changing policy. Note that we could just use "default" policy in this case. However, this also allows an application to request that pages be migrated, only if necessary, to follow any arbitrary policy that might currently apply to a range of pages, without knowing the policy, or without specifying multiple mbind()s for ranges with different policies. [ Bug in early version of mpol_parse_str() reported by Fengguang Wu. ] Bug-Reported-by: Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:40 +00:00
Peter Zijlstra	479e2802d0	mm: mempolicy: Make MPOL_LOCAL a real policy Make MPOL_LOCAL a real and exposed policy such that applications that relied on the previous default behaviour can explicitly request it. Requested-by: Christoph Lameter <cl@linux.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:39 +00:00
Mel Gorman	d10e63f294	mm: numa: Create basic numa page hinting infrastructure Note: This patch started as "mm/mpol: Create special PROT_NONE infrastructure" and preserves the basic idea but steals very heavily from "autonuma: numa hinting page faults entry points" for the actual fault handlers without the migration parts. The end result is barely recognisable as either patch so all Signed-off and Reviewed-bys are dropped. If Peter, Ingo and Andrea are ok with this version, I will re-add the signed-offs-by to reflect the history. In order to facilitate a lazy -- fault driven -- migration of pages, create a special transient PAGE_NUMA variant, we can then use the 'spurious' protection faults to drive our migrations from. The meaning of PAGE_NUMA depends on the architecture but on x86 it is effectively PROT_NONE. Actual PROT_NONE mappings will not generate these NUMA faults for the reason that the page fault code checks the permission on the VMA (and will throw a segmentation fault on actual PROT_NONE mappings), before it ever calls handle_mm_fault. [dhillf@gmail.com: Fix typo] Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:42:39 +00:00
Andrea Arcangeli	0b9d705297	mm: numa: Support NUMA hinting page faults from gup/gup_fast Introduce FOLL_NUMA to tell follow_page to check pte/pmd_numa. get_user_pages must use FOLL_NUMA, and it's safe to do so because it always invokes handle_mm_fault and retries the follow_page later. KVM secondary MMU page faults will trigger the NUMA hinting page faults through gup_fast -> get_user_pages -> follow_page -> handle_mm_fault. Other follow_page callers like KSM should not use FOLL_NUMA, or they would fail to get the pages if they use follow_page instead of get_user_pages. [ This patch was picked up from the AutoNUMA tree. ] Originally-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> [ ported to this tree. ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:42:37 +00:00
Andrea Arcangeli	be3a728427	mm: numa: pte_numa() and pmd_numa() Implement pte_numa and pmd_numa. We must atomically set the numa bit and clear the present bit to define a pte_numa or pmd_numa. Once a pte or pmd has been set as pte_numa or pmd_numa, the next time a thread touches a virtual address in the corresponding virtual range, a NUMA hinting page fault will trigger. The NUMA hinting page fault will clear the NUMA bit and set the present bit again to resolve the page fault. The expectation is that a NUMA hinting page fault is used as part of a placement policy that decides if a page should remain on the current node or migrated to a different node. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de>	2012-12-11 14:42:36 +00:00
Mel Gorman	397487db69	mm: compaction: Add scanned and isolated counters for compaction Compaction already has tracepoints to count scanned and isolated pages but it requires that ftrace be enabled and if that information has to be written to disk then it can be disruptive. This patch adds vmstat counters for compaction called compact_migrate_scanned, compact_free_scanned and compact_isolated. With these counters, it is possible to define a basic cost model for compaction. This approximates of how much work compaction is doing and can be compared that with an oprofile showing TLB misses and see if the cost of compaction is being offset by THP for example. Minimally a compaction patch can be evaluated in terms of whether it increases or decreases cost. The basic cost model looks like this Fundamental unit u: a word sizeof(void ) Ca = cost of struct page access = sizeof(struct page) / u Cmc = Cost migrate page copy = (Ca + PAGE_SIZE/u) 2 Cmf = Cost migrate failure = Ca * 2 Ci = Cost page isolation = (Ca + Wi) where Wi is a constant that should reflect the approximate cost of the locking operation. Csm = Cost migrate scanning = Ca Csf = Cost free scanning = Ca Overall cost = (Csm * compact_migrate_scanned) + (Csf * compact_free_scanned) + (Ci * compact_isolated) + (Cmc * pgmigrate_success) + (Cmf * pgmigrate_failed) Where the values are read from /proc/vmstat. This is very basic and ignores certain costs such as the allocation cost to do a migrate page copy but any improvement to the model would still use the same vmstat counters. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:28:35 +00:00
Mel Gorman	7b2a2d4a18	mm: migrate: Add a tracepoint for migrate_pages The pgmigrate_success and pgmigrate_fail vmstat counters tells the user about migration activity but not the type or the reason. This patch adds a tracepoint to identify the type of page migration and why the page is being migrated. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:28:35 +00:00
Mel Gorman	5647bc293a	mm: compaction: Move migration fail/success stats to migrate.c The compact_pages_moved and compact_pagemigrate_failed events are convenient for determining if compaction is active and to what degree migration is succeeding but it's at the wrong level. Other users of migration may also want to know if migration is working properly and this will be particularly true for any automated NUMA migration. This patch moves the counters down to migration with the new events called pgmigrate_success and pgmigrate_fail. The compact_blocks_moved counter is removed because while it was useful for debugging initially, it's worthless now as no meaningful conclusions can be drawn from its value. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>	2012-12-11 14:28:35 +00:00
Peter Zijlstra	7da4d641c5	mm: Count the number of pages affected in change_protection() This will be used for three kinds of purposes: - to optimize mprotect() - to speed up working set scanning for working set areas that have not been touched - to more accurately scan per real working set No change in functionality from this patch. Suggested-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-11 14:28:34 +00:00
Rik van Riel	2c3cf556b2	x86/mm: Introduce pte_accessible() We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that the pte is associated with a page. However, for TLB flushing purposes, we would like to know whether the pte points to an actually accessible page. This allows us to skip remote TLB flushes for pages that are not actually accessible. Fill in this method for x86 and provide a safe (but slower) method on other architectures. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixed-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-66p11te4uj23gevgh4j987ip@git.kernel.org [ Added Linus's review fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>	2012-12-11 14:28:34 +00:00
Mauro Carvalho Chehab	77c53d0b56	Merge branch 'for_3.8-rc1' into v4l_for_linus * for_3.8-rc1: (243 commits) [media] omap3isp: Replace cpu_is_omap3630() with ISP revision check [media] omap3isp: Prepare/unprepare clocks before/after enable/disable [media] omap3isp: preview: Add support for 8-bit formats at the sink pad [media] omap3isp: Replace printk with dev_* [media] omap3isp: Find source pad from external entity [media] omap3isp: Configure CSI-2 phy based on platform data [media] omap3isp: Add PHY routing configuration [media] omap3isp: Add CSI configuration registers from control block to ISP resources [media] omap3isp: Remove unneeded module memory address definitions [media] omap3isp: Use monotonic timestamps for statistics buffers [media] uvcvideo: Fix control value clamping for unsigned integer controls [media] uvcvideo: Mark first output terminal as default video node [media] uvcvideo: Add VIDIOC_[GS]_PRIORITY support [media] uvcvideo: Return -ENOTTY for unsupported ioctls [media] uvcvideo: Set device_caps in VIDIOC_QUERYCAP [media] uvcvideo: Don't fail when an unsupported format is requested [media] uvcvideo: Return -EACCES when trying to access a read/write-only control [media] uvcvideo: Set error_idx properly for extended controls API failures [media] rtl28xxu: add NOXON DAB/DAB+ USB dongle rev 2 [media] fc2580: write some registers conditionally ...	2012-12-11 11:28:37 -02:00
Vitaly Andrianov	4009793e15	drivers: cma: represent physical addresses as phys_addr_t This commit changes the CMA early initialization code to use phys_addr_t for representing physical addresses instead of unsigned long. Without this change, among other things, dma_declare_contiguous() simply discards any memory regions whose address is not representable as unsigned long. This is a problem on 32-bit PAE machines where unsigned long is 32-bit but physical address space is larger. Signed-off-by: Vitaly Andrianov <vitalya@ti.com> Signed-off-by: Cyril Chemparathy <cyril@ti.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>	2012-12-11 09:28:09 +01:00
Mike Turquette	7c045a55c9	clk: introduce optional disable_unused callback Some gate clocks have special needs which must be handled during the disable-unused clocks sequence. These needs might be driven by software due to the fact that we're disabling a clock outside of the normal clk_disable path and a clk's enable_count will not be accurate. On the other hand a specific hardware programming sequence might need to be followed for this corner case. This change is needed for the upcoming OMAP port to the common clock framework. Specifically, it is undesirable to treat the disable-unused path identically to the normal clk_disable path since other software layers are involved. In this case OMAP's clockdomain code throws WARNs and bails early due to the clock's enable_count being set to zero. A custom callback mitigates this problem nicely. Cc: Paul Walmsley <paul@pwsan.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Mike Turquette <mturquette@linaro.org>	2012-12-10 22:35:02 -08:00
Mark Brown	8e24a6e696	Merge remote-tracking branch 'regmap/topic/table' into regmap-next	2012-12-11 12:39:32 +09:00
Mark Brown	db760fbecd	Merge remote-tracking branch 'regmap/topic/lock' into regmap-next	2012-12-11 12:39:30 +09:00
Mark Brown	4d348e6e0a	Merge remote-tracking branch 'regmap/topic/domain' into regmap-next	2012-12-11 12:39:29 +09:00
Mark Brown	bcf86687d6	Merge remote-tracking branch 'regmap/topic/debugfs' into regmap-next	2012-12-11 12:39:20 +09:00
Russell King	0b99cb7310	Merge branches 'cache-l2x0', 'fixes', 'hdrs', 'misc', 'mmci', 'vic' and 'warnings' into for-next	2012-12-11 00:20:18 +00:00
Bjorn Helgaas	1cb73f8c47	Merge branch 'pci/mjg-pci-roms-from-efi' into next * pci/mjg-pci-roms-from-efi: PCI: Use phys_addr_t for physical ROM address	2012-12-10 16:20:12 -07:00
Gabor Juhos	ab5c4f71d8	ath9k: allow to load EEPROM content via firmware API The calibration data for devices w/o a separate EEPROM chip can be specified via the 'eeprom_data' field of 'ath9k_platform_data'. The 'eeprom_data' is usually filled from board specific setup functions. It is easy if the EEPROM data is mapped to the memory, but it can be complicated if it is stored elsewhere. The patch adds support for loading of the EEPROM data via the firmware API to avoid this limitation. Signed-off-by: Gabor Juhos <juhosg@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2012-12-10 15:49:57 -05:00
Linus Torvalds	caf491916b	Revert "revert "Revert "mm: remove __GFP_NO_KSWAPD""" and associated damage This reverts commits a50915394f1fc02c2861d3b7ce7014788aa5066e and d7c3b937bdf45f0b844400b7bf6fd3ed50bac604. This is a revert of a revert of a revert. In addition, it reverts the even older i915 change to stop using the __GFP_NO_KSWAPD flag due to the original commits in linux-next. It turns out that the original patch really was bogus, and that the original revert was the correct thing to do after all. We thought we had fixed the problem, and then reverted the revert, but the problem really is fundamental: waking up kswapd simply isn't the right thing to do, and direct reclaim sometimes simply _is_ the right thing to do. When certain allocations fail, we simply should try some direct reclaim, and if that fails, fail the allocation. That's the right thing to do for THP allocations, which can easily fail, and the GPU allocations want to do that too. So starting kswapd is sometimes simply wrong, and removing the flag that said "don't start kswapd" was a mistake. Let's hope we never revisit this mistake again - and certainly not this many times ;) Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2012-12-10 11:03:05 -08:00
Bjorn Helgaas	dbd3fc3345	PCI: Use phys_addr_t for physical ROM address Use phys_addr_t rather than "void *" for physical memory address. This removes casts and fixes a "cast from pointer to integer of different size" warning on ppc44x_defconfig. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>	2012-12-10 11:24:42 -07:00

... 2 3 4 5 6 ...

55271 Commits