5621 Commits

Author SHA1 Message Date
Masahiro Yamada
89d790ab31 scripts/spelling.txt: add "algined" pattern and fix typo instances
Fix typos and add the following to the scripts/spelling.txt:

  algined||aligned

While we are here, fix the "appplication" in the touched line in
drivers/block/loop.c.  Also, fix the "may not naturally ..." to
"may not be naturally ..." in the touched line in mm/page_alloc.

Link: http://lkml.kernel.org/r/1481573103-11329-9-git-send-email-yamada.masahiro@socionext.com
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-27 18:43:46 -08:00
Christoph Hellwig
ad71473d9c virtio_blk: use virtio IRQ affinity
Use automatic IRQ affinity assignment in the virtio layer if available,
and build the blk-mq queues based on it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:05 +02:00
Christoph Hellwig
fb5e31d970 virtio: allow drivers to request IRQ affinity when creating VQs
Add a struct irq_affinity pointer to the find_vqs methods, which if set
is used to tell the PCI layer to create the MSI-X vectors for our I/O
virtqueues with the proper affinity from the start.  Compared to after
the fact affinity hints this gives us an instantly working setup and
allows to allocate the irq descritors node-local and avoid interconnect
traffic.  Last but not least this will allow blk-mq queues are created
based on the interrupt affinity for storage drivers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2017-02-27 20:54:04 +02:00
Linus Torvalds
7b46588f36 Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:

 - almost all of the rest of MM

 - misc bits

 - KASAN updates

 - procfs

 - lib/ updates

 - checkpatch updates

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (124 commits)
  checkpatch: remove false unbalanced braces warning
  checkpatch: notice unbalanced else braces in a patch
  checkpatch: add another old address for the FSF
  checkpatch: update $logFunctions
  checkpatch: warn on logging continuations
  checkpatch: warn on embedded function names
  lib/lz4: remove back-compat wrappers
  fs/pstore: fs/squashfs: change usage of LZ4 to work with new LZ4 version
  crypto: change LZ4 modules to work with new LZ4 module version
  lib/decompress_unlz4: change module to work with new LZ4 module version
  lib: update LZ4 compressor module
  lib/test_sort.c: make it explicitly non-modular
  lib: add CONFIG_TEST_SORT to enable self-test of sort()
  rbtree: use designated initializers
  linux/kernel.h: fix DIV_ROUND_CLOSEST to support negative divisors
  lib/find_bit.c: micro-optimise find_next_*_bit
  lib: add module support to atomic64 tests
  lib: add module support to glob tests
  lib: add module support to crc32 tests
  kernel/ksysfs.c: add __ro_after_init to bin_attribute structure
  ...
2017-02-25 10:29:09 -08:00
zhouxianrong
8e19d540d1 zram: extend zero pages to same element pages
The idea is that without doing more calculations we extend zero pages to
same element pages for zram.  zero page is special case of same element
page with zero element.

1. the test is done under android 7.0
2. startup too many applications circularly
3. sample the zero pages, same pages (none-zero element)
   and total pages in function page_zero_filled

the result is listed as below:

ZERO	SAME	TOTAL
36214	17842	598196

		ZERO/TOTAL	 SAME/TOTAL	  (ZERO+SAME)/TOTAL ZERO/SAME
AVERAGE	0.060631909	 0.024990816  0.085622726		2.663825038
STDEV	0.00674612	 0.005887625  0.009707034		2.115881328
MAX		0.069698422	 0.030046087  0.094975336		7.56043956
MIN		0.03959586	 0.007332205  0.056055193		1.928985507

from the above data, the benefit is about 2.5% and up to 3% of total
swapout pages.

The defect of the patch is that when we recovery a page from non-zero
element the operations are low efficient for partial read.

This patch extends zero_page to same_page so if there is any user to
have monitored zero_pages, he will be surprised if the number is
increased but it's not harmful, I believe.

[minchan@kernel.org: do not free same element pages in zram_meta_free]
  Link: http://lkml.kernel.org/r/20170207065741.GA2567@bbox
Link: http://lkml.kernel.org/r/1483692145-75357-1-git-send-email-zhouxianrong@huawei.com
Link: http://lkml.kernel.org/r/1486307804-27903-1-git-send-email-minchan@kernel.org
Signed-off-by: zhouxianrong <zhouxianrong@huawei.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:56 -08:00
Minchan Kim
a09759acaa zram: remove waitqueue for IO done
zram_reset_device() waits for ongoing writepage pages to be completed by
zram->refcount logic.  However, it's pointless because before the reset,
we prevent further opening of zram by zram->claim and flush all of
pending IO by fsync_bdev so there should be no pending IO at the
zram_reset_device().

So let's remove that code which is even broken due to the lack of
wake_up elsewhere.

Link: http://lkml.kernel.org/r/1485145031-11661-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-24 17:46:54 -08:00
Linus Torvalds
f8e6859ea9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc updates from David Miller:

 1) Support multiple huge page sizes, from Nitin Gupta.

 2) Improve boot time on large memory configurations, from Pavel
    Tatashin.

 3) Make BRK handling more consistent and documented, from Vijay Kumar.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc64: Fix build error in flush_tsb_user_page
  sparc64: memblock resizes are not handled properly
  sparc64: use latency groups to improve add_node_ranges speed
  sparc64: Add 64K page size support
  sparc64: Multi-page size support
  Documentation/sparc: Steps for sending break on sunhv console
  sparc64: Send break twice from console to return to boot prom
  sparc64: Migrate hvcons irq to panicked cpu
  sparc64: Set cpu state to offline when stopped
  sunvdc: Add support for setting physical sector size
  sparc64: fix for user probes in high memory
  sparc: topology_64.h: Fix condition for including cpudata.h
  sparc32: mm: srmmu: add __ro_after_init to sparc32_cachetlb_ops structures
2017-02-24 15:01:17 -08:00
Linus Torvalds
1802979ab1 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block updates and fixes from Jens Axboe:

 - NVMe updates and fixes that missed the first pull request. This
   includes bug fixes, and support for autonomous power management.

 - Fix from Christoph for missing clear of the request payload, causing
   a problem with (at least) the storvsc driver.

 - Further fixes for the queue/bdi life time issues from Jan.

 - The Kconfig mq scheduler update from me.

 - Fixing a use-after-free in dm-rq, spotted by Bart, introduced in this
   merge window.

 - Three fixes for nbd from Josef.

 - Bug fix from Omar, fixing a bug in sas transport code that oopses
   when bsg ioctls were used. From Omar.

 - Improvements to the queue restart and tag wait from from Omar.

 - Set of fixes for the sed/opal code from Scott.

 - Three trivial patches to cciss from Tobin

* 'for-linus' of git://git.kernel.dk/linux-block: (41 commits)
  dm-rq: don't dereference request payload after ending request
  blk-mq-sched: separate mark hctx and queue restart operations
  blk-mq: use sbq wait queues instead of restart for driver tags
  block/sed-opal: Propagate original error message to userland.
  nvme/pci: re-check security protocol support after reset
  block/sed-opal: Introduce free_opal_dev to free the structure and clean up state
  nvme: detect NVMe controller in recent MacBooks
  nvme-rdma: add support for host_traddr
  nvmet-rdma: Fix error handling
  nvmet-rdma: use nvme cm status helper
  nvme-rdma: move nvme cm status helper to .h file
  nvme-fc: don't bother to validate ioccsz and iorcsz
  nvme/pci: No special case for queue busy on IO
  nvme/core: Fix race kicking freed request_queue
  nvme/pci: Disable on removal when disconnected
  nvme: Enable autonomous power state transitions
  nvme: Add a quirk mechanism that uses identify_ctrl
  nvme: make nvmf_register_transport require a create_ctrl callback
  nvme: Use CNS as 8-bit field and avoid endianness conversion
  nvme: add semicolon in nvme_command setting
  ...
2017-02-24 14:13:34 -08:00
Ilya Dryomov
54ea0046b6 libceph, rbd, ceph: WRITE | ONDISK -> WRITE
CEPH_OSD_FLAG_ONDISK is set in account_request().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2017-02-24 19:04:57 +01:00
Don Brace
ebe73647fb scsi: cciss: correct check map error.
Remove device driver failed to check map error messages

Reported-by: Johnny Bieren <jbieren@redhat.com>
Tested-by: Johnny Bieren <jbieren@redhat.com>
Reviewed-by: Scott Teel <scott.teel@microsemi.com>
Signed-off-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2017-02-23 17:10:32 -05:00
Liam R. Howlett
f41e54616c sunvdc: Add support for setting physical sector size
Physical sector size is supported in v1.2 of the vDisk protocol and
should be set if available.  If protocol version 1.2 is used and the
physical disk size is unavailable, then the disk is considered busy.

Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-02-23 08:24:08 -08:00
Sergey Senozhatsky
c87d1655c2 zram: remove obsolete sysfs attrs
We had a deprecated_attr_warn() warning for 2 years and now the time has
come and we finally can do the cleanup.

The plan was as follows:

: per-stat sysfs attributes are considered to be deprecated.
: The basic strategy is:
: -- the existing RW nodes will be downgraded to WO nodes (in linux 4.11)
: -- deprecated RO sysfs nodes will eventually be removed (in linux 4.11)
:
: The list of deprecated attributes can be found here:
: Documentation/ABI/obsolete/sysfs-block-zram
:
: Basically, every attribute that has its own read accessible sysfs
: node (e.g. num_reads) *AND* is accessible via one of the stat files
: (zram<id>/stat or zram<id>/io_stat or zram<id>/mm_stat) is considered
: to be deprecated.

The patch also removes `obsolete/sysfs-block-zram', clean ups
`testing/sysfs-block-zram' and tweaks zram.txt files.

Link: http://lkml.kernel.org/r/20170118035838.11090-1-sergey.senozhatsky@gmail.com
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-02-22 16:41:30 -08:00
Tobin C. Harding
48efbfbf63 cciss: Remove kmalloc cast
Coccinelle emits a warning about casting the return value of
kmalloc(). Coccinelle suggests removing the cast as do
kerneljanitors.

Remove cast from kmalloc() call.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-22 11:54:49 -07:00
Tobin C. Harding
19a5e10c3b cciss: Fix checkpatch OPEN_BRACE
Checkpatch emits ERROR:OPEN_BRACE: that open brace { should be on the
previous line.

Move open brace to new line. Also add space after if/switch statement
since we introduce more checkpatch errors if not fixed at the same
time.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-22 11:54:49 -07:00
Tobin C. Harding
f088963845 cciss: Fix checkpatch TRAILING_WHITESPACE
Checkpatch emits 85 trailing whitespace warnings.

Remove trailing whitespace.

Signed-off-by: Tobin C. Harding <me@tobin.cc>
Acked-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-22 11:54:49 -07:00
Linus Torvalds
252b95c0ed xen: features and fixes for 4.11-rc0
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJYrElpAAoJELDendYovxMvNFQH/RJU7lwDSf7rF7ZzFGvdcsfi
 T4DDuowkYoJm2+GypoRVzZZ0lxJlxr0mNKPvgGDvuTogMY7pvjAf6B7/xCvTFsNU
 UoO2I7ljgXxCXFRiXH50nAjS7PC2PFW3Qx+8XPIWeZmnUPeJi4Q43fiSloUt+a6l
 JgS/autOCflGasR5MihCZXkvdVF81K6GuEd3hCh9GKZ/8RiwNPaY50vHnMv/hfqq
 SJNRKOTSRXioYlTohLnjuPWHDMayRJEO48IXl3c7aNxDTkHjn78yaoPhJ7m+M0Bq
 s2GyaQA4tCwADhP+tilmI5H1vFpt6w9x7O0dgWiSm7TB91lwfZOen2WhTPPp6L8=
 =gktV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.11-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen updates from Juergen Gross:
 "Xen features and fixes:

   - a series from Boris Ostrovsky adding support for booting Linux as
     Xen PVH guest

   - a series from Juergen Gross streamlining the xenbus driver

   - a series from Paul Durrant adding support for the new device model
     hypercall

   - several small corrections"

* tag 'for-linus-4.11-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/privcmd: add IOCTL_PRIVCMD_RESTRICT
  xen/privcmd: Add IOCTL_PRIVCMD_DM_OP
  xen/privcmd: return -ENOTTY for unimplemented IOCTLs
  xen: optimize xenbus driver for multiple concurrent xenstore accesses
  xen: modify xenstore watch event interface
  xen: clean up xenbus internal headers
  xenbus: Neaten xenbus_va_dev_error
  xen/pvh: Use Xen's emergency_restart op for PVH guests
  xen/pvh: Enable CPU hotplug
  xen/pvh: PVH guests always have PV devices
  xen/pvh: Initialize grant table for PVH guests
  xen/pvh: Make sure we don't use ACPI_IRQ_MODEL_PIC for SCI
  xen/pvh: Bootstrap PVH guest
  xen/pvh: Import PVH-related Xen public interfaces
  xen/x86: Remove PVH support
  x86/boot/32: Convert the 32-bit pgtable setup code from assembly to C
  xen/manage: correct return value check on xenbus_scanf()
  x86/xen: Fix APIC id mismatch warning on Intel
  xen/netback: set default upper limit of tx/rx queues to 8
  xen/netfront: set default upper limit of tx/rx queues to 8
2017-02-21 13:53:41 -08:00
Josef Bacik
6330a2d0b4 nbd: cleanup workqueue on error properly
If we fail to register the blockdev we need to make sure to destroy the
recv workqueue.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-21 12:51:54 -07:00
Josef Bacik
e544541b07 nbd: set the logical and physical blocksize properly
We noticed when trying to do O_DIRECT to an export on the server side
that we were getting requests smaller than the 4k sectorsize of the
device.  This is because the client isn't setting the logical and
physical blocksizes properly for the underlying device.  Fix this up by
setting the queue blocksizes and then calling bd_set_size.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-21 12:51:54 -07:00
Josef Bacik
9442b73920 nbd: cleanup ioctl handling
Break the ioctl handling out into helper functions, some of these things
are getting pretty big and unwieldy.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-21 12:51:54 -07:00
Linus Torvalds
cdc194705d SCSI misc on 20170220
This update includes the usual round of major driver updates (ncr5380,
 ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
 megaraid_sas, ).  There's also an assortment of minor fixes and the
 major update of switching a bunch of drivers to pci_alloc_irq_vectors
 from Christoph.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.vnet.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJYq5adAAoJEAVr7HOZEZN4bjUP/Atk7CSZVnC75pcYmncbEGCx
 ysOlEHK4uW2HhiAYk3PlYMk+pKrMHet2zsbbM9PHJfopdOHZ7Sq1+UZZVeqE1Zun
 8pe0NhON+fZx7XAnevdEvnSSULQZ+AGfjZO72iUwkJiN3ozYaFtCITOyn49l4GpR
 ra9emskBh7CQOFW2voGn1AKeDijPYGx3+TO4AUrWjVMiByR06gb1bmImx+ljiUrs
 jzRJPfrt90ORcTdpMateyN2EXxudcASMhX03SJ6fRI84hPAhMCROMbTv8RnzOTE4
 DPbnvbYUowlHt43iUhJHSwGdkRRaRBnkzQENBp1fNrNzZgF6vB7+kShxbonrYB2p
 gC4ewaJr0BNj+HsUnvTpe3WseiPOcfsnBsKilPLKBlm2dCKEXqFox/dj/T1uexxg
 HoyFrl3u8fyEqVHrzRS4M9t/njWh0NFmXxb0wBdj+lkVFTRErGSKQ8SfOqshuSGs
 P8NN88jy8vC7uqgzKBJ+UH3ehzn3qfBxasFHIC/e2awY9FqKjHGTxKMmSVpjXVxy
 wCvE2FQ3k/qEj2XSM6f7/NGytlSOlju5q1rFtHPW2M+TFSh0LJWCnmVjR/Zle9em
 pBWmtIgCv8W5b41zL2H94nLWAZbfdrrNU/XnX88l47LKnmorte/PGhpxu36NEsMS
 VCgreQmFMdMRY+WzDWl1
 =cBQx
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 "This update includes the usual round of major driver updates (ncr5380,
  ufs, lpfc, be2iscsi, hisi_sas, storvsc, cxlflash, aacraid,
  megaraid_sas, ...).

  There's also an assortment of minor fixes and the major update of
  switching a bunch of drivers to pci_alloc_irq_vectors from Christoph"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (188 commits)
  scsi: megaraid_sas: handle dma_addr_t right on 32-bit
  scsi: megaraid_sas: array overflow in megasas_dump_frame()
  scsi: snic: switch to pci_irq_alloc_vectors
  scsi: megaraid_sas: driver version upgrade
  scsi: megaraid_sas: Change RAID_1_10_RMW_CMDS to RAID_1_PEER_CMDS and set value to 2
  scsi: megaraid_sas: Indentation and smatch warning fixes
  scsi: megaraid_sas: Cleanup VD_EXT_DEBUG and SPAN_DEBUG related debug prints
  scsi: megaraid_sas: Increase internal command pool
  scsi: megaraid_sas: Use synchronize_irq to wait for IRQs to complete
  scsi: megaraid_sas: Bail out the driver load if ld_list_query fails
  scsi: megaraid_sas: Change build_mpt_mfi_pass_thru to return void
  scsi: megaraid_sas: During OCR, if get_ctrl_info fails do not continue with OCR
  scsi: megaraid_sas: Do not set fp_possible if TM capable for non-RW syspdIO, change fp_possible to bool
  scsi: megaraid_sas: Remove unused pd_index from megasas_build_ld_nonrw_fusion
  scsi: megaraid_sas: megasas_return_cmd does not memset IO frame to zero
  scsi: megaraid_sas: max_fw_cmds are decremented twice, remove duplicate
  scsi: megaraid_sas: update can_queue only if the new value is less
  scsi: megaraid_sas: Change max_cmd from u32 to u16 in all functions
  scsi: megaraid_sas: set pd_after_lb from MR_BuildRaidContext and initialize pDevHandle to MR_DEVHANDLE_INVALID
  scsi: megaraid_sas: latest controller OCR capability from FW before sending shutdown DCMD
  ...
2017-02-21 11:51:42 -08:00
Linus Torvalds
772c8f6f3b for-4.11/linus-merge-signed
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJYqeb8AAoJEPfTWPspceCmB3UP/3UtcPrzEm8w2cxB9MaWhZN3
 J+jiwlO4vaqhm2HVzQtoJqfaqRlud/iDx5cIXE2S7FnIM54ZKs3CANbKu8X+b1zm
 eJije3zMI8A8qyftigbz6a/Y2kWE4ZqFEc9WU5CWawfTl3ImCVUi8+F5X0wOLU/h
 r50zAQOEyURH4G5usNl9q0olF6FonJ82AcYm1iJ0QP2wYWZRJauC0rRn8IT93tyK
 bZPHnGKdkd7km8yi3zr2GNWOfuZZuA0HWAaF4qfrHPZQ883gITFAUIlFb1f+2TNl
 DkQzRrBB2wPWPnlbfb9KejMkvL94hflzsLb5rHt835DyVXFRyjxsgyAI8A+LPGSz
 vqZ3rsbWj6H4F9z2CkZ+T+AP/ZSWDNjwc0RXPm9HYdR5CDeTxIUVvnFQ44YNsmTv
 Xd5BKrUJ2oKegAxQG6zcuFx23p8JzhT70l+mNrMdtyeKnDD9FRdDvhKG9AHeTipn
 o/DnGivhS3UMQoQ7D68KOO+kuhLDeo7my5XGsnjzMO/iHqg++7IP2HyYYs/Ba4qZ
 cYaCtSDQW71Zt0vsqa6dvPuXBveu4h8Qh8R7uAGjSGS9IAFFb4Cab2tiUdISE6PE
 YnMWzY+G6pT8imlLVOL5/QFuo2Q4pUsaL0AHpXMCN9TZnQtbqXa8eqwnKnQ0m2KN
 7ut0IYYEPaYUX5xFn1K6
 =z7AL
 -----END PGP SIGNATURE-----

Merge tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:

 - blk-mq scheduling framework from me and Omar, with a port of the
   deadline scheduler for this framework. A port of BFQ from Paolo is in
   the works, and should be ready for 4.12.

 - Various fixups and improvements to the above scheduling framework
   from Omar, Paolo, Bart, me, others.

 - Cleanup of the exported sysfs blk-mq data into debugfs, from Omar.
   This allows us to export more information that helps debug hangs or
   performance issues, without cluttering or abusing the sysfs API.

 - Fixes for the sbitmap code, the scalable bitmap code that was
   migrated from blk-mq, from Omar.

 - Removal of the BLOCK_PC support in struct request, and refactoring of
   carrying SCSI payloads in the block layer. This cleans up the code
   nicely, and enables us to kill the SCSI specific parts of struct
   request, shrinking it down nicely. From Christoph mainly, with help
   from Hannes.

 - Support for ranged discard requests and discard merging, also from
   Christoph.

 - Support for OPAL in the block layer, and for NVMe as well. Mainly
   from Scott Bauer, with fixes/updates from various others folks.

 - Error code fixup for gdrom from Christophe.

 - cciss pci irq allocation cleanup from Christoph.

 - Making the cdrom device operations read only, from Kees Cook.

 - Fixes for duplicate bdi registrations and bdi/queue life time
   problems from Jan and Dan.

 - Set of fixes and updates for lightnvm, from Matias and Javier.

 - A few fixes for nbd from Josef, using idr to name devices and a
   workqueue deadlock fix on receive. Also marks Josef as the current
   maintainer of nbd.

 - Fix from Josef, overwriting queue settings when the number of
   hardware queues is updated for a blk-mq device.

 - NVMe fix from Keith, ensuring that we don't repeatedly mark and IO
   aborted, if we didn't end up aborting it.

 - SG gap merging fix from Ming Lei for block.

 - Loop fix also from Ming, fixing a race and crash between setting loop
   status and IO.

 - Two block race fixes from Tahsin, fixing request list iteration and
   fixing a race between device registration and udev device add
   notifiations.

 - Double free fix from cgroup writeback, from Tejun.

 - Another double free fix in blkcg, from Hou Tao.

 - Partition overflow fix for EFI from Alden Tondettar.

* tag 'for-4.11/linus-merge-signed' of git://git.kernel.dk/linux-block: (156 commits)
  nvme: Check for Security send/recv support before issuing commands.
  block/sed-opal: allocate struct opal_dev dynamically
  block/sed-opal: tone down not supported warnings
  block: don't defer flushes on blk-mq + scheduling
  blk-mq-sched: ask scheduler for work, if we failed dispatching leftovers
  blk-mq: don't special case flush inserts for blk-mq-sched
  blk-mq-sched: don't add flushes to the head of requeue queue
  blk-mq: have blk_mq_dispatch_rq_list() return if we queued IO or not
  block: do not allow updates through sysfs until registration completes
  lightnvm: set default lun range when no luns are specified
  lightnvm: fix off-by-one error on target initialization
  Maintainers: Modify SED list from nvme to block
  Move stack parameters for sed_ioctl to prevent oversized stack with CONFIG_KASAN
  uapi: sed-opal fix IOW for activate lsp to use correct struct
  cdrom: Make device operations read-only
  elevator: fix loading wrong elevator type for blk-mq devices
  cciss: switch to pci_irq_alloc_vectors
  block/loop: fix race between I/O and set_status
  blk-mq-sched: don't hold queue_lock when calling exit_icq
  block: set make_request_fn manually in blk_mq_update_nr_hw_queues
  ...
2017-02-21 10:57:33 -08:00
Linus Torvalds
42e1b14b6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Implement wraparound-safe refcount_t and kref_t types based on
     generic atomic primitives (Peter Zijlstra)

   - Improve and fix the ww_mutex code (Nicolai Hähnle)

   - Add self-tests to the ww_mutex code (Chris Wilson)

   - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
     Bueso)

   - Micro-optimize the current-task logic all around the core kernel
     (Davidlohr Bueso)

   - Tidy up after recent optimizations: remove stale code and APIs,
     clean up the code (Waiman Long)

   - ... plus misc fixes, updates and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  fork: Fix task_struct alignment
  locking/spinlock/debug: Remove spinlock lockup detection code
  lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
  lkdtm: Convert to refcount_t testing
  kref: Implement 'struct kref' using refcount_t
  refcount_t: Introduce a special purpose refcount type
  sched/wake_q: Clarify queue reinit comment
  sched/wait, rcuwait: Fix typo in comment
  locking/mutex: Fix lockdep_assert_held() fail
  locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
  locking/rwsem: Reinit wake_q after use
  locking/rwsem: Remove unnecessary atomic_long_t casts
  jump_labels: Move header guard #endif down where it belongs
  locking/atomic, kref: Implement kref_put_lock()
  locking/ww_mutex: Turn off __must_check for now
  locking/atomic, kref: Avoid more abuse
  locking/atomic, kref: Use kref_get_unless_zero() more
  locking/atomic, kref: Kill kref_sub()
  locking/atomic, kref: Add kref_read()
  locking/atomic, kref: Add KREF_INIT()
  ...
2017-02-20 13:23:30 -08:00
Miklos Szeredi
bb7462b6fd vfs: use helpers for calling f_op->{read,write}_iter()
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-02-20 16:51:23 +01:00
Bhumika Goyal
b9942bc9d3 rbd: constify device_type structure
Declare device_type structure as const as it is only stored in the
type field of a device structure. This field is of type const, so add
const to the declaration of device_type structure.

File size before:
   text	   data	    bss	    dec	    hex	filename
  61546	  11610	    208	  73364	  11e94	drivers/block/rbd.o

File size after:
   text	   data	    bss	    dec	    hex	filename
  61610	  11578	    208	  73396	  11eb4	drivers/block/rbd.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-02-20 13:06:02 +01:00
Ilya Dryomov
6c696d8560 rbd: kill obj_request->object_name and rbd_segment_name_cache
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:15 +01:00
Ilya Dryomov
a90bb0c1d4 rbd: store and use obj_request->object_no
object_no can be trivially formatted into an object name.  We already
store object names in OSD requests with special care to avoid dynamic
allocations for short names.  Storing a name in obj_request, obtained
as below (!), is a waste and will be removed in the next commit.

    name = kmem_cache_alloc(rbd_segment_name_cache, ...);
    snprintf(name, ...);
    obj_request->object_name = kstrdup(name);
    kmem_cache_free(rbd_segment_name_cache, name);
    ...
    ceph_oid_aprintf(..., "%s", obj_request->object_name);

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:15 +01:00
Ilya Dryomov
223768d02e rbd: RBD_V{1,2}_DATA_FORMAT macros
... and also fix up the comment -- format 1 data objects have always
been 12 hex digits long.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:15 +01:00
Ilya Dryomov
bc81207ea9 rbd: factor out __rbd_osd_req_create()
Factor OSD request allocation and initialization code out into
__rbd_osd_req_create().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:15 +01:00
Ilya Dryomov
67e2b65274 rbd: set offset and length outside of rbd_obj_request_create()
The allocation doesn't depend on offset and length.  Both offset and
length can be changed after obj_request is allocated, too.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:14 +01:00
Ilya Dryomov
7e97332ea9 rbd: support for data-pool feature
Add support for RBD_FEATURE_DATA_POOL feature.  rbd_dev->layout.pool_id
now stores the data pool id.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:14 +01:00
Ilya Dryomov
263423f8ad rbd: introduce rbd_init_layout()
Rather than initializing layout fields with some made up values in
__rbd_dev_create(), move the initialization into rbd_init_layout() and
call it after the header is actually populated.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:14 +01:00
Ilya Dryomov
5bc3fb1775 rbd: use rbd_obj_bytes() more
Returning u64 doesn't make sense: max header->obj_order is 25 and
ceph_file_layout::object_size is u32.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:14 +01:00
Ilya Dryomov
03acf08335 rbd: remove now unused rbd_obj_request_wait() and helpers
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:14 +01:00
Ilya Dryomov
ecd4a68a26 rbd: switch rbd_obj_method_sync() to ceph_osdc_call()
As explained in the previous commit, rbd_obj_request machinery (and
rbd_osd_req_create() in particular) shouldn't be used for working with
metadata objects.

Switch to the recently added ceph_osdc_call().  It assumes single pages
for outbound and inbound buffers, but that's OK - none of the callers
need more than that.  These pages need to be allocated (messenger is in
dire need of proper iterator interface!), but we are swapping for
pages[] and pagelist allocations in the existing code.

Kill class_name argument - all rbd methods are under "rbd".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:13 +01:00
Ilya Dryomov
fe5478e0f6 rbd: do away with obj_request in rbd_obj_read_sync()
rbd_obj_request machinery is completely unnecessary here; all that's
being done is fetching a metadata object - no striping, cloning, etc.
More importantly, rbd_osd_req_create() grabs pool id from layout and
that is becoming a data pool id.

Kill offset argument - all metadata objects are small and read in full.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:12 +01:00
Ilya Dryomov
431a02cd82 rbd: initialize rbd_dev->header_oloc early
No reason to delay it until image_id is known.  This will be required
by some rbd_obj_method_sync() callers, after rbd_obj_method_sync() is
changed to take oloc.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:12 +01:00
Ilya Dryomov
24dca799fd rbd: kill rbd_image_header::{crypt_type,comp_type}
Image format 1 is deprecated and format 2 doesn't have these.  Also,
__rbd_dev_create() takes care of zeroing (or otherwise initializing)
format 2 specific fields.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:12 +01:00
Ilya Dryomov
848d796c8c rbd: use kstrndup() in rbd_header_from_disk()
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jason Dillaman <dillaman@redhat.com>
2017-02-20 12:16:11 +01:00
Jens Axboe
818551e2b2 Merge branch 'for-4.11/next' into for-4.11/linus-merge
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-17 14:08:19 -07:00
Jens Axboe
6010720da8 Merge branch 'for-4.11/block' into for-4.11/linus-merge
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-17 14:06:45 -07:00
Kees Cook
853fe1bf75 cdrom: Make device operations read-only
Since function tables are a common target for attackers, it's best to keep
them in read-only memory. As such, this makes the CDROM device ops tables
const. This drops additionally n_minors, since it isn't used meaningfully,
and sets the only user of cdrom_dummy_generic_packet explicitly so the
variables can all be const.

Inspired by similar changes in grsecurity/PaX.

Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-14 08:29:56 -07:00
Matthew Wilcox
d3e709e63e idr: Return the deleted entry from idr_remove
It is a relatively common idiom (8 instances) to first look up an IDR
entry, and then remove it from the tree if it is found, possibly doing
further operations upon the entry afterwards.  If we change idr_remove()
to return the removed object, all of these users can save themselves a
walk of the IDR tree.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
2017-02-13 21:44:03 -05:00
Christoph Hellwig
c5c9b26ee5 cciss: switch to pci_irq_alloc_vectors
Simple cleanup to use the new APIs.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Don Brace <don.brace@microsemi.com>
Tested-by: Don Brace <don.brace@microsemi.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-13 15:43:31 -07:00
Ming Lei
ecdd09597a block/loop: fix race between I/O and set_status
Inside set_status, transfer need to setup again, so
we have to drain IO before the transition, otherwise
oops may be triggered like the following:

	divide error: 0000 [#1] SMP KASAN
	CPU: 0 PID: 2935 Comm: loop7 Not tainted 4.10.0-rc7+ #213
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
	01/01/2011
	task: ffff88006ba1e840 task.stack: ffff880067338000
	RIP: 0010:transfer_xor+0x1d1/0x440 drivers/block/loop.c:110
	RSP: 0018:ffff88006733f108 EFLAGS: 00010246
	RAX: 0000000000000000 RBX: ffff8800688d7000 RCX: 0000000000000059
	RDX: 0000000000000000 RSI: 1ffff1000d743f43 RDI: ffff880068891c08
	RBP: ffff88006733f160 R08: ffff8800688d7001 R09: 0000000000000000
	R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800688d7000
	R13: ffff880067b7d000 R14: dffffc0000000000 R15: 0000000000000000
	FS:  0000000000000000(0000) GS:ffff88006d000000(0000)
	knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00000000006c17e0 CR3: 0000000066e3b000 CR4: 00000000001406f0
	Call Trace:
	 lo_do_transfer drivers/block/loop.c:251 [inline]
	 lo_read_transfer drivers/block/loop.c:392 [inline]
	 do_req_filebacked drivers/block/loop.c:541 [inline]
	 loop_handle_cmd drivers/block/loop.c:1677 [inline]
	 loop_queue_work+0xda0/0x49b0 drivers/block/loop.c:1689
	 kthread_worker_fn+0x4c3/0xa30 kernel/kthread.c:630
	 kthread+0x326/0x3f0 kernel/kthread.c:227
	 ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
	Code: 03 83 e2 07 41 29 df 42 0f b6 04 30 4d 8d 44 24 01 38 d0 7f 08
	84 c0 0f 85 62 02 00 00 44 89 f8 41 0f b6 48 ff 25 ff 01 00 00 99 <f7>
	7d c8 48 63 d2 48 03 55 d0 48 89 d0 48 89 d7 48 c1 e8 03 83
	RIP: transfer_xor+0x1d1/0x440 drivers/block/loop.c:110 RSP:
	ffff88006733f108
	---[ end trace 0166f7bd3b0c0933 ]---

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-13 09:37:21 -07:00
Juergen Gross
5584ea250a xen: modify xenstore watch event interface
Today a Xenstore watch event is delivered via a callback function
declared as:

void (*callback)(struct xenbus_watch *,
                 const char **vec, unsigned int len);

As all watch events only ever come with two parameters (path and token)
changing the prototype to:

void (*callback)(struct xenbus_watch *,
                 const char *path, const char *token);

is the natural thing to do.

Apply this change and adapt all users.

Cc: konrad.wilk@oracle.com
Cc: roger.pau@citrix.com
Cc: wei.liu2@citrix.com
Cc: paul.durrant@citrix.com
Cc: netdev@vger.kernel.org

Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Reviewed-by: Wei Liu <wei.liu2@citrix.com>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
2017-02-09 11:26:49 -05:00
Jens Axboe
e17354961b zram_drv: update for backing dev info changes
A previous commit made the bdi embedded in the request queue
a pointer, but neglected to fixup zram. Fix it up.

Fixes: dc3b17cc8bf ("block: Use pointer to backing_dev_info from request_queue")
Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 16:53:07 -07:00
Jan Kara
dc3b17cc8b block: Use pointer to backing_dev_info from request_queue
We will want to have struct backing_dev_info allocated separately from
struct request_queue. As the first step add pointer to backing_dev_info
to request_queue and convert all users touching it. No functional
changes in this patch.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-02 08:20:48 -07:00
Josef Bacik
b0d9111a2d nbd: use an idr to keep track of nbd devices
To prepare for dynamically adding new nbd devices to the system switch
from using an array for the nbd devices and instead use an idr.  This
copies what loop does for keeping track of its devices.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-01 16:28:08 -07:00
Josef Bacik
124d6db07c nbd: use our own workqueue for recv threads
Since we are in the memory reclaim path we need our recv work to be on a
workqueue that has WQ_MEM_RECLAIM set so we can avoid deadlocks.  Also
set WQ_HIGHPRI since we are in the completion path for IO.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-02-01 16:28:06 -07:00
Christoph Hellwig
aebf526b53 block: fold cmd_type into the REQ_OP_ space
Instead of keeping two levels of indirection for requests types, fold it
all into the operations.  The little caveat here is that previously
cmd_type only applied to struct request, while the request and bio op
fields were set to plain REQ_OP_READ/WRITE even for passthrough
operations.

Instead this patch adds new REQ_OP_* for SCSI passthrough and driver
private requests, althought it has to add two for each so that we
can communicate the data in/out nature of the request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-01-31 14:00:44 -07:00