Commit Graph

372 Commits

Author SHA1 Message Date
Ard Biesheuvel
a1a262b66e dm crypt: switch to ESSIV crypto API template
Replace the explicit ESSIV handling in the dm-crypt driver with calls
into the crypto API, which now possesses the capability to perform
this processing within the crypto subsystem.

Note that we reorder the AEAD cipher_api string parsing with the TFM
instantiation: this is needed because cipher_api is mangled by the
ESSIV handling, and throws off the parsing of "authenc(" otherwise.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-09-03 16:45:54 -04:00
Ard Biesheuvel
39d13a1ac4 dm crypt: reuse eboiv skcipher for IV generation
Instead of instantiating a separate cipher to perform the encryption
needed to produce the IV, reuse the skcipher used for the block data
and invoke it one additional time for each block to encrypt a zero
vector and use the output as the IV.

For CBC mode, this is equivalent to using the bare block cipher, but
without the risk of ending up with a non-time invariant implementation
of AES when the skcipher itself is time variant (e.g., arm64 without
Crypto Extensions has a NEON based time invariant implementation of
cbc(aes) but no time invariant implementation of the core cipher other
than aes-ti, which is not enabled by default).

This approach is a compromise between dm-crypt API flexibility and
reducing dependence on parts of the crypto API that should not usually
be exposed to other subsystems, such as the bare cipher API.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-08-23 10:13:13 -04:00
Milan Broz
b9411d73bd dm crypt: implement eboiv - encrypted byte-offset initialization vector
This IV is used in some BitLocker devices with CBC encryption mode.

IV is encrypted little-endian byte-offset (with the same key and cipher
as the volume).

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-07-09 14:14:23 -04:00
Milan Broz
6028a7a5a3 dm crypt: remove obsolete comment about plumb IV
The URL is no longer valid and the comment is obsolete anyway
(the plumb IV was never used).

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-07-09 14:14:22 -04:00
Milan Broz
4a52ffc7ca dm crypt: wipe private IV struct after key invalid flag is set
If a private IV wipe function fails, the code does not set the key
invalid flag.  To fix this, move code to after the flag is set to
prevent the device from resuming in an inconsistent state.

Also, this allows using of a randomized key in private wipe function
(to be used in a following commit).

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-07-09 14:14:21 -04:00
Zhengyuan Liu
9c81c99b24 dm crypt: use struct_size() when allocating encryption context
Use struct_size() to avoid open-coded equivalent that is prone to a type
mistake.

Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-07-09 14:08:40 -04:00
Linus Torvalds
311f71281f - Improve DM snapshot target's scalability by using finer grained
locking.  Requires some list_bl interface improvements.
 
 - Add ability for DM integrity to use a bitmap mode, that tracks regions
   where data and metadata are out of sync, instead of using a journal.
 
 - Improve DM thin provisioning target to not write metadata changes to
   disk if the thin-pool and associated thin devices are merely
   activated but not used.  This avoids metadata corruption due to
   concurrent activation of thin devices across different OS instances
   (e.g. split brain scenarios, which ultimately would be avoided if
   proper device filters were used -- but not having proper filtering has
   proven a very common configuration mistake)
 
 - Fix missing call to path selector type->end_io in DM multipath.  This
   fixes reported performance problems due to inaccurate path selector IO
   accounting causing an imbalance of IO (e.g. avoiding issuing IO to
   particular path due to it seemingly being heavily used).
 
 - Fix bug in DM cache metadata's loading of its discard bitset that
   could lead to all cache blocks being discarded if the very first cache
   block was discarded (thankfully in practice the first cache block is
   generally in use; be it FS superblock, partition table, disk label,
   etc).
 
 - Add testing-only DM dust target which simulates a device that has
   failing sectors and/or read failures.
 
 - Fix a DM init error path reference count hang that caused boot hangs
   if user supplied malformed input on kernel commandline.
 
 - Fix a couple issues with DM crypt target's logging being overly
   verbose or lacking context.
 
 - Various other small fixes to DM init, DM multipath, DM zoned, and DM
   crypt.
 -----BEGIN PGP SIGNATURE-----
 
 iQFHBAABCAAxFiEEJfWUX4UqZ4x1O2wixSPxCi2dA1oFAlzdcCgTHHNuaXR6ZXJA
 cmVkaGF0LmNvbQAKCRDFI/EKLZ0DWsxZB/9idHl8LmwwL1JzBfi/XX7bWxwqDQLo
 j1b3ycQ14AKVau4VCkmgDuRIfMDuU6PIAVvsMeVbF3aCE0fZ7zbEV1qHefbtJuCL
 MMm//KbrhIT8oMKYUWtlOj7XI9MT6ErFzfActBZ6UF6r21m1N3bohhVGN7kvCnJm
 wgmSlnz/m2GLKK8gQx+OisnAh0nlje3PIdIYPu7uWN6t0FF2XRz3UwWTuyw7lYhC
 Rx2J+sOIL02CtadhHKLMCG8OutRXWP01cBSohUVJIMGihWfbe6aqvhG5afbqb4bG
 UQrXl477ry5zyQ4fAU2JKZ+8qFvc1FoLLknKrZQu+uYPRokUPw/AwiL7
 =mOH3
 -----END PGP SIGNATURE-----

Merge tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Improve DM snapshot target's scalability by using finer grained
   locking. Requires some list_bl interface improvements.

 - Add ability for DM integrity to use a bitmap mode, that tracks
   regions where data and metadata are out of sync, instead of using a
   journal.

 - Improve DM thin provisioning target to not write metadata changes to
   disk if the thin-pool and associated thin devices are merely
   activated but not used. This avoids metadata corruption due to
   concurrent activation of thin devices across different OS instances
   (e.g. split brain scenarios, which ultimately would be avoided if
   proper device filters were used -- but not having proper filtering
   has proven a very common configuration mistake)

 - Fix missing call to path selector type->end_io in DM multipath. This
   fixes reported performance problems due to inaccurate path selector
   IO accounting causing an imbalance of IO (e.g. avoiding issuing IO to
   particular path due to it seemingly being heavily used).

 - Fix bug in DM cache metadata's loading of its discard bitset that
   could lead to all cache blocks being discarded if the very first
   cache block was discarded (thankfully in practice the first cache
   block is generally in use; be it FS superblock, partition table, disk
   label, etc).

 - Add testing-only DM dust target which simulates a device that has
   failing sectors and/or read failures.

 - Fix a DM init error path reference count hang that caused boot hangs
   if user supplied malformed input on kernel commandline.

 - Fix a couple issues with DM crypt target's logging being overly
   verbose or lacking context.

 - Various other small fixes to DM init, DM multipath, DM zoned, and DM
   crypt.

* tag 'for-5.2/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (42 commits)
  dm: fix a couple brace coding style issues
  dm crypt: print device name in integrity error message
  dm crypt: move detailed message into debug level
  dm ioctl: fix hang in early create error condition
  dm integrity: whitespace, coding style and dead code cleanup
  dm integrity: implement synchronous mode for reboot handling
  dm integrity: handle machine reboot in bitmap mode
  dm integrity: add a bitmap mode
  dm integrity: introduce a function add_new_range_and_wait()
  dm integrity: allow large ranges to be described
  dm ingerity: pass size to dm_integrity_alloc_page_list()
  dm integrity: introduce rw_journal_sectors()
  dm integrity: update documentation
  dm integrity: don't report unused options
  dm integrity: don't check null pointer before kvfree and vfree
  dm integrity: correctly calculate the size of metadata area
  dm dust: Make dm_dust_init and dm_dust_exit static
  dm dust: remove redundant unsigned comparison to less than zero
  dm mpath: always free attached_handler_name in parse_path()
  dm init: fix max devices/targets checks
  ...
2019-05-16 15:55:48 -07:00
Milan Broz
f710126cfc dm crypt: print device name in integrity error message
This message should better identify the DM device with the integrity
failure.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-05-16 10:09:20 -04:00
Milan Broz
7a1cd7238f dm crypt: move detailed message into debug level
The information about tag size should not be printed without debug info
set. Also print device major:minor in the error message to identify the
device instance.

Also use rate limiting and debug level for info about used crypto API
implementaton.  This is important because during online reencryption
the existing message saturates syslog (because we are moving hotzone
across the whole device).

Cc: stable@vger.kernel.org
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-05-16 10:09:20 -04:00
Linus Torvalds
67a2422239 for-5.2/block-20190507
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlzR0AAQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpo0MD/47D1kBK9rGzkAwIz1Jkh1Qy/ITVaDJzmHJ
 UP5uncQsgKFLKMR1LbRcrWtmk2MwFDNULGbteHFeCYE1ypCrTgpWSp5+SJluKd1Q
 hma9krLSAXO9QiSaZ4jafshXFIZxz6IjakOW8c9LrT80Ze47yh7AxiLwDafcp/Jj
 x6NW790qB7ENDtfarDkZk14NCS8HGLRHO5B21LB+hT0Kfbh0XZaLzJdj7Mck1wPA
 VT8hL9mPuA++AjF7Ra4kUjwSakgmajTa3nS2fpkwTYdztQfas7x5Jiv7FWxrrelb
 qbabkNkWKepcHAPEiZR7o53TyfCucGeSK/jG+dsJ9KhNp26kl1ci3frl5T6PfVMP
 SPPDjsKIHs+dqFrU9y5rSGhLJqewTs96hHthnLGxyF67+5sRb5+YIy+dcqgiyc/b
 TUVyjCD6r0cO2q4v9VhwnhOyeBUA9Rwbu8nl7JV5Q45uG7qI4BC39l1jfubMNDPO
 GLNGUUzb6ER7z6lYINjRSF2Jhejsx8SR9P7jhpb1Q7k/VvDDxO1T4FpwvqWFz9+s
 Gn+s6//+cA6LL+42eZkQjvwF2CUNE7TaVT8zdb+s5HP1RQkZToqUnsQCGeRTrFni
 RqWXfW9o9+awYRp431417oMdX/LvLGq9+ZtifRk9DqDcowXevTaf0W2RpplWSuiX
 RcCuPeLAVg==
 =Ot0g
 -----END PGP SIGNATURE-----

Merge tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:
 "Nothing major in this series, just fixes and improvements all over the
  map. This contains:

   - Series of fixes for sed-opal (David, Jonas)

   - Fixes and performance tweaks for BFQ (via Paolo)

   - Set of fixes for bcache (via Coly)

   - Set of fixes for md (via Song)

   - Enabling multi-page for passthrough requests (Ming)

   - Queue release fix series (Ming)

   - Device notification improvements (Martin)

   - Propagate underlying device rotational status in loop (Holger)

   - Removal of mtip32xx trim support, which has been disabled for years
     (Christoph)

   - Improvement and cleanup of nvme command handling (Christoph)

   - Add block SPDX tags (Christoph)

   - Cleanup/hardening of bio/bvec iteration (Christoph)

   - A few NVMe pull requests (Christoph)

   - Removal of CONFIG_LBDAF (Christoph)

   - Various little fixes here and there"

* tag 'for-5.2/block-20190507' of git://git.kernel.dk/linux-block: (164 commits)
  block: fix mismerge in bvec_advance
  block: don't drain in-progress dispatch in blk_cleanup_queue()
  blk-mq: move cancel of hctx->run_work into blk_mq_hw_sysfs_release
  blk-mq: always free hctx after request queue is freed
  blk-mq: split blk_mq_alloc_and_init_hctx into two parts
  blk-mq: free hw queue's resource in hctx's release handler
  blk-mq: move cancel of requeue_work into blk_mq_release
  blk-mq: grab .q_usage_counter when queuing request from plug code path
  block: fix function name in comment
  nvmet: protect discovery change log event list iteration
  nvme: mark nvme_core_init and nvme_core_exit static
  nvme: move command size checks to the core
  nvme-fabrics: check more command sizes
  nvme-pci: check more command sizes
  nvme-pci: remove an unneeded variable initialization
  nvme-pci: unquiesce admin queue on shutdown
  nvme-pci: shutdown on timeout during deletion
  nvme-pci: fix psdt field for single segment sgls
  nvme-multipath: don't print ANA group state by default
  nvme-multipath: split bios with the ns_head bio_set before submitting
  ...
2019-05-07 18:14:36 -07:00
Christoph Hellwig
2b070cfe58 block: remove the i argument to bio_for_each_segment_all
We only have two callers that need the integer loop iterator, and they
can easily maintain it themselves.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Acked-by: David Sterba <dsterba@suse.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: Coly Li <colyli@suse.de>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-04-30 09:26:13 -06:00
Eric Biggers
877b5691f2 crypto: shash - remove shash_desc::flags
The flags field in 'struct shash_desc' never actually does anything.
The only ostensibly supported flag is CRYPTO_TFM_REQ_MAY_SLEEP.
However, no shash algorithm ever sleeps, making this flag a no-op.

With this being the case, inevitably some users who can't sleep wrongly
pass MAY_SLEEP.  These would all need to be fixed if any shash algorithm
actually started sleeping.  For example, the shash_ahash_*() functions,
which wrap a shash algorithm with the ahash API, pass through MAY_SLEEP
from the ahash API to the shash API.  However, the shash functions are
called under kmap_atomic(), so actually they're assumed to never sleep.

Even if it turns out that some users do need preemption points while
hashing large buffers, we could easily provide a helper function
crypto_shash_update_large() which divides the data into smaller chunks
and calls crypto_shash_update() and cond_resched() for each chunk.  It's
not necessary to have a flag in 'struct shash_desc', nor is it necessary
to make individual shash algorithms aware of this at all.

Therefore, remove shash_desc::flags, and document that the
crypto_shash_*() functions can be called from any context.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2019-04-25 15:38:12 +08:00
Christoph Hellwig
c13b5487d9 dm crypt: fix endianness annotations around org_sector_of_dmreq
The sector used here is a little endian value, so use the right
type for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-04-18 16:16:01 -04:00
Linus Torvalds
80201fe175 for-5.1/block-20190302
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlx63XIQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpp2vEACfrrQsap7R+Av28mmXpmXi2FPa3g5Tev1t
 yYjK2qHvhlMZjPTYw3hCmbYdDDczlF7PEgSE2x2DjdcsYapb8Fy1lZ2X16c7ztBR
 HD/t9b5AVSQsczZzKgv3RqsNtTnjzS5V0A8XH8FAP2QRgiwDMwSN6G0FP0JBLbE/
 ZgxQrH1Iy1F33Wz4hI3Z7dEghKPZrH1IlegkZCEu47q9SlWS76qUetSy2GEtchOl
 3Lgu54mQZyVdI5/QZf9DyMDLF6dIz3tYU2qhuo01AHjGRCC72v86p8sIiXcUr94Q
 8pbegJhJ/g8KBol9Qhv3+pWG/QUAZwi/ZwasTkK+MJ4klRXfOrznxPubW1z6t9Vn
 QRo39Po5SqqP0QWAscDxCFjESIQlWlKa+LZurJL7DJDCUGrSgzTpnVwFqKwc5zTP
 HJa5MT2tEeL2TfUYRYCfh0ZV0elINdHA1y1klDBh38drh4EWr2gW8xdseGYXqRjh
 fLgEpoF7VQ8kTvxKN+E4jZXkcZmoLmefp0ZyAbblS6IawpPVC7kXM9Fdn2OU8f2c
 fjVjvSiqxfeN6dnpfeLDRbbN9894HwgP/LPropJOQ7KmjCorQq5zMDkAvoh3tElq
 qwluRqdBJpWT/F05KweY+XVW8OawIycmUWqt6JrVNoIDAK31auHQv47kR0VA4OvE
 DRVVhYpocw==
 =VBaU
 -----END PGP SIGNATURE-----

Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
 "Not a huge amount of changes in this round, the biggest one is that we
  finally have Mings multi-page bvec support merged. Apart from that,
  this pull request contains:

   - Small series that avoids quiescing the queue for sysfs changes that
     match what we currently have (Aleksei)

   - Series of bcache fixes (via Coly)

   - Series of lightnvm fixes (via Mathias)

   - NVMe pull request from Christoph. Nothing major, just SPDX/license
     cleanups, RR mp policy (Hannes), and little fixes (Bart,
     Chaitanya).

   - BFQ series (Paolo)

   - Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
     for the fast path (Jianchao)

   - fops->iopoll() added for async IO polling, this is a feature that
     the upcoming io_uring interface will use (Christoph, me)

   - Partition scan loop fixes (Dongli)

   - mtip32xx conversion from managed resource API (Christoph)

   - cdrom registration race fix (Guenter)

   - MD pull from Song, two minor fixes.

   - Various documentation fixes (Marcos)

   - Multi-page bvec feature. This brings a lot of nice improvements
     with it, like more efficient splitting, larger IOs can be supported
     without growing the bvec table size, and so on. (Ming)

   - Various little fixes to core and drivers"

* tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
  block: fix updating bio's front segment size
  block: Replace function name in string with __func__
  nbd: propagate genlmsg_reply return code
  floppy: remove set but not used variable 'q'
  null_blk: fix checking for REQ_FUA
  block: fix NULL pointer dereference in register_disk
  fs: fix guard_bio_eod to check for real EOD errors
  blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
  block: optimize bvec iteration in bvec_iter_advance
  block: introduce mp_bvec_for_each_page() for iterating over page
  block: optimize blk_bio_segment_split for single-page bvec
  block: optimize __blk_segment_map_sg() for single-page bvec
  block: introduce bvec_nth_page()
  iomap: wire up the iopoll method
  block: add bio_set_polled() helper
  block: wire up block device iopoll method
  fs: add an iopoll method to struct file_operations
  loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
  loop: do not print warn message if partition scan is successful
  block: bounce: make sure that bvec table is updated
  ...
2019-03-08 14:12:17 -08:00
Ming Lei
6dc4f100c1 block: allow bio_for_each_segment_all() to iterate over multi-page bvec
This patch introduces one extra iterator variable to bio_for_each_segment_all(),
then we can allow bio_for_each_segment_all() to iterate over multi-page bvec.

Given it is just one mechannical & simple change on all bio_for_each_segment_all()
users, this patch does tree-wide change in one single patch, so that we can
avoid to use a temporary helper for this conversion.

Reviewed-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-02-15 08:40:11 -07:00
Mikulas Patocka
ff0c129d3b dm crypt: don't overallocate the integrity tag space
bio_sectors() returns the value in the units of 512-byte sectors (no
matter what the real sector size of the device).  dm-crypt multiplies
bio_sectors() by on_disk_tag_size to calculate the space allocated for
integrity tags.  If dm-crypt is running with sector size larger than
512b, it allocates more data than is needed.

Device Mapper trims the extra space when passing the bio to
dm-integrity, so this bug didn't result in any visible misbehavior.
But it must be fixed to avoid wasteful memory allocation for the block
integrity payload.

Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # 4.12+
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-02-11 12:06:48 -05:00
Milan Broz
1856b9f7bc dm crypt: fix parsing of extended IV arguments
The dm-crypt cipher specification in a mapping table is defined as:
  cipher[:keycount]-chainmode-ivmode[:ivopts]
or (new crypt API format):
  capi:cipher_api_spec-ivmode[:ivopts]

For ESSIV, the parameter includes hash specification, for example:
aes-cbc-essiv:sha256

The implementation expected that additional IV option to never include
another dash '-' character.

But, with SHA3, there are names like sha3-256; so the mapping table
parser fails:

dmsetup create test --table "0 8 crypt aes-cbc-essiv:sha3-256 9c1185a5c5e9fc54612808977ee8f5b9e 0 /dev/sdb 0"
  or (new crypt API format)
dmsetup create test --table "0 8 crypt capi:cbc(aes)-essiv:sha3-256 9c1185a5c5e9fc54612808977ee8f5b9e 0 /dev/sdb 0"

  device-mapper: crypt: Ignoring unexpected additional cipher options
  device-mapper: table: 253:0: crypt: Error creating IV
  device-mapper: ioctl: error adding target to table

Fix the dm-crypt constructor to ignore additional dash in IV options and
also remove a bogus warning (that is ignored anyway).

Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2019-01-10 15:17:33 -05:00
Linus Torvalds
f346b0becb Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:

 - large KASAN update to use arm's "software tag-based mode"

 - a few misc things

 - sh updates

 - ocfs2 updates

 - just about all of MM

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (167 commits)
  kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
  memcg, oom: notify on oom killer invocation from the charge path
  mm, swap: fix swapoff with KSM pages
  include/linux/gfp.h: fix typo
  mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
  hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
  hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
  memory_hotplug: add missing newlines to debugging output
  mm: remove __hugepage_set_anon_rmap()
  include/linux/vmstat.h: remove unused page state adjustment macro
  mm/page_alloc.c: allow error injection
  mm: migrate: drop unused argument of migrate_page_move_mapping()
  blkdev: avoid migration stalls for blkdev pages
  mm: migrate: provide buffer_migrate_page_norefs()
  mm: migrate: move migrate_page_lock_buffers()
  mm: migrate: lock buffers before migrate_page_move_mapping()
  mm: migration: factor out code to compute expected number of page references
  mm, page_alloc: enable pcpu_drain with zone capability
  kmemleak: add config to select auto scan
  mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
  ...
2018-12-28 16:55:46 -08:00
Linus Torvalds
4ed7bdc1eb - Eliminate a couple indirect calls from bio-based DM core.
- Fix DM to allow reads that exceed readahead limits by setting io_pages
   in the backing_dev_info.
 
 - A couple code cleanups in request-based DM.
 
 - Fix various DM targets to check for device sector overflow if
   CONFIG_LBDAF is not set.
 
 - Use u64 instead of sector_t to store iv_offset in DM crypt; sector_t
   isn't large enough on 32bit when CONFIG_LBDAF is not set.
 
 - Performance fixes to DM's kcopyd and the snapshot target focused on
   limiting memory use and workqueue stalls.
 
 - Fix typos in the integrity and writecache targets.
 
 - Log which algorithm is used for dm-crypt's encryption and
   dm-integrity's hashing.
 
 - Fix false -EBUSY errors in DM raid target's handling of check/repair
   messages.
 
 - Fix DM flakey target's corrupt_bio_byte feature to reliably corrupt
   the Nth byte in a bio's payload.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJcHu6MAAoJEMUj8QotnQNar9gIAKyJ5hH0ik3mlLhAyPcMkRU+
 MPvjTAC8V5V33VY8YqKD/TcDHY8yaGTPKSK0SAnB8Jv2tLz8mp0H18syJbMvNcFc
 MS/xWYLAZiXf5xJ4eJpo8ayDvU6DAkKtSbG7VHJ8gnx/WV8bW1b90OCIq8GN+jHo
 DPp6RDzy3LvonoZ9OPognQKUDXK1crdPDiv1Wzw9Zjd4JlruIaGgJOc1RLB7YdVK
 u5c7UD7cT+MCROEzJrctAZa5oRVhuemd83HPvwE2akRFvCSNtc7aIVtNfJ79nqL7
 19oUDfsuzf2zkiJLzO0oU4OrJ+lUGDJf/sPXmU1qlPyKdUqYKFxT+KHvBnhAQXI=
 =pCs8
 -----END PGP SIGNATURE-----

Merge tag 'for-4.21/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Eliminate a couple indirect calls from bio-based DM core.

 - Fix DM to allow reads that exceed readahead limits by setting
   io_pages in the backing_dev_info.

 - A couple code cleanups in request-based DM.

 - Fix various DM targets to check for device sector overflow if
   CONFIG_LBDAF is not set.

 - Use u64 instead of sector_t to store iv_offset in DM crypt; sector_t
   isn't large enough on 32bit when CONFIG_LBDAF is not set.

 - Performance fixes to DM's kcopyd and the snapshot target focused on
   limiting memory use and workqueue stalls.

 - Fix typos in the integrity and writecache targets.

 - Log which algorithm is used for dm-crypt's encryption and
   dm-integrity's hashing.

 - Fix false -EBUSY errors in DM raid target's handling of check/repair
   messages.

 - Fix DM flakey target's corrupt_bio_byte feature to reliably corrupt
   the Nth byte in a bio's payload.

* tag 'for-4.21/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm: do not allow readahead to limit IO size
  dm raid: fix false -EBUSY when handling check/repair message
  dm rq: cleanup leftover code from recently removed q->mq_ops branching
  dm verity: log the hash algorithm implementation
  dm crypt: log the encryption algorithm implementation
  dm integrity: fix spelling mistake in workqueue name
  dm flakey: Properly corrupt multi-page bios.
  dm: Check for device sector overflow if CONFIG_LBDAF is not set
  dm crypt: use u64 instead of sector_t to store iv_offset
  dm kcopyd: Fix bug causing workqueue stalls
  dm snapshot: Fix excessive memory usage and workqueue stalls
  dm bufio: update comment in dm-bufio.c
  dm writecache: fix typo in error msg for creating writecache_flush_thread
  dm: remove indirect calls from __send_changing_extent_only()
  dm mpath: only flush workqueue when needed
  dm rq: remove unused arguments from rq_completed()
  dm: avoid indirect call in __dm_make_request
2018-12-28 15:02:26 -08:00
Arun KS
ca79b0c211 mm: convert totalram_pages and totalhigh_pages variables to atomic
totalram_pages and totalhigh_pages are made static inline function.

Main motivation was that managed_page_count_lock handling was complicating
things.  It was discussed in length here,
https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
better to remove the lock and convert variables to atomic, with preventing
poteintial store-to-read tearing as a bonus.

[akpm@linux-foundation.org: coding style fixes]
Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
Signed-off-by: Arun KS <arunks@codeaurora.org>
Suggested-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Reviewed-by: Pavel Tatashin <pasha.tatashin@soleen.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 12:11:47 -08:00
Eric Biggers
af331ebae7 dm crypt: log the encryption algorithm implementation
Log the encryption algorithm's driver name when a dm-crypt target is
created.  This will help people determine whether the expected
implementation is being used.  In some cases we've seen people do
benchmarks and reject using encryption for performance reasons, when in
fact they used a much slower implementation than was possible on the
hardware.  It can make an enormous difference; e.g., AES-XTS on ARM can
be over 10x faster with the crypto extensions than without.  It can also
be useful to know if an implementation using an external crypto
accelerator is being used instead of a software implementation.

Example message:

[   29.307629] device-mapper: crypt: xts(aes) using implementation "xts-aes-ce"

We've already found the similar message in fs/crypto/keyinfo.c to be
very useful.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-12-18 09:02:27 -05:00
Milan Broz
ef87bfc24f dm: Check for device sector overflow if CONFIG_LBDAF is not set
Reference to a device in device-mapper table contains offset in sectors.

If the sector_t is 32bit integer (CONFIG_LBDAF is not set), then
several device-mapper targets can overflow this offset and validity
check is then performed on a wrong offset and a wrong table is activated.

See for example (on 32bit without CONFIG_LBDAF) this overflow:

  # dmsetup create test --table "0 2048 linear /dev/sdg 4294967297"
  # dmsetup table test
  0 2048 linear 8:96 1

This patch adds explicit check for overflow if the offset is sector_t type.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-12-18 09:02:26 -05:00
AliOS system security
8d683dcd65 dm crypt: use u64 instead of sector_t to store iv_offset
The iv_offset in the mapping table of crypt target is a 64bit number when
IV algorithm is plain64, plain64be, essiv or benbi. It will be assigned to
iv_offset of struct crypt_config, cc_sector of struct convert_context and
iv_sector of struct dm_crypt_request. These structures members are defined
as a sector_t. But sector_t is 32bit when CONFIG_LBDAF is not set in 32bit
kernel. In this situation sector_t is not big enough to store the 64bit
iv_offset.

Here is a reproducer.
Prepare test image and device (loop is automatically allocated by cryptsetup):

  # dd if=/dev/zero of=tst.img bs=1M count=1
  # echo "tst"|cryptsetup open --type plain -c aes-xts-plain64 \
  --skip 500000000000000000 tst.img test

On 32bit system (use IV offset value that overflows to 64bit; CONFIG_LBDAF if off)
and device checksum is wrong:

  # dmsetup table test --showkeys
  0 2048 crypt aes-xts-plain64 dfa7cfe3c481f2239155739c42e539ae8f2d38f304dcc89d20b26f69daaf0933 3551657984 7:0 0

  # sha256sum /dev/mapper/test
  533e25c09176632b3794f35303488c4a8f3f965dffffa6ec2df347c168cb6c19 /dev/mapper/test

On 64bit system (and on 32bit system with the patch), table and checksum is now correct:

  # dmsetup table test --showkeys
  0 2048 crypt aes-xts-plain64 dfa7cfe3c481f2239155739c42e539ae8f2d38f304dcc89d20b26f69daaf0933 500000000000000000 7:0 0

  # sha256sum /dev/mapper/test
  5d16160f9d5f8c33d8051e65fdb4f003cc31cd652b5abb08f03aa6fce0df75fc /dev/mapper/test

Signed-off-by: AliOS system security <alios_sys_security@linux.alibaba.com>
Tested-and-Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-12-18 09:02:26 -05:00
Eric Biggers
1ad0f1603a crypto: drop mask=CRYPTO_ALG_ASYNC from 'cipher' tfm allocations
'cipher' algorithms (single block ciphers) are always synchronous, so
passing CRYPTO_ALG_ASYNC in the mask to crypto_alloc_cipher() has no
effect.  Many users therefore already don't pass it, but some still do.
This inconsistency can cause confusion, especially since the way the
'mask' argument works is somewhat counterintuitive.

Thus, just remove the unneeded CRYPTO_ALG_ASYNC flags.

This patch shouldn't change any actual behavior.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-11-20 14:26:55 +08:00
Michał Mirosław
ed0302e830 dm crypt: make workqueue names device-specific
Make cpu-usage debugging easier by naming workqueues per device.

Example ps output:

root       413  0.0  0.0      0     0 ?        I<   paź02   0:00  [kcryptd_io/253:0]
root       414  0.0  0.0      0     0 ?        I<   paź02   0:00  [kcryptd/253:0]
root       415  0.0  0.0      0     0 ?        S    paź02   1:10  [dmcrypt_write/253:0]
root       465  0.0  0.0      0     0 ?        I<   paź02   0:00  [kcryptd_io/253:2]
root       466  0.0  0.0      0     0 ?        I<   paź02   0:00  [kcryptd/253:2]
root       467  0.0  0.0      0     0 ?        S    paź02   2:06  [dmcrypt_write/253:2]
root     15359  0.2  0.0      0     0 ?        I<   19:43   0:25  [kworker/u17:8-kcryptd/253:0]
root     16563  0.2  0.0      0     0 ?        I<   20:10   0:18  [kworker/u17:0-kcryptd/253:2]
root     23205  0.1  0.0      0     0 ?        I<   21:21   0:04  [kworker/u17:4-kcryptd/253:0]
root     13383  0.1  0.0      0     0 ?        I<   21:32   0:02  [kworker/u17:2-kcryptd/253:2]
root      2610  0.1  0.0      0     0 ?        I<   21:42   0:01  [kworker/u17:12-kcryptd/253:2]
root     20124  0.1  0.0      0     0 ?        I<   21:56   0:01  [kworker/u17:1-kcryptd/253:2]

Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-10-18 12:17:17 -04:00
Mikulas Patocka
432061b3da dm: disable CRYPTO_TFM_REQ_MAY_SLEEP to fix a GFP_KERNEL recursion deadlock
There's a XFS on dm-crypt deadlock, recursing back to itself due to the
crypto subsystems use of GFP_KERNEL, reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=200835

* dm-crypt calls crypt_convert in xts mode
* init_crypt from xts.c calls kmalloc(GFP_KERNEL)
* kmalloc(GFP_KERNEL) recurses into the XFS filesystem, the filesystem
	tries to submit some bios and wait for them, causing a deadlock

Fix this by updating both the DM crypt and integrity targets to no
longer use the CRYPTO_TFM_REQ_MAY_SLEEP flag, which will change the
crypto allocations from GFP_KERNEL to GFP_ATOMIC, therefore they can't
recurse into a filesystem.  A GFP_ATOMIC allocation can fail, but
init_crypt() in xts.c handles the allocation failure gracefully - it
will fall back to preallocated buffer if the allocation fails.

The crypto API maintainer says that the crypto API only needs to
allocate memory when dealing with unaligned buffers and therefore
turning CRYPTO_TFM_REQ_MAY_SLEEP off is safe (see this discussion:
https://www.redhat.com/archives/dm-devel/2018-August/msg00195.html )

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-09-06 13:31:09 -04:00
Mikulas Patocka
bc9e9cf040 dm crypt: don't decrease device limits
dm-crypt should only increase device limits, it should not decrease them.

This fixes a bug where the user could creates a crypt device with 1024
sector size on the top of scsi device that had 4096 logical block size.
The limit 4096 would be lost and the user could incorrectly send
1024-I/Os to the crypt device.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-08-13 15:28:41 -04:00
Kees Cook
c07c88f54f dm crypt: convert essiv from ahash to shash
In preparing to remove all stack VLA usage from the kernel[1], remove
the discouraged use of AHASH_REQUEST_ON_STACK in favor of the smaller
SHASH_DESC_ON_STACK by converting from ahash-wrapped-shash to direct
shash.  The stack allocation will be made a fixed size in a later patch
to the crypto subsystem.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-07-27 15:24:28 -04:00
Mikulas Patocka
c7329eff72 dm crypt: use wake_up_process() instead of a wait queue
This is a small simplification of dm-crypt - use wake_up_process()
instead of a wait queue in a case where only one process may be
waiting.  dm-writecache uses a similar pattern.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-07-27 15:24:28 -04:00
Linus Torvalds
b08fc5277a - Error path bug fix for overflow tests (Dan)
- Additional struct_size() conversions (Matthew, Kees)
 - Explicitly reported overflow fixes (Silvio, Kees)
 - Add missing kvcalloc() function (Kees)
 - Treewide conversions of allocators to use either 2-factor argument
   variant when available, or array_size() and array3_size() as needed (Kees)
 -----BEGIN PGP SIGNATURE-----
 Comment: Kees Cook <kees@outflux.net>
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAlsgVtMWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJhsJEACLYe2EbwLFJz7emOT1KUGK5R1b
 oVxJog0893WyMqgk9XBlA2lvTBRBYzR3tzsadfYo87L3VOBzazUv0YZaweJb65sF
 bAvxW3nY06brhKKwTRed1PrMa1iG9R63WISnNAuZAq7+79mN6YgW4G6YSAEF9lW7
 oPJoPw93YxcI8JcG+dA8BC9w7pJFKooZH4gvLUSUNl5XKr8Ru5YnWcV8F+8M4vZI
 EJtXFmdlmxAledUPxTSCIojO8m/tNOjYTreBJt9K1DXKY6UcgAdhk75TRLEsp38P
 fPvMigYQpBDnYz2pi9ourTgvZLkffK1OBZ46PPt8BgUZVf70D6CBg10vK47KO6N2
 zreloxkMTrz5XohyjfNjYFRkyyuwV2sSVrRJqF4dpyJ4NJQRjvyywxIP4Myifwlb
 ONipCM1EjvQjaEUbdcqKgvlooMdhcyxfshqJWjHzXB6BL22uPzq5jHXXugz8/ol8
 tOSM2FuJ2sBLQso+szhisxtMd11PihzIZK9BfxEG3du+/hlI+2XgN7hnmlXuA2k3
 BUW6BSDhab41HNd6pp50bDJnL0uKPWyFC6hqSNZw+GOIb46jfFcQqnCB3VZGCwj3
 LH53Be1XlUrttc/NrtkvVhm4bdxtfsp4F7nsPFNDuHvYNkalAVoC3An0BzOibtkh
 AtfvEeaPHaOyD8/h2Q==
 =zUUp
 -----END PGP SIGNATURE-----

Merge tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull more overflow updates from Kees Cook:
 "The rest of the overflow changes for v4.18-rc1.

  This includes the explicit overflow fixes from Silvio, further
  struct_size() conversions from Matthew, and a bug fix from Dan.

  But the bulk of it is the treewide conversions to use either the
  2-factor argument allocators (e.g. kmalloc(a * b, ...) into
  kmalloc_array(a, b, ...) or the array_size() macros (e.g. vmalloc(a *
  b) into vmalloc(array_size(a, b)).

  Coccinelle was fighting me on several fronts, so I've done a bunch of
  manual whitespace updates in the patches as well.

  Summary:

   - Error path bug fix for overflow tests (Dan)

   - Additional struct_size() conversions (Matthew, Kees)

   - Explicitly reported overflow fixes (Silvio, Kees)

   - Add missing kvcalloc() function (Kees)

   - Treewide conversions of allocators to use either 2-factor argument
     variant when available, or array_size() and array3_size() as needed
     (Kees)"

* tag 'overflow-v4.18-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (26 commits)
  treewide: Use array_size in f2fs_kvzalloc()
  treewide: Use array_size() in f2fs_kzalloc()
  treewide: Use array_size() in f2fs_kmalloc()
  treewide: Use array_size() in sock_kmalloc()
  treewide: Use array_size() in kvzalloc_node()
  treewide: Use array_size() in vzalloc_node()
  treewide: Use array_size() in vzalloc()
  treewide: Use array_size() in vmalloc()
  treewide: devm_kzalloc() -> devm_kcalloc()
  treewide: devm_kmalloc() -> devm_kmalloc_array()
  treewide: kvzalloc() -> kvcalloc()
  treewide: kvmalloc() -> kvmalloc_array()
  treewide: kzalloc_node() -> kcalloc_node()
  treewide: kzalloc() -> kcalloc()
  treewide: kmalloc() -> kmalloc_array()
  mm: Introduce kvcalloc()
  video: uvesafb: Fix integer overflow in allocation
  UBIFS: Fix potential integer overflow in allocation
  leds: Use struct_size() in allocation
  Convert intel uncore to struct_size
  ...
2018-06-12 18:28:00 -07:00
Kees Cook
6396bb2215 treewide: kzalloc() -> kcalloc()
The kzalloc() function has a 2-factor argument form, kcalloc(). This
patch replaces cases of:

        kzalloc(a * b, gfp)

with:
        kcalloc(a * b, gfp)

as well as handling cases of:

        kzalloc(a * b * c, gfp)

with:

        kzalloc(array3_size(a, b, c), gfp)

as it's slightly less ugly than:

        kzalloc_array(array_size(a, b), c, gfp)

This does, however, attempt to ignore constant size factors like:

        kzalloc(4 * 1024, gfp)

though any constants defined via macros get caught up in the conversion.

Any factors with a sizeof() of "unsigned char", "char", and "u8" were
dropped, since they're redundant.

The Coccinelle script used for this was:

// Fix redundant parens around sizeof().
@@
type TYPE;
expression THING, E;
@@

(
  kzalloc(
-	(sizeof(TYPE)) * E
+	sizeof(TYPE) * E
  , ...)
|
  kzalloc(
-	(sizeof(THING)) * E
+	sizeof(THING) * E
  , ...)
)

// Drop single-byte sizes and redundant parens.
@@
expression COUNT;
typedef u8;
typedef __u8;
@@

(
  kzalloc(
-	sizeof(u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * (COUNT)
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(__u8) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(char) * COUNT
+	COUNT
  , ...)
|
  kzalloc(
-	sizeof(unsigned char) * COUNT
+	COUNT
  , ...)
)

// 2-factor product with sizeof(type/expression) and identifier or constant.
@@
type TYPE;
expression THING;
identifier COUNT_ID;
constant COUNT_CONST;
@@

(
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_ID)
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_ID
+	COUNT_ID, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (COUNT_CONST)
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * COUNT_CONST
+	COUNT_CONST, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_ID)
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_ID
+	COUNT_ID, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (COUNT_CONST)
+	COUNT_CONST, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * COUNT_CONST
+	COUNT_CONST, sizeof(THING)
  , ...)
)

// 2-factor product, only identifiers.
@@
identifier SIZE, COUNT;
@@

- kzalloc
+ kcalloc
  (
-	SIZE * COUNT
+	COUNT, SIZE
  , ...)

// 3-factor product with 1 sizeof(type) or sizeof(expression), with
// redundant parens removed.
@@
expression THING;
identifier STRIDE, COUNT;
type TYPE;
@@

(
  kzalloc(
-	sizeof(TYPE) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(TYPE) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(TYPE))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * (COUNT) * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * (STRIDE)
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
|
  kzalloc(
-	sizeof(THING) * COUNT * STRIDE
+	array3_size(COUNT, STRIDE, sizeof(THING))
  , ...)
)

// 3-factor product with 2 sizeof(variable), with redundant parens removed.
@@
expression THING1, THING2;
identifier COUNT;
type TYPE1, TYPE2;
@@

(
  kzalloc(
-	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(THING1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * COUNT
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
|
  kzalloc(
-	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
+	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
  , ...)
)

// 3-factor product, only identifiers, with redundant parens removed.
@@
identifier STRIDE, SIZE, COUNT;
@@

(
  kzalloc(
-	(COUNT) * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * STRIDE * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	(COUNT) * (STRIDE) * (SIZE)
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
|
  kzalloc(
-	COUNT * STRIDE * SIZE
+	array3_size(COUNT, STRIDE, SIZE)
  , ...)
)

// Any remaining multi-factor products, first at least 3-factor products,
// when they're not all constants...
@@
expression E1, E2, E3;
constant C1, C2, C3;
@@

(
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(
-	(E1) * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * E3
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	(E1) * (E2) * (E3)
+	array3_size(E1, E2, E3)
  , ...)
|
  kzalloc(
-	E1 * E2 * E3
+	array3_size(E1, E2, E3)
  , ...)
)

// And then all remaining 2 factors products when they're not all constants,
// keeping sizeof() as the second factor argument.
@@
expression THING, E1, E2;
type TYPE;
constant C1, C2, C3;
@@

(
  kzalloc(sizeof(THING) * C2, ...)
|
  kzalloc(sizeof(TYPE) * C2, ...)
|
  kzalloc(C1 * C2 * C3, ...)
|
  kzalloc(C1 * C2, ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * (E2)
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(TYPE) * E2
+	E2, sizeof(TYPE)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * (E2)
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	sizeof(THING) * E2
+	E2, sizeof(THING)
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * E2
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	(E1) * (E2)
+	E1, E2
  , ...)
|
- kzalloc
+ kcalloc
  (
-	E1 * E2
+	E1, E2
  , ...)
)

Signed-off-by: Kees Cook <keescook@chromium.org>
2018-06-12 16:19:22 -07:00
Mike Snitzer
72d711c876 dm: adjust structure members to improve alignment
Eliminate most holes in DM data structures that were modified by
commit 6f1c819c21 ("dm: convert to bioset_init()/mempool_init()").
Also prevent structure members from unnecessarily spanning cache
lines.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-06-08 11:53:14 -04:00
Kent Overstreet
d00a11df69 dm-crypt: fix warning in shutdown path
The counter for the number of allocated pages includes pages in the
mempool's reserve, so checking that the number of allocated pages is 0
needs to happen after we exit the mempool.

Fixes: 6f1c819c21 ("dm: convert to bioset_init()/mempool_init()")
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Reported-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Mike Snitzer <snitzer@redhat.com>

Fixed to always just use percpu_counter_sum()

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-06-02 20:35:00 -06:00
Kent Overstreet
6f1c819c21 dm: convert to bioset_init()/mempool_init()
Convert dm to embedded bio sets.

Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-30 15:33:32 -06:00
Mikulas Patocka
5059353df8 dm crypt: limit the number of allocated pages
dm-crypt consumes an excessive amount memory when the user attempts to
zero a dm-crypt device with "blkdiscard -z". The command "blkdiscard -z"
calls the BLKZEROOUT ioctl, it goes to the function __blkdev_issue_zeroout,
__blkdev_issue_zeroout sends a large amount of write bios that contain
the zero page as their payload.

For each incoming page, dm-crypt allocates another page that holds the
encrypted data, so when processing "blkdiscard -z", dm-crypt tries to
allocate the amount of memory that is equal to the size of the device.
This can trigger OOM killer or cause system crash.

Fix this by limiting the amount of memory that dm-crypt allocates to 2%
of total system memory. This limit is system-wide and is divided by the
number of active dm-crypt devices and each device receives an equal
share.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:11 -04:00
Mike Snitzer
1eb5fa849f dm: allow targets to return output from messages they are sent
Could be useful for a target to return stats or other information.
If a target does DMEMIT() anything to @result from its .message method
then it must return 1 to the caller.

Signed-off-By: Mike Snitzer <snitzer@redhat.com>
2018-04-03 15:04:10 -04:00
Linus Torvalds
0be600a5ad - DM core fixes to ensure that bio submission follows a depth-first tree
walk; this is critical to allow forward progress without the need to
   use the bioset's BIOSET_NEED_RESCUER.
 
 - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure.
 
 - DM core cleanups and improvements to make bio-based DM more efficient
   (e.g. reduced memory footprint as well leveraging per-bio-data more).
 
 - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages
   the more direct IO submission path in the block layer; this mode is
   used by DM multipath and also optimizes targets like DM thin-pool that
   stack directly on NVMe data device.
 
 - DM multipath improvements to factor out legacy SCSI-only
   (e.g. scsi_dh) code paths to allow for more optimized support for NVMe
   multipath.
 
 - A fix for DM multipath path selectors (service-time and queue-length)
   to select paths in a more balanced way; largely academic but doesn't
   hurt.
 
 - Numerous DM raid target fixes and improvements.
 
 - Add a new DM "unstriped" target that enables Intel to workaround
   firmware limitations in some NVMe drives that are striped internally
   (this target also works when stacked above the DM "striped" target).
 
 - Various Documentation fixes and improvements.
 
 - Misc. cleanups and fixes across various DM infrastructure and targets
   (e.g. bufio, flakey, log-writes, snapshot).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJacgwPAAoJEMUj8QotnQNaEw0H/0XRTcg8/lRuGl46kdeI3PgR
 ZxUy4XgUrCLiACWO5yCU/nKipB32+3xTlTDTBcjmaBfX8HolH147Pasb1KdHqLVC
 dOWLMpjlFztb5fnuOMitJA05qQAbgRlZ52QdVk/FDo9yWicgWjQZduh8aYX53pHw
 6XOYWzSFAXQcaduPdz6TLiPw479xBwIpXxQbrO09f4qt3Ub4bqknEhzFXc+6M7zl
 ejmW/bG2Qg6WmsfAuaAhFTV0LpTPSEzvaq9TfR7yqFU3DvDIAi7Yh8eQinIUDo4u
 txpOGoESRAMPAMKH0/UJdr/u7jTsfgJox4QEavWfnViPvkouah5KdjVOL1veZ5U=
 =R3dN
 -----END PGP SIGNATURE-----

Merge tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - DM core fixes to ensure that bio submission follows a depth-first
   tree walk; this is critical to allow forward progress without the
   need to use the bioset's BIOSET_NEED_RESCUER.

 - Remove DM core's BIOSET_NEED_RESCUER based dm_offload infrastructure.

 - DM core cleanups and improvements to make bio-based DM more efficient
   (e.g. reduced memory footprint as well leveraging per-bio-data more).

 - Introduce new bio-based mode (DM_TYPE_NVME_BIO_BASED) that leverages
   the more direct IO submission path in the block layer; this mode is
   used by DM multipath and also optimizes targets like DM thin-pool
   that stack directly on NVMe data device.

 - DM multipath improvements to factor out legacy SCSI-only (e.g.
   scsi_dh) code paths to allow for more optimized support for NVMe
   multipath.

 - A fix for DM multipath path selectors (service-time and queue-length)
   to select paths in a more balanced way; largely academic but doesn't
   hurt.

 - Numerous DM raid target fixes and improvements.

 - Add a new DM "unstriped" target that enables Intel to workaround
   firmware limitations in some NVMe drives that are striped internally
   (this target also works when stacked above the DM "striped" target).

 - Various Documentation fixes and improvements.

 - Misc cleanups and fixes across various DM infrastructure and targets
   (e.g. bufio, flakey, log-writes, snapshot).

* tag 'for-4.16/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (69 commits)
  dm cache: Documentation: update default migration_throttling value
  dm mpath selector: more evenly distribute ties
  dm unstripe: fix target length versus number of stripes size check
  dm thin: fix trailing semicolon in __remap_and_issue_shared_cell
  dm table: fix NVMe bio-based dm_table_determine_type() validation
  dm: various cleanups to md->queue initialization code
  dm mpath: delay the retry of a request if the target responded as busy
  dm mpath: return DM_MAPIO_DELAY_REQUEUE if QUEUE_IO or PG_INIT_REQUIRED
  dm mpath: return DM_MAPIO_REQUEUE on blk-mq rq allocation failure
  dm log writes: fix max length used for kstrndup
  dm: backfill missing calls to mutex_destroy()
  dm snapshot: use mutex instead of rw_semaphore
  dm flakey: check for null arg_name in parse_features()
  dm thin: extend thinpool status format string with omitted fields
  dm thin: fixes in thin-provisioning.txt
  dm thin: document representation of <highest mapped sector> when there is none
  dm thin: fix documentation relative to low water mark threshold
  dm cache: be consistent in specifying sectors and SI units in cache.txt
  dm cache: delete obsoleted paragraph in cache.txt
  dm cache: fix grammar in cache-policies.txt
  ...
2018-01-31 11:05:47 -08:00
Linus Torvalds
0a4b6e2f80 Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
 "This is the main pull request for block IO related changes for the
  4.16 kernel. Nothing major in this pull request, but a good amount of
  improvements and fixes all over the map. This contains:

   - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
     Paolo.

   - Support for SMR zones for deadline and mq-deadline from Damien and
     Christoph.

   - Set of fixes for bcache by way of Michael Lyle, including fixes
     from himself, Kent, Rui, Tang, and Coly.

   - Series from Matias for lightnvm with fixes from Hans Holmberg,
     Javier, and Matias. Mostly centered around pblk, and the removing
     rrpc 1.2 in preparation for supporting 2.0.

   - A couple of NVMe pull requests from Christoph. Nothing major in
     here, just fixes and cleanups, and support for command tracing from
     Johannes.

   - Support for blk-throttle for tracking reads and writes separately.
     From Joseph Qi. A few cleanups/fixes also for blk-throttle from
     Weiping.

   - Series from Mike Snitzer that enables dm to register its queue more
     logically, something that's alwways been problematic on dm since
     it's a stacked device.

   - Series from Ming cleaning up some of the bio accessor use, in
     preparation for supporting multipage bvecs.

   - Various fixes from Ming closing up holes around queue mapping and
     quiescing.

   - BSD partition fix from Richard Narron, fixing a problem where we
     can't mount newer (10/11) FreeBSD partitions.

   - Series from Tejun reworking blk-mq timeout handling. The previous
     scheme relied on atomic bits, but it had races where we would think
     a request had timed out if it to reused at the wrong time.

   - null_blk now supports faking timeouts, to enable us to better
     exercise and test that functionality separately. From me.

   - Kill the separate atomic poll bit in the request struct. After
     this, we don't use the atomic bits on blk-mq anymore at all. From
     me.

   - sgl_alloc/free helpers from Bart.

   - Heavily contended tag case scalability improvement from me.

   - Various little fixes and cleanups from Arnd, Bart, Corentin,
     Douglas, Eryu, Goldwyn, and myself"

* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
  block: remove smart1,2.h
  nvme: add tracepoint for nvme_complete_rq
  nvme: add tracepoint for nvme_setup_cmd
  nvme-pci: introduce RECONNECTING state to mark initializing procedure
  nvme-rdma: remove redundant boolean for inline_data
  nvme: don't free uuid pointer before printing it
  nvme-pci: Suspend queues after deleting them
  bsg: use pr_debug instead of hand crafted macros
  blk-mq-debugfs: don't allow write on attributes with seq_operations set
  nvme-pci: Fix queue double allocations
  block: Set BIO_TRACE_COMPLETION on new bio during split
  blk-throttle: use queue_is_rq_based
  block: Remove kblockd_schedule_delayed_work{,_on}()
  blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
  blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
  lib/scatterlist: Fix chaining support in sgl_alloc_order()
  blk-throttle: track read and write request individually
  block: add bdev_read_only() checks to common helpers
  block: fail op_is_write() requests to read-only partitions
  blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
  ...
2018-01-29 11:51:49 -08:00
Mike Snitzer
d5ffebdd79 dm: backfill missing calls to mutex_destroy()
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17 09:16:15 -05:00
Wei Yongjun
3cc2e57c4b dm crypt: fix error return code in crypt_ctr()
Fix to return error code -ENOMEM from the mempool_create_kmalloc_pool()
error handling case instead of 0, as done elsewhere in this function.

Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17 09:10:55 -05:00
Ondrej Kozina
dc94902bde dm crypt: wipe kernel key copy after IV initialization
Loading key via kernel keyring service erases the internal
key copy immediately after we pass it in crypto layer. This is
wrong because IV is initialized later and we use wrong key
for the initialization (instead of real key there's just zeroed
block).

The bug may cause data corruption if key is loaded via kernel keyring
service first and later same crypt device is reactivated using exactly
same key in hexbyte representation, or vice versa. The bug (and fix)
affects only ciphers using following IVs: essiv, lmk and tcw.

Fixes: c538f6ec9f ("dm crypt: add ability to use keys from the kernel key retention service")
Cc: stable@vger.kernel.org # 4.10+
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17 09:10:48 -05:00
Milan Broz
27c7003697 dm crypt: fix crash by adding missing check for auth key size
If dm-crypt uses authenticated mode with separate MAC, there are two
concatenated part of the key structure - key(s) for encryption and
authentication key.

Add a missing check for authenticated key length.  If this key length is
smaller than actually provided key, dm-crypt now properly fails instead
of crashing.

Fixes: ef43aa3806 ("dm crypt: add cryptographic data integrity protection (authenticated encryption)")
Cc: stable@vger.kernel.org # 4.12+
Reported-by: Salah Coronya <salahx@yahoo.com>
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2018-01-17 09:08:41 -05:00
Ming Lei
92681eca61 dm-crypt: don't clear bvec->bv_page in crypt_free_buffer_pages()
The bio is always freed after running crypt_free_buffer_pages(), so it
isn't necessary to clear bv->bv_page.

Cc: Mike Snitzer <snitzer@redhat.com>
Cc:dm-devel@redhat.com
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06 09:18:00 -07:00
NeilBrown
80cd175783 dm crypt: remove BIOSET_NEED_RESCUER flag
The BIOSET_NEED_RESCUER flag is only needed when a make_request_fn might
do two allocations from the one bioset, and the second one could block
until the first bio completes.

dm-crypt does allocate from this bioset inside the dm make_request_fn,
but does so using GFP_NOWAIT so that the allocation will not block.

So BIOSET_NEED_RESCUER is not needed.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-12-13 12:15:55 -05:00
Mikulas Patocka
0440d5c0ca dm crypt: allow unaligned bv_offset
When slub_debug is enabled kmalloc returns unaligned memory. XFS uses
this unaligned memory for its buffers (if an unaligned buffer crosses a
page, XFS frees it and allocates a full page instead - see the function
xfs_buf_allocate_memory).

dm-crypt checks if bv_offset is aligned on page size and these checks
fail with slub_debug and XFS.

Fix this bug by removing the bv_offset checks. Switch to checking if
bv_len is aligned instead of bv_offset (this check should be sufficient
to prevent overruns if a bio with too small bv_len is received).

Fixes: 8f0009a225 ("dm crypt: optionally support larger encryption sector size")
Cc: stable@vger.kernel.org # v4.12+
Reported-by: Bruno Prémont <bonbons@sysophe.eu>
Tested-by: Bruno Prémont <bonbons@sysophe.eu>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Reviewed-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-11-10 15:44:58 -05:00
Milan Broz
783874b050 dm crypt: reject sector_size feature if device length is not aligned to it
If a crypt mapping uses optional sector_size feature, additional
restrictions to mapped device segment size must be applied in
constructor, otherwise the device activation will fail later.

Fixes: 8f0009a225 ("dm crypt: optionally support larger encryption sector size")
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-10-04 10:49:58 -04:00
Jeffy Chen
bd86e32059 dm crypt: fix memory leak in crypt_ctr_cipher_old()
Fix memory leak of cipher_api.

Fixes: 33d2f09fcb (dm crypt: introduce new format of cipher with "capi:" prefix)
Cc: stable@vger.kernel.org # 4.12+
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-09-28 12:19:02 -04:00
Linus Torvalds
dff4d1f6fe - Some request-based DM core and DM multipath fixes and cleanups
- Constify a few variables in DM core and DM integrity
 
 - Add bufio optimization and checksum failure accounting to DM integrity
 
 - Fix DM integrity to avoid checking integrity of failed reads
 
 - Fix DM integrity to use init_completion
 
 - A couple DM log-writes target fixes
 
 - Simplify DAX flushing by eliminating the unnecessary flush abstraction
   that was stood up for DM's use.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZuo8UAAoJEMUj8QotnQNa5BEIANO4mHh1nrzEbH72a4RCLgxV
 H1Pk1zZx/W1bhOOmcRRhxCSM85dPgsCegc5EmpwLZEMavQrP9UZblHcYOUsyIx7W
 S/lWa+soOq/5N2OveROc4WdoWVs50UFmc1+BcClc4YrEe+15XC3R0VMkjX2b/hUL
 o2eYhPjpMlgaorMtRRU6MAooo2fBRQ9m05aPeVgd35fxibrE7PZm+EYW09wa0STi
 9ufuDXJf8+TtFP/38BD41LbUEskuHUZTSDeAJ+3DBaTtfEZcZYxsst4P9JangsHx
 jqqqI9aYzFD2a27fl9WLhCvm40YFiKp5nwzED0RZjzWxVa/jTShX7a49BdzTTfw=
 =rkSB
 -----END PGP SIGNATURE-----

Merge tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Some request-based DM core and DM multipath fixes and cleanups

 - Constify a few variables in DM core and DM integrity

 - Add bufio optimization and checksum failure accounting to DM
   integrity

 - Fix DM integrity to avoid checking integrity of failed reads

 - Fix DM integrity to use init_completion

 - A couple DM log-writes target fixes

 - Simplify DAX flushing by eliminating the unnecessary flush
   abstraction that was stood up for DM's use.

* tag 'for-4.14/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dax: remove the pmem_dax_ops->flush abstraction
  dm integrity: use init_completion instead of COMPLETION_INITIALIZER_ONSTACK
  dm integrity: make blk_integrity_profile structure const
  dm integrity: do not check integrity for failed read operations
  dm log writes: fix >512b sectorsize support
  dm log writes: don't use all the cpu while waiting to log blocks
  dm ioctl: constify ioctl lookup table
  dm: constify argument arrays
  dm integrity: count and display checksum failures
  dm integrity: optimize writing dm-bufio buffers that are partially changed
  dm rq: do not update rq partially in each ending bio
  dm rq: make dm-sq requeuing behavior consistent with dm-mq behavior
  dm mpath: complain about unsupported __multipath_map_bio() return values
  dm mpath: avoid that building with W=1 causes gcc 7 to complain about fall-through
2017-09-14 13:43:16 -07:00
Linus Torvalds
a0725ab0c7 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the first pull request for 4.14, containing most of the code
  changes. It's a quiet series this round, which I think we needed after
  the churn of the last few series. This contains:

   - Fix for a registration race in loop, from Anton Volkov.

   - Overflow complaint fix from Arnd for DAC960.

   - Series of drbd changes from the usual suspects.

   - Conversion of the stec/skd driver to blk-mq. From Bart.

   - A few BFQ improvements/fixes from Paolo.

   - CFQ improvement from Ritesh, allowing idling for group idle.

   - A few fixes found by Dan's smatch, courtesy of Dan.

   - A warning fixup for a race between changing the IO scheduler and
     device remova. From David Jeffery.

   - A few nbd fixes from Josef.

   - Support for cgroup info in blktrace, from Shaohua.

   - Also from Shaohua, new features in the null_blk driver to allow it
     to actually hold data, among other things.

   - Various corner cases and error handling fixes from Weiping Zhang.

   - Improvements to the IO stats tracking for blk-mq from me. Can
     drastically improve performance for fast devices and/or big
     machines.

   - Series from Christoph removing bi_bdev as being needed for IO
     submission, in preparation for nvme multipathing code.

   - Series from Bart, including various cleanups and fixes for switch
     fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
  kernfs: checking for IS_ERR() instead of NULL
  drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
  drbd: Fix allyesconfig build, fix recent commit
  drbd: switch from kmalloc() to kmalloc_array()
  drbd: abort drbd_start_resync if there is no connection
  drbd: move global variables to drbd namespace and make some static
  drbd: rename "usermode_helper" to "drbd_usermode_helper"
  drbd: fix race between handshake and admin disconnect/down
  drbd: fix potential deadlock when trying to detach during handshake
  drbd: A single dot should be put into a sequence.
  drbd: fix rmmod cleanup, remove _all_ debugfs entries
  drbd: Use setup_timer() instead of init_timer() to simplify the code.
  drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  drbd: new disk-option disable-write-same
  drbd: Fix resource role for newly created resources in events2
  drbd: mark symbols static where possible
  drbd: Send P_NEG_ACK upon write error in protocol != C
  drbd: add explicit plugging when submitting batches
  drbd: change list_for_each_safe to while(list_first_entry_or_null)
  drbd: introduce drbd_recv_header_maybe_unplug
  ...
2017-09-07 11:59:42 -07:00
Eric Biggers
5916a22b83 dm: constify argument arrays
The arrays of 'struct dm_arg' are never modified by the device-mapper
core, so constify them so that they are placed in .rodata.

(Exception: the args array in dm-raid cannot be constified because it is
allocated on the stack and modified.)

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-08-28 11:47:18 -04:00
Christoph Hellwig
74d46992e0 block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-23 12:49:55 -06:00
Christoph Hellwig
62d20aa6a0 dm-crypt: don't mess with BIP_BLOCK_INTEGRITY
This flag is never set right after calling bio_integrity_alloc,
so don't clear it and confuse the reader.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-09 10:00:31 -06:00
Ard Biesheuvel
45fe93dff2 crypto: algapi - make crypto_xor() take separate dst and src arguments
There are quite a number of occurrences in the kernel of the pattern

  if (dst != src)
          memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
  crypto_xor(dst, final, walk.total % AES_BLOCK_SIZE);

or

  crypto_xor(keystream, src, nbytes);
  memcpy(dst, keystream, nbytes);

where crypto_xor() is preceded or followed by a memcpy() invocation
that is only there because crypto_xor() uses its output parameter as
one of the inputs. To avoid having to add new instances of this pattern
in the arm64 code, which will be refactored to implement non-SIMD
fallbacks, add an alternative implementation called crypto_xor_cpy(),
taking separate input and output arguments. This removes the need for
the separate memcpy().

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2017-08-04 09:27:15 +08:00
Milan Broz
7e3fd855ad dm crypt: add big-endian variant of plain64 IV
The big-endian IV (plain64be) is needed to map images from extracted
disks that are used in some external (on-chip FDE) disk encryption
drives, e.g.: data recovery from external USB/SATA drives that support
"internal" encryption.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-06-19 11:03:50 -04:00
NeilBrown
47e0fb461f blk: make the bioset rescue_workqueue optional.
This patch converts bioset_create() to not create a workqueue by
default, so alloctions will never trigger punt_bios_to_rescuer().  It
also introduces a new flag BIOSET_NEED_RESCUER which tells
bioset_create() to preserve the old behavior.

All callers of bioset_create() that are inside block device drivers,
are given the BIOSET_NEED_RESCUER flag.

biosets used by filesystems or other top-level users do not
need rescuing as the bio can never be queued behind other
bios.  This includes fs_bio_set, blkdev_dio_pool,
btrfs_bioset, xfs_ioend_bioset, and one allocated by
target_core_iblock.c.

biosets used by md/raid do not need rescuing as
their usage was recently audited and revised to never
risk deadlock.

It is hoped that most, if not all, of the remaining biosets
can end up being the non-rescued version.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Credit-to: Ming Lei <ming.lei@redhat.com> (minor fixes)
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 12:40:59 -06:00
NeilBrown
011067b056 blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().

To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().

Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().

Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 12:40:59 -06:00
Christoph Hellwig
4e4cbee93d block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Christoph Hellwig
846785e6a5 dm: don't return errnos from ->map
Instead use the special DM_MAPIO_KILL return value to return -EIO just
like we do for the request based path.  Note that dm-log-writes returned
-ENOMEM in a few places, which now becomes -EIO instead.  No consumer
treats -ENOMEM special so this shouldn't be an issue (and it should
use a mempool to start with to make guaranteed progress).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Linus Torvalds
d35a878ae1 - A major update for DM cache that reduces the latency for deciding
whether blocks should migrate to/from the cache.  The bio-prison-v2
   interface supports this improvement by enabling direct dispatch of
   work to workqueues rather than having to delay the actual work
   dispatch to the DM cache core.  So the dm-cache policies are much more
   nimble by being able to drive IO as they see fit.  One immediate
   benefit from the improved latency is a cache that should be much more
   adaptive to changing workloads.
 
 - Add a new DM integrity target that emulates a block device that has
   additional per-sector tags that can be used for storing integrity
   information.
 
 - Add a new authenticated encryption feature to the DM crypt target that
   builds on the capabilities provided by the DM integrity target.
 
 - Add MD interface for switching the raid4/5/6 journal mode and update
   the DM raid target to use it to enable aid4/5/6 journal write-back
   support.
 
 - Switch the DM verity target over to using the asynchronous hash crypto
   API (this helps work better with architectures that have access to
   off-CPU algorithm providers, which should reduce CPU utilization).
 
 - Various request-based DM and DM multipath fixes and improvements from
   Bart and Christoph.
 
 - A DM thinp target fix for a bio structure leak that occurs for each
   discard IFF discard passdown is enabled.
 
 - A fix for a possible deadlock in DM bufio and a fix to re-check the
   new buffer allocation watermark in the face of competing admin changes
   to the 'max_cache_size_bytes' tunable.
 
 - A couple DM core cleanups.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJZB6vtAAoJEMUj8QotnQNaoicIALuZTLElgAzxzA28cfk1+1Ea
 Gd09CfJ3M6cvk/YGUU7WwiSYIwu16yOJALG4sLcYnEmUCzvKfFPcl/RpeSJHPpYM
 0aVXa6NIJw7K2r3C17toiK2DRMHYw6QU843WeWI93vBW13lDJklNJL9fM7GBEOLH
 NMSNw2mAq9ajtLlnJhM3ZfhloA7/u/jektvlBO1AA3RQ5Kx1cXVXFPqN7FdRfcqp
 4RuEMe9faAadlXLsj3bia5IBmF/W0Qza6JilP+NLKLWB4fm7LZDjN/k+TsHWMa9e
 cGR73TgUGLMBJX+sDJy8R3oeBG9JZkFVkD7I30eCjzyhSOs/54XNYQ23EkqHJU0=
 =9Ryi
 -----END PGP SIGNATURE-----

Merge tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - A major update for DM cache that reduces the latency for deciding
   whether blocks should migrate to/from the cache. The bio-prison-v2
   interface supports this improvement by enabling direct dispatch of
   work to workqueues rather than having to delay the actual work
   dispatch to the DM cache core. So the dm-cache policies are much more
   nimble by being able to drive IO as they see fit. One immediate
   benefit from the improved latency is a cache that should be much more
   adaptive to changing workloads.

 - Add a new DM integrity target that emulates a block device that has
   additional per-sector tags that can be used for storing integrity
   information.

 - Add a new authenticated encryption feature to the DM crypt target
   that builds on the capabilities provided by the DM integrity target.

 - Add MD interface for switching the raid4/5/6 journal mode and update
   the DM raid target to use it to enable aid4/5/6 journal write-back
   support.

 - Switch the DM verity target over to using the asynchronous hash
   crypto API (this helps work better with architectures that have
   access to off-CPU algorithm providers, which should reduce CPU
   utilization).

 - Various request-based DM and DM multipath fixes and improvements from
   Bart and Christoph.

 - A DM thinp target fix for a bio structure leak that occurs for each
   discard IFF discard passdown is enabled.

 - A fix for a possible deadlock in DM bufio and a fix to re-check the
   new buffer allocation watermark in the face of competing admin
   changes to the 'max_cache_size_bytes' tunable.

 - A couple DM core cleanups.

* tag 'for-4.12/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (50 commits)
  dm bufio: check new buffer allocation watermark every 30 seconds
  dm bufio: avoid a possible ABBA deadlock
  dm mpath: make it easier to detect unintended I/O request flushes
  dm mpath: cleanup QUEUE_IF_NO_PATH bit manipulation by introducing assign_bit()
  dm mpath: micro-optimize the hot path relative to MPATHF_QUEUE_IF_NO_PATH
  dm: introduce enum dm_queue_mode to cleanup related code
  dm mpath: verify __pg_init_all_paths locking assumptions at runtime
  dm: verify suspend_locking assumptions at runtime
  dm block manager: remove an unused argument from dm_block_manager_create()
  dm rq: check blk_mq_register_dev() return value in dm_mq_init_request_queue()
  dm mpath: delay requeuing while path initialization is in progress
  dm mpath: avoid that path removal can trigger an infinite loop
  dm mpath: split and rename activate_path() to prepare for its expanded use
  dm ioctl: prevent stack leak in dm ioctl call
  dm integrity: use previously calculated log2 of sectors_per_block
  dm integrity: use hex2bin instead of open-coded variant
  dm crypt: replace custom implementation of hex2bin()
  dm crypt: remove obsolete references to per-CPU state
  dm verity: switch to using asynchronous hash crypto API
  dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
  ...
2017-05-03 10:31:20 -07:00
Andy Shevchenko
e944e03e33 dm crypt: replace custom implementation of hex2bin()
There is no need to have a duplication of the generic library, i.e. hex2bin().
Replace the open coded variant.

Signed-off-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-27 12:08:31 -04:00
Eric Biggers
86f917adea dm crypt: remove obsolete references to per-CPU state
dm-crypt used to use separate crypto transforms for each CPU, but this
is no longer the case.  To avoid confusion, fix up obsolete comments and
rename setup_essiv_cpu().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-25 16:12:03 -04:00
Tim Murray
a1b89132dc dm crypt: use WQ_HIGHPRI for the IO and crypt workqueues
Running dm-crypt with workqueues at the standard priority results in IO
competing for CPU time with standard user apps, which can lead to
pipeline bubbles and seriously degraded performance.  Move to using
WQ_HIGHPRI workqueues to protect against that.

Signed-off-by: Tim Murray <timmurray@google.com>
Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24 15:32:07 -04:00
Ondrej Kozina
c82feeec9a dm crypt: rewrite (wipe) key in crypto layer using random data
The message "key wipe" used to wipe real key stored in crypto layer by
rewriting it with zeroes.  Since commit 28856a9 ("crypto: xts -
consolidate sanity check for keys") this no longer works in FIPS mode
for XTS.

While running in FIPS mode the crypto key part has to differ from the
tweak key.

Fixes: 28856a9 ("crypto: xts - consolidate sanity check for keys")
Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24 15:16:03 -04:00
Mikulas Patocka
583fe7474c dm crypt: fix large block integrity support
Previously, dm-crypt could use blocks composed of multiple 512b sectors
but it created integrity profile for each 512b sector (it padded it with
zeroes).  Fix dm-crypt so that the integrity profile is sent for each
block not each sector.

The user must use the same block size in the DM crypt and integrity
targets.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-04-24 12:04:34 -04:00
Christoph Hellwig
48920ff2a5 block: remove the discard_zeroes_data flag
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Mikulas Patocka
ff3af92b44 dm crypt: use shifts instead of sector_div
sector_div is very slow, so we introduce a variable sector_shift and
use shift instead of sector_div.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-03-24 15:54:24 -04:00
Milan Broz
8f0009a225 dm crypt: optionally support larger encryption sector size
Add  optional "sector_size"  parameter that specifies encryption sector
size (atomic unit of block device encryption).

Parameter can be in range 512 - 4096 bytes and must be power of two.
For compatibility reasons, the maximal IO must fit into the page limit,
so the limit is set to the minimal page size possible (4096 bytes).

NOTE: this device cannot yet be handled by cryptsetup if this parameter
is set.

IV for the sector is calculated from the 512 bytes sector offset unless
the iv_large_sectors option is used.

Test script using dmsetup:

  DEV="/dev/sdb"
  DEV_SIZE=$(blockdev --getsz $DEV)
  KEY="9c1185a5c5e9fc54612808977ee8f548b2258d31ddadef707ba62c166051b9e3cd0294c27515f2bccee924e8823ca6e124b8fc3167ed478bca702babe4e130ac"
  BLOCK_SIZE=4096

  # dmsetup create test_crypt --table "0 $DEV_SIZE crypt aes-xts-plain64 $KEY 0 $DEV 0 1 sector_size:$BLOCK_SIZE"
  # dmsetup table --showkeys test_crypt

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-03-24 15:54:21 -04:00
Milan Broz
33d2f09fcb dm crypt: introduce new format of cipher with "capi:" prefix
For the new authenticated encryption we have to support generic composed
modes (combination of encryption algorithm and authenticator) because
this is how the kernel crypto API accesses such algorithms.

To simplify the interface, we accept an algorithm directly in crypto API
format.  The new format is recognised by the "capi:" prefix.  The
dmcrypt internal IV specification is the same as for the old format.

The crypto API cipher specifications format is:
     capi:cipher_api_spec-ivmode[:ivopts]
Examples:
     capi:cbc(aes)-essiv:sha256 (equivalent to old aes-cbc-essiv:sha256)
     capi:xts(aes)-plain64      (equivalent to old aes-xts-plain64)
Examples of authenticated modes:
     capi:gcm(aes)-random
     capi:authenc(hmac(sha256),xts(aes))-random
     capi:rfc7539(chacha20,poly1305)-random

Authenticated modes can only be configured using the new cipher format.
Note that this format allows user to specify arbitrary combinations that
can be insecure. (Policy decision is done in cryptsetup userspace.)

Authenticated encryption algorithms can be of two types, either native
modes (like GCM) that performs both encryption and authentication
internally, or composed modes where user can compose AEAD with separate
specification of encryption algorithm and authenticator.

For composed mode with HMAC (length-preserving encryption mode like an
XTS and HMAC as an authenticator) we have to calculate HMAC digest size
(the separate authentication key is the same size as the HMAC digest).
Introduce crypt_ctr_auth_cipher() to parse the crypto API string to get
HMAC algorithm and retrieve digest size from it.

Also, for HMAC composed mode we need to parse the crypto API string to
get the cipher mode nested in the specification.  For native AEAD mode
(like GCM), we can use crypto_tfm_alg_name() API to get the cipher
specification.

Because the HMAC composed mode is not processed the same as the native
AEAD mode, the CRYPT_MODE_INTEGRITY_HMAC flag is no longer needed and
"hmac" specification for the table integrity argument is removed.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-03-24 15:54:20 -04:00
Milan Broz
e889f97a3e dm crypt: factor IV constructor out to separate function
No functional change.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-03-24 15:54:19 -04:00
Milan Broz
ef43aa3806 dm crypt: add cryptographic data integrity protection (authenticated encryption)
Allow the use of per-sector metadata, provided by the dm-integrity
module, for integrity protection and persistently stored per-sector
Initialization Vector (IV).  The underlying device must support the
"DM-DIF-EXT-TAG" dm-integrity profile.

The per-bio integrity metadata is allocated by dm-crypt for every bio.

Example of low-level mapping table for various types of use:
 DEV=/dev/sdb
 SIZE=417792

 # Additional HMAC with CBC-ESSIV, key is concatenated encryption key + HMAC key
 SIZE_INT=389952
 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 32 J 0"
 dmsetup create y --table "0 $SIZE_INT crypt aes-cbc-essiv:sha256 \
 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
 00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
 0 /dev/mapper/x 0 1 integrity:32:hmac(sha256)"

 # AEAD (Authenticated Encryption with Additional Data) - GCM with random IVs
 # GCM in kernel uses 96bits IV and we store 128bits auth tag (so 28 bytes metadata space)
 SIZE_INT=393024
 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 28 J 0"
 dmsetup create y --table "0 $SIZE_INT crypt aes-gcm-random \
 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
 0 /dev/mapper/x 0 1 integrity:28:aead"

 # Random IV only for XTS mode (no integrity protection but provides atomic random sector change)
 SIZE_INT=401272
 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 16 J 0"
 dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
 0 /dev/mapper/x 0 1 integrity:16:none"

 # Random IV with XTS + HMAC integrity protection
 SIZE_INT=377656
 dmsetup create x --table "0 $SIZE_INT integrity $DEV 0 48 J 0"
 dmsetup create y --table "0 $SIZE_INT crypt aes-xts-random \
 11ff33c6fb942655efb3e30cf4c0fd95f5ef483afca72166c530ae26151dd83b \
 00112233445566778899aabbccddeeff00112233445566778899aabbccddeeff \
 0 /dev/mapper/x 0 1 integrity:48:hmac(sha256)"

Both AEAD and HMAC protection authenticates not only data but also
sector metadata.

HMAC protection is implemented through autenc wrapper (so it is
processed the same way as an authenticated mode).

In HMAC mode there are two keys (concatenated in dm-crypt mapping
table).  First is the encryption key and the second is the key for
authentication (HMAC).  (It is userspace decision if these keys are
independent or somehow derived.)

The sector request for AEAD/HMAC authenticated encryption looks like this:
 |----- AAD -------|------ DATA -------|-- AUTH TAG --|
 | (authenticated) | (auth+encryption) |              |
 | sector_LE |  IV |  sector in/out    |  tag in/out  |

For writes, the integrity fields are calculated during AEAD encryption
of every sector and stored in bio integrity fields and sent to
underlying dm-integrity target for storage.

For reads, the integrity metadata is verified during AEAD decryption of
every sector (they are filled in by dm-integrity, but the integrity
fields are pre-allocated in dm-crypt).

There is also an experimental support in cryptsetup utility for more
friendly configuration (part of LUKS2 format).

Because the integrity fields are not valid on initial creation, the
device must be "formatted".  This can be done by direct-io writes to the
device (e.g. dd in direct-io mode).  For now, there is available trivial
tool to do this, see: https://github.com/mbroz/dm_int_tools

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
Signed-off-by: Vashek Matyas <matyas@fi.muni.cz>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-03-24 15:49:41 -04:00
David Howells
0837e49ab3 KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload()
rcu_dereference_key() and user_key_payload() are currently being used in
two different, incompatible ways:

 (1) As a wrapper to rcu_dereference() - when only the RCU read lock used
     to protect the key.

 (2) As a wrapper to rcu_dereference_protected() - when the key semaphor is
     used to protect the key and the may be being modified.

Fix this by splitting both of the key wrappers to produce:

 (1) RCU accessors for keys when caller has the key semaphore locked:

	dereference_key_locked()
	user_key_payload_locked()

 (2) RCU accessors for keys when caller holds the RCU read lock:

	dereference_key_rcu()
	user_key_payload_rcu()

This should fix following warning in the NFS idmapper

  ===============================
  [ INFO: suspicious RCU usage. ]
  4.10.0 #1 Tainted: G        W
  -------------------------------
  ./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
  other info that might help us debug this:
  rcu_scheduler_active = 2, debug_locks = 0
  1 lock held by mount.nfs/5987:
    #0:  (rcu_read_lock){......}, at: [<d000000002527abc>] nfs_idmap_get_key+0x15c/0x420 [nfsv4]
  stack backtrace:
  CPU: 1 PID: 5987 Comm: mount.nfs Tainted: G        W       4.10.0 #1
  Call Trace:
    dump_stack+0xe8/0x154 (unreliable)
    lockdep_rcu_suspicious+0x140/0x190
    nfs_idmap_get_key+0x380/0x420 [nfsv4]
    nfs_map_name_to_uid+0x2a0/0x3b0 [nfsv4]
    decode_getfattr_attrs+0xfac/0x16b0 [nfsv4]
    decode_getfattr_generic.constprop.106+0xbc/0x150 [nfsv4]
    nfs4_xdr_dec_lookup_root+0xac/0xb0 [nfsv4]
    rpcauth_unwrap_resp+0xe8/0x140 [sunrpc]
    call_decode+0x29c/0x910 [sunrpc]
    __rpc_execute+0x140/0x8f0 [sunrpc]
    rpc_run_task+0x170/0x200 [sunrpc]
    nfs4_call_sync_sequence+0x68/0xa0 [nfsv4]
    _nfs4_lookup_root.isra.44+0xd0/0xf0 [nfsv4]
    nfs4_lookup_root+0xe0/0x350 [nfsv4]
    nfs4_lookup_root_sec+0x70/0xa0 [nfsv4]
    nfs4_find_root_sec+0xc4/0x100 [nfsv4]
    nfs4_proc_get_rootfh+0x5c/0xf0 [nfsv4]
    nfs4_get_rootfh+0x6c/0x190 [nfsv4]
    nfs4_server_common_setup+0xc4/0x260 [nfsv4]
    nfs4_create_server+0x278/0x3c0 [nfsv4]
    nfs4_remote_mount+0x50/0xb0 [nfsv4]
    mount_fs+0x74/0x210
    vfs_kern_mount+0x78/0x220
    nfs_do_root_mount+0xb0/0x140 [nfsv4]
    nfs4_try_mount+0x60/0x100 [nfsv4]
    nfs_fs_mount+0x5ec/0xda0 [nfs]
    mount_fs+0x74/0x210
    vfs_kern_mount+0x78/0x220
    do_mount+0x254/0xf70
    SyS_mount+0x94/0x100
    system_call+0x38/0xe0

Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
2017-03-02 10:09:00 +11:00
Linus Torvalds
42e1b14b6e Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - Implement wraparound-safe refcount_t and kref_t types based on
     generic atomic primitives (Peter Zijlstra)

   - Improve and fix the ww_mutex code (Nicolai Hähnle)

   - Add self-tests to the ww_mutex code (Chris Wilson)

   - Optimize percpu-rwsems with the 'rcuwait' mechanism (Davidlohr
     Bueso)

   - Micro-optimize the current-task logic all around the core kernel
     (Davidlohr Bueso)

   - Tidy up after recent optimizations: remove stale code and APIs,
     clean up the code (Waiman Long)

   - ... plus misc fixes, updates and cleanups"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
  fork: Fix task_struct alignment
  locking/spinlock/debug: Remove spinlock lockup detection code
  lockdep: Fix incorrect condition to print bug msgs for MAX_LOCKDEP_CHAIN_HLOCKS
  lkdtm: Convert to refcount_t testing
  kref: Implement 'struct kref' using refcount_t
  refcount_t: Introduce a special purpose refcount type
  sched/wake_q: Clarify queue reinit comment
  sched/wait, rcuwait: Fix typo in comment
  locking/mutex: Fix lockdep_assert_held() fail
  locking/rtmutex: Flip unlikely() branch to likely() in __rt_mutex_slowlock()
  locking/rwsem: Reinit wake_q after use
  locking/rwsem: Remove unnecessary atomic_long_t casts
  jump_labels: Move header guard #endif down where it belongs
  locking/atomic, kref: Implement kref_put_lock()
  locking/ww_mutex: Turn off __must_check for now
  locking/atomic, kref: Avoid more abuse
  locking/atomic, kref: Use kref_get_unless_zero() more
  locking/atomic, kref: Kill kref_sub()
  locking/atomic, kref: Add kref_read()
  locking/atomic, kref: Add KREF_INIT()
  ...
2017-02-20 13:23:30 -08:00
Ondrej Kozina
f5b0cba8f2 dm crypt: replace RCU read-side section with rwsem
The lockdep splat below hints at a bug in RCU usage in dm-crypt that
was introduced with commit c538f6ec9f ("dm crypt: add ability to use
keys from the kernel key retention service").  The kernel keyring
function user_key_payload() is in fact a wrapper for
rcu_dereference_protected() which must not be called with only
rcu_read_lock() section mark.

Unfortunately the kernel keyring subsystem doesn't currently provide
an interface that allows the use of an RCU read-side section.  So for
now we must drop RCU in favour of rwsem until a proper function is
made available in the kernel keyring subsystem.

===============================
[ INFO: suspicious RCU usage. ]
4.10.0-rc5 #2 Not tainted
-------------------------------
./include/keys/user-type.h:53 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
2 locks held by cryptsetup/6464:
 #0:  (&md->type_lock){+.+.+.}, at: [<ffffffffa02472a2>] dm_lock_md_type+0x12/0x20 [dm_mod]
 #1:  (rcu_read_lock){......}, at: [<ffffffffa02822f8>] crypt_set_key+0x1d8/0x4b0 [dm_crypt]
stack backtrace:
CPU: 1 PID: 6464 Comm: cryptsetup Not tainted 4.10.0-rc5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.1-1.fc24 04/01/2014
Call Trace:
 dump_stack+0x67/0x92
 lockdep_rcu_suspicious+0xc5/0x100
 crypt_set_key+0x351/0x4b0 [dm_crypt]
 ? crypt_set_key+0x1d8/0x4b0 [dm_crypt]
 crypt_ctr+0x341/0xa53 [dm_crypt]
 dm_table_add_target+0x147/0x330 [dm_mod]
 table_load+0x111/0x350 [dm_mod]
 ? retrieve_status+0x1c0/0x1c0 [dm_mod]
 ctl_ioctl+0x1f5/0x510 [dm_mod]
 dm_ctl_ioctl+0xe/0x20 [dm_mod]
 do_vfs_ioctl+0x8e/0x690
 ? ____fput+0x9/0x10
 ? task_work_run+0x7e/0xa0
 ? trace_hardirqs_on_caller+0x122/0x1b0
 SyS_ioctl+0x3c/0x70
 entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7f392c9a4ec7
RSP: 002b:00007ffef6383378 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00007ffef63830a0 RCX: 00007f392c9a4ec7
RDX: 000000000124fcc0 RSI: 00000000c138fd09 RDI: 0000000000000005
RBP: 00007ffef6383090 R08: 00000000ffffffff R09: 00000000012482b0
R10: 2a28205d34383336 R11: 0000000000000246 R12: 00007f392d803a08
R13: 00007ffef63831e0 R14: 0000000000000000 R15: 00007f392d803a0b

Fixes: c538f6ec9f ("dm crypt: add ability to use keys from the kernel key retention service")
Reported-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2017-02-03 10:26:14 -05:00
Davidlohr Bueso
642fa448ae sched/core: Remove set_task_state()
This is a nasty interface and setting the state of a foreign task must
not be done. As of the following commit:

  be628be095 ("bcache: Make gc wakeup sane, remove set_task_state()")

... everyone in the kernel calls set_task_state() with current, allowing
the helper to be removed.

However, as the comment indicates, it is still around for those archs
where computing current is more expensive than using a pointer, at least
in theory. An important arch that is affected is arm64, however this has
been addressed now [1] and performance is up to par making no difference
with either calls.

Of all the callers, if any, it's the locking bits that would care most
about this -- ie: we end up passing a tsk pointer to a lot of the lock
slowpath, and setting ->state on that. The following numbers are based
on two tests: a custom ad-hoc microbenchmark that just measures
latencies (for ~65 million calls) between get_task_state() vs
get_current_state().

Secondly for a higher overview, an unlink microbenchmark was used,
which pounds on a single file with open, close,unlink combos with
increasing thread counts (up to 4x ncpus). While the workload is quite
unrealistic, it does contend a lot on the inode mutex or now rwsem.

[1] https://lkml.kernel.org/r/1483468021-8237-1-git-send-email-mark.rutland@arm.com

== 1. x86-64 ==

Avg runtime set_task_state():    601 msecs
Avg runtime set_current_state(): 552 msecs

                                            vanilla                 dirty
Hmean    unlink1-processes-2      36089.26 (  0.00%)    38977.33 (  8.00%)
Hmean    unlink1-processes-5      28555.01 (  0.00%)    29832.55 (  4.28%)
Hmean    unlink1-processes-8      37323.75 (  0.00%)    44974.57 ( 20.50%)
Hmean    unlink1-processes-12     43571.88 (  0.00%)    44283.01 (  1.63%)
Hmean    unlink1-processes-21     34431.52 (  0.00%)    38284.45 ( 11.19%)
Hmean    unlink1-processes-30     34813.26 (  0.00%)    37975.17 (  9.08%)
Hmean    unlink1-processes-48     37048.90 (  0.00%)    39862.78 (  7.59%)
Hmean    unlink1-processes-79     35630.01 (  0.00%)    36855.30 (  3.44%)
Hmean    unlink1-processes-110    36115.85 (  0.00%)    39843.91 ( 10.32%)
Hmean    unlink1-processes-141    32546.96 (  0.00%)    35418.52 (  8.82%)
Hmean    unlink1-processes-172    34674.79 (  0.00%)    36899.21 (  6.42%)
Hmean    unlink1-processes-203    37303.11 (  0.00%)    36393.04 ( -2.44%)
Hmean    unlink1-processes-224    35712.13 (  0.00%)    36685.96 (  2.73%)

== 2. ppc64le ==

Avg runtime set_task_state():  938 msecs
Avg runtime set_current_state: 940 msecs

                                            vanilla                 dirty
Hmean    unlink1-processes-2      19269.19 (  0.00%)    30704.50 ( 59.35%)
Hmean    unlink1-processes-5      20106.15 (  0.00%)    21804.15 (  8.45%)
Hmean    unlink1-processes-8      17496.97 (  0.00%)    17243.28 ( -1.45%)
Hmean    unlink1-processes-12     14224.15 (  0.00%)    17240.21 ( 21.20%)
Hmean    unlink1-processes-21     14155.66 (  0.00%)    15681.23 ( 10.78%)
Hmean    unlink1-processes-30     14450.70 (  0.00%)    15995.83 ( 10.69%)
Hmean    unlink1-processes-48     16945.57 (  0.00%)    16370.42 ( -3.39%)
Hmean    unlink1-processes-79     15788.39 (  0.00%)    14639.27 ( -7.28%)
Hmean    unlink1-processes-110    14268.48 (  0.00%)    14377.40 (  0.76%)
Hmean    unlink1-processes-141    14023.65 (  0.00%)    16271.69 ( 16.03%)
Hmean    unlink1-processes-172    13417.62 (  0.00%)    16067.55 ( 19.75%)
Hmean    unlink1-processes-203    15293.08 (  0.00%)    15440.40 (  0.96%)
Hmean    unlink1-processes-234    13719.32 (  0.00%)    16190.74 ( 18.01%)
Hmean    unlink1-processes-265    16400.97 (  0.00%)    16115.22 ( -1.74%)
Hmean    unlink1-processes-296    14388.60 (  0.00%)    16216.13 ( 12.70%)
Hmean    unlink1-processes-320    15771.85 (  0.00%)    15905.96 (  0.85%)

x86-64 (known to be fast for get_current()/this_cpu_read_stable() caching)
and ppc64 (with paca) show similar improvements in the unlink microbenches.
The small delta for ppc64 (2ms), does not represent the gains on the unlink
runs. In the case of x86, there was a decent amount of variation in the
latency runs, but always within a 20 to 50ms increase), ppc was more constant.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave@stgolabs.net
Cc: mark.rutland@arm.com
Link: http://lkml.kernel.org/r/1483479794-14013-5-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-01-14 11:14:16 +01:00
Linus Torvalds
775a2e29c3 . various fixes and improvements to request-based DM and DM multipath
. some locking improvements in DM bufio
 
 . add Kconfig option to disable the DM block manager's extra locking
   which mainly serves as a developer tool
 
 . a few bug fixes to DM's persistent-data
 
 . a couple changes to prepare for multipage biovec support in the block
   layer
 
 . various improvements and cleanups in the DM core, DM cache, DM raid
   and DM crypt
 
 . add ability to have DM crypt use keys from the kernel key retention
   service
 
 . add a new "error_writes" feature to the DM flakey target, reads are
   left unchanged in this mode
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYUW8zAAoJEMUj8QotnQNaAWEIAMRQ4aCXq5T7F9Hf4K/l6FwO
 FoBr2TPS3Lf0vm/A5Tr819I47hk7q0oroa61ARbpS90iuGt/Au/Sk35cn1BwT0YW
 llMvMGbh+w9ZBUJGkyexdXbyfm5ywPHuthMr4CK/UNASyjDl2QMAeBuUZ6FLSPn1
 RUL/RYv0mG/7EXOPz0PURPb5rpjO15cAU0NjfNS0862UVR8x8dNS6iImOmScsioe
 Flw90qPl3kMBxBHik8xSPJfhtW+lD7xSaOlWzHKtalnUZHRG2BNUtlAMKdiaynx2
 yl9MhSsi8wlgd4h9WmlmaOr0VqkU5UYY9D9TDuuJwXnHUXGenVSJ/aGOohr+bm4=
 =kOoK
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - various fixes and improvements to request-based DM and DM multipath

 - some locking improvements in DM bufio

 - add Kconfig option to disable the DM block manager's extra locking
   which mainly serves as a developer tool

 - a few bug fixes to DM's persistent-data

 - a couple changes to prepare for multipage biovec support in the block
   layer

 - various improvements and cleanups in the DM core, DM cache, DM raid
   and DM crypt

 - add ability to have DM crypt use keys from the kernel key retention
   service

 - add a new "error_writes" feature to the DM flakey target, reads are
   left unchanged in this mode

* tag 'dm-4.10-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (40 commits)
  dm flakey: introduce "error_writes" feature
  dm cache policy smq: use hash_32() instead of hash_32_generic()
  dm crypt: reject key strings containing whitespace chars
  dm space map: always set ev if sm_ll_mutate() succeeds
  dm space map metadata: skip useless memcpy in metadata_ll_init_index()
  dm space map metadata: fix 'struct sm_metadata' leak on failed create
  Documentation: dm raid: define data_offset status field
  dm raid: fix discard support regression
  dm raid: don't allow "write behind" with raid4/5/6
  dm mpath: use hw_handler_params if attached hw_handler is same as requested
  dm crypt: add ability to use keys from the kernel key retention service
  dm array: remove a dead assignment in populate_ablock_with_values()
  dm ioctl: use offsetof() instead of open-coding it
  dm rq: simplify use_blk_mq initialization
  dm: use blk_set_queue_dying() in __dm_destroy()
  dm bufio: drop the lock when doing GFP_NOIO allocation
  dm bufio: don't take the lock in dm_bufio_shrink_count
  dm bufio: avoid sleeping while holding the dm_bufio lock
  dm table: simplify dm_table_determine_type()
  dm table: an 'all_blk_mq' table must be loaded for a blk-mq DM device
  ...
2016-12-14 11:01:00 -08:00
Ondrej Kozina
027c431ccf dm crypt: reject key strings containing whitespace chars
Unfortunately key_string may theoretically contain whitespace even after
it's processed by dm_split_args().  The reason for this is DM core
supports escaping of almost all chars including any whitespace.

If userspace passes a key to the kernel in format ":32:logon:my_prefix:my\ key"
dm-crypt will look up key "my_prefix:my key" in kernel keyring service.
So far everything's fine.

Unfortunately if userspace later calls DM_TABLE_STATUS ioctl, it will not
receive back expected ":32:logon:my_prefix:my\ key" but the unescaped version
instead.  Also userpace (most notably cryptsetup) is not ready to parse
single target argument containing (even escaped) whitespace chars and any
whitespace is simply taken as delimiter of another argument.

This effect is mitigated by the fact libdevmapper curently performs
double escaping of '\' char.  Any user input in format "x\ x" is
transformed into "x\\ x" before being passed to the kernel.  Nonetheless
dm-crypt may be used without libdevmapper.  Therefore the near-term
solution to this is to reject any key string containing whitespace.

Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-12-08 14:13:16 -05:00
Ondrej Kozina
c538f6ec9f dm crypt: add ability to use keys from the kernel key retention service
The kernel key service is a generic way to store keys for the use of
other subsystems. Currently there is no way to use kernel keys in dm-crypt.
This patch aims to fix that. Instead of key userspace may pass a key
description with preceding ':'. So message that constructs encryption
mapping now looks like this:

  <cipher> [<key>|:<key_string>] <iv_offset> <dev_path> <start> [<#opt_params> <opt_params>]

where <key_string> is in format: <key_size>:<key_type>:<key_description>

Currently we only support two elementary key types: 'user' and 'logon'.
Keys may be loaded in dm-crypt either via <key_string> or using
classical method and pass the key in hex representation directly.

dm-crypt device initialised with a key passed in hex representation may be
replaced with key passed in key_string format and vice versa.

(Based on original work by Andrey Ryabinin)

Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-12-08 14:13:09 -05:00
Julia Lawall
1b1b58f54f dm crypt: constify crypt_iv_operations structures
The crypt_iv_operations are never modified, so declare them
as const.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-11-21 09:52:06 -05:00
Mikulas Patocka
671ea6b457 dm crypt: rename crypt_setkey_allcpus to crypt_setkey
In the past, dm-crypt used per-cpu crypto context. This has been removed
in the kernel 3.15 and the crypto context is shared between all cpus. This
patch renames the function crypt_setkey_allcpus to crypt_setkey, because
there is really no activity that is done for all cpus.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-11-21 09:52:00 -05:00
Ondrej Kozina
265e9098ba dm crypt: mark key as invalid until properly loaded
In crypt_set_key(), if a failure occurs while replacing the old key
(e.g. tfm->setkey() fails) the key must not have DM_CRYPT_KEY_VALID flag
set.  Otherwise, the crypto layer would have an invalid key that still
has DM_CRYPT_KEY_VALID flag set.

Cc: stable@vger.kernel.org
Signed-off-by: Ondrej Kozina <okozina@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-11-21 09:51:59 -05:00
Ming Lei
0dae7fe597 dm crypt: use bio_add_page()
Use bio_add_page(), the standard interface for adding a page to a bio,
rather than open-coding the same.

It should be noted that the 'clone' bio that is allocated using
bio_alloc_bioset(), in crypt_alloc_buffer(), does _not_ set the
bio's BIO_CLONED flag.  As such, bio_add_page()'s early return for true
bio clones (those with BIO_CLONED set) isn't applicable.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-11-21 09:51:58 -05:00
Christoph Hellwig
ef295ecf09 block: better op and flags encoding
Now that we don't need the common flags to overflow outside the range
of a 32-bit type we can encode them the same way for both the bio and
request fields.  This in addition allows us to place the operation
first (and make some room for more ops while we're at it) and to
stop having to shift around the operation values.

In addition this allows passing around only one value in the block layer
instead of two (and eventuall also in the file systems, but we can do
that later) and thus clean up a lot of code.

Last but not least this allows decreasing the size of the cmd_flags
field in struct request to 32-bits.  Various functions passing this
value could also be updated, but I'd like to avoid the churn for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-10-28 08:48:16 -06:00
Linus Torvalds
48915c2cbc . various fixes and cleanups for request-based DM core
. add support for delaying the requeue of requests; used by DM multipath
   when all paths have failed and 'queue_if_no_path' is enabled
 
 . DM cache improvements to speedup the loading metadata and the writing
   of the hint array
 
 . fix potential for a dm-crypt crash on device teardown
 
 . remove dm_bufio_cond_resched() and just using cond_resched()
 
 . change DM multipath to return a reservation conflict error
   immediately; rather than failing the path and retrying (potentially
   indefinitely)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJX7n9KAAoJEMUj8QotnQNab74IANm+rW2uYdpLNCxWUmcaih0d
 BK8dLS/Mz35S0TRSekvynuBcPx18VP2Zueulc+aHTWcT4sj79l6KnVYT9g6c98rL
 zzcv10QTteqhiiWwFmPHsZgv5dW8Y5wiRdt+SqcQ5sAHMFci6C05gzp9caNu7VTs
 fbcLUdyYm40y3j84Lx/+ABXgnBhq+40OTtdnYSkEmLtdscPLzwpHgPmMctkrEl7e
 7mqGC1KbDDzartqOZOeGP2P2qOCNN21qA+8ctMw9Xyze33uwvj7Vx6cro6e28wMm
 ZClY9XNGlfuW9dCNtFR9o6NXS6NIK30UJbKqyZPPsK+70JrOgzh6GzQnwSXdyNs=
 =7SkG
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - various fixes and cleanups for request-based DM core

 - add support for delaying the requeue of requests; used by DM
   multipath when all paths have failed and 'queue_if_no_path' is
   enabled

 - DM cache improvements to speedup the loading metadata and the writing
   of the hint array

 - fix potential for a dm-crypt crash on device teardown

 - remove dm_bufio_cond_resched() and just using cond_resched()

 - change DM multipath to return a reservation conflict error
   immediately; rather than failing the path and retrying (potentially
   indefinitely)

* tag 'dm-4.9-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (24 commits)
  dm mpath: always return reservation conflict without failing over
  dm bufio: remove dm_bufio_cond_resched()
  dm crypt: fix crash on exit
  dm cache metadata: switch to using the new cursor api for loading metadata
  dm array: introduce cursor api
  dm btree: introduce cursor api
  dm cache policy smq: distribute entries to random levels when switching to smq
  dm cache: speed up writing of the hint array
  dm array: add dm_array_new()
  dm mpath: delay the requeue of blk-mq requests while all paths down
  dm mpath: use dm_mq_kick_requeue_list()
  dm rq: introduce dm_mq_kick_requeue_list()
  dm rq: reduce arguments passed to map_request() and dm_requeue_original_request()
  dm rq: add DM_MAPIO_DELAY_REQUEUE to delay requeue of blk-mq requests
  dm: convert wait loops to use autoremove_wake_function()
  dm: use signal_pending_state() in dm_wait_for_completion()
  dm: rename task state function arguments
  dm: add two lockdep_assert_held() statements
  dm rq: simplify dm_old_stop_queue()
  dm mpath: check if path's request_queue is dying in activate_path()
  ...
2016-10-09 17:16:18 -07:00
Linus Torvalds
513a4befae Merge branch 'for-4.9/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the main pull request for block layer changes in 4.9.

  As mentioned at the last merge window, I've changed things up and now
  do just one branch for core block layer changes, and driver changes.
  This avoids dependencies between the two branches. Outside of this
  main pull request, there are two topical branches coming as well.

  This pull request contains:

   - A set of fixes, and a conversion to blk-mq, of nbd. From Josef.

   - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd.
     Followup dependency fix from Geert.

   - General fixes from Bart, Baoyou, Guoqing, and Linus W.

   - CFQ async write starvation fix from Glauber.

   - Add supprot for delayed kick of the requeue list, from Mike.

   - Pull out the scalable bitmap code from blk-mq-tag.c and make it
     generally available under the name of sbitmap. Only blk-mq-tag uses
     it for now, but the blk-mq scheduling bits will use it as well.
     From Omar.

   - bdev thaw error progagation from Pierre.

   - Improve the blk polling statistics, and allow the user to clear
     them. From Stephen.

   - Set of minor cleanups from Christoph in block/blk-mq.

   - Set of cleanups and optimizations from me for block/blk-mq.

   - Various nvme/nvmet/nvmeof fixes from the various folks"

* 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits)
  fs/block_dev.c: return the right error in thaw_bdev()
  nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
  nvme/scsi: Remove power management support
  nvmet: Make dsm number of ranges zero based
  nvmet: Use direct IO for writes
  admin-cmd: Added smart-log command support.
  nvme-fabrics: Add host_traddr options field to host infrastructure
  nvme-fabrics: revise host transport option descriptions
  nvme-fabrics: rework nvmf_get_address() for variable options
  nbd: use BLK_MQ_F_BLOCKING
  blkcg: Annotate blkg_hint correctly
  cfq: fix starvation of asynchronous writes
  blk-mq: add flag for drivers wanting blocking ->queue_rq()
  blk-mq: remove non-blocking pass in blk_mq_map_request
  blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
  block: export bio_free_pages to other modules
  lightnvm: propagate device_add() error code
  lightnvm: expose device geometry through sysfs
  lightnvm: control life of nvm_dev in driver
  blk-mq: register device instead of disk
  ...
2016-10-07 14:42:05 -07:00
Rabin Vincent
f659b10087 dm crypt: fix crash on exit
As the documentation for kthread_stop() says, "if threadfn() may call
do_exit() itself, the caller must ensure task_struct can't go away".
dm-crypt does not ensure this and therefore crashes when crypt_dtr()
calls kthread_stop().  The crash is trivially reproducible by adding a
delay before the call to kthread_stop() and just opening and closing a
dm-crypt device.

 general protection fault: 0000 [#1] PREEMPT SMP
 CPU: 0 PID: 533 Comm: cryptsetup Not tainted 4.8.0-rc7+ #7
 task: ffff88003bd0df40 task.stack: ffff8800375b4000
 RIP: 0010: kthread_stop+0x52/0x300
 Call Trace:
  crypt_dtr+0x77/0x120
  dm_table_destroy+0x6f/0x120
  __dm_destroy+0x130/0x250
  dm_destroy+0x13/0x20
  dev_remove+0xe6/0x120
  ? dev_suspend+0x250/0x250
  ctl_ioctl+0x1fc/0x530
  ? __lock_acquire+0x24f/0x1b10
  dm_ctl_ioctl+0x13/0x20
  do_vfs_ioctl+0x91/0x6a0
  ? ____fput+0xe/0x10
  ? entry_SYSCALL_64_fastpath+0x5/0xbd
  ? trace_hardirqs_on_caller+0x151/0x1e0
  SyS_ioctl+0x41/0x70
  entry_SYSCALL_64_fastpath+0x1f/0xbd

This problem was introduced by bcbd94ff48 ("dm crypt: fix a possible
hang due to race condition on exit").

Looking at the description of that patch (excerpted below), it seems
like the problem it addresses can be solved by just using
set_current_state instead of __set_current_state, since we obviously
need the memory barrier.

| dm crypt: fix a possible hang due to race condition on exit
|
| A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE),
| __add_wait_queue, spin_unlock_irq and then tests kthread_should_stop().
| It is possible that the processor reorders memory accesses so that
| kthread_should_stop() is executed before __set_current_state().  If
| such reordering happens, there is a possible race on thread
| termination: [...]

So this patch just reverts the aforementioned patch and changes the
__set_current_state(TASK_INTERRUPTIBLE) to set_current_state(...).  This
fixes the crash and should also fix the potential hang.

Fixes: bcbd94ff48 ("dm crypt: fix a possible hang due to race condition on exit")
Cc: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Rabin Vincent <rabinv@axis.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-09-22 11:15:06 -04:00
Bart Van Assche
4382e33ad3 block, dm-crypt, btrfs: Introduce bio_flags()
Introduce the bio_flags() macro. Ensure that the second argument of
bio_set_op_attrs() only contains flags and no operation. This patch
does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Chris Mason <clm@fb.com> (maintainer:BTRFS FILE SYSTEM)
Cc: Josef Bacik <jbacik@fb.com> (maintainer:BTRFS FILE SYSTEM)
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@hgst.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-09-14 08:48:27 -06:00
Eric Biggers
5d0be84ec0 dm crypt: fix free of bad values after tfm allocation failure
If crypt_alloc_tfms() had to allocate multiple tfms and it failed before
the last allocation, then it would call crypt_free_tfms() and could free
pointers from uninitialized memory -- due to the crypt_free_tfms() check
for non-zero cc->tfms[i].  Fix by allocating zeroed memory.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2016-08-30 19:45:19 -04:00
Mikulas Patocka
4e870e948f dm crypt: fix error with too large bios
When dm-crypt processes writes, it allocates a new bio in
crypt_alloc_buffer().  The bio is allocated from a bio set and it can
have at most BIO_MAX_PAGES vector entries, however the incoming bio can be
larger (e.g. if it was allocated by bcache).  If the incoming bio is
larger, bio_alloc_bioset() fails and an error is returned.

To avoid the error, we test for a too large bio in the function
crypt_map() and use dm_accept_partial_bio() to split the bio.
dm_accept_partial_bio() trims the current bio to the desired size and
asks DM core to send another bio with the rest of the data.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.16+
2016-08-30 19:44:11 -04:00
Mikulas Patocka
0a83df6c8c dm crypt: increase mempool reserve to better support swapping
Increase mempool size from 16 to 64 entries.  This increase improves
swap on dm-crypt performance.

When swapping to dm-crypt, all available memory is temporarily exhausted
and dm-crypt can only use the mempool reserve.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-08-15 09:23:14 -04:00
Jens Axboe
1eff9d322a block: rename bio bi_rw to bi_opf
Since commit 63a4cc2486, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.

Signed-off-by: Jens Axboe <axboe@fb.com>
2016-08-07 14:41:02 -06:00
Bart Van Assche
350b539328 dm crypt: Fix sparse complaints
Avoid that sparse complains about assigning a __le64 value to a u64
variable.  Remove the (u64) casts since these are superfluous.  This
patch does not change the behavior of the source code.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-07-01 10:53:21 -04:00
Mike Christie
28a8f0d317 block, drivers, fs: rename REQ_FLUSH to REQ_PREFLUSH
To avoid confusion between REQ_OP_FLUSH, which is handled by
request_fn drivers, and upper layers requesting the block layer
perform a flush sequence along with possibly a WRITE, this patch
renames REQ_FLUSH to REQ_PREFLUSH.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Mike Christie
e6047149db dm: use bio op accessors
Separate the op from the rq_flag_bits and have dm
set/get the bio using bio_set_op_attrs/bio_op.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2016-06-07 13:41:38 -06:00
Linus Torvalds
70477371dc Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "Here is the crypto update for 4.6:

  API:
   - Convert remaining crypto_hash users to shash or ahash, also convert
     blkcipher/ablkcipher users to skcipher.
   - Remove crypto_hash interface.
   - Remove crypto_pcomp interface.
   - Add crypto engine for async cipher drivers.
   - Add akcipher documentation.
   - Add skcipher documentation.

  Algorithms:
   - Rename crypto/crc32 to avoid name clash with lib/crc32.
   - Fix bug in keywrap where we zero the wrong pointer.

  Drivers:
   - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
   - Add PIC32 hwrng driver.
   - Support BCM6368 in bcm63xx hwrng driver.
   - Pack structs for 32-bit compat users in qat.
   - Use crypto engine in omap-aes.
   - Add support for sama5d2x SoCs in atmel-sha.
   - Make atmel-sha available again.
   - Make sahara hashing available again.
   - Make ccp hashing available again.
   - Make sha1-mb available again.
   - Add support for multiple devices in ccp.
   - Improve DMA performance in caam.
   - Add hashing support to rockchip"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
  crypto: qat - remove redundant arbiter configuration
  crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
  crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
  crypto: qat - Change the definition of icp_qat_uof_regtype
  hwrng: exynos - use __maybe_unused to hide pm functions
  crypto: ccp - Add abstraction for device-specific calls
  crypto: ccp - CCP versioning support
  crypto: ccp - Support for multiple CCPs
  crypto: ccp - Remove check for x86 family and model
  crypto: ccp - memset request context to zero during import
  lib/mpi: use "static inline" instead of "extern inline"
  lib/mpi: avoid assembler warning
  hwrng: bcm63xx - fix non device tree compatibility
  crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
  crypto: qat - The AE id should be less than the maximal AE number
  lib/mpi: Endianness fix
  crypto: rockchip - add hash support for crypto engine in rk3288
  crypto: xts - fix compile errors
  crypto: doc - add skcipher API documentation
  crypto: doc - update AEAD AD handling
  ...
2016-03-17 11:22:54 -07:00
Mike Snitzer
30187e1d48 dm: rename target's per_bio_data_size to per_io_data_size
Request-based DM will also make use of per_bio_data_size.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2016-02-22 22:34:37 -05:00
Herbert Xu
bbdb23b5d6 dm crypt: Use skcipher and ahash
This patch replaces uses of ablkcipher with skcipher, and the long
obsolete hash interface with ahash.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2016-01-27 20:35:48 +08:00
Mikulas Patocka
bcbd94ff48 dm crypt: fix a possible hang due to race condition on exit
A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE),
__add_wait_queue, spin_unlock_irq and then tests kthread_should_stop().
It is possible that the processor reorders memory accesses so that
kthread_should_stop() is executed before __set_current_state().  If such
reordering happens, there is a possible race on thread termination:

CPU 0:
calls kthread_should_stop()
	it tests KTHREAD_SHOULD_STOP bit, returns false
CPU 1:
calls kthread_stop(cc->write_thread)
	sets the KTHREAD_SHOULD_STOP bit
	calls wake_up_process on the kernel thread, that sets the thread
	state to TASK_RUNNING
CPU 0:
sets __set_current_state(TASK_INTERRUPTIBLE)
spin_unlock_irq(&cc->write_thread_wait.lock)
schedule() - and the process is stuck and never terminates, because the
	state is TASK_INTERRUPTIBLE and wake_up_process on CPU 1 already
	terminated

Fix this race condition by using a new flag DM_CRYPT_EXIT_THREAD to
signal that the kernel thread should exit.  The flag is set and tested
while holding cc->write_thread_wait.lock, so there is no possibility of
racy access to the flag.

Also, remove the unnecessary set_task_state(current, TASK_RUNNING)
following the schedule() call.  When the process was woken up, its state
was already set to TASK_RUNNING.  Other kernel code also doesn't set the
state to TASK_RUNNING following schedule() (for example,
do_wait_for_common in completion.c doesn't do it).

Fixes: dc2676210c ("dm crypt: offload writes to thread")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-11-19 13:38:30 -05:00
Mel Gorman
d0164adc89 mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
__GFP_WAIT has been used to identify atomic context in callers that hold
spinlocks or are in interrupts.  They are expected to be high priority and
have access one of two watermarks lower than "min" which can be referred
to as the "atomic reserve".  __GFP_HIGH users get access to the first
lower watermark and can be called the "high priority reserve".

Over time, callers had a requirement to not block when fallback options
were available.  Some have abused __GFP_WAIT leading to a situation where
an optimisitic allocation with a fallback option can access atomic
reserves.

This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
cannot sleep and have no alternative.  High priority users continue to use
__GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
redefined as a caller that is willing to enter direct reclaim and wake
kswapd for background reclaim.

This patch then converts a number of sites

o __GFP_ATOMIC is used by callers that are high priority and have memory
  pools for those requests. GFP_ATOMIC uses this flag.

o Callers that have a limited mempool to guarantee forward progress clear
  __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
  into this category where kswapd will still be woken but atomic reserves
  are not used as there is a one-entry mempool to guarantee progress.

o Callers that are checking if they are non-blocking should use the
  helper gfpflags_allow_blocking() where possible. This is because
  checking for __GFP_WAIT as was done historically now can trigger false
  positives. Some exceptions like dm-crypt.c exist where the code intent
  is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
  flag manipulations.

o Callers that built their own GFP flags instead of starting with GFP_KERNEL
  and friends now also need to specify __GFP_KSWAPD_RECLAIM.

The first key hazard to watch out for is callers that removed __GFP_WAIT
and was depending on access to atomic reserves for inconspicuous reasons.
In some cases it may be appropriate for them to use __GFP_HIGH.

The second key hazard is callers that assembled their own combination of
GFP flags instead of starting with something like GFP_KERNEL.  They may
now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
if it's missed in most cases as other activity will wake kswapd.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Julia Lawall
6f65985e26 dm: drop NULL test before kmem_cache_destroy() and mempool_destroy()
Remove DM's unneeded NULL tests before calling these destroy functions,
now that they check for NULL, thanks to these v4.3 commits:
3942d2991 ("mm/slab_common: allow NULL cache pointer in kmem_cache_destroy()")
4e3ca3e03 ("mm/mempool: allow NULL `pool' pointer in mempool_destroy()")

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@ expression x; @@
-if (x != NULL)
  \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-10-31 19:06:00 -04:00
Mike Snitzer
586b286b11 dm crypt: constrain crypt device's max_segment_size to PAGE_SIZE
Setting the dm-crypt device's max_segment_size to PAGE_SIZE is an
unfortunate constraint that is required to avoid the potential for
exceeding dm-crypt's underlying device's max_segments limits -- due to
crypt_alloc_buffer() possibly allocating pages for the encryption bio
that are not as physically contiguous as the original bio.

It is interesting to note that this problem was already fixed back in
2007 via commit 91e106259 ("dm crypt: use bio_add_page").  But Linux 4.0
commit cf2f1abfb ("dm crypt: don't allocate pages for a partial
request") regressed dm-crypt back to _not_ using bio_add_page().  But
given dm-crypt's cpu parallelization changes all depend on commit
cf2f1abfb's abandoning of the more complex io fragments processing that
dm-crypt previously had we cannot easily go back to using
bio_add_page().

So all said the cleanest way to resolve this issue is to fix dm-crypt to
properly constrain the original bios entering dm-crypt so the encryption
bios that dm-crypt generates from the original bios are always
compatible with the underlying device's max_segments queue limits.

It should be noted that technically Linux 4.3 does _not_ need this fix
because of the block core's new late bio-splitting capability.  But, it
is reasoned, there is little to be gained by having the block core split
the encrypted bio that is composed of PAGE_SIZE segments.  That said, in
the future we may revert this change.

Fixes: cf2f1abfb ("dm crypt: don't allocate pages for a partial request")
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=104421
Suggested-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.0+
2015-09-14 12:04:24 -04:00
Linus Torvalds
1e1a4e8f43 Merge tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper update from Mike Snitzer:

 - a couple small cleanups in dm-cache, dm-verity, persistent-data's
   dm-btree, and DM core.

 - a 4.1-stable fix for dm-cache that fixes the leaking of deferred bio
   prison cells

 - a 4.2-stable fix that adds feature reporting for the dm-stats
   features added in 4.2

 - improve DM-snapshot to not invalidate the on-disk snapshot if
   snapshot device write overflow occurs; but a write overflow triggered
   through the origin device will still invalidate the snapshot.

 - optimize DM-thinp's async discard submission a bit now that late bio
   splitting has been included in block core.

 - switch DM-cache's SMQ policy lock from using a mutex to a spinlock;
   improves performance on very low latency devices (eg. NVMe SSD).

 - document DM RAID 4/5/6's discard support

[ I did not pull the slab changes, which weren't appropriate for this
  tree, and weren't obviously the right thing to do anyway.  At the very
  least they need some discussion and explanation before getting merged.

  Because not pulling the actual tagged commit but doing a partial pull
  instead, this merge commit thus also obviously is missing the git
  signature from the original tag ]

* tag 'dm-4.3-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm cache: fix use after freeing migrations
  dm cache: small cleanups related to deferred prison cell cleanup
  dm cache: fix leaking of deferred bio prison cells
  dm raid: document RAID 4/5/6 discard support
  dm stats: report precise_timestamps and histogram in @stats_list output
  dm thin: optimize async discard submission
  dm snapshot: don't invalidate on-disk image on snapshot write overflow
  dm: remove unlikely() before IS_ERR()
  dm: do not override error code returned from dm_get_device()
  dm: test return value for DM_MAPIO_SUBMITTED
  dm verity: remove unused mempool
  dm cache: move wake_waker() from free_migrations() to where it is needed
  dm btree remove: remove unused function get_nr_entries()
  dm btree: remove unused "dm_block_t root" parameter in btree_split_sibling()
  dm cache policy smq: change the mutex to a spinlock
2015-09-02 16:35:26 -07:00
Kent Overstreet
8ae126660f block: kill merge_bvec_fn() completely
As generic_make_request() is now able to handle arbitrarily sized bios,
it's no longer necessary for each individual block driver to define its
own ->merge_bvec_fn() callback. Remove every invocation completely.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: drbd-user@lists.linbit.com
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@kernel.org>
Cc: ceph-devel@vger.kernel.org
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Neil Brown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Acked-by: NeilBrown <neilb@suse.de> (for the 'md' bits)
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[dpark: also remove ->merge_bvec_fn() in dm-thin as well as
 dm-era-target, and resolve merge conflicts]
Signed-off-by: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-13 12:31:57 -06:00
Vivek Goyal
e80d1c805a dm: do not override error code returned from dm_get_device()
Some of the device mapper targets override the error code returned by
dm_get_device() and return either -EINVAL or -ENXIO.  There is nothing
gained by this override.  It is better to propagate the returned error
code unchanged to caller.

This work was motivated by hitting an issue where the underlying device
was busy but -EINVAL was being returned.  After this change we get
-EBUSY instead and it is easier to figure out the problem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-08-12 11:32:21 -04:00
Sasha Levin
9b81c84235 block: don't access bio->bi_error after bio_put()
Commit 4246a0b6 ("block: add a bi_error field to struct bio") has added a few
dereferences of 'bio' after a call to bio_put(). This causes use-after-frees
such as:

[521120.719695] BUG: KASan: use after free in dio_bio_complete+0x2b3/0x320 at addr ffff880f36b38714
[521120.720638] Read of size 4 by task mount.ocfs2/9644
[521120.721212] =============================================================================
[521120.722056] BUG kmalloc-256 (Not tainted): kasan: bad access detected
[521120.722968] -----------------------------------------------------------------------------
[521120.722968]
[521120.723915] Disabling lock debugging due to kernel taint
[521120.724539] INFO: Slab 0xffffea003cdace00 objects=32 used=25 fp=0xffff880f36b38600 flags=0x46fffff80004080
[521120.726037] INFO: Object 0xffff880f36b38700 @offset=1792 fp=0xffff880f36b38800
[521120.726037]
[521120.726974] Bytes b4 ffff880f36b386f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[521120.727898] Object ffff880f36b38700: 00 88 b3 36 0f 88 ff ff 00 00 d8 de 0b 88 ff ff  ...6............
[521120.728822] Object ffff880f36b38710: 02 00 00 f0 00 00 00 00 00 00 00 00 00 00 00 00  ................
[521120.729705] Object ffff880f36b38720: 01 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00  ................
[521120.730623] Object ffff880f36b38730: 00 00 00 00 00 00 00 00 01 00 00 00 00 02 00 00  ................
[521120.731621] Object ffff880f36b38740: 00 02 00 00 01 00 00 00 d0 f7 87 ad ff ff ff ff  ................
[521120.732776] Object ffff880f36b38750: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[521120.733640] Object ffff880f36b38760: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[521120.734508] Object ffff880f36b38770: 01 00 03 00 01 00 00 00 88 87 b3 36 0f 88 ff ff  ...........6....
[521120.735385] Object ffff880f36b38780: 00 73 22 ad 02 88 ff ff 40 13 e0 3c 00 ea ff ff  .s".....@..<....
[521120.736667] Object ffff880f36b38790: 00 02 00 00 00 04 00 00 00 00 00 00 00 00 00 00  ................
[521120.737596] Object ffff880f36b387a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[521120.738524] Object ffff880f36b387b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[521120.739388] Object ffff880f36b387c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[521120.740277] Object ffff880f36b387d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[521120.741187] Object ffff880f36b387e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[521120.742233] Object ffff880f36b387f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[521120.743229] CPU: 41 PID: 9644 Comm: mount.ocfs2 Tainted: G    B           4.2.0-rc6-next-20150810-sasha-00039-gf909086 #2420
[521120.744274]  ffff880f36b38000 ffff880d89c8f638 ffffffffb6e9ba8a ffff880101c0e5c0
[521120.745025]  ffff880d89c8f668 ffffffffad76a313 ffff880101c0e5c0 ffffea003cdace00
[521120.745908]  ffff880f36b38700 ffff880f36b38798 ffff880d89c8f690 ffffffffad772854
[521120.747063] Call Trace:
[521120.747520] dump_stack (lib/dump_stack.c:52)
[521120.748053] print_trailer (mm/slub.c:653)
[521120.748582] object_err (mm/slub.c:660)
[521120.749079] kasan_report_error (include/linux/kasan.h:20 mm/kasan/report.c:152 mm/kasan/report.c:194)
[521120.750834] __asan_report_load4_noabort (mm/kasan/report.c:250)
[521120.753580] dio_bio_complete (fs/direct-io.c:478)
[521120.755752] do_blockdev_direct_IO (fs/direct-io.c:494 fs/direct-io.c:1291)
[521120.759765] __blockdev_direct_IO (fs/direct-io.c:1322)
[521120.761658] blkdev_direct_IO (fs/block_dev.c:162)
[521120.762993] generic_file_read_iter (mm/filemap.c:1738)
[521120.767405] blkdev_read_iter (fs/block_dev.c:1649)
[521120.768556] __vfs_read (fs/read_write.c:423 fs/read_write.c:434)
[521120.772126] vfs_read (fs/read_write.c:454)
[521120.773118] SyS_pread64 (fs/read_write.c:607 fs/read_write.c:594)
[521120.776062] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:186)
[521120.777375] Memory state around the buggy address:
[521120.778118]  ffff880f36b38600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.779211]  ffff880f36b38680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.780315] >ffff880f36b38700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.781465]                          ^
[521120.782083]  ffff880f36b38780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[521120.783717]  ffff880f36b38800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[521120.784818] ==================================================================

This patch fixes a few of those places that I caught while auditing the patch, but the
original patch should be audited further for more occurences of this issue since I'm
not too familiar with the code.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-11 11:34:32 -06:00
Christoph Hellwig
4246a0b63b block: add a bi_error field to struct bio
Currently we have two different ways to signal an I/O error on a BIO:

 (1) by clearing the BIO_UPTODATE flag
 (2) by returning a Linux errno value to the bi_end_io callback

The first one has the drawback of only communicating a single possible
error (-EIO), and the second one has the drawback of not beeing persistent
when bios are queued up, and are not passed along from child to parent
bio in the ever more popular chaining scenario.  Having both mechanisms
available has the additional drawback of utterly confusing driver authors
and introducing bugs where various I/O submitters only deal with one of
them, and the others have to add boilerplate code to deal with both kinds
of error returns.

So add a new bi_error field to store an errno value directly in struct
bio and remove the existing mechanisms to clean all this up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-07-29 08:55:15 -06:00
Milan Broz
54cea3f668 dm crypt: add comments to better describe crypto processing logic
A crypto driver can process requests synchronously or asynchronously
and can use an internal driver queue to backlog requests.
Add some comments to clarify internal logic and completion return codes.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-29 14:19:02 -04:00
Rabin Vincent
c0403ec0bb Revert "dm crypt: fix deadlock when async crypto algorithm returns -EBUSY"
This reverts Linux 4.1-rc1 commit 0618764cb2.

The problem which that commit attempts to fix actually lies in the
Freescale CAAM crypto driver not dm-crypt.

dm-crypt uses CRYPTO_TFM_REQ_MAY_BACKLOG.  This means the the crypto
driver should internally backlog requests which arrive when the queue is
full and process them later.  Until the crypto hw's queue becomes full,
the driver returns -EINPROGRESS.  When the crypto hw's queue if full,
the driver returns -EBUSY, and if CRYPTO_TFM_REQ_MAY_BACKLOG is set, is
expected to backlog the request and process it when the hardware has
queue space.  At the point when the driver takes the request from the
backlog and starts processing it, it calls the completion function with
a status of -EINPROGRESS.  The completion function is called (for a
second time, in the case of backlogged requests) with a status/err of 0
when a request is done.

Crypto drivers for hardware without hardware queueing use the helpers,
crypto_init_queue(), crypto_enqueue_request(), crypto_dequeue_request()
and crypto_get_backlog() helpers to implement this behaviour correctly,
while others implement this behaviour without these helpers (ccp, for
example).

dm-crypt (before the patch that needs reverting) uses this API
correctly.  It queues up as many requests as the hw queues will allow
(i.e. as long as it gets back -EINPROGRESS from the request function).
Then, when it sees at least one backlogged request (gets -EBUSY), it
waits till that backlogged request is handled (completion gets called
with -EINPROGRESS), and then continues.  The references to
af_alg_wait_for_completion() and af_alg_complete() in that commit's
commit message are irrelevant because those functions only handle one
request at a time, unlink dm-crypt.

The problem is that the Freescale CAAM driver, which that commit
describes as having being tested with, fails to implement the
backlogging behaviour correctly.  In cam_jr_enqueue(), if the hardware
queue is full, it simply returns -EBUSY without backlogging the request.
What the observed deadlock was is not described in the commit message
but it is obviously the wait_for_completion() in crypto_convert() where
dm-crypto would wait for the completion being called with -EINPROGRESS
in the case of backlogged requests.  This completion will never be
completed due to the bug in the CAAM driver.

Commit 0618764cb2 incorrectly made dm-crypt wait for every request,
even when the driver/hardware queues are not full, which means that
dm-crypt will never see -EBUSY.  This means that that commit will cause
a performance regression on all crypto drivers which implement the API
correctly.

Revert it.  Correct backlog handling should be implemented in the CAAM
driver instead.

Cc'ing stable purely because commit 0618764cb2 did.  If for some reason
a stable@ kernel did pick up commit 0618764cb2 it should get reverted.

Signed-off-by: Rabin Vincent <rabin.vincent@axis.com>
Reviewed-by: Horia Geanta <horia.geanta@freescale.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-05-05 12:16:43 -04:00
Wei Yongjun
44c144f9c8 dm crypt: fix missing error code return from crypt_ctr error path
Fix to return a negative error code from crypt_ctr()'s optional
parameter processing error path.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-04-16 22:00:50 -04:00
Ben Collins
0618764cb2 dm crypt: fix deadlock when async crypto algorithm returns -EBUSY
I suspect this doesn't show up for most anyone because software
algorithms typically don't have a sense of being too busy.  However,
when working with the Freescale CAAM driver it will return -EBUSY on
occasion under heavy -- which resulted in dm-crypt deadlock.

After checking the logic in some other drivers, the scheme for
crypt_convert() and it's callback, kcryptd_async_done(), were not
correctly laid out to properly handle -EBUSY or -EINPROGRESS.

Fix this by using the completion for both -EBUSY and -EINPROGRESS.  Now
crypt_convert()'s use of completion is comparable to
af_alg_wait_for_completion().  Similarly, kcryptd_async_done() follows
the pattern used in af_alg_complete().

Before this fix dm-crypt would lockup within 1-2 minutes running with
the CAAM driver.  Fix was regression tested against software algorithms
on PPC32 and x86_64, and things seem perfectly happy there as well.

Signed-off-by: Ben Collins <ben.c@servergy.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2015-04-15 12:10:26 -04:00
Mike Snitzer
5977907937 dm crypt: leverage immutable biovecs when decrypting on read
Commit 003b5c571 ("block: Convert drivers to immutable biovecs")
stopped short of changing dm-crypt to leverage the fact that the biovec
array of a bio will no longer be modified.

Switch to using bio_clone_fast() when cloning bios for decryption after
read.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-04-15 12:10:25 -04:00
Milan Broz
e44f23b32d dm crypt: update URLs to new cryptsetup project page
Cryptsetup home page moved to GitLab.
Also remove link to abandonded Truecrypt page.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-04-15 12:10:24 -04:00
Mikulas Patocka
b3c5fd3052 dm crypt: sort writes
Write requests are sorted in a red-black tree structure and are
submitted in the sorted order.

In theory the sorting should be performed by the underlying disk
scheduler, however, in practice the disk scheduler only accepts and
sorts a finite number of requests.  To allow the sorting of all
requests, dm-crypt needs to implement its own sorting.

The overhead associated with rbtree-based sorting is considered
negligible so it is not used conditionally.  Even on SSD sorting can be
beneficial since in-order request dispatch promotes lower latency IO
completion to the upper layers.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-16 11:11:15 -05:00
Mikulas Patocka
0f5d8e6ee7 dm crypt: add 'submit_from_crypt_cpus' option
Make it possible to disable offloading writes by setting the optional
'submit_from_crypt_cpus' table argument.

There are some situations where offloading write bios from the
encryption threads to a single thread degrades performance
significantly.

The default is to offload write bios to the same thread because it
benefits CFQ to have writes submitted using the same IO context.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-16 11:11:15 -05:00
Mikulas Patocka
dc2676210c dm crypt: offload writes to thread
Submitting write bios directly in the encryption thread caused serious
performance degradation.  On a multiprocessor machine, encryption requests
finish in a different order than they were submitted.  Consequently, write
requests would be submitted in a different order and it could cause severe
performance degradation.

Move the submission of write requests to a separate thread so that the
requests can be sorted before submitting.  But this commit improves
dm-crypt performance even without having dm-crypt perform request
sorting (in particular it enables IO schedulers like CFQ to sort more
effectively).

Note: it is required that a previous commit ("dm crypt: don't allocate
pages for a partial request") be applied before applying this patch.
Otherwise, this commit could introduce a crash.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-16 11:11:14 -05:00
Mikulas Patocka
94f5e0243c dm crypt: remove unused io_pool and _crypt_io_pool
The previous commit ("dm crypt: don't allocate pages for a partial
request") stopped using the io_pool slab mempool and backing
_crypt_io_pool kmem cache.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-16 11:11:13 -05:00
Mikulas Patocka
7145c241a1 dm crypt: avoid deadlock in mempools
Fix a theoretical deadlock introduced in the previous commit ("dm crypt:
don't allocate pages for a partial request").

The function crypt_alloc_buffer may be called concurrently.  If we allocate
from the mempool concurrently, there is a possibility of deadlock.  For
example, if we have mempool of 256 pages, two processes, each wanting
256, pages allocate from the mempool concurrently, it may deadlock in a
situation where both processes have allocated 128 pages and the mempool
is exhausted.

To avoid such a scenario we allocate the pages under a mutex.  In order
to not degrade performance with excessive locking, we try non-blocking
allocations without a mutex first and if that fails, we fallback to a
blocking allocations with a mutex.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-16 11:11:13 -05:00
Mikulas Patocka
cf2f1abfbd dm crypt: don't allocate pages for a partial request
Change crypt_alloc_buffer so that it only ever allocates pages for a
full request.  This is a prerequisite for the commit "dm crypt: offload
writes to thread".

This change simplifies the dm-crypt code at the expense of reduced
throughput in low memory conditions (where allocation for a partial
request is most useful).

Note: the next commit ("dm crypt: avoid deadlock in mempools") is needed
to fix a theoretical deadlock.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-16 11:11:12 -05:00
Mikulas Patocka
f3396c58fd dm crypt: use unbound workqueue for request processing
Use unbound workqueue by default so that work is automatically balanced
between available CPUs.  The original behavior of encrypting using the
same cpu that IO was submitted on can still be enabled by setting the
optional 'same_cpu_crypt' table argument.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2015-02-16 11:10:59 -05:00
Linus Torvalds
140dfc9299 - Significant DM thin-provisioning performance improvements to meet
performance requirements that were requested by the Gluster
   distributed filesystem.  Specifically, dm-thinp now takes care to
   aggregate IO that will be issued to the same thinp block before
   issuing IO to the underlying devices.  This really helps improve
   performance on HW RAID6 devices that have a writeback cache because it
   avoids RMW in the HW RAID controller.
 
 - Some stable fixes: fix leak in DM bufio if integrity profiles were
   enabled, use memzero_explicit in DM crypt to avoid any potential for
   information leak, and a DM cache fix to properly mark a cache block
   dirty if it was promoted to the cache via the overwrite optimization.
 
 - A few simple DM persistent data library fixes
 
 - DM cache multiqueue policy block promotion improvements.
 
 - DM cache discard improvements that take advantage of range
   (multiblock) discard support in the DM bio-prison.  This allows for
   much more efficient bulk discard processing (e.g. when mkfs.xfs
   discards the entire device).
 
 - Some small optimizations in DM core and RCU deference cleanups
 
 - DM core changes to suspend/resume code to introduce the new internal
   suspend/resume interface that the DM thin-pool target now uses to
   suspend/resume active thin devices when the thin-pool must
   suspend/resume.  This avoids forcing userspace to track all active
   thin volumes in a thin-pool when the thin-pool is suspended for the
   purposes of metadata or data space resize.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUhcvVAAoJEMUj8QotnQNaB78H+wSA6sDJGOhc6e1KlWoFh4Hx
 hTmwm/O8Fxrp9StO3NPlcv9l+l9FX9pGzN/lo3OsxgWMTs/vLTKZ5SAe3/YT3/b9
 6SFC7pC70+glakgMhhXWRvoeSEQC1OWb5BuvOF8irl2n+7B9NAn/zHd9pgpmyWHp
 nBXK2GJBMzVSiI47NMjo2n6007LgQq0xxSJ9luwdrpwjDqD1d406DrhzbHou5H2Y
 b8XJGQzUy0GZCX8ycwPVXo9svp2Bc+XajVcgOj5Qg7s2uV5car8NN7TxhSOKSXn2
 VpiSyEa2MLHAbFuWtGs8XO98z/m5JlGf1eIgRO4s7w59URgpzdxOHXLlAoyqIGw=
 =opXi
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

 - Significant DM thin-provisioning performance improvements to meet
   performance requirements that were requested by the Gluster
   distributed filesystem.

   Specifically, dm-thinp now takes care to aggregate IO that will be
   issued to the same thinp block before issuing IO to the underlying
   devices.  This really helps improve performance on HW RAID6 devices
   that have a writeback cache because it avoids RMW in the HW RAID
   controller.

 - Some stable fixes: fix leak in DM bufio if integrity profiles were
   enabled, use memzero_explicit in DM crypt to avoid any potential for
   information leak, and a DM cache fix to properly mark a cache block
   dirty if it was promoted to the cache via the overwrite optimization.

 - A few simple DM persistent data library fixes

 - DM cache multiqueue policy block promotion improvements.

 - DM cache discard improvements that take advantage of range
   (multiblock) discard support in the DM bio-prison.  This allows for
   much more efficient bulk discard processing (e.g.  when mkfs.xfs
   discards the entire device).

 - Some small optimizations in DM core and RCU deference cleanups

 - DM core changes to suspend/resume code to introduce the new internal
   suspend/resume interface that the DM thin-pool target now uses to
   suspend/resume active thin devices when the thin-pool must
   suspend/resume.

   This avoids forcing userspace to track all active thin volumes in a
   thin-pool when the thin-pool is suspended for the purposes of
   metadata or data space resize.

* tag 'dm-3.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (49 commits)
  dm crypt: use memzero_explicit for on-stack buffer
  dm space map metadata: fix sm_bootstrap_get_count()
  dm space map metadata: fix sm_bootstrap_get_nr_blocks()
  dm bufio: fix memleak when using a dm_buffer's inline bio
  dm cache: fix spurious cell_defer when dealing with partial block at end of device
  dm cache: dirty flag was mistakenly being cleared when promoting via overwrite
  dm cache: only use overwrite optimisation for promotion when in writeback mode
  dm cache: discard block size must be a multiple of cache block size
  dm cache: fix a harmless race when working out if a block is discarded
  dm cache: when reloading a discard bitset allow for a different discard block size
  dm cache: fix some issues with the new discard range support
  dm array: if resizing the array is a noop set the new root to the old one
  dm: use rcu_dereference_protected instead of rcu_dereference
  dm thin: fix pool_io_hints to avoid looking at max_hw_sectors
  dm thin: suspend/resume active thin devices when reloading thin-pool
  dm: enhance internal suspend and resume interface
  dm thin: do not allow thin device activation while pool is suspended
  dm: add presuspend_undo hook to target_type
  dm: return earlier from dm_blk_ioctl if target doesn't implement .ioctl
  dm thin: remove stale 'trim' message in block comment above pool_message
  ...
2014-12-08 21:10:03 -08:00
Milan Broz
1a71d6ffe1 dm crypt: use memzero_explicit for on-stack buffer
Use memzero_explicit to cleanup sensitive data allocated on stack
to prevent the compiler from optimizing and removing memset() calls.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2014-12-02 10:25:07 -05:00
Jan-Simon Möller
b610626523 crypto, dm: LLVMLinux: Remove VLAIS usage from dm-crypt
Replaced the use of a Variable Length Array In Struct (VLAIS) with a C99
compliant equivalent. This patch allocates the appropriate amount of memory
using a char array using the SHASH_DESC_ON_STACK macro.

The new code can be compiled with both gcc and clang.

Signed-off-by: Jan-Simon Möller <dl9pf@gmx.de>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Reviewed-by: Mark Charlebois <charlebm@gmail.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Cc: pageexec@freemail.hu
Cc: gmazyland@gmail.com
Cc: "David S. Miller" <davem@davemloft.net>
2014-10-14 10:51:23 +02:00
Mikulas Patocka
d49ec52ff6 dm crypt: fix access beyond the end of allocated space
The DM crypt target accesses memory beyond allocated space resulting in
a crash on 32 bit x86 systems.

This bug is very old (it dates back to 2.6.25 commit 3a7f6c990a "dm
crypt: use async crypto").  However, this bug was masked by the fact
that kmalloc rounds the size up to the next power of two.  This bug
wasn't exposed until 3.17-rc1 commit 298a9fa08a ("dm crypt: use per-bio
data").  By switching to using per-bio data there was no longer any
padding beyond the end of a dm-crypt allocated memory block.

To minimize allocation overhead dm-crypt puts several structures into one
block allocated with kmalloc.  The block holds struct ablkcipher_request,
cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))),
struct dm_crypt_request and an initialization vector.

The variable dmreq_start is set to offset of struct dm_crypt_request
within this memory block.  dm-crypt allocates the block with this size:
cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size.

When accessing the initialization vector, dm-crypt uses the function
iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq
+ 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1).

dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request
structure.  However, when dm-crypt accesses the initialization vector, it
takes a pointer to the end of dm_crypt_request, aligns it, and then uses
it as the initialization vector.  If the end of dm_crypt_request is not
aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the
alignment causes the initialization vector to point beyond the allocated
space.

Fix this bug by calculating the variable iv_size_padding and adding it
to the allocated size.

Also correct the alignment of dm_crypt_request.  struct dm_crypt_request
is specific to dm-crypt (it isn't used by the crypto subsystem at all),
so it is aligned on __alignof__(struct dm_crypt_request).

Also align per_bio_data_size on ARCH_KMALLOC_MINALIGN, so that it is
aligned as if the block was allocated with kmalloc.

Reported-by: Krzysztof Kolasa <kkolasa@winsoft.pl>
Tested-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-08-28 14:24:09 -04:00
Mikulas Patocka
298a9fa08a dm crypt: use per-bio data
Change dm-crypt so that it uses auxiliary data allocated with the bio.

Dm-crypt requires two allocations per request - struct dm_crypt_io and
struct ablkcipher_request (with other data appended to it).  It
previously only used mempool allocations.

Some requests may require more dm_crypt_ios and ablkcipher_requests,
however most requests need just one of each of these two structures to
complete.

This patch changes it so that the first dm_crypt_io and ablkcipher_request
are allocated with the bio (using target per_bio_data_size option).  If
the request needs additional values, they are allocated from the mempool.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-08-01 12:30:35 -04:00
Jana Saout
bf14299f1c dm crypt, dm zero: update author name following legal name change
Signed-off-by: Jana Saout <jana@saout.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2014-07-10 16:44:14 -04:00
Mikulas Patocka
610f2de355 dm crypt: fix cpu hotplug crash by removing per-cpu structure
The DM crypt target used per-cpu structures to hold pointers to a
ablkcipher_request structure.  The code assumed that the work item keeps
executing on a single CPU, so it didn't use synchronization when
accessing this structure.

If a CPU is disabled by writing 0 to /sys/devices/system/cpu/cpu*/online,
the work item could be moved to another CPU.  This causes dm-crypt
crashes, like the following, because the code starts using an incorrect
ablkcipher_request:

 smpboot: CPU 7 is now offline
 BUG: unable to handle kernel NULL pointer dereference at 0000000000000130
 IP: [<ffffffffa1862b3d>] crypt_convert+0x12d/0x3c0 [dm_crypt]
 ...
 Call Trace:
  [<ffffffffa1864415>] ? kcryptd_crypt+0x305/0x470 [dm_crypt]
  [<ffffffff81062060>] ? finish_task_switch+0x40/0xc0
  [<ffffffff81052a28>] ? process_one_work+0x168/0x470
  [<ffffffff8105366b>] ? worker_thread+0x10b/0x390
  [<ffffffff81053560>] ? manage_workers.isra.26+0x290/0x290
  [<ffffffff81058d9f>] ? kthread+0xaf/0xc0
  [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120
  [<ffffffff813464ac>] ? ret_from_fork+0x7c/0xb0
  [<ffffffff81058cf0>] ? kthread_create_on_node+0x120/0x120

Fix this bug by removing the per-cpu definition.  The structure
ablkcipher_request is accessed via a pointer from convert_context.
Consequently, if the work item is rescheduled to a different CPU, the
thread still uses the same ablkcipher_request.

This change may undermine performance improvements intended by commit
c0297721 ("dm crypt: scale to multiple cpus") on select hardware.  In
practice no performance difference was observed on recent hardware.  But
regardless, correctness is more important than performance.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
2014-05-14 16:11:35 -04:00
Kent Overstreet
003b5c5719 block: Convert drivers to immutable biovecs
Now that we've got a mechanism for immutable biovecs -
bi_iter.bi_bvec_done - we need to convert drivers to use primitives that
respect it instead of using the bvec array directly.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: NeilBrown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
2013-11-23 22:33:51 -08:00
Kent Overstreet
4f024f3797 block: Abstract out bvec iterator
Immutable biovecs are going to require an explicit iterator. To
implement immutable bvecs, a later patch is going to add a bi_bvec_done
member to this struct; for now, this patch effectively just renames
things.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Geoff Levand <geoff@infradead.org>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: Sage Weil <sage@inktank.com>
Cc: Alex Elder <elder@inktank.com>
Cc: ceph-devel@vger.kernel.org
Cc: Joshua Morris <josh.h.morris@us.ibm.com>
Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux390@de.ibm.com
Cc: Boaz Harrosh <bharrosh@panasas.com>
Cc: Benny Halevy <bhalevy@tonian.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: Joern Engel <joern@logfs.org>
Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Ben Myers <bpm@sgi.com>
Cc: xfs@oss.sgi.com
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Len Brown <len.brown@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchand@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Peng Tao <tao.peng@emc.com>
Cc: Andy Adamson <andros@netapp.com>
Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
Cc: Jie Liu <jeff.liu@oracle.com>
Cc: Sunil Mushran <sunil.mushran@gmail.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Pankaj Kumar <pankaj.km@samsung.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Mel Gorman <mgorman@suse.de>6
2013-11-23 22:33:47 -08:00
Wolfram Sang
16735d022f tree-wide: use reinit_completion instead of INIT_COMPLETION
Use this new function to make code more comprehensible, since we are
reinitialzing the completion, not initializing.

[akpm@linux-foundation.org: linux-next resyncs]
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Acked-by: Linus Walleij <linus.walleij@linaro.org> (personally at LCE13)
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-15 09:32:21 +09:00
Milan Broz
ed04d98169 dm crypt: add TCW IV mode for old CBC TCRYPT containers
dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers
in LRW or XTS block encryption mode.

TCRYPT containers prior to version 4.1 use CBC mode with some additional
tweaks, this patch adds support for these containers.

This new mode is implemented using special IV generator named TCW
(TrueCrypt IV with whitening).  TCW IV only supports containers that are
encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and
TripleDES).

While this mode is legacy and is known to be vulnerable to some
watermarking attacks (e.g. revealing of hidden disk existence) it can
still be useful to activate old containers without using 3rd party
software or for independent forensic analysis of such containers.

(Both the userspace and kernel code is an independent implementation
based on the format documentation and it completely avoids use of
original source code.)

The TCW IV generator uses two additional keys: Kw (whitening seed, size
is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is
always the IV size of the selected cipher).  These keys are concatenated
at the end of the main encryption key provided in mapping table.

While whitening is completely independent from IV, it is implemented
inside IV generator for simplification.

The whitening value is always 16 bytes long and is calculated per sector
from provided Kw as initial seed, xored with sector number and mixed
with CRC32 algorithm.  Resulting value is xored with ciphertext sector
content.

IV is calculated from the provided Kiv as initial IV seed and xored with
sector number.

Detailed calculation can be found in the Truecrypt documentation for
version < 4.1 and will also be described on dm-crypt site, see:
http://code.google.com/p/cryptsetup/wiki/DMCrypt

The experimental support for activation of these containers is already
present in git devel brach of cryptsetup.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:20 -05:00
Milan Broz
da31a0787a dm crypt: properly handle extra key string in initialization
Some encryption modes use extra keys (e.g. loopAES has IV seed) which
are not used in block cipher initialization but are part of key string
in table constructor.

This patch adds an additional field which describes the length of the
extra key(s) and substracts it before real key encryption setting.

The key_size always includes the size, in bytes, of the key provided
in mapping table.

The key_parts describes how many parts (usually keys) are contained in
the whole key buffer.  And key_extra_size contains size in bytes of
additional keys part (this number of bytes must be subtracted because it
is processed by the IV generator).

| K1 | K2 | .... | K64 |      Kiv       |
|----------- key_size ----------------- |
|                      |-key_extra_size-|
|     [64 keys]        |  [1 key]       | => key_parts = 65

Example where key string contains main key K, whitening key
Kw and IV seed Kiv:

|     K       |   Kiv   |       Kw      |
|--------------- key_size --------------|
|             |-----key_extra_size------|
|  [1 key]    | [1 key] |     [1 key]   | => key_parts = 3

Because key_extra_size is calculated during IV mode setting, key
initialization is moved after this step.

For now, this change has no effect to supported modes (thanks to ilog2
rounding) but it is required by the following patch.

Also, fix a sparse warning in crypt_iv_lmk_one().

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:20 -05:00
Tejun Heo
670368a8dd dm: stop using WQ_NON_REENTRANT
dbf2576e37 ("workqueue: make all workqueues non-reentrant") made
WQ_NON_REENTRANT no-op and the flag is going away.  Remove its usages.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
2013-08-23 09:02:13 -04:00
Kent Overstreet
cb34e057ad block: Convert some code to bio_for_each_segment_all()
More prep work for immutable bvecs:

A few places in the code were either open coding or using the wrong
version - fix.

After we introduce the bvec iter, it'll no longer be possible to modify
the biovec through bio_for_each_segment_all() - it doesn't increment a
pointer to the current bvec, you pass in a struct bio_vec (not a
pointer) which is updated with what the current biovec would be (taking
into account bi_bvec_done and bi_size).

So because of that it's more worthwhile to be consistent about
bio_for_each_segment()/bio_for_each_segment_all() usage.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
CC: Alasdair Kergon <agk@redhat.com>
CC: dm-devel@redhat.com
CC: Alexander Viro <viro@zeniv.linux.org.uk>
2013-03-23 14:26:30 -07:00
Alasdair G Kergon
55a62eef8d dm: rename request variables to bios
Use 'bio' in the name of variables and functions that deal with
bios rather than 'request' to avoid confusion with the normal
block layer use of 'request'.

No functional changes.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:47 +00:00
Mikulas Patocka
fd7c092e71 dm: fix truncated status strings
Avoid returning a truncated table or status string instead of setting
the DM_BUFFER_FULL_FLAG when the last target of a table fills the
buffer.

When processing a table or status request, the function retrieve_status
calls ti->type->status. If ti->type->status returns non-zero,
retrieve_status assumes that the buffer overflowed and sets
DM_BUFFER_FULL_FLAG.

However, targets don't return non-zero values from their status method
on overflow. Most targets returns always zero.

If a buffer overflow happens in a target that is not the last in the
table, it gets noticed during the next iteration of the loop in
retrieve_status; but if a buffer overflow happens in the last target, it
goes unnoticed and erroneously truncated data is returned.

In the current code, the targets behave in the following way:
* dm-crypt returns -ENOMEM if there is not enough space to store the
  key, but it returns 0 on all other overflows.
* dm-thin returns errors from the status method if a disk error happened.
  This is incorrect because retrieve_status doesn't check the error
  code, it assumes that all non-zero values mean buffer overflow.
* all the other targets always return 0.

This patch changes the ti->type->status function to return void (because
most targets don't use the return code). Overflow is detected in
retrieve_status: if the status method fills up the remaining space
completely, it is assumed that buffer overflow happened.

Cc: stable@vger.kernel.org
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-01 22:45:44 +00:00
Mikulas Patocka
7de3ee57da dm: remove map_info
This patch removes map_info from bio-based device mapper targets.
map_info is still used for request-based targets.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21 20:23:41 +00:00
Kent Overstreet
bf800ef181 block: Add bio_clone_bioset(), bio_clone_kmalloc()
Previously, there was bio_clone() but it only allocated from the fs bio
set; as a result various users were open coding it and using
__bio_clone().

This changes bio_clone() to become bio_clone_bioset(), and then we add
bio_clone() and bio_clone_kmalloc() as wrappers around it, making use of
the functionality the last patch adedd.

This will also help in a later patch changing how bio cloning works.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
CC: Alasdair Kergon <agk@redhat.com>
CC: Boaz Harrosh <bharrosh@panasas.com>
CC: Jeff Garzik <jeff@garzik.org>
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-09 10:35:39 +02:00
Kent Overstreet
395c72a707 block: Generalized bio pool freeing
With the old code, when you allocate a bio from a bio pool you have to
implement your own destructor that knows how to find the bio pool the
bio was originally allocated from.

This adds a new field to struct bio (bi_pool) and changes
bio_alloc_bioset() to use it. This makes various bio destructors
unnecessary, so they're then deleted.

v6: Explain the temporary if statement in bio_put

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: NeilBrown <neilb@suse.de>
CC: Alasdair Kergon <agk@redhat.com>
CC: Nicholas Bellinger <nab@linux-iscsi.org>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-09-09 10:35:38 +02:00
Alasdair G Kergon
1f4e0ff079 dm thin: commit before gathering status
Commit outstanding metadata before returning the status for a dm thin
pool so that the numbers reported are as up-to-date as possible.

The commit is not performed if the device is suspended or if
the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
through a new 'status_flags' parameter in the target's dm_status_fn.

The userspace dmsetup tool will support the --noflush flag with the
'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
onwards.

Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:16 +01:00
Alasdair G Kergon
0ac55489d9 dm: use bool bitfields in struct dm_target
Use boolean bit fields for flags in struct dm_target.

Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:08 +01:00
Mikulas Patocka
c66029f4d4 dm crypt: rename struct convert_context sector field
Rename sector to cc_sector in dm-crypt's convert_context struct.

This is preparation for a future patch that merges dm_io and
convert_context which both have a "sector" field.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:05 +01:00
Alasdair G Kergon
49a8a9204b dm crypt: store crypt_config instead of dm_target struct
Store the crypt_config struct pointer directly in struct dm_crypt_io
instead of the dm_target struct pointer.

Target information is never used - only target->private is referenced,
thus we can change it to point directly to struct crypt_config.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:05 +01:00
Mikulas Patocka
fd2d231faf dm crypt: move cipher data out of per_cpu struct
Move static dm-crypt cipher data out of per-cpu structure.

Cipher information is static, so it does not have to be in a per-cpu
structure.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:05 +01:00
Mikulas Patocka
40b6229b69 dm crypt: rename pending field
There are two dm crypt structures that have a field called "pending".

This patch renames them to "cc_pending" and "io_pending" to reduce confusion
and ease searching the code.

Also remove unnecessary initialisation of r in crypt_convert_block().

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:08:04 +01:00
majianpeng
1a66a08ae8 dm: replace simple_strtoul
Replace obsolete simple_strtoul() with kstrtou8/kstrtouint.

Signed-off-by: majianpeng <majianpeng@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-07-27 15:07:59 +01:00
Mikulas Patocka
31998ef193 dm: reject trailing characters in sccanf input
Device mapper uses sscanf to convert arguments to numbers. The problem is that
the way we use it ignores additional unmatched characters in the scanned string.

For example, this `if (sscanf(string, "%d", &number) == 1)' will match a number,
but also it will match number with some garbage appended, like "123abc".

As a result, device mapper accepts garbage after some numbers. For example
the command `dmsetup create vg1-new --table "0 16384 linear 254:1bla 34816bla"'
will pass without an error.

This patch fixes all sscanf uses in device mapper. It appends "%c" with
a pointer to a dummy character variable to every sscanf statement.

The construct `if (sscanf(string, "%d%c", &number, &dummy) == 1)' succeeds
only if string is a null-terminated number (optionally preceded by some
whitespace characters). If there is some character appended after the number,
sscanf matches "%c", writes the character to the dummy variable and returns 2.
We check the return value for 1 and consequently reject numbers with some
garbage appended.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-28 18:41:26 +01:00
Mikulas Patocka
72c6e7afc4 dm crypt: add missing error handling
Always set io->error to -EIO when an error is detected in dm-crypt.

There were cases where an error code would be set only if we finish
processing the last sector. If there were other encryption operations in
flight, the error would be ignored and bio would be returned with
success as if no error happened.

This bug is present in kcryptd_crypt_write_convert, kcryptd_crypt_read_convert
and kcryptd_async_done.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Reviewed-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-28 18:41:22 +01:00
Mikulas Patocka
aeb2deae26 dm crypt: fix mempool deadlock
This patch fixes a possible deadlock in dm-crypt's mempool use.

Currently, dm-crypt reserves a mempool of MIN_BIO_PAGES reserved pages.
It allocates first MIN_BIO_PAGES with non-failing allocation (the allocation
cannot fail and waits until the mempool is refilled). Further pages are
allocated with different gfp flags that allow failing.

Because allocations may be done in parallel, this code can deadlock. Example:
There are two processes, each tries to allocate MIN_BIO_PAGES and the processes
run simultaneously.
It may end up in a situation where each process allocates (MIN_BIO_PAGES / 2)
pages. The mempool is exhausted. Each process waits for more pages to be freed
to the mempool, which never happens.

To avoid this deadlock scenario, this patch changes the code so that only
the first page is allocated with non-failing gfp mask. Allocation of further
pages may fail.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-03-28 18:41:22 +01:00
Cong Wang
c2e022cb65 dm: remove the second argument of k[un]map_atomic()
Acked-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:28 +08:00
Milan Broz
983c7db347 dm crypt: always disable discard_zeroes_data
If optional discard support in dm-crypt is enabled, discards requests
bypass the crypt queue and blocks of the underlying device are discarded.
For the read path, discarded blocks are handled the same as normal
ciphertext blocks, thus decrypted.

So if the underlying device announces discarded regions return zeroes,
dm-crypt must disable this flag because after decryption there is just
random noise instead of zeroes.

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-09-25 23:26:21 +01:00
Milan Broz
772ae5f54d dm crypt: optionally support discard requests
Add optional parameter field to dmcrypt table and support
"allow_discards" option.

Discard requests bypass crypt queue processing. Bio is simple remapped
to underlying device.

Note that discard will be never enabled by default because of security
consequences.  It is up to the administrator to enable it for encrypted
devices.

(Note that userspace cryptsetup does not understand new optional
parameters yet.  Support for this will come later.  Until then, you
should use 'dmsetup' to enable and disable this.)

Signed-off-by: Milan Broz <mbroz@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2011-08-02 12:32:08 +01:00