44134 Commits

Author SHA1 Message Date
Ryan Ding
46e6255659 ocfs2: do not change i_size in write_end for direct io
To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.

Append direct io do not change i_size in get block phase.  It only move
to orphan when starting write.  After data is written to disk, it will
delete itself from orphan and update i_size.  So skip i_size change
section in write_begin for direct io.

And when there is no extents alloc, no meta data changes needed for
direct io (since write_begin start trans for 2 reason: alloc extents &
change i_size.  Now none of them needed).  So we can skip start trans
procedure.

Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Ryan Ding
65c4db8c82 ocfs2: test target page before change it
To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.

Direct io data will not appear in buffer.  The w_target_page member will
not be filled by direct io.  So avoid to use it when it's NULL.  Unlinke
buffer io and mmap, direct io will call write_begin with more than 1
page a time.  So the target_index is not sufficient to describe the
actual data.  change it to a range start at target_index, end in
end_index.

Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Ryan Ding
b46637d59f ocfs2: use c_new to indicate newly allocated extents
To support direct io in ocfs2_write_begin_nolock & ocfs2_write_end_nolock.

There is a problem in ocfs2's direct io implement: if system crashed
after extents allocated, and before data return, we will get a extent
with dirty data on disk.  This problem violate the journal=order
semantics, which means meta changes take effect after data written to
disk.  To resolve this issue, direct write can use the UNWRITTEN flag to
describe a extent during direct data writeback.  The direct write
procedure should act in the following order:

phase 1: alloc extent with UNWRITTEN flag
phase 2: submit direct data to disk, add zero page to page cache
phase 3: clear UNWRITTEN flag when data has been written to disk

This patch is to change the 'c_unwritten' member of
ocfs2_write_cluster_desc to 'c_clear_unwritten'.  Means whether to clear
the unwritten flag.  It do not care if a extent is allocated or not.
And use 'c_new' to specify a newly allocated extent.  So the direct io
procedure can use c_clear_unwritten to control the UNWRITTEN bit on
extent.

Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Ryan Ding
c1ad1e3ca3 ocfs2: add ocfs2_write_type_t type to identify the caller of write
Patchset: fix ocfs2 direct io code patch to support sparse file and data
ordering semantics

The idea is to use buffer io(more precisely use the interface
ocfs2_write_begin_nolock & ocfs2_write_end_nolock) to do the zero work
beyond block size.  And clear UNWRITTEN flag until direct io data has
been written to disk, which can prevent data corruption when system
crashed during direct write.

And we will also archive a better performance: eg.  dd direct write new
file with block size 4KB: before this patchset:
  2.5 MB/s
after this patchset:
  66.4 MB/s

This patch (of 8):

To support direct io in ocfs2_write_begin_nolock &
ocfs2_write_end_nolock.

Remove unused args filp & flags.  Add new arg type.  The type is one of
buffer/direct/mmap.  Indicate 3 way to perform write.  buffer/mmap type
has implemented.  direct type will be implemented later.

Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Junxiao Bi
9e13f1f9de ocfs2: o2hb: fix double free bug
This is a regression issue and caused the following kernel panic when do
ocfs2 multiple test.

  BUG: unable to handle kernel paging request at 00000002000800c0
  IP: [<ffffffff81192978>] kmem_cache_alloc+0x78/0x160
  PGD 7bbe5067 PUD 0
  Oops: 0000 [#1] SMP
  Modules linked in: ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi xen_kbdfront xen_netfront xen_fbfront xen_blkfront
  CPU: 2 PID: 4044 Comm: mpirun Not tainted 4.5.0-rc5-next-20160225 #1
  Hardware name: Xen HVM domU, BIOS 4.3.1OVM 05/14/2014
  task: ffff88007a521a80 ti: ffff88007aed0000 task.ti: ffff88007aed0000
  RIP: 0010:[<ffffffff81192978>]  [<ffffffff81192978>] kmem_cache_alloc+0x78/0x160
  RSP: 0018:ffff88007aed3a48  EFLAGS: 00010282
  RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000001991
  RDX: 0000000000001990 RSI: 00000000024000c0 RDI: 000000000001b330
  RBP: ffff88007aed3a98 R08: ffff88007d29b330 R09: 00000002000800c0
  R10: 0000000c51376d87 R11: ffff8800792cac38 R12: ffff88007cc30f00
  R13: 00000000024000c0 R14: ffffffff811b053f R15: ffff88007aed3ce7
  FS:  0000000000000000(0000) GS:ffff88007d280000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00000002000800c0 CR3: 000000007aeb2000 CR4: 00000000000406e0
  Call Trace:
    __d_alloc+0x2f/0x1a0
    d_alloc+0x17/0x80
    lookup_dcache+0x8a/0xc0
    path_openat+0x3c3/0x1210
    do_filp_open+0x80/0xe0
    do_sys_open+0x110/0x200
    SyS_open+0x19/0x20
    do_syscall_64+0x72/0x230
    entry_SYSCALL64_slow_path+0x25/0x25
  Code: 05 e6 77 e7 7e 4d 8b 08 49 8b 40 10 4d 85 c9 0f 84 dd 00 00 00 48 85 c0 0f 84 d4 00 00 00 49 63 44 24 20 49 8b 3c 24 48 8d 4a 01 <49> 8b 1c 01 4c 89 c8 65 48 0f c7 0f 0f 94 c0 3c 01 75 b6 49 63
  RIP   kmem_cache_alloc+0x78/0x160
  CR2: 00000002000800c0
  ---[ end trace 823969e602e4aaac ]---

Fixes: a4a1dfa4bb8b("ocfs2/cluster: fix memory leak in o2hb_region_release")
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-25 16:37:42 -07:00
Geliang Tang
99ec269779 ceph: use kmem_cache_zalloc
Use kmem_cache_zalloc() instead of kmem_cache_alloc() with flag GFP_ZERO.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:56 +01:00
Yan, Zheng
200fd27c8f ceph: use lookup request to revalidate dentry
If dentry has no lease, ceph_d_revalidate() previously return 0.
This causes VFS to invalidate the dentry and create a new dentry
for later lookup. Invalidating a dentry also detach any underneath
mount points. So mount point inside cephfs can disapear mystically
(even the mount point is not modified by other hosts).

The fix is using lookup request to revalidate dentry without lease.
This can partly solve the mount points disapear issue (as long as
the mount point is not modified by other hosts)

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:56 +01:00
Yan, Zheng
641235d8f8 ceph: kill ceph_get_dentry_parent_inode()
use vfs helper dget_parent() instead

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:55 +01:00
Yan, Zheng
315f240880 ceph: fix security xattr deadlock
When security is enabled, security module can call filesystem's
getxattr/setxattr callbacks during d_instantiate(). For cephfs,
d_instantiate() is usually called by MDS' dispatch thread, while
handling MDS reply. If the MDS reply does not include xattrs and
corresponding caps, getxattr/setxattr need to send a new request
to MDS and waits for the reply. This makes MDS' dispatch sleep,
nobody handles later MDS replies.

The fix is make sure lookup/atomic_open reply include xattrs and
corresponding caps. So getxattr can be handled by cached xattrs.
This requires some modification to both MDS and request message.
(Client tells MDS what caps it wants; MDS encodes proper caps in
the reply)

Smack security module may call setxattr during d_instantiate().
Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
to us. So just make setxattr return error when called by MDS'
dispatch thread.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:55 +01:00
Yan, Zheng
29dccfa5af ceph: don't request vxattrs from MDS
It's uselese because MDS reply does not carry any vxattr.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:55 +01:00
Yan, Zheng
132ca7e1de ceph: fix mounting same fs multiple times
Now __ceph_open_session() only accepts closed client. An opened
client will tigger BUG_ON().

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:54 +01:00
Yan, Zheng
4531126753 ceph: remove unnecessary NULL check
If page->mapping is NULL, releasepage() callback does not get called.
Remove the unnecessary NULL check to make static code analysis tool
happy

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:54 +01:00
Yan, Zheng
a3d714c336 ceph: avoid updating directory inode's i_size accidentally
Directory inode's i_size is used by readdir cache.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:53 +01:00
Yan, Zheng
af5e5eb574 ceph: fix race during filling readdir cache
Readdir cache uses page cache to save dentry pointers. When adding
dentry pointers to middle of a page, we need to make sure the page
already exists. Otherwise the beginning part of the page will be
invalid pointers.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:53 +01:00
Ilya Dryomov
34b759b4a2 ceph: kill ceph_empty_snapc
ceph_empty_snapc->num_snaps == 0 at all times.  Passing such a snapc to
ceph_osdc_alloc_request() (possibly through ceph_osdc_new_request()) is
equivalent to passing NULL, as ceph_osdc_alloc_request() uses it only
for sizing the request message.

Further, in all four cases the subsequent ceph_osdc_build_request() is
passed NULL for snapc, meaning that 0 is encoded for seq and num_snaps
and making ceph_empty_snapc entirely useless.  The two cases where it
actually mattered were removed in commits 860560904962 ("ceph: avoid
sending unnessesary FLUSHSNAP message") and 23078637e054 ("ceph: fix
queuing inode to mdsdir's snaprealm").

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by:  Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:52 +01:00
Anton Protopopov
ce4355932a ceph: fix a wrong comparison
A negative value rc compared to the positive value ENOENT in the
finish_read() function.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:52 +01:00
Deepa Dinamani
8bbd47140c ceph: replace CURRENT_TIME by current_fs_time()
CURRENT_TIME macro is not appropriate for filesystems as it
doesn't use the right granularity for filesystem timestamps.
Use current_fs_time() instead.

Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:52 +01:00
Yan, Zheng
5b64640cf6 ceph: scattered page writeback
This patch makes ceph_writepages_start() try using single OSD request
to write all dirty pages within a strip unit. When a nonconsecutive
dirty page is found, ceph_writepages_start() tries starting a new write
operation to existing OSD request. If it succeeds, it uses the new
operation to writeback the dirty page.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:51 +01:00
Yan, Zheng
a587d71b0a ceph: remove useless BUG_ON
ceph_osdc_start_request() never return -EOLDSNAP

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:41 +01:00
Yan, Zheng
133e91566c ceph: don't enable rbytes mount option by default
When rbytes mount option is enabled, directory size is recursive
size. Recursive size is not updated instantly. This can cause
directory size to change between successive stat(1)

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:41 +01:00
Yan, Zheng
d1eee0c0e1 ceph: encode ctime in cap message
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2016-03-25 18:51:40 +01:00
Ilya Dryomov
82dcabad75 libceph: revamp subs code, switch to SUBSCRIBE2 protocol
It is currently hard-coded in the mon_client that mdsmap and monmap
subs are continuous, while osdmap sub is always "onetime".  To better
handle full clusters/pools in the osd_client, we need to be able to
issue continuous osdmap subs.  Revamp subs code to allow us to specify
for each sub whether it should be continuous or not.

Although not strictly required for the above, switch to SUBSCRIBE2
protocol while at it, eliminating the ambiguity between a request for
"every map since X" and a request for "just the latest" when we don't
have a map yet (i.e. have epoch 0).  SUBSCRIBE2 feature bit is now
required - it's been supported since pre-argonaut (2010).

Move "got mdsmap" call to the end of ceph_mdsc_handle_map() - calling
in before we validate the epoch and successfully install the new map
can mess up mon_client sub state.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-03-25 18:51:38 +01:00
Linus Torvalds
1d02369dba Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Final round of fixes for this merge window - some of this has come up
  after the initial pull request, and some of it was put in a post-merge
  branch before the merge window.

  This contains:

   - Fix for a bad check for an error on dma mapping in the mtip32xx
     driver, from Alexey Khoroshilov.

   - A set of fixes for lightnvm, from Javier, Matias, and Wenwei.

   - An NVMe completion record corruption fix from Marta, ensuring that
     we read things in the right order.

   - Two writeback fixes from Tejun, marked for stable@ as well.

   - A blk-mq sw queue iterator fix from Thomas, fixing an oops for
     sparse CPU maps.  They hit this in the hot plug/unplug rework"

* 'for-linus' of git://git.kernel.dk/linux-block:
  nvme: avoid cqe corruption when update at the same time as read
  writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode
  writeback, cgroup: fix premature wb_put() in locked_inode_to_wb_and_lock_list()
  blk-mq: Use proper cpumask iterator
  mtip32xx: fix checks for dma mapping errors
  lightnvm: do not load L2P table if not supported
  lightnvm: do not reserve lun on l2p loading
  nvme: lightnvm: return ppa completion status
  lightnvm: add a bitmap of luns
  lightnvm: specify target's logical address area
  null_blk: add lightnvm null_blk device to the nullb_list
2016-03-24 20:00:44 -07:00
Linus Torvalds
8f40842e42 MTD updates for v4.6
NAND:
  * Add sunxi_nand randomizer support
  * begin refactoring NAND ecclayout structs
  * fix pxa3xx_nand dmaengine usage
  * brcmnand: fix support for v7.1 controller
  * add Qualcomm NAND controller driver
 
 SPI NOR:
  * add new ls1021a, ls2080a support to Freescale QuadSPI
  * add new flash ID entries
  * support bottom-block protection for Winbond flash
  * support Status Register Write Protect
  * remove broken QPI support for Micron SPI flash
 
 JFFS2:
  * improve post-mount CRC scan efficiency
 
 General:
  * refactor bcm63xxpart parser, to later extend for NAND
  * add writebuf size parameter to mtdram
 
 Other minor code quality improvements
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW9CzVAAoJEFySrpd9RFgtQFwQAJdH0wnsZTYfeqToIaD8yMM4
 rtakV/oIMSvMSWuqK+Mx0k6OjGwswgnGZ+tfQLRAYIhb33P8UD0F8Dv5D0x/+zRo
 EgiDlnss/lliXpbh2u4fsANSpFF/JUPXFqU6NanjqQ1rtvR60LUeKOFEz1NRciuV
 Ib6oDLFeXQFxwG0J+EBDo5MrT8aiPODtx4TS8VVo0o0y/WLkEujQPP5592TnCPha
 zX0n9azi26pARo7VLqWjVD8GigY5PadqJAWOZcQr0dGMQv5URtWcCCdThiNsCEzY
 SW9cYSr4CBdy1FIeoJ47yoBg8aFzhyeeuF1efb1U0MoYVL0rdIbznop3Kwilj48L
 Rnh4hvKkrTH16rO6RfKm1lIJaJQYKMErXyEceYMIjV91fEL3qhfbU9W6+Q5HT4hY
 oJmlH+4e/I1Jtf+vW4xFGMYclmYwCO6GJ4HHqnNpby/iH/nZ07hNX3lbxrlqHMwh
 MrSIidqLTsseXcyHBFc+42AsWs8unaYWVB0N3VFkEgl0BFyPObAtvwnHA6zywMvp
 EqJijXFG8VPcztE3eTIMbd0WOkxTjpMT6YHzpZqli/ENxCgu79OWELYrJ0/vC5Uj
 HK0qxgvIzUyJgmikkySDvd/Hc6HWItYonlcAht0VErNfTTfkMwWgRz1W4ZRB6bOJ
 7M83aytLyRYaPGEbwaoR
 =xOlP
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20160324' of git://git.infradead.org/linux-mtd

Pull MTD updates from Brian Norris:
 "NAND:
   - Add sunxi_nand randomizer support
   - begin refactoring NAND ecclayout structs
   - fix pxa3xx_nand dmaengine usage
   - brcmnand: fix support for v7.1 controller
   - add Qualcomm NAND controller driver

  SPI NOR:
   - add new ls1021a, ls2080a support to Freescale QuadSPI
   - add new flash ID entries
   - support bottom-block protection for Winbond flash
   - support Status Register Write Protect
   - remove broken QPI support for Micron SPI flash

  JFFS2:
   - improve post-mount CRC scan efficiency

  General:
   - refactor bcm63xxpart parser, to later extend for NAND
   - add writebuf size parameter to mtdram

  Other minor code quality improvements"

* tag 'for-linus-20160324' of git://git.infradead.org/linux-mtd: (72 commits)
  mtd: nand: remove kerneldoc for removed function parameter
  mtd: nand: Qualcomm NAND controller driver
  dt/bindings: qcom_nandc: Add DT bindings
  mtd: nand: don't select chip in nand_chip's block_bad op
  mtd: spi-nor: support lock/unlock for a few Winbond chips
  mtd: spi-nor: add TB (Top/Bottom) protect support
  mtd: spi-nor: add SPI_NOR_HAS_LOCK flag
  mtd: spi-nor: use BIT() for flash_info flags
  mtd: spi-nor: disallow further writes to SR if WP# is low
  mtd: spi-nor: make lock/unlock bounds checks more obvious and robust
  mtd: spi-nor: silently drop lock/unlock for already locked/unlocked region
  mtd: spi-nor: wait for SR_WIP to clear on initial unlock
  mtd: nand: simplify nand_bch_init() usage
  mtd: mtdswap: remove useless if (!mtd->ecclayout) test
  mtd: create an mtd_oobavail() helper and make use of it
  mtd: kill the ecclayout->oobavail field
  mtd: nand: check status before reporting timeout
  mtd: bcm63xxpart: give width specifier an 'int', not 'size_t'
  mtd: mtdram: Add parameter for setting writebuf size
  mtd: nand: pxa3xx_nand: kill unused field 'drcmr_cmd'
  ...
2016-03-24 19:57:15 -07:00
Linus Torvalds
88875667eb This pull request contains cleanups and a maintainer update
for UBI and UBIFS.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJW9C7iAAoJEEtJtSqsAOnWLSIP/3ijRdZdGCD9QlKQfb0qg+wH
 kS0yHiPRp6mabMqPrgvDvF5/+ULRkVdEJef1Pcvs24odB2cH8SmwY7sPXTJIAh49
 R9tbAQsgz+nPgplBnCnKaO1Io6aHZ3v4eJZL6UF5Y89B2i8d5zAzql/9Y+a2gFZP
 wTCSxVC+NsO1Bf822701pQXON9fTZbBEYomeb1xIIC2paOp3XrBuLIT6grKv+h2Z
 wvZ9iCuMzWdPkRoIi/qLUkrWVdsuCt2oN/GqKJn5uSbjybS/bIqUYe5dcm322c2s
 vPhcQptgv4zkjHoJd6hyf/hts+y1i1K6NNX3hCGZ1sgL1qldeq5VMAlf+Ksuy9V5
 NlfNTU7/AJEJT1DgcuRqvXBP+sB7W/4T3TTphW7sn4uTFMDQ94N3LBkUC7dOkWLg
 qOIvBJho1im/gZyZS9g3mwg0h7SCnYCNOoZFTU17f5wCyYCLgVaFWIzcBH4iwPXk
 BSADYNv9crYyyV2RDdcYif4QDZ9QwUPanvJdrdH3CsieymboHseMW9N3I53kr3Bq
 lU2pRSeQz9aTLCN1l06bkXHHtoE+RWb5oiL3Gy28a+QwW7dluz58NjIRkT6vEost
 UzIgvBl5GJTjO7JInrIgwDIuAeznwFkG7xDS0VG9lCki+HlIH4R5SGJdqkFtcVIF
 sXeRIQCDh03niv37vwAh
 =y4hr
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.6-rc1' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS updates from Richard Weinberger:
 "This contains cleanups and a maintainer update for UBI and UBIFS"

* tag 'upstream-4.6-rc1' of git://git.infradead.org/linux-ubifs:
  ubifs: Remove unused header
  MAINTAINERS: Update UBIFS entry
  mtd: ubi: Add logging functions ubi_msg, ubi_warn and ubi_err
  ubifs: Add logging functions for ubifs_msg, ubifs_err and ubifs_warn
2016-03-24 19:55:41 -07:00
Linus Torvalds
8b306a2e7c Various bugfixes, a RDMA update from Chuck Lever, and support for a new
pnfs layout type from Christoph Hellwig.  The new layout type is a
 variant of the block layout which uses SCSI features to offer improved
 fencing and device identification.
 
 Note this pull request also includes the client side of SCSI layout,
 with Trond's permission.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW9D0/AAoJECebzXlCjuG+fYcP/ibluAOSRrQ523gQcJNS+QSV
 3B7YY6diJkfQNkm4oAROwPd1KHT2qhoVAO3JHXA3SZnjVVYQxAHeh2wsZJ2jL6Ft
 uyZARxix+F9alJVT3S+uYLwagjh9LXLhb0MaRTMheaWGsPKLQTU4JtsLsjAIhCah
 R0EIIdQfWcb83XoVPmiflVO4Nl/TQWmfA5wHfoVtITJcL3AaC9gzCGNbc8dHLnFC
 HRjGVgHr3nSL3suvUEFfxSEo4QoNPWIX4kBaWXgqbVgOQqmbtQtaXdnd3gIRtkzj
 9Q/lxiwaArtDjdAQdyNtRRBUpkpWo+xWp/vpnNUxTXKoRtpSyqYQX5FaPCPRVAAp
 GYGw2qHrvWn2hSajtVtKyWwsQ3lYsDmbkxAkgScO9kQdS+kuxNyIzYIEvakdtFyJ
 txFsauJczkNNFeHKzLPDoGbuX7KB/+pUsjmX5nYtMhwRriXA5S8zcO4AvTrmTPDF
 vQrLM97mqI60LWmpQUO1OE8CEFPVx5DUZ0KdLMvFNKPZph8BTPJxJMmxJK4R6stV
 /TWglRTEO8IGUh0ww8+3PfMfxVG5XHnQc99+VGVZOS9hJ4GOXbWYAqZ0m+sRJ2Pi
 JPawILie5x2gH1FrVYbcTZsQzdmdn/BF9yePNzAkMucjuEUHXFTlf3MMfEhKpYTl
 0l8LBCv6ZvtGU+PUJxZn
 =MToz
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux

Pull more nfsd updates from Bruce Fields:
 "Apologies for the previous request, which omitted the top 8 commits
  from my for-next branch (including the SCSI layout commits).  Thanks
  to Trond for spotting my error!"

This actually includes the new layout types, so here's that part of
the pull message repeated:

 "Support for a new pnfs layout type from Christoph Hellwig.  The new
  layout type is a variant of the block layout which uses SCSI features
  to offer improved fencing and device identification.

  Note this pull request also includes the client side of SCSI layout,
  with Trond's permission"

* tag 'nfsd-4.6-1' of git://linux-nfs.org/~bfields/linux:
  nfsd: use short read as well as i_size to set eof
  nfsd: better layoutupdate bounds-checking
  nfsd: block and scsi layout drivers need to depend on CONFIG_BLOCK
  nfsd: add SCSI layout support
  nfsd: move some blocklayout code
  nfsd: add a new config option for the block layout driver
  nfs/blocklayout: add SCSI layout support
  nfs4.h: add SCSI layout definitions
2016-03-24 19:50:32 -07:00
Chris Mason
232cad8413 Merge branch 'misc-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.6 2016-03-24 17:36:13 -07:00
Linus Torvalds
5b1e167d8d Various bugfixes, a RDMA update from Chuck Lever, and support for a new
pnfs layout type from Christoph Hellwig.  The new layout type is a
 variant of the block layout which uses SCSI features to offer improved
 fencing and device identification.
 
 (Also: note this pull request also includes the client side of SCSI
 layout, with Trond's permission.)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJW8+uhAAoJECebzXlCjuG+26YP/35DP4MPfszEJ5G0dYq5HMwl
 dJUni8ajSHRswZ/2FqiBsRwmg3Djfc+uoXdOneD1f6ogkDe7S16yp+FRyh8/VwUs
 Ym6LcxSjT28uqkxO0MblcnUl0G9nNSuOwqIsZ0HG7/UC7E6RmCF4o3r5fFUfOsA+
 B3koB5UcHNAFythAk+GDwOQ46Fr96VkZ7Y+OhdNAwmeXZIdKXIufweueI/o2uipB
 RoJFJ7lqrzAjFe+CqAUBr2l2k6lEKzdxbEH6HXQ5+cvVNwfVIgnrONpF78uF/p9T
 NNDnZ+fn3YdRhd+W9RxUHZq7ZL5YOEA8kHsAlloeBH74GqCy7IcS+DrKt1ReM3px
 bhgsXM3dqqJ9xiDGqmeE4VQwRF30SxgYZbO386E+cLHnCYV+vfY6RUaWPrk6On/r
 FL9g3iyVvhyC4HO06Xm+uvvERw8R+fTZY9KZQKH2RL0Tr5DkWRRNJfasMO+PwGOv
 Fdku01vyoA4Y6mbqUgQ9DmrbLO4gK3UyMiOTanQV9shrIDxI0MOuLK03zL25vZCM
 s1A4YBpXmg4gx3XsOFM+tygv6EVujDu6scICeb+hj0vi0oG82Lx7T9e3MJEiYC+T
 jbi8bu+x+0bX2obMprvDNVUzi/PgSUVpGCnRlbRTaXBa0lB6nV7uUiQ1HC9gGesm
 ZWWiOv7du+7WlFP5c6r5
 =mY8w
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux

Pull nfsd updates from Bruce Fields:
 "Various bugfixes, a RDMA update from Chuck Lever, and support for a
  new pnfs layout type from Christoph Hellwig.  The new layout type is a
  variant of the block layout which uses SCSI features to offer improved
  fencing and device identification.

  (Also: note this pull request also includes the client side of SCSI
  layout, with Trond's permission.)"

* tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux:
  sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
  nfsd: recover: fix memory leak
  nfsd: fix deadlock secinfo+readdir compound
  nfsd4: resfh unused in nfsd4_secinfo
  svcrdma: Use new CQ API for RPC-over-RDMA server send CQs
  svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs
  svcrdma: Remove close_out exit path
  svcrdma: Hook up the logic to return ERR_CHUNK
  svcrdma: Use correct XID in error replies
  svcrdma: Make RDMA_ERROR messages work
  rpcrdma: Add RPCRDMA_HDRLEN_ERR
  svcrdma: svc_rdma_post_recv() should close connection on error
  svcrdma: Close connection when a send error occurs
  nfsd: Lower NFSv4.1 callback message size limit
  svcrdma: Do not send Write chunk XDR pad with inline content
  svcrdma: Do not write xdr_buf::tail in a Write chunk
  svcrdma: Find client-provided write and reply chunks once per reply
  nfsd: Update NFS server comments related to RDMA support
  nfsd: Fix a memory leak when meeting unsupported state_protect_how4
  nfsd4: fix bad bounds checking
2016-03-24 10:41:00 -07:00
Martin Brandenburg
fecd86aac5 ornagefs: ensure that truncate has an up to date inode size
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:16 -04:00
Martin Brandenburg
e8da254c41 orangefs: move code which sets i_link to orangefs_inode_getattr
Everything else setting inode->i_ values is in there.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:16 -04:00
Martin Brandenburg
05d31c5cb3 orangefs: remove needless wrapper around GFP_KERNEL
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:15 -04:00
Martin Brandenburg
93d53a4885 orangefs: remove wrapper around mutex_lock(&inode->i_mutex)
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:15 -04:00
Martin Brandenburg
266626339b orangefs: refactor inode type or link_target change detection
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:15 -04:00
Martin Brandenburg
5859d77e56 orangefs: use new getattr for revalidate and remove old getattr
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:15 -04:00
Martin Brandenburg
8f24928d19 orangefs: use new getattr in inode getattr and permission
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:15 -04:00
Martin Brandenburg
e2f7f0d798 orangefs: use new orangefs_inode_getattr to get size in write and llseek
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:14 -04:00
Martin Brandenburg
075cca50b6 orangefs: use new orangefs_inode_getattr to create new inodes
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:14 -04:00
Martin Brandenburg
3c9cf98d7b orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr
This is motivated by orangefs_inode_old_getattr's habit of writing over
live inodes.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:14 -04:00
Martin Brandenburg
d57521a653 orangefs: remove inode->i_lock wrapper
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2016-03-23 17:36:13 -04:00
Benjamin Coddington
ac503e4a30 nfsd: use short read as well as i_size to set eof
Use the result of a local read to determine when to set the eof flag.  This
allows us to return the location of the end of the file atomically at the
time of the read.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
[bfields: add some documentation]
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2016-03-23 16:02:39 -04:00
Linus Torvalds
c130423620 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes the following issues:

  API:
   - Fix kzalloc error path crash in ecryptfs added by skcipher
     conversion.  Note the subject of the commit is screwed up and the
     correct subject is actually in the body.

  Drivers:
   - A number of fixes to the marvell cesa hashing code.
   - Remove bogus nested irqsave that clobbers the saved flags in ccp"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: marvell/cesa - forward devm_ioremap_resource() error code
  crypto: marvell/cesa - initialize hash states
  crypto: marvell/cesa - fix memory leak
  crypto: ccp - fix lock acquisition code
  eCryptfs: Use skcipher and shash
2016-03-23 06:12:39 -07:00
Linus Torvalds
a24e3d414e Merge branch 'akpm' (patches from Andrew)
Merge third patch-bomb from Andrew Morton:

 - more ocfs2 changes

 - a few hotfixes

 - Andy's compat cleanups

 - misc fixes to fatfs, ptrace, coredump, cpumask, creds, eventfd,
   panic, ipmi, kgdb, profile, kfifo, ubsan, etc.

 - many rapidio updates: fixes, new drivers.

 - kcov: kernel code coverage feature.  Like gcov, but not
   "prohibitively expensive".

 - extable code consolidation for various archs

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (81 commits)
  ia64/extable: use generic search and sort routines
  x86/extable: use generic search and sort routines
  s390/extable: use generic search and sort routines
  alpha/extable: use generic search and sort routines
  kernel/...: convert pr_warning to pr_warn
  drivers: dma-coherent: use memset_io for DMA_MEMORY_IO mappings
  drivers: dma-coherent: use MEMREMAP_WC for DMA_MEMORY_MAP
  memremap: add MEMREMAP_WC flag
  memremap: don't modify flags
  kernel/signal.c: add compile-time check for __ARCH_SI_PREAMBLE_SIZE
  mm/mprotect.c: don't imply PROT_EXEC on non-exec fs
  ipc/sem: make semctl setting sempid consistent
  ubsan: fix tree-wide -Wmaybe-uninitialized false positives
  kfifo: fix sparse complaints
  scripts/gdb: account for changes in module data structure
  scripts/gdb: add cmdline reader command
  scripts/gdb: add version command
  kernel: add kcov code coverage
  profile: hide unused functions when !CONFIG_PROC_FS
  hpwdt: use nmi_panic() when kernel panics in NMI handler
  ...
2016-03-22 17:09:14 -07:00
Paolo Bonzini
a484c3dd94 eventfd: document lockless access in eventfd_poll
Since commit e22553e2a25e ("eventfd: don't take the spinlock in
eventfd_poll", 2015-02-17), eventfd is reading ctx->count outside
ctx->wqh.lock.

However, things aren't as simple as the read barrier in eventfd_poll
would suggest.  In fact, the read barrier, besides lacking a comment, is
not paired in any obvious manner with another read barrier, and it is
pointless because it is sitting between a write (deep in poll_wait) and
the read of ctx->count.  The read barrier is acting just as a compiler
barrier, for which we can use READ_ONCE instead.  This is what the code
change in this patch does.

The documentation change is just as important, however.  The question,
posed by Andrea Arcangeli, is then why the thing is safe on
architectures where spin_unlock does not imply a store-load memory
barrier.  The answer is that it's safe because writes of ctx->count use
the same lock as poll_wait, and hence an acquire barrier implicit in
poll_wait provides the necessary synchronization between eventfd_poll
and callers of wake_up_locked_poll.  This is sort of mentioned in the
commit message with respect to eventfd_ctx_read ("eventfd_read is
similar, it will do a single decrement with the lock held") but it
applies to all other callers too.  It's tricky enough that it should be
documented in the code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Chris Mason <clm@fb.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Jann Horn
378c6520e7 fs/coredump: prevent fsuid=0 dumps into user-controlled directories
This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:

 - The fs.suid_dumpable sysctl is set to 2.
 - The kernel.core_pattern sysctl's value starts with "/". (Systems
   where kernel.core_pattern starts with "|/" are not affected.)
 - Unprivileged user namespace creation is permitted. (This is
   true on Linux >=3.8, but some distributions disallow it by
   default using a distro patch.)

Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.

To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Maciej S. Szmigiero
3873938068 fat: add config option to set UTF-8 mount option by default
FAT has long supported its own default file name encoding config
setting, separate from CONFIG_NLS_DEFAULT.

However, if UTF-8 encoded file names are desired FAT character set
should not be set to utf8 since this would make file names case
sensitive even if case insensitive matching is requested.  Instead,
"utf8" mount options should be provided to enable UTF-8 file names in
FAT file system.

Unfortunately, there was no possibility to set the default value of this
option so on UTF-8 system "utf8" mount option had to be added manually
to most FAT mounts.

This patch adds config option to set such default value.

Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Andy Lutomirski
121cef8f17 ext4: in ext4_dir_llseek, check syscall bitness directly
ext4 treats directory offsets differently for 32-bit and 64-bit callers.
Check the caller type using in_compat_syscall, not is_compat_task.  This
changes behavior on SPARC slightly.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Gang He
d56a8f32e4 ocfs2: check/fix inode block for online file check
Implement online check or fix inode block during reading a inode block
to memory.

Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Gang He
a849d46816 ocfs2: create/remove sysfile for online file check
Create online file check sysfile when ocfs2 mount, remove the related
sysfile when ocfs2 umount.

Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Gang He
a860f6eb4c ocfs2: sysfile interfaces for online file check
Implement online file check sysfile interfaces, e.g. how to create the
related sysfile according to device name, how to display/handle file
check request from the sysfile.

Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00
Gang He
9dde5e4f33 ocfs2: export ocfs2_kset for online file check
When there are errors in the ocfs2 filesystem, they are usually
accompanied by the inode number which caused the error.  This inode
number would be the input to fixing the file.  One of these options
could be considered:

A file in the sys filesytem which would accept inode numbers.  This
could be used to communication back what has to be fixed or is fixed.
You could write:

  $# echo "<inode>" > /sys/fs/ocfs2/devname/filecheck/check

or

  $# echo "<inode>" > /sys/fs/ocfs2/devname/filecheck/fix

Compare with second version, I re-design filecheck sysfs interfaces,
there are three sysfs files (check, fix and set) under filecheck
directory (see above), sysfs will accept only one argument <inode>.
Second, I adjust some code in ocfs2_filecheck_repair_inode_block()
function according to upstream feedback, we cannot just add VALID_FL
flag back as a inode block fix, then we will not fix this field
corruption currently until having a complete solution.  Compare with
first version, I use strncasecmp instead of double strncmp functions.
Second, update the source file contribution vendor.

This patch (of 4):

Export ocfs2_kset object from ocfs2_stackglue kernel module, then online
file check code will create the related sysfiles under ocfs2_kset
object.  We're exporting this because it's built in ocfs2_stackglue.ko.

Signed-off-by: Gang He <ghe@suse.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-03-22 15:36:02 -07:00