900783 Commits

Author SHA1 Message Date
Mauro Carvalho Chehab
c849d86139 docs: kvm: Convert ppc-pv.txt to ReST format
- Use document title and chapter markups;
- Add markups for tables;
- Use list markups;
- Add markups for literal blocks;
- Add blank lines.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:10:05 +01:00
Mauro Carvalho Chehab
320f3f74d9 docs: kvm: Convert nested-vmx.txt to ReST format
This file is almost in ReST format. Just a small set of
changes were needed:

    - Add markups for lists;
    - Add markups for a literal block;
    - Adjust whitespaces.

While here, use the standard markup for the document title.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:10:04 +01:00
Mauro Carvalho Chehab
037d1f92ef docs: kvm: Convert mmu.txt to ReST format
- Use document title and chapter markups;
- Add markups for tables;
- Add markups for literal blocks;
- Add blank lines and adjust indentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:10:03 +01:00
Mauro Carvalho Chehab
75e7fcdb4a docs: kvm: Convert locking.txt to ReST format
- Use document title and chapter markups;
- Add markups for literal blocks;
- use :field: for field descriptions;
- Add blank lines and adjust indentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:10:02 +01:00
Mauro Carvalho Chehab
5a0af4806c docs: kvm: Convert hypercalls.txt to ReST format
- Use document title and chapter markups;
- Convert tables;
- Add markups for literal blocks;
- use :field: for field descriptions;
- Add blank lines and adjust indentation

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:10:02 +01:00
Mauro Carvalho Chehab
cec0e48be3 docs: kvm: arm/psci.txt: convert to ReST
- Add a title for the document;
- Adjust whitespaces for it to be properly formatted after
  parsed.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:10:01 +01:00
Mauro Carvalho Chehab
69bf758bc8 docs: kvm: convert arm/hyp-abi.txt to ReST
- Add proper markups for titles;
- Adjust whitespaces and blank lines to match ReST
  needs;
- Mark literal blocks as such.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:10:00 +01:00
Mauro Carvalho Chehab
106ee47dc6 docs: kvm: Convert api.txt to ReST format
convert api.txt document to ReST format while trying to keep
its format as close as possible with the authors intent, and
avoid adding uneeded markups.

- Use document title and chapter markups;
- Convert tables;
- Add markups for literal blocks;
- use :field: for field descriptions;
- Add blank lines and adjust indentation

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:59 +01:00
Mauro Carvalho Chehab
d3b52e4976 docs: kvm: convert devices/xive.txt to ReST
- Use title markups;
- adjust indentation and add blank lines as needed;
- adjust tables to match ReST accepted formats;
- mark code blocks as such.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:58 +01:00
Mauro Carvalho Chehab
5cccf37974 docs: kvm: convert devices/xics.txt to ReST
- Use title markups;
- adjust indentation and add blank lines as needed;
- adjust tables to match ReST accepted formats;
- use :field: markups.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:58 +01:00
Mauro Carvalho Chehab
6c972ba685 docs: kvm: convert devices/vm.txt to ReST
- Use title markups;
- adjust indentation and add blank lines as needed;
- use :field: markups;
- Use cross-references;
- mark code blocks as such.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:57 +01:00
Mauro Carvalho Chehab
aff7aeea54 docs: kvm: convert devices/vfio.txt to ReST
- Use standard title markup;
- adjust lists;
- mark code blocks as such.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:56 +01:00
Mauro Carvalho Chehab
e777a5bd98 docs: kvm: convert devices/vcpu.txt to ReST
- Use title markups;
- adjust indentation and add blank lines as needed;
- adjust tables to match ReST accepted formats;
- use :field: markups;
- mark code blocks as such.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:55 +01:00
Mauro Carvalho Chehab
e944743003 docs: kvm: convert devices/s390_flic.txt to ReST
- Use standard markup for document title;
- Adjust indentation and add blank lines as needed;
- use the notes markup;
- mark code blocks as such.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:54 +01:00
Mauro Carvalho Chehab
05c47036c6 docs: kvm: convert devices/mpic.txt to ReST
This document is almost in ReST format. The only thing
needed is to mark a list as such and to add an extra
whitespace.

Yet, let's also use the standard document title markup,
as it makes easier if anyone wants later to add sessions
to it.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:54 +01:00
Mauro Carvalho Chehab
bf6154dba0 docs: kvm: convert devices/arm-vgit.txt to ReST
- Use title markups;
- change indent to match ReST syntax;
- use proper table markups;
- use literal block markups.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:53 +01:00
Mauro Carvalho Chehab
c0d1c8a0af docs: kvm: devices/arm-vgit-v3.txt to ReST
- Use title markups;
- change indent to match ReST syntax;
- use proper table markups;
- use literal block markups.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:52 +01:00
Mauro Carvalho Chehab
d371c011fc docs: kvm: devices/arm-vgic-its.txt to ReST format
- Fix document title to match ReST format
- Convert the table to be properly recognized
- use proper markups for literal blocks
- Some indentation fixes to match ReST

While here, add an index for kvm devices.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:51 +01:00
Mauro Carvalho Chehab
263a19ff21 docs: virt: Convert msr.txt to ReST format
- Use document title markup;
- Convert tables;
- Add blank lines and adjust indentation.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:51 +01:00
Mauro Carvalho Chehab
2756df60d0 docs: virt: convert halt-polling.txt to ReST format
- Fix document title to match ReST format
- Convert the table to be properly recognized
- Some indentation fixes to match ReST syntax.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:50 +01:00
Mauro Carvalho Chehab
c09708ccb4 docs: virt: user_mode_linux.rst: fix URL references
Several URLs are pointing to outdated places.

Update the references for the URLs whose contents still exists,
removing the others.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:49 +01:00
Mauro Carvalho Chehab
72f8a49dc8 docs: virt: user_mode_linux.rst: update compiling instructions
Instead of pointing for a pre-2.4 and a seaparate patch,
update it to match current upstream, as UML was merged
a long time ago.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:48 +01:00
Mauro Carvalho Chehab
7d94ab169b docs: virt: convert UML documentation to ReST
Despite being an old document, it contains lots of information
that could still be useful.

The document has a nice style with makes easy to convert to
ReST. So, let's convert it to ReST.

This patch does:

	- Use proper markups for titles;
	- Mark and proper indent literal blocks;
	- don't use an 'o' character for lists;
	- other minor changes required for the doc to be parsed.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:47 +01:00
Mauro Carvalho Chehab
7bd460fc1d docs: kvm: add arm/pvtime.rst to index.rst
Add this file to a new kvm/arm index.rst, in order for it to
be shown as part of the virt book.

Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:47 +01:00
Paolo Bonzini
9446e6fce0 KVM: x86: fix WARN_ON check of an unsigned less than zero
The check cpu->hv_clock.system_time < 0 is redundant since system_time
is a u64 and hence can never be less than zero.  But what was actually
meant is to check that the result is positive, since kernel_ns and
v->kvm->arch.kvmclock_offset are both s64.

Reported-by: Colin King <colin.king@canonical.com>
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Addresses-Coverity: ("Macro compares unsigned to 0")
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:46 +01:00
Eric Auger
ff47902534 selftests: KVM: Remove unused x86_register enum
x86_register enum is not used, let's remove it.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Suggested-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:45 +01:00
Sean Christopherson
f6ab0107a4 KVM: x86/mmu: Fix struct guest_walker arrays for 5-level paging
Define PT_MAX_FULL_LEVELS as PT64_ROOT_MAX_LEVEL, i.e. 5, to fix shadow
paging for 5-level guest page tables.  PT_MAX_FULL_LEVELS is used to
size the arrays that track guest pages table information, i.e. using a
"max levels" of 4 causes KVM to access garbage beyond the end of an
array when querying state for level 5 entries.  E.g. FNAME(gpte_changed)
will read garbage and most likely return %true for a level 5 entry,
soft-hanging the guest because FNAME(fetch) will restart the guest
instead of creating SPTEs because it thinks the guest PTE has changed.

Note, KVM doesn't yet support 5-level nested EPT, so PT_MAX_FULL_LEVELS
gets to stay "4" for the PTTYPE_EPT case.

Fixes: 855feb673640 ("KVM: MMU: Add 5 level EPT & Shadow page table support.")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:44 +01:00
Sean Christopherson
148d735eb5 KVM: nVMX: Use correct root level for nested EPT shadow page tables
Hardcode the EPT page-walk level for L2 to be 4 levels, as KVM's MMU
currently also hardcodes the page walk level for nested EPT to be 4
levels.  The L2 guest is all but guaranteed to soft hang on its first
instruction when L1 is using EPT, as KVM will construct 4-level page
tables and then tell hardware to use 5-level page tables.

Fixes: 855feb673640 ("KVM: MMU: Add 5 level EPT & Shadow page table support.")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:43 +01:00
Miaohe Lin
ffdbd50dca KVM: nVMX: Fix some comment typos and coding style
Fix some typos in the comments. Also fix coding style.
[Sean Christopherson rewrites the comment of write_fault_to_shadow_pgtable
field in struct kvm_vcpu_arch.]

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:43 +01:00
Sean Christopherson
7a02674d15 KVM: x86/mmu: Avoid retpoline on ->page_fault() with TDP
Wrap calls to ->page_fault() with a small shim to directly invoke the
TDP fault handler when the kernel is using retpolines and TDP is being
used.  Single out the TDP fault handler and annotate the TDP path as
likely to coerce the compiler into preferring it over the indirect
function call.

Rename tdp_page_fault() to kvm_tdp_page_fault(), as it's exposed outside
of mmu.c to allow inlining the shim.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:42 +01:00
Miaohe Lin
331ca0f89f KVM: apic: reuse smp_wmb() in kvm_make_request()
kvm_make_request() provides smp_wmb() so pending_events changes are
guaranteed to be visible.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:41 +01:00
Miaohe Lin
20796447a1 KVM: x86: remove duplicated KVM_REQ_EVENT request
The KVM_REQ_EVENT request is already made in kvm_set_rflags(). We should
not make it again.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:40 +01:00
Eric Auger
1ea2cc0cd7 selftests: KVM: SVM: Add vmcall test
L2 guest calls vmcall and L1 checks the exit status does
correspond.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Wei Huang <wei.huang2@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:39 +01:00
Eric Auger
20ba262f86 selftests: KVM: AMD Nested test infrastructure
Add the basic infrastructure needed to test AMD nested SVM.
This is largely copied from the KVM unit test infrastructure.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:39 +01:00
Eric Auger
1ecaabed4e selftests: KVM: Replace get_{gdt,idt}_base() by get_{gdt,idt}()
get_gdt_base() and get_idt_base() only return the base address
of the descriptor tables. Soon we will need to get the size as well.
Change the prototype of those functions so that they return
the whole desc_ptr struct instead of the address field.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: Wei Huang <wei.huang2@amd.com>
Reviewed-by: Krish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2020-02-12 20:09:38 +01:00
Trond Myklebust
efeda80da3 NFSv4: Fix revalidation of dentries with delegations
If a dentry was not initially looked up while we were holding a
delegation, then we do still need to revalidate that it still holds
the same name. If there are multiple hard links to the same file,
then all the hard links need validation.

Reported-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
[Anna: Put nfs_unset_verifier_delegated() under CONFIG_NFS_V4]
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-12 13:55:25 -05:00
Linus Torvalds
f2850dd5ee Kbuild fixes for v5.6
- fix memory corruption in scripts/kallsyms
 
  - fix the vmlinux link stage to correctly update compile.h
 -----BEGIN PGP SIGNATURE-----
 
 iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAl5EH2wVHG1hc2FoaXJv
 eUBrZXJuZWwub3JnAAoJED2LAQed4NsGX/UP/0j/U3FxvLba8Ke/hIObdfZQyuhp
 kxvhq4+Mgylx9JQ84nME5HlFpoW83qyvLQGcEnj5yErMiYfOX62h8TPVKjMg5tvu
 w5Wik777pc0bAbrwBCzmO19rDzWX7TH/Gg0HrmgsZFNynF/mlFXVYtCJ0cIgmzxH
 zI9W/UpZX9Mns7gzdn0kig50KoidcdgH3W78KApCBrMK75KC3E4qSWumm11ToCGe
 Yh4n/TsqWhsMdJpnZRYIZq144bOHGvTqj62fgR3UgOIp4fgYEQZ1Eh/phMFu3YXm
 63nr5tx7lsRIb1RrFg9rljktEUAB9I9Nia9+Dpp8lDGnifciZFnyZm6mee0c8RzE
 NMD3FMjp+jSE9LfahdkrunPepI1vlsS0XQbRlSOTu26lEZiYOHiYZomE2Z1Jt5bf
 ancQJUMoQ2rRWAN816r1p2+CiRn/kRI0vWtxJmQylHx5AIN9Gaw6bF1/b67C3jVn
 QNrS7i0nsVZ45tUuzVDxwUMtsqRzS892Me0WvEua2vd53623qiVQnLeyIACdg40X
 HzgF8Q3EChyjR5248x7C4q0Vr3srwUIgHcPHM3I64dmCXAVGOH1G62V7s1BqZ3w7
 74xKy8RsCmpmODOvMSUMajBqouXR9LtXe+vcoECMjksvRbv5DQYHjM52lmBHROYq
 Qg/98qM7Lhr9MH0E
 =4lc5
 -----END PGP SIGNATURE-----

Merge tag 'kbuild-fixes-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - fix memory corruption in scripts/kallsyms

 - fix the vmlinux link stage to correctly update compile.h

* tag 'kbuild-fixes-v5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kbuild: fix mismatch between .version and include/generated/compile.h
  scripts/kallsyms: fix memory corruption caused by write over-run
2020-02-12 10:28:42 -08:00
Kunihiko Hayashi
b9287f2ac3 net: ethernet: ave: Add capability of rgmii-id mode
This allows you to specify the type of rgmii-id that will enable phy
internal delay in ethernet phy-mode.

This adds all RGMII cases to all of get_pinmode() except LD11, because LD11
SoC doesn't support RGMII due to the constraint of the hardware. When RGMII
phy mode is specified in the devicetree for LD11, the driver will abort
with an error.

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-12 09:55:04 -08:00
Firo Yang
0f90522591 enic: prevent waking up stopped tx queues over watchdog reset
Recent months, our customer reported several kernel crashes all
preceding with following message:
NETDEV WATCHDOG: eth2 (enic): transmit queue 0 timed out
Error message of one of those crashes:
BUG: unable to handle kernel paging request at ffffffffa007e090

After analyzing severl vmcores, I found that most of crashes are
caused by memory corruption. And all the corrupted memory areas
are overwritten by data of network packets. Moreover, I also found
that the tx queues were enabled over watchdog reset.

After going through the source code, I found that in enic_stop(),
the tx queues stopped by netif_tx_disable() could be woken up over
a small time window between netif_tx_disable() and the
napi_disable() by the following code path:
napi_poll->
  enic_poll_msix_wq->
     vnic_cq_service->
        enic_wq_service->
           netif_wake_subqueue(enic->netdev, q_number)->
              test_and_clear_bit(__QUEUE_STATE_DRV_XOFF, &txq->state)
In turn, upper netowrk stack could queue skb to ENIC NIC though
enic_hard_start_xmit(). And this might introduce some race condition.

Our customer comfirmed that this kind of kernel crash doesn't occur over
90 days since they applied this patch.

Signed-off-by: Firo Yang <firo.yang@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-12 09:43:26 -08:00
Anand Jain
1b9867eb61 btrfs: sysfs, move device id directories to UUID/devinfo
Originally it was planned to create device id directories under
UUID/devinfo, but it got under UUID/devices by mistake. We really want
it under definfo so the bare device node names are not mixed with device
ids and are easy to enumerate.

Fixes: 668e48af7a94 ("btrfs: sysfs, add devid/dev_state kobject and device attributes")
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-12 18:28:22 +01:00
Anand Jain
a013d141ec btrfs: sysfs, add UUID/devinfo kobject
Create directory /sys/fs/btrfs/UUID/devinfo to hold devices directories
by the id (unlike /devices).

Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-12 18:28:18 +01:00
Geert Uytterhoeven
d91771848f arm64: time: Replace <linux/clk-provider.h> by <linux/of_clk.h>
The arm64 time code is not a clock provider, and just needs to call
of_clk_init().

Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>.

Reviewed-by: Stephen Boyd <sboyd@kernel.org>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Will Deacon <will@kernel.org>
2020-02-12 17:26:38 +00:00
Filipe Manana
28553fa992 Btrfs: fix race between shrinking truncate and fiemap
When there is a fiemap executing in parallel with a shrinking truncate
we can end up in a situation where we have extent maps for which we no
longer have corresponding file extent items. This is generally harmless
and at the moment the only consequences are missing file extent items
representing holes after we expand the file size again after the
truncate operation removed the prealloc extent items, and stale
information for future fiemap calls (reporting extents that no longer
exist or may have been reallocated to other files for example).

Consider the following example:

1) Our inode has a size of 128KiB, one 128KiB extent at file offset 0
   and a 1MiB prealloc extent at file offset 128KiB;

2) Task A starts doing a shrinking truncate of our inode to reduce it to
   a size of 64KiB. Before it searches the subvolume tree for file
   extent items to delete, it drops all the extent maps in the range
   from 64KiB to (u64)-1 by calling btrfs_drop_extent_cache();

3) Task B starts doing a fiemap against our inode. When looking up for
   the inode's extent maps in the range from 128KiB to (u64)-1, it
   doesn't find any in the inode's extent map tree, since they were
   removed by task A.  Because it didn't find any in the extent map
   tree, it scans the inode's subvolume tree for file extent items, and
   it finds the 1MiB prealloc extent at file offset 128KiB, then it
   creates an extent map based on that file extent item and adds it to
   inode's extent map tree (this ends up being done by
   btrfs_get_extent() <- btrfs_get_extent_fiemap() <-
   get_extent_skip_holes());

4) Task A then drops the prealloc extent at file offset 128KiB and
   shrinks the 128KiB extent file offset 0 to a length of 64KiB. The
   truncation operation finishes and we end up with an extent map
   representing a 1MiB prealloc extent at file offset 128KiB, despite we
   don't have any more that extent;

After this the two types of problems we have are:

1) Future calls to fiemap always report that a 1MiB prealloc extent
   exists at file offset 128KiB. This is stale information, no longer
   correct;

2) If the size of the file is increased, by a truncate operation that
   increases the file size or by a write into a file offset > 64KiB for
   example, we end up not inserting file extent items to represent holes
   for any range between 128KiB and 128KiB + 1MiB, since the hole
   expansion function, btrfs_cont_expand() will skip hole insertion for
   any range for which an extent map exists that represents a prealloc
   extent. This causes fsck to complain about missing file extent items
   when not using the NO_HOLES feature.

The second issue could be often triggered by test case generic/561 from
fstests, which runs fsstress and duperemove in parallel, and duperemove
does frequent fiemap calls.

Essentially the problems happens because fiemap does not acquire the
inode's lock while truncate does, and fiemap locks the file range in the
inode's iotree while truncate does not. So fix the issue by making
btrfs_truncate_inode_items() lock the file range from the new file size
to (u64)-1, so that it serializes with fiemap.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-12 17:17:10 +01:00
David Sterba
10a3a3edc5 btrfs: log message when rw remount is attempted with unclean tree-log
A remount to a read-write filesystem is not safe when there's tree-log
to be replayed. Files that could be opened until now might be affected
by the changes in the tree-log.

A regular mount is needed to replay the log so the filesystem presents
the consistent view with the pending changes included.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-12 17:17:00 +01:00
David Sterba
e8294f2f6a btrfs: print message when tree-log replay starts
There's no logged information about tree-log replay although this is
something that points to previous unclean unmount. Other filesystems
report that as well.

Suggested-by: Chris Murphy <lists@colorremedies.com>
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-12 17:16:57 +01:00
Filipe Manana
ac05ca913e Btrfs: fix race between using extent maps and merging them
We have a few cases where we allow an extent map that is in an extent map
tree to be merged with other extents in the tree. Such cases include the
unpinning of an extent after the respective ordered extent completed or
after logging an extent during a fast fsync. This can lead to subtle and
dangerous problems because when doing the merge some other task might be
using the same extent map and as consequence see an inconsistent state of
the extent map - for example sees the new length but has seen the old start
offset.

With luck this triggers a BUG_ON(), and not some silent bug, such as the
following one in __do_readpage():

  $ cat -n fs/btrfs/extent_io.c
  3061  static int __do_readpage(struct extent_io_tree *tree,
  3062                           struct page *page,
  (...)
  3127                  em = __get_extent_map(inode, page, pg_offset, cur,
  3128                                        end - cur + 1, get_extent, em_cached);
  3129                  if (IS_ERR_OR_NULL(em)) {
  3130                          SetPageError(page);
  3131                          unlock_extent(tree, cur, end);
  3132                          break;
  3133                  }
  3134                  extent_offset = cur - em->start;
  3135                  BUG_ON(extent_map_end(em) <= cur);
  (...)

Consider the following example scenario, where we end up hitting the
BUG_ON() in __do_readpage().

We have an inode with a size of 8KiB and 2 extent maps:

  extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by
            a previous transaction

  extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet
            persisted but writeback started for it already. The extent map
	    is pinned since there's writeback and an ordered extent in
	    progress, so it can not be merged with extent map A yet

The following sequence of steps leads to the BUG_ON():

1) The ordered extent for extent B completes, the respective page gets its
   writeback bit cleared and the extent map is unpinned, at that point it
   is not yet merged with extent map A because it's in the list of modified
   extents;

2) Due to memory pressure, or some other reason, the MM subsystem releases
   the page corresponding to extent B - btrfs_releasepage() is called and
   returns 1, meaning the page can be released as it's not dirty, not under
   writeback anymore and the extent range is not locked in the inode's
   iotree. However the extent map is not released, either because we are
   not in a context that allows memory allocations to block or because the
   inode's size is smaller than 16MiB - in this case our inode has a size
   of 8KiB;

3) Task B needs to read extent B and ends up __do_readpage() through the
   btrfs_readpage() callback. At __do_readpage() it gets a reference to
   extent map B;

4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B
   while holding the write lock on the inode's extent map tree - this
   results in try_merge_map() being called and since it's possible to merge
   extent map B with extent map A now (the extent map B was removed from
   the list of modified extents), the merging begins - it sets extent map
   B's start offset to 0 (was 4KiB), but before it increments the map's
   length to 8KiB (4kb + 4KiB), task A is at:

   BUG_ON(extent_map_end(em) <= cur);

   The call to extent_map_end() sees the extent map has a start of 0
   and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so
   the BUG_ON() is triggered.

So it's dangerous to modify an extent map that is in the tree, because some
other task might have got a reference to it before and still using it, and
needs to see a consistent map while using it. Generally this is very rare
since most paths that lookup and use extent maps also have the file range
locked in the inode's iotree. The fsync path is pretty much the only
exception where we don't do it to avoid serialization with concurrent
reads.

Fix this by not allowing an extent map do be merged if if it's being used
by tasks other then the one attempting to merge the extent map (when the
reference count of the extent map is greater than 2).

Reported-by: ryusuke1925 <st13s20@gm.ibaraki-ct.ac.jp>
Reported-by: Koki Mitani <koki.mitani.xg@hco.ntt.co.jp>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211
CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-12 17:16:46 +01:00
Wenwen Wang
f311ade3a7 btrfs: ref-verify: fix memory leaks
In btrfs_ref_tree_mod(), 'ref' and 'ra' are allocated through kzalloc() and
kmalloc(), respectively. In the following code, if an error occurs, the
execution will be redirected to 'out' or 'out_unlock' and the function will
be exited. However, on some of the paths, 'ref' and 'ra' are not
deallocated, leading to memory leaks. For example, if 'action' is
BTRFS_ADD_DELAYED_EXTENT, add_block_entry() will be invoked. If the return
value indicates an error, the execution will be redirected to 'out'. But,
'ref' is not deallocated on this path, causing a memory leak.

To fix the above issues, deallocate both 'ref' and 'ra' before exiting from
the function when an error is encountered.

CC: stable@vger.kernel.org # 4.15+
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2020-02-12 17:16:31 +01:00
Arnaldo Carvalho de Melo
2a8d017d46 tools headers kvm: Sync linux/kvm.h with the kernel sources
To pick up the changes from:

  7de3f1423ff9 ("KVM: s390: Add new reset vcpu API")

So far we're ignoring those arch specific ioctls, we need to revisit
this at some time to have arch specific tables, etc:

  $ grep S390 tools/perf/trace/beauty/kvm_ioctl.sh
  	egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \
  $

This addresses these tools/perf build warnings:

  Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h'
  diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Janosch Frank <frankja@linux.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-12 12:45:24 -03:00
Arnaldo Carvalho de Melo
391df72fbd tools headers kvm: Sync kvm headers with the kernel sources
To pick up the changes from:

  290a6bb06de9 ("arm64: KVM: Add UAPI notes for swapped registers")

No tools changes are caused by this.

This addresses these tools/perf build warnings:

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Jones <drjones@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-12 12:41:20 -03:00
Arnaldo Carvalho de Melo
71dd652897 tools arch x86: Sync asm/cpufeatures.h with the kernel sources
To pick up the changes from:

  85c17291e2eb ("x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured")
  f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB")

These don't cause any changes in tooling, just silences this perf build
warning:

  Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
  diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-02-12 12:33:34 -03:00