IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Modify the writepage handler to find and convert pending delalloc
extents to real allocations. Furthermore, when we're doing non-cow
writes to a part of a file that already has a CoW reservation (the
cowextsz hint that we set up in a subsequent patch facilitates this),
promote the write to copy-on-write so that the entire extent can get
written out as a single extent on disk, thereby reducing post-CoW
fragmentation.
Christoph moved the CoW support code in _map_blocks to a separate helper
function, refactored other functions, and reduced the number of CoW fork
lookups, so I merged those changes here to reduce churn.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Modify xfs_bmap_add_extent_delay_real() so that we can convert delayed
allocation extents in the CoW fork to real allocations, and wire this
up all the way back to xfs_iomap_write_allocate(). In a subsequent
patch, we'll modify the writepage handler to call this.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Wire up iomap_begin to detect shared extents and create delayed allocation
extents in the CoW fork:
1) Check if we already have an extent in the COW fork for the area.
If so nothing to do, we can move along.
2) Look up block number for the current extent, and if there is none
it's not shared move along.
3) Unshare the current extent as far as we are going to write into it.
For this we avoid an additional COW fork lookup and use the
information we set aside in step 1) above.
4) Goto 1) unless we've covered the whole range.
Last but not least, this updates the xfs_reflink_reserve_cow_range calling
convention to pass a byte offset and length, as that is what both callers
expect anyway. This patch has been refactored considerably as part of the
iomap transition.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Allow the creation of delayed allocation extents in the CoW fork. In
a subsequent patch we'll wire up iomap_begin to actually do this via
reflink helper functions.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Introduce a new in-core fork for storing copy-on-write delalloc
reservations and allocated extents that are in the process of being
written out.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Only non-rt files can be reflinked, so check that when we load an
inode. Also, don't leak the attr fork if there's a failure.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Report the reflink feature in the XFS geometry so that xfs_info and
friends know the filesystem has this feature.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Define all the tracepoints we need to inspect the runtime operation
of reflink/dedupe/copy-on-write.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Return the range of file blocks that bunmapi didn't free. This hint
is used by CoW and reflink to figure out what part of an extent
actually got freed so that it can set up the appropriate atomic
remapping of just the freed range.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Pull security subsystem updates from James Morris:
SELinux/LSM:
- overlayfs support, necessary for container filesystems
LSM:
- finally remove the kernel_module_from_file hook
Smack:
- treat signal delivery as an 'append' operation
TPM:
- lots of bugfixes & updates
Audit:
- new audit data type: LSM_AUDIT_DATA_FILE
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (47 commits)
Revert "tpm/tpm_crb: implement tpm crb idle state"
Revert "tmp/tpm_crb: fix Intel PTT hw bug during idle state"
Revert "tpm/tpm_crb: open code the crb_init into acpi_add"
Revert "tmp/tpm_crb: implement runtime pm for tpm_crb"
lsm,audit,selinux: Introduce a new audit data type LSM_AUDIT_DATA_FILE
tmp/tpm_crb: implement runtime pm for tpm_crb
tpm/tpm_crb: open code the crb_init into acpi_add
tmp/tpm_crb: fix Intel PTT hw bug during idle state
tpm/tpm_crb: implement tpm crb idle state
tpm: add check for minimum buffer size in tpm_transmit()
tpm: constify TPM 1.x header structures
tpm/tpm_crb: fix the over 80 characters checkpatch warring
tpm/tpm_crb: drop useless cpu_to_le32 when writing to registers
tpm/tpm_crb: cache cmd_size register value.
tmp/tpm_crb: drop include to platform_device
tpm/tpm_tis: remove unused itpm variable
tpm_crb: fix incorrect values of cmdReady and goIdle bits
tpm_crb: refine the naming of constants
tpm_crb: remove wmb()'s
tpm_crb: fix crb_req_canceled behavior
...
1. Fabian Frederick submitted a nice cleanup that uses the BIT macro
rather than bit shifting.
2. Andreas Gruenbacher contributed a patch that fixes a long-standing
annoyance whereby GFS2 warned about dirty pages.
3. Andreas also fixed a problem with the recent extended attribute
readahead feature.
4. Chao Yu contributed a patch that checks the return code from function
register_shrinker and reacts accordingly. Previously, it was not checked.
5. Andreas Gruenbacher also fixed a problem whereby incore file timestamps
were forgotten if the file was invalidated. This merely moves the
assignment inside the inode glock where it belongs.
6. He also fixed a problem where incore timestamps were not initialized.
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJX8oqYAAoJENeLYdPf93o7m5YIAIvBQ4WAmMmNuLT0AkvXIKXW
ZHXtV5oizSOl+qOrb5x3ANbnZWZ5NnWRP6E0frDf3Y5wk6U4qWAqU0V8BTbdr2E+
IryOLQ+62CAa4UbHqgQRFCpwkPxEaCsOde7eQh/ppTyBKjP0da7tUvSfPcLrWU+9
qhYiqAv5qVk38JjFiwhw4zER+dOCPDIg1xkkMPG6fspjM8/CkXR9p4lh73qNJT/j
NDzyjHSBYK32lkcb5xagjpLjmN/fIm6gXvdk65bD1euqxfUeuSCg6AF8QWkEXkcB
pbqQVIOWrZixS9HMTqT7w8nNstsBKSrEwQhulZWBZygRAzJJAWu6IaHQ9gZkUsE=
=1Fjo
-----END PGP SIGNATURE-----
Merge tag 'gfs2-4.8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Bob Peterson:
"We've only got six GFS2 patches for this merge window. In patch
order:
- Fabian Frederick submitted a nice cleanup that uses the BIT macro
rather than bit shifting.
- Andreas Gruenbacher contributed a patch that fixes a long-standing
annoyance whereby GFS2 warned about dirty pages.
- Andreas also fixed a problem with the recent extended attribute
readahead feature.
- Chao Yu contributed a patch that checks the return code from
function register_shrinker and reacts accordingly. Previously, it
was not checked.
- Andreas Gruenbacher also fixed a problem whereby incore file
timestamps were forgotten if the file was invalidated. This merely
moves the assignment inside the inode glock where it belongs.
- Andreas also fixed a problem where incore timestamps were not
initialized"
* tag 'gfs2-4.8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: Initialize atime of I_NEW inodes
gfs2: Update file times after grabbing glock
gfs2: fix to detect failure of register_shrinker
gfs2: Fix extended attribute readahead optimization
gfs2: Remove dirty buffer warning from gfs2_releasepage
GFS2: use BIT() macro
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJX8EMFAAoJEAAOaEEZVoIV138QALm9BtIpuLeg3m2L7DffC6tk
uRhu0a+sZhES8n1YF8/Z40KqlGvZ8qlbRv08vYQ1xNGYQ/RMBEdVZUXuOvN1NDSt
CgU3JSEtBo1Qg8eNkAUwvzfyLsfTazLYf6rus2v2wwrH/1pF8yeU2OZUhv4FhKd2
EoIczZ5NsWabJLktb4drckD+Xng9WHLKyB5bE7VKXR38cK7HWbuY30wg03JyX/em
rkfw00rcRhh5JWqyL2NOO7INJSNXyJKBVZ/xeIQYnhj4ZA7aTFN+LgQebPqpfyzw
g5jVet1ygaI+/8lp3IpB8rrkpmVSbtqLgmbPOvnDltiZOQbBlGOsw84TX/Dxp9VH
7q04zCmcDWGD1ZMnQmXDPJxQZ8+pYdutfSNait0Q7lYSySqO0+1nSLpMQ2yIrebS
hSREgj/MyOWewn5todNCh102IpSPUvo0J9mcDijlUBFWmPrK30QDGWrG20Qzb6ON
olYRxztSX7cs0rNIOSjeRNCiy6E5Eoz8zm22JuDgKd2TGzES0ZoPea++1iqsTKbM
KZrjGw5oQPkRbOePxoIk8ZP1iGbZyXQgMsPVHe+cuKBhiPqujgRNex4bwGQzKBT0
O9o1YORl/wN2H04+K+HfsdAIh0cWeSZDiU7F9vPP5RmjVqzMwDc5YbP+KZFF3Nod
Yu292qD+EcZL25PDt/Da
=MUd+
-----END PGP SIGNATURE-----
Merge tag 'locks-v4.9-1' of git://git.samba.org/jlayton/linux
Pull file locking updates from Jeff Layton:
"Only a single patch from Nikolay this cycle, with a small change to
better handle /proc/locks in a containerized host"
* tag 'locks-v4.9-1' of git://git.samba.org/jlayton/linux:
locks: Filter /proc/locks output on proc pid ns
The caller of rpc_run_task also gets a reference that must be put.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
boot_time is represented as a struct timespec.
struct timespec and CURRENT_TIME are not y2038 safe.
Overall, the plan is to use timespec64 and ktime_t for
all internal kernel representation of timestamps.
CURRENT_TIME will also be removed.
boot_time is used to construct the nfs client boot verifier.
Use ktime_t to represent boot_time and ktime_get_real() for
the boot_time value.
Following Trond's request https://lkml.org/lkml/2016/6/9/22 ,
use ktime_t instead of converting to struct timespec64.
Use higher and lower 32 bit parts of ktime_t for the boot
verifier.
Use the lower 32 bit part of ktime_t for the authsys_parms
stamp field.
Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Log recovery will iget an inode to replay BUI items and iput the inode
when it's done. Unfortunately, if the inode was unlinked, the iput
will see that i_nlink == 0 and decide to truncate & free the inode,
which prevents us from replaying subsequent BUIs. We can't skip the
BUIs because we have to replay all the redo items to ensure that
atomic operations complete.
Since unlinked inode recovery will reap the inode anyway, we can
safely introduce a new inode flag to indicate that an inode is in this
'unlinked recovery' state and should not be auto-reaped in the
drop_inode path.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Implement deferred versions of the inode block map/unmap functions.
These will be used in subsequent patches to make reflink operations
atomic.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Pass BMAPI_ flags from bunmapi into bmap_del_extent and extend
BMAPI_REMAP (which means "don't touch the allocator or the quota
accounting") to apply to bunmapi as well. This will be used to
implement the unmap operation, which will be used by swapext.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Teach the bmap routine to know how to map a range of file blocks to a
specific range of physical blocks, instead of simply allocating fresh
blocks. This enables reflink to map a file to blocks that are already
in use.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Provide a mechanism for higher levels to create BUI/BUD items, submit
them to the log, and a stub function to deal with recovered BUI items.
These parts will be connected to the rmapbt in a later patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create bmbt update intent/done log items to record redo information in
the log. Because we roll transactions multiple times for reflink
operations, we also have to track the status of the metadata updates
that will be recorded in the post-roll transactions in case we crash
before committing the final transaction. This mechanism enables log
recovery to finish what was already started.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Here is the big TTY and Serial patch set for 4.9-rc1.
It also includes some drivers/dma/ changes, as those were needed by some
serial drivers, and they were all acked by the DMA maintainer. Also in
here is the long-suffering ACPI SPCR patchset, which was passed around
from maintainer to maintainer like a hot-potato. Seems I was the
sucker^Wlucky one. All of those patches have been acked by the various
subsystem maintainers as well.
All of this has been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iFYEABECABYFAlfyNjEPHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspwIcAn2uN
qCD8xQJ0Cs61hD1nUzhNygG8AJ94I4zz/fPGpyh/CtJfLQwtUdLhNA==
=Rken
-----END PGP SIGNATURE-----
Merge tag 'tty-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty and serial updates from Greg KH:
"Here is the big tty and serial patch set for 4.9-rc1.
It also includes some drivers/dma/ changes, as those were needed by
some serial drivers, and they were all acked by the DMA maintainer.
Also in here is the long-suffering ACPI SPCR patchset, which was
passed around from maintainer to maintainer like a hot-potato. Seems I
was the sucker^Wlucky one. All of those patches have been acked by the
various subsystem maintainers as well.
All of this has been in linux-next with no reported issues"
* tag 'tty-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (111 commits)
Revert "serial: pl011: add console matching function"
MAINTAINERS: update entry for atmel_serial driver
serial: pl011: add console matching function
ARM64: ACPI: enable ACPI_SPCR_TABLE
ACPI: parse SPCR and enable matching console
of/serial: move earlycon early_param handling to serial
Revert "drivers/tty: Explicitly pass current to show_stack"
tty: amba-pl011: Don't complain on -EPROBE_DEFER when no irq
nios2: dts: 10m50: Add tx-threshold parameter
serial: 8250: Set Altera 16550 TX FIFO Threshold
serial: 8250: of: Load TX FIFO Threshold from DT
Documentation: dt: serial: Add TX FIFO threshold parameter
drivers/tty: Explicitly pass current to show_stack
serial: imx: Fix DCD reading
serial: stm32: mark symbols static where possible
serial: xuartps: Add some register initialisation to cdns_early_console_setup()
serial: xuartps: Removed unwanted checks while reading the error conditions
serial: xuartps: Rewrite the interrupt handling logic
serial: stm32: use mapbase instead of membase for DMA
tty/serial: atmel: fix fractional baud rate computation
...
Here are the "big" driver core patches for 4.9-rc1. Also in here are a
number of debugfs fixes that cropped up due to the changes that happened
in 4.8 for that filesystem. Overall, nothing major, just a few fixes
and cleanups.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iFYEABECABYFAlfyNw4PHGdyZWdAa3JvYWguY29tAAoJEDFH1A3bLfspLVYAoNXr
FXBHGb2tNT/1PLfvUCwd5PqWAJ9Khb5WAHtvjTmEN1zabz45aSbcrA==
=Uz6V
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here are the "big" driver core patches for 4.9-rc1. Also in here are a
number of debugfs fixes that cropped up due to the changes that
happened in 4.8 for that filesystem. Overall, nothing major, just a
few fixes and cleanups.
All of these have been in linux-next with no reported issues"
* tag 'driver-core-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (23 commits)
drivers: dma-coherent: Move spinlock in dma_alloc_from_coherent()
drivers: dma-coherent: Fix DMA coherent size for less than page
MAINTAINERS: extend firmware_class maintainer list
debugfs: propagate release() call result
driver-core: platform: Catch errors from calls to irq_get_irq_data
sysfs print name of undiscoverable attribute group
carl9170: fix debugfs crashes
b43legacy: fix debugfs crash
b43: fix debugfs crash
debugfs: introduce a public file_operations accessor
device core: Remove deprecated create_singlethread_workqueue
drivers/base dmam_declare_coherent_memory leaks
platform: don't return 0 from platform_get_irq[_byname]() on error
cpu: clean up register_cpu func
dma-mapping: use vma_pages().
drivers: dma-coherent: use vma_pages().
attribute_container: Fix typo
base: soc: make it explicitly non-modular
drivers: base: dma-mapping: page align the size when unmap_kernel_range
platform driver: fix use-after-free in platform_device_del()
...
single-buffer analogue of splice_to_pipe(); vmsplice_to_pipe() switched
to that, leaving splice_to_pipe() only for ->splice_read() instances
(and that only until they are converted as well).
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* splice_to_pipe() stops at pipe overflow and does *not* take pipe_lock
* ->splice_read() instances do the same
* vmsplice_to_pipe() and do_splice() (ultimate callers of splice_to_pipe())
arrange for waiting, looping, etc. themselves.
That should make pipe_lock the outermost one.
Unfortunately, existing rules for the amount passed by vmsplice_to_pipe()
and do_splice() are quite ugly _and_ userland code can be easily broken
by changing those. It's not even "no more than the maximal capacity of
this pipe" - it's "once we'd fed pipe->nr_buffers pages into the pipe,
leave instead of waiting".
Considering how poorly these rules are documented, let's try "wait for some
space to appear, unless given SPLICE_F_NONBLOCK, then push into pipe
and if we run into overflow, we are done".
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Make local filesystems treat a fault as shortened IO,
returning -EFAULT only if nothing had been transferred.
That's how everything else (NFS, FUSE, ceph, Lustre)
behaves.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull x86 vdso updates from Ingo Molnar:
"The main changes in this cycle centered around adding support for
32-bit compatible C/R of the vDSO on 64-bit kernels, by Dmitry
Safonov"
* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Use CONFIG_X86_X32_ABI to enable vdso prctl
x86/vdso: Only define map_vdso_randomized() if CONFIG_X86_64
x86/vdso: Only define prctl_map_vdso() if CONFIG_CHECKPOINT_RESTORE
x86/signal: Add SA_{X32,IA32}_ABI sa_flags
x86/ptrace: Down with test_thread_flag(TIF_IA32)
x86/coredump: Use pr_reg size, rather that TIF_IA32 flag
x86/arch_prctl/vdso: Add ARCH_MAP_VDSO_*
x86/vdso: Replace calculate_addr in map_vdso() with addr
x86/vdso: Unmap vdso blob on vvar mapping failure
Pull low-level x86 updates from Ingo Molnar:
"In this cycle this topic tree has become one of those 'super topics'
that accumulated a lot of changes:
- Add CONFIG_VMAP_STACK=y support to the core kernel and enable it on
x86 - preceded by an array of changes. v4.8 saw preparatory changes
in this area already - this is the rest of the work. Includes the
thread stack caching performance optimization. (Andy Lutomirski)
- switch_to() cleanups and all around enhancements. (Brian Gerst)
- A large number of dumpstack infrastructure enhancements and an
unwinder abstraction. The secret long term plan is safe(r) live
patching plus maybe another attempt at debuginfo based unwinding -
but all these current bits are standalone enhancements in a frame
pointer based debug environment as well. (Josh Poimboeuf)
- More __ro_after_init and const annotations. (Kees Cook)
- Enable KASLR for the vmemmap memory region. (Thomas Garnier)"
[ The virtually mapped stack changes are pretty fundamental, and not
x86-specific per se, even if they are only used on x86 right now. ]
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits)
x86/asm: Get rid of __read_cr4_safe()
thread_info: Use unsigned long for flags
x86/alternatives: Add stack frame dependency to alternative_call_2()
x86/dumpstack: Fix show_stack() task pointer regression
x86/dumpstack: Remove dump_trace() and related callbacks
x86/dumpstack: Convert show_trace_log_lvl() to use the new unwinder
oprofile/x86: Convert x86_backtrace() to use the new unwinder
x86/stacktrace: Convert save_stack_trace_*() to use the new unwinder
perf/x86: Convert perf_callchain_kernel() to use the new unwinder
x86/unwind: Add new unwind interface and implementations
x86/dumpstack: Remove NULL task pointer convention
fork: Optimize task creation by caching two thread stacks per CPU if CONFIG_VMAP_STACK=y
sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK
lib/syscall: Pin the task stack in collect_syscall()
x86/process: Pin the target stack in get_wchan()
x86/dumpstack: Pin the target stack when dumping it
kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function
sched/core: Add try_get_task_stack() and put_task_stack()
x86/entry/64: Fix a minor comment rebase error
iommu/amd: Don't put completion-wait semaphore on stack
...
Pull locking updates from Ingo Molnar:
"The main changes in this cycle were:
- rwsem micro-optimizations (Davidlohr Bueso)
- Improve the implementation and optimize the performance of
percpu-rwsems. (Peter Zijlstra.)
- Convert all lglock users to better facilities such as percpu-rwsems
or percpu-spinlocks and remove lglocks. (Peter Zijlstra)
- Remove the ticket (spin)lock implementation. (Peter Zijlstra)
- Korean translation of memory-barriers.txt and related fixes to the
English document. (SeongJae Park)
- misc fixes and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/cmpxchg, locking/atomics: Remove superfluous definitions
x86, locking/spinlocks: Remove ticket (spin)lock implementation
locking/lglock: Remove lglock implementation
stop_machine: Remove stop_cpus_lock and lg_double_lock/unlock()
fs/locks: Use percpu_down_read_preempt_disable()
locking/percpu-rwsem: Add down_read_preempt_disable()
fs/locks: Replace lg_local with a per-cpu spinlock
fs/locks: Replace lg_global with a percpu-rwsem
locking/percpu-rwsem: Add DEFINE_STATIC_PERCPU_RWSEMand percpu_rwsem_assert_held()
locking/pv-qspinlock: Use cmpxchg_release() in __pv_queued_spin_unlock()
locking/rwsem, x86: Drop a bogus cc clobber
futex: Add some more function commentry
locking/hung_task: Show all locks
locking/rwsem: Scan the wait_list for readers only once
locking/rwsem: Remove a few useless comments
locking/rwsem: Return void in __rwsem_mark_wake()
locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()
locking/Documentation: Add Korean translation
locking/Documentation: Fix a typo of example result
locking/Documentation: Fix wrong section reference
...
Pull EFI updates from Ingo Molnar:
"Main changes in this cycle were:
- Refactor the EFI memory map code into architecture neutral files
and allow drivers to permanently reserve EFI boot services regions
on x86, as well as ARM/arm64. (Matt Fleming)
- Add ARM support for the EFI ESRT driver. (Ard Biesheuvel)
- Make the EFI runtime services and efivar API interruptible by
swapping spinlocks for semaphores. (Sylvain Chouleur)
- Provide the EFI identity mapping for kexec which allows kexec to
work on SGI/UV platforms with requiring the "noefi" kernel command
line parameter. (Alex Thorlton)
- Add debugfs node to dump EFI page tables on arm64. (Ard Biesheuvel)
- Merge the EFI test driver being carried out of tree until now in
the FWTS project. (Ivan Hu)
- Expand the list of flags for classifying EFI regions as "RAM" on
arm64 so we align with the UEFI spec. (Ard Biesheuvel)
- Optimise out the EFI mixed mode if it's unsupported (CONFIG_X86_32)
or disabled (CONFIG_EFI_MIXED=n) and switch the early EFI boot
services function table for direct calls, alleviating us from
having to maintain the custom function table. (Lukas Wunner)
- Miscellaneous cleanups and fixes"
* 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
x86/efi: Round EFI memmap reservations to EFI_PAGE_SIZE
x86/efi: Allow invocation of arbitrary boot services
x86/efi: Optimize away setup_gop32/64 if unused
x86/efi: Use kmalloc_array() in efi_call_phys_prolog()
efi/arm64: Treat regions with WT/WC set but WB cleared as memory
efi: Add efi_test driver for exporting UEFI runtime service interfaces
x86/efi: Defer efi_esrt_init until after memblock_x86_fill
efi/arm64: Add debugfs node to dump UEFI runtime page tables
x86/efi: Remove unused find_bits() function
fs/efivarfs: Fix double kfree() in error path
x86/efi: Map in physical addresses in efi_map_region_fixed
lib/ucs2_string: Speed up ucs2_utf8size()
firmware-gsmi: Delete an unnecessary check before the function call "dma_pool_destroy"
x86/efi: Initialize status to ensure garbage is not returned on small size
efi: Replace runtime services spinlock with semaphore
efi: Don't use spinlocks for efi vars
efi: Use a file local lock for efivars
efi/arm*: esrt: Add missing call to efi_esrt_init()
efi/esrt: Use memremap not ioremap to access ESRT table in memory
x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data
...
The free space tree format conversion functions were broken on
big-endian systems, but the sanity tests didn't catch it because all of
the operations were aligned to multiple words. This was meant to catch
any bugs in the extent buffer code's handling of high memory, but it
ended up hiding the endianness bug. Expand the tests to do both
sector-aligned and page-aligned operations.
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
The in-memory bitmap code manipulates words and is therefore sensitive
to endianness, while the extent buffer bitmap code addresses bytes and
is byte-order agnostic. Because the byte addressing of the extent buffer
bitmaps is equivalent to a little-endian in-memory bitmap, the extent
buffer bitmap tests fail on big-endian systems.
34b3e6c92af1 ("Btrfs: self-tests: Fix extent buffer bitmap test fail on
BE system") worked around another endianness bug in the tests but missed
this one because ed9e4afdb055 ("Btrfs: self-tests: Execute page
straddling test only when nodesize < PAGE_SIZE") disables this part of
the test on ppc64. That change lost the original meaning of the test,
however. We really want to test that an equivalent series of operations
using the in-memory bitmap API and the extent buffer bitmap API produces
equivalent results.
To fix this, don't use memcmp_extent_buffer() or write_extent_buffer();
do everything bit-by-bit.
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Anatoly Pugachev <matorola@gmail.com>
Tested-by: Feifei Xu <xufeifei@linux.vnet.ibm.com>
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
There are two separate issues that can lead to corrupted free space
trees.
1. The free space tree bitmaps had an endianness issue on big-endian
systems which is fixed by an earlier patch in this series.
2. btrfs-progs before v4.7.3 modified filesystems without updating the
free space tree.
To catch both of these issues at once, we need to force the free space
tree to be rebuilt. To do so, add a FREE_SPACE_TREE_VALID compat_ro bit.
If the bit isn't set, we know that it was either produced by a broken
big-endian kernel or may have been corrupted by btrfs-progs.
This also provides us with a way to add rudimentary read-write support
for the free space tree to btrfs-progs: it can just clear this bit and
have the kernel rebuild the free space tree.
Cc: stable@vger.kernel.org # 4.5+
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
We moved the code for creating the free space tree the first time that
it's enabled, but didn't move the clearing code along with it. This
breaks my (undocumented) intention that `mount -o
clear_cache,space_cache=v2` would clear the free space tree and then
recreate it.
Fixes: 511711af91f2 ("btrfs: don't run delayed references while we are creating the free space tree")
Cc: stable@vger.kernel.org # 4.5+
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
In convert_free_space_to_{bitmaps,extents}(), we buffer the free space
bitmaps in memory and copy them directly to/from the extent buffers with
{read,write}_extent_buffer(). The extent buffer bitmap helpers use byte
granularity, which is equivalent to a little-endian bitmap. This means
that on big-endian systems, the in-memory bitmaps will be written to
disk byte-swapped. To fix this, use byte-granularity for the bitmaps in
memory.
Fixes: a5ed91828518 ("Btrfs: implement the free space B-tree")
Cc: stable@vger.kernel.org # 4.5+
Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
These functions will be used by the other reflink functions to find
the maximum length of a range of shared blocks.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.coM>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reduce the max AG usable space size so that we always have space for
the refcount btree root.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Identify refcountbt blocks in the log correctly so that we can
validate them during log recovery.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When we're unmapping blocks from a reflinked file, decrease the
refcount of the affected blocks and free the extents that are no
longer in use.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Plumb in the upper level interface to schedule and finish deferred
refcount operations via the deferred ops mechanism.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Provide functions to adjust the reference counts for an extent of
physical blocks stored in the refcount btree.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Provide a mechanism for higher levels to create CUI/CUD items, submit
them to the log, and a stub function to deal with recovered CUI items.
These parts will be connected to the refcountbt in a later patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Create refcount update intent/done log items to record redo
information in the log. Because we need to roll transactions between
updating the bmbt mapping and updating the reverse mapping, we also
have to track the status of the metadata updates that will be recorded
in the post-roll transactions, just in case we crash before committing
the final transaction. This mechanism enables log recovery to finish
what was already started.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Implement the generic btree operations required to manipulate refcount
btree blocks. The implementation is similar to the bmapbt, though it
will only allocate and free blocks from the AG.
Since the refcount root and level fields are separate from the
existing roots and levels array, they need a separate logging flag.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
[hch: fix logging of AGF refcount btree fields]
Signed-off-by: Christoph Hellwig <hch@lst.de>