IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
For file systems where we can afford to keep the buddy bitmaps cached,
we can speed up initial writes to large file systems by starting to
load the block allocation bitmaps as soon as the file system is
mounted. This won't work well for _super_ large file systems, or
memory constrained systems, so we only enable this when it is
requested via a mount option.
Addresses-Google-Bug: 159488342
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Modify the ext4_read_block_bitmap_load tracepoint so that it tells us
whether a block bitmap is being prefetched.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
If we free a metadata buffer which has been failed to async write out
in the background, the jbd2 checkpoint procedure will not detect this
failure in jbd2_log_do_checkpoint(), so it may lead to filesystem
inconsistency after cleanup journal tail. This patch abort the journal
if free a buffer has write_io_error flag to prevent potential further
inconsistency.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Link: https://lore.kernel.org/r/20200620025427.1756360-5-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
There is a risk of filesystem inconsistency if we failed to async write
back metadata buffer in the background. Because of current buffer's end
io procedure is handled by end_buffer_async_write() in the block layer,
and it only clear the buffer's uptodate flag and mark the write_io_error
flag, so ext4 cannot detect such failure immediately. In most cases of
getting metadata buffer (e.g. ext4_read_inode_bitmap()), although the
buffer's data is actually uptodate, it may still read data from disk
because the buffer's uptodate flag has been cleared. Finally, it may
lead to on-disk filesystem inconsistency if reading old data from the
disk successfully and write them out again.
This patch detect bdev mapping->wb_err when getting journal's write
access and mark the filesystem error if bdev's mapping->wb_err was
increased, this could prevent further writing and potential
inconsistency.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200620025427.1756360-2-yi.zhang@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
cr=0 is supposed to be an optimization to save CPU cycles, but if
buddy data (in memory) is not initialized then all this makes no sense
as we have to do sync IO taking a lot of cycles. Also, at cr=0
mballoc doesn't choose any available chunk. cr=1 also skips groups
using heuristic based on avg. fragment size. It's more useful to skip
such groups and switch to cr=2 where groups will be scanned for
available chunks. However, we always read the first block group in a
flex_bg so metadata blocks will get read into the first flex_bg if
possible.
Using sparse image and dm-slow virtual device of 120TB was
simulated, then the image was formatted and filled using debugfs to
mark ~85% of available space as busy. mount process w/o the patch
couldn't complete in half an hour (according to vmstat it would take
~10-11 hours). With the patch applied mount took ~20 seconds.
Lustre-bug-id: https://jira.whamcloud.com/browse/LU-12988
Signed-off-by: Alex Zhuravlev <azhuravlev@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Reviewed-by: Artem Blagodarenko <artem.blagodarenko@gmail.com>
This should significantly improve bitmap loading, especially for flex
groups as it tries to load all bitmaps within a flex.group instead of
one by one synchronously.
Prefetching is done in 8 * flex_bg groups, so it should be 8
read-ahead reads for a single allocating thread. At the end of
allocation the thread waits for read-ahead completion and initializes
buddy information so that read-aheads are not lost in case of memory
pressure.
At cr=0 the number of prefetching IOs is limited per allocation
context to prevent a situation when mballoc loads thousands of bitmaps
looking for a perfect group and ignoring groups with good chunks.
Together with the patch "ext4: limit scanning of uninitialized groups"
the mount time (which includes few tiny allocations) of a 1PB
filesystem is reduced significantly:
0% full 50%-full unpatched patched
mount time 33s 9279s 563s
[ Restructured by tytso; removed the state flags in the allocation
context, so it can be used to lazily prefetch the allocation bitmaps
immediately after the file system is mounted. Skip prefetching
block groups which are uninitialized. Finally pass in the
REQ_RAHEAD flag to the block layer while prefetching. ]
Signed-off-by: Alex Zhuravlev <bzzz@whamcloud.com>
Reviewed-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Don't define EXT4_IOC_* aliases to ioctls that already have a generic
FS_IOC_* name. These aliases are unnecessary, and they make it unclear
which ioctls are ext4-specific and which are generic.
Exception: leave EXT4_IOC_GETVERSION_OLD and EXT4_IOC_SETVERSION_OLD
as-is for now, since renaming them to FS_IOC_GETVERSION and
FS_IOC_SETVERSION would probably make them more likely to be confused
with EXT4_IOC_GETVERSION and EXT4_IOC_SETVERSION which also exist.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200714230909.56349-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
A customer has reported a BUG_ON in ext4_clear_journal_err() hitting
during an LTP testing. Either this has been caused by a test setup
issue where the filesystem was being overwritten while LTP was mounting
it or the journal replay has overwritten the superblock with invalid
data. In either case it is preferable we don't take the machine down
with a BUG_ON. So handle the situation of unexpectedly missing
has_journal feature more gracefully. We issue warning and fail the mount
in the cases where the race window is narrow and the failed check is
most likely a programming error. In cases where fs corruption is more
likely, we do full ext4_error() handling before failing mount / remount.
Reviewed-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200710140759.18031-1-jack@suse.cz
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.
Deterministic algorithm:
For each file:
If not .svg:
For each line:
If doesn't contain `\bxmlns\b`:
For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
If both the HTTP and HTTPS versions
return 200 OK and serve the same content:
Replace HTTP with HTTPS.
Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Link: https://lore.kernel.org/r/20200706190339.20709-1-grandmaster@al2klimov.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
If for any reason a directory passed to do_split() does not have enough
active entries to exceed half the size of the block, we can end up
iterating over all "count" entries without finding a split point.
In this case, count == move, and split will be zero, and we will
attempt a negative index into map[].
Guard against this by detecting this case, and falling back to
split-to-half-of-count instead; in this case we will still have
plenty of space (> half blocksize) in each split block.
Fixes: ef2b02d3e6 ("ext34: ensure do_split leaves enough free space in both blocks")
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/f53e246b-647c-64bb-16ec-135383c70ad7@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Callers of __jbd2_journal_unfile_buffer() and
__jbd2_journal_refile_buffer() assume that the b_transaction is set. In
fact if it's not, we can end up with journal_head refcounting errors
leading to crash much later that might be very hard to track down. Add
asserts to make sure that is the case.
We also make sure that b_next_transaction is NULL in
__jbd2_journal_unfile_buffer() since the callers expect that as well and
we should not get into that stage in this state anyway, leading to
problems later on if we do.
Tested with fstests.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20200617092549.6712-1-lczerner@redhat.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Using a mutex for "print this warning only once" is so overdesigned as
to be actively offensive to my sensitive stomach.
Just use "pr_info_once()" that already does this, although in a
(harmlessly) racy manner that can in theory cause the message to be
printed twice if more than one CPU races on that "is this the first
time" test.
[ If somebody really cares about that harmless data race (which sounds
very unlikely indeed), that person can trivially fix printk_once() by
using a simple atomic access, preferably with an optimistic non-atomic
test first before even bothering to treat the pointless "make sure it
is _really_ just once" case.
A mutex is most definitely never the right primitive to use for
something like this. ]
Yes, this is a small and meaningless detail in a code path that hardly
matters. But let's keep some code quality standards here, and not
accept outrageously bad code.
Link: https://lore.kernel.org/lkml/CAHk-=wgV9toS7GU3KmNpj8hCS9SeF+A0voHS8F275_mgLhL4Lw@mail.gmail.com/
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull x86 fixes from Thomas Gleixner:
"A series of fixes for x86:
- Reset MXCSR in kernel_fpu_begin() to prevent using a stale user
space value.
- Prevent writing MSR_TEST_CTRL on CPUs which are not explicitly
whitelisted for split lock detection. Some CPUs which do not
support it crash even when the MSR is written to 0 which is the
default value.
- Fix the XEN PV fallout of the entry code rework
- Fix the 32bit fallout of the entry code rework
- Add more selftests to ensure that these entry problems don't come
back.
- Disable 16 bit segments on XEN PV. It's not supported because XEN
PV does not implement ESPFIX64"
* tag 'x86-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/ldt: Disable 16-bit segments on Xen PV
x86/entry/32: Fix #MC and #DB wiring on x86_32
x86/entry/xen: Route #DB correctly on Xen PV
x86/entry, selftests: Further improve user entry sanity checks
x86/entry/compat: Clear RAX high bits on Xen PV SYSENTER
selftests/x86: Consolidate and fix get/set_eflags() helpers
selftests/x86/syscall_nt: Clear weird flags after each test
selftests/x86/syscall_nt: Add more flag combinations
x86/entry/64/compat: Fix Xen PV SYSENTER frame setup
x86/entry: Move SYSENTER's regs->sp and regs->flags fixups into C
x86/entry: Assert that syscalls are on the right stack
x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted
x86/fpu: Reset MXCSR to default in kernel_fpu_begin()
Pull irq fixes from Thomas Gleixner:
"A set of interrupt chip driver fixes:
- Ensure the atomicity of affinity updates in the GIC driver
- Don't try to sleep in atomic context when waiting for the GICv4.1
to respond. Use polling instead.
- Typo fixes in Kconfig and warnings"
* tag 'irq-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic: Atomically update affinity
irqchip/riscv-intc: Fix a typo in a pr_warn()
irqchip/gic-v4.1: Use readx_poll_timeout_atomic() to fix sleep in atomic
irqchip/loongson-pci-msi: Fix a typo in Kconfig
Pull rcu fixlet from Thomas Gleixner:
"A single fix for a printk format warning in RCU"
* tag 'core-urgent-2020-07-05' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rcuperf: Fix printk format warning
Pull Kbuild fixes frin Masahiro Yamada:
- fix various bugs in xconfig
- fix some issues in cross-compilation using Clang
- fix documentation
* tag 'kbuild-fixes-v5.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
.gitignore: Do not track `defconfig` from `make savedefconfig`
kbuild: make Clang build userprogs for target architecture
kbuild: fix CONFIG_CC_CAN_LINK(_STATIC) for cross-compilation with Clang
kconfig: qconf: parse newer types at debug info
kconfig: qconf: navigate menus on hyperlinks
kconfig: qconf: don't show goback button on splitMode
kconfig: qconf: simplify the goBack() logic
kconfig: qconf: re-implement setSelected()
kconfig: qconf: make debug links work again
kconfig: qconf: make search fully work again on split mode
kconfig: qconf: cleanup includes
docs: kbuild: fix ReST formatting
gcc-plugins: fix gcc-plugins directory path in documentation
Pull SCSI fixes from James Bottomley:
"Four small fixes in three drivers.
The mptfusion one has actually caused user visible issues in certain
kernel configurations"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: mptfusion: Don't use GFP_ATOMIC for larger DMA allocations
scsi: libfc: Skip additional kref updating work event
scsi: libfc: Handling of extra kref
scsi: qla2xxx: Fix a condition in qla2x00_find_all_fabric_devs()
Pull block fixes from Jens Axboe:
- NVMe fixes from Christoph:
- Fix crash in multi-path disk add (Christoph)
- Fix ignore of identify error (Sagi)
- Fix a compiler complaint that a function should be static (Wei)
* tag 'block-5.8-2020-07-05' of git://git.kernel.dk/linux-block:
block: make function __bio_integrity_free() static
nvme: fix a crash in nvme_mpath_add_disk
nvme: fix identify error status silent ignore
Pull io_uring fix from Jens Axboe:
"Andres reported a regression with the fix that was merged earlier this
week, where his setup of using signals to interrupt io_uring CQ waits
no longer worked correctly.
Fix this, and also limit our use of TWA_SIGNAL to the case where we
need it, and continue using TWA_RESUME for task_work as before.
Since the original is marked for 5.7 stable, let's flush this one out
early"
* tag 'io_uring-5.8-2020-07-05' of git://git.kernel.dk/linux-block:
io_uring: fix regression with always ignoring signals in io_cqring_wait()
Pull i2c fixes from Wolfram Sang:
"The usual driver fixes and documentation updates"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mlxcpld: check correct size of maximum RECV_LEN packet
i2c: add Kconfig help text for slave mode
i2c: slave-eeprom: update documentation
i2c: eg20t: Load module automatically if ID matches
i2c: designware: platdrv: Set class based on DMI
i2c: algo-pca: Add 0x78 as SCL stuck low status for PCA9665
Pull MIPS fixes from Thomas Bogendoerfer:
- fix for missing hazard barrier
- DT fix for ingenic
- DT fix of GPHY names for lantiq
- fix usage of smp_processor_id() while preemption is enabled
* tag 'mips_fixes_5.8_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: Do not use smp_processor_id() in preemptible code
MIPS: Add missing EHB in mtc0 -> mfc0 sequence for DSPen
MIPS: ingenic: gcw0: Fix HP detection GPIO.
MIPS: lantiq: xway: sysctrl: fix the GPHY clock alias names
This resolves the hazard between the mtc0 in the change_c0_status() and
the mfc0 in configure_exception_vector(). Without resolving this hazard
configure_exception_vector() could read an old value and would restore
this old value again. This would revert the changes change_c0_status()
did. I checked this by printing out the read_c0_status() at the end of
per_cpu_trap_init() and the ST0_MX is not set without this patch.
The hazard is documented in the MIPS Architecture Reference Manual Vol.
III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
6.03 table 8.1 which includes:
Producer | Consumer | Hazard
----------|----------|----------------------------
mtc0 | mfc0 | any coprocessor 0 register
I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
There the change_c0_status() function would activate the DSPen by
setting ST0_MX in the c0_status register. This was reverted and then the
system got a DSP exception when the DSP registers were saved in
save_dsp() in the first process switch. The crash looks like this:
[ 0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.107070] Kernel panic - not syncing: Unexpected DSP exception
[ 0.113470] Rebooting in 1 seconds..
We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
did not happen.
In the kernel I bisected this problem to commit 9012d01166 ("compiler:
allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
reverted it also happened after commit 172dcd935c ("MIPS: Always
allocate exception vector for MIPSr2+").
Commit 0b24cae4d5 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
does similar changes to a different file. I am not sure if there are
more places affected by this problem.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Running `make savedefconfig` creates by default `defconfig`, which is,
currently, on git’s radar, for example, `git status` lists this file as
untracked.
So, add the file to `.gitignore`, so it’s ignored by git.
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull powerpc fixes from Michael Ellerman:
"One fix for a regression in our pkey handling, which exhibits as
PROT_EXEC mappings taking continuous page faults.
Thanks to: Jan Stancek, Aneesh Kumar K.V"
* tag 'powerpc-5.8-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/mm/pkeys: Make pkey access check work on execute_only_key
Pull arm64 fixes from Will Deacon:
"Nothing earth-shattering, really - some CPU errata workarounds (one
day they'll get it right, ha!) and a fix for a boot failure with very
large kernel images where the alternative patching gets confused when
patching relative branches using veneers.
- Fix alternative patching for very large kernel images and modules
- Hook up existing CPU errata workarounds for Qualcomm Kryo CPUs"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Add KRYO4XX silver CPU cores to erratum list 1530923 and 1024718
arm64: Add KRYO4XX gold CPU cores to erratum list 1463225 and 1418040
arm64: Add MIDR value for KRYO4XX gold CPU cores
arm64/alternatives: use subsections for replacement sequences
When switching to TWA_SIGNAL for task_work notifications, we also made
any signal based condition in io_cqring_wait() return -ERESTARTSYS.
This breaks applications that rely on using signals to abort someone
waiting for events.
Check if we have a signal pending because of queued task_work, and
repeat the signal check once we've run the task_work. This provides a
reliable way of telling the two apart.
Additionally, only use TWA_SIGNAL if we are using an eventfd. If not,
we don't have the dependency situation described in the original commit,
and we can get by with just using TWA_RESUME like we previously did.
Fixes: ce593a6c48 ("io_uring: use signal based task_work running")
Cc: stable@vger.kernel.org # v5.7
Reported-by: Andres Freund <andres@anarazel.de>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
On Xen PV, #DB doesn't use IST. It still needs to be correctly routed
depending on whether it came from user or kernel mode.
Get rid of DECLARE/DEFINE_IDTENTRY_XEN -- it was too hard to follow the
logic. Instead, route #DB and NMI through DECLARE/DEFINE_IDTENTRY_RAW on
Xen, and do the right thing for #DB. Also add more warnings to the
exc_debug* handlers to make this type of failure more obvious.
This fixes various forms of corruption that happen when usermode
triggers #DB on Xen PV.
Fixes: 4c0dcd8350 ("x86/entry: Implement user mode C entry points for #DB and #MCE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/4163e733cce0b41658e252c6c6b3464f33fdff17.1593795633.git.luto@kernel.org
Pull xen fixes from Juergen Gross:
"One small cleanup patch for ARM and two patches for the xenbus driver
fixing latent problems (large stack allocations and bad return code
settings)"
* tag 'for-linus-5.8b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/xenbus: let xenbus_map_ring_valloc() return errno values only
xen/xenbus: avoid large structs and arrays on the stack
arm/xen: remove the unused macro GRANT_TABLE_PHYSADDR
I2C_SMBUS_BLOCK_MAX defines already the maximum number as defined in the
SMBus 2.0 specs. I don't see a reason to add 1 here. Also, fix the errno
to what is suggested for this error.
Fixes: c9bfdc7c16 ("i2c: mlxcpld: Add support for smbus block read transaction")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Michael Shych <michaelsh@mellanox.com>
Tested-by: Michael Shych <michaelsh@mellanox.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
Pull sysctl fix from Al Viro:
"Another regression fix for sysctl changes this cycle..."
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Call sysctl_head_finish on error