2010-08-09 22:56:56 +05:30
What: /sys/block/zram<id>/disksize
Date: August 2010
Contact: Nitin Gupta <ngupta@vflare.org>
Description:
The disksize file is read-write and specifies the disk size
which represents the limit on the *uncompressed* worth of data
that can be stored in this disk.
2013-08-26 14:55:44 -07:00
Unit: bytes
2010-08-09 22:56:56 +05:30
What: /sys/block/zram<id>/initstate
Date: August 2010
Contact: Nitin Gupta <ngupta@vflare.org>
Description:
2013-08-26 14:55:44 -07:00
The initstate file is read-only and shows the initialization
2010-08-09 22:56:56 +05:30
state of the device.
What: /sys/block/zram<id>/reset
Date: August 2010
Contact: Nitin Gupta <ngupta@vflare.org>
Description:
2013-08-26 14:55:44 -07:00
The reset file is write-only and allows resetting the
device. The reset operation frees all the memory associated
2010-08-09 22:56:56 +05:30
with this device.
zram: add multi stream functionality
Existing zram (zcomp) implementation has only one compression stream
(buffer and algorithm private part), so in order to prevent data
corruption only one write (compress operation) can use this compression
stream, forcing all concurrent write operations to wait for stream lock
to be released. This patch changes zcomp to keep a compression streams
list of user-defined size (via sysfs device attr). Each write operation
still exclusively holds compression stream, the difference is that we
can have N write operations (depending on size of streams list)
executing in parallel. See TEST section later in commit message for
performance data.
Introduce struct zcomp_strm_multi and a set of functions to manage
zcomp_strm stream access. zcomp_strm_multi has a list of idle
zcomp_strm structs, spinlock to protect idle list and wait queue, making
it possible to perform parallel compressions.
The following set of functions added:
- zcomp_strm_multi_find()/zcomp_strm_multi_release()
find and release a compression stream, implement required locking
- zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
create and destroy zcomp_strm_multi
zcomp ->strm_find() and ->strm_release() callbacks are set during
initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
correspondingly.
Each time zcomp issues a zcomp_strm_multi_find() call, the following set
of operations performed:
- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
awaken by zcomp_strm_multi_release() caller.
zcomp_strm_multi_release():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper
Minchan Kim reported that spinlock-based locking scheme has demonstrated
a severe perfomance regression for single compression stream case,
comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)
base spinlock mutex
==Initial write ==Initial write ==Initial write
records: 5 records: 5 records: 5
avg: 1642424.35 avg: 699610.40 avg: 1655583.71
std: 39890.95(2.43%) std: 232014.19(33.16%) std: 52293.96
max: 1690170.94 max: 1163473.45 max: 1697164.75
min: 1568669.52 min: 573429.88 min: 1553410.23
==Rewrite ==Rewrite ==Rewrite
records: 5 records: 5 records: 5
avg: 1611775.39 avg: 501406.64 avg: 1684419.11
std: 17144.58(1.06%) std: 15354.41(3.06%) std: 18367.42
max: 1641800.95 max: 531356.78 max: 1706445.84
min: 1593515.27 min: 488817.78 min: 1655335.73
When only one compression stream available, mutex with spin on owner
tends to perform much better than frequent wait_event()/wake_up(). This
is why single stream implemented as a special case with mutex locking.
Introduce and document zram device attribute max_comp_streams. This
attr shows and stores current zcomp's max number of zcomp streams
(max_strm). Extend zcomp's zcomp_create() with `max_strm' parameter.
`max_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams).
max_comp_streams used during initialisation as follows:
-- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
using single compression stream zcomp_strm_single (mutex-based locking).
-- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
using multi compression stream zcomp_strm_multi (spinlock-based locking).
default max_comp_streams value is 1, meaning that zram with single stream
will be initialised.
Later patch will introduce configuration knob to change max_comp_streams
on already initialised and used zcomp.
TEST
iozone -t 3 -R -r 16K -s 60M -I +Z
test base 1 strm (mutex) 3 strm (spinlock)
-----------------------------------------------------------------------
Initial write 589286.78 583518.39 718011.05
Rewrite 604837.97 596776.38 1515125.72
Random write 584120.11 595714.58 1388850.25
Pwrite 535731.17 541117.38 739295.27
Fwrite 1418083.88 1478612.72 1484927.06
Usage example:
set max_comp_streams to 4
echo 4 > /sys/block/zram0/max_comp_streams
show current max_comp_streams (default value is 1).
cat /sys/block/zram0/max_comp_streams
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-07 15:38:14 -07:00
What: /sys/block/zram<id>/max_comp_streams
Date: February 2014
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The max_comp_streams file is read-write and specifies the
number of backend's zcomp_strm compression streams (number of
concurrent compress operations).
2014-04-07 15:38:17 -07:00
What: /sys/block/zram<id>/comp_algorithm
Date: February 2014
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The comp_algorithm file is read-write and lets to show
available and selected compression algorithms, change
compression algorithm selection.
2014-10-09 15:29:55 -07:00
What: /sys/block/zram<id>/mem_used_max
Date: August 2014
Contact: Minchan Kim <minchan@kernel.org>
Description:
2017-02-22 15:46:45 -08:00
The mem_used_max file is write-only and is used to reset
the counter of maximum memory zram have consumed to store
compressed data. For resetting the value, you should write
"0". Otherwise, you could see -EINVAL.
2014-10-09 15:29:55 -07:00
Unit: bytes
2014-10-09 15:29:53 -07:00
What: /sys/block/zram<id>/mem_limit
Date: August 2014
Contact: Minchan Kim <minchan@kernel.org>
Description:
2017-02-22 15:46:45 -08:00
The mem_limit file is write-only and specifies the maximum
amount of memory ZRAM can use to store the compressed data.
The limit could be changed in run time and "0" means disable
the limit. No limit is the initial state. Unit: bytes
2015-04-15 16:15:36 -07:00
What: /sys/block/zram<id>/compact
Date: August 2015
Contact: Minchan Kim <minchan@kernel.org>
Description:
The compact file is write-only and trigger compaction for
allocator zrm uses. The allocator moves some objects so that
it could free fragment space.
2015-04-15 16:16:03 -07:00
What: /sys/block/zram<id>/io_stat
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The io_stat file is read-only and accumulates device's I/O
statistics not accounted by block layer. For example,
failed_reads, failed_writes, etc. File format is similar to
block layer statistics file format.
2015-04-15 16:16:06 -07:00
What: /sys/block/zram<id>/mm_stat
Date: August 2015
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The mm_stat file is read-only and represents device's mm
statistics (orig_data_size, compr_data_size, etc.) in a format
similar to block layer statistics file format.
2016-05-20 17:00:02 -07:00
What: /sys/block/zram<id>/debug_stat
Date: July 2016
Contact: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Description:
The debug_stat file is read-only and represents various
device's debugging info useful for kernel developers. Its
format is not documented intentionally and may change
anytime without any notice.
2017-09-06 16:20:10 -07:00
What: /sys/block/zram<id>/backing_dev
Date: June 2017
Contact: Minchan Kim <minchan@kernel.org>
Description:
The backing_dev file is read-write and set up backing
device for zram to write incompressible pages.
For using, user should enable CONFIG_ZRAM_WRITEBACK.
2018-12-28 00:36:44 -08:00
What: /sys/block/zram<id>/idle
Date: November 2018
Contact: Minchan Kim <minchan@kernel.org>
Description:
idle file is write-only and mark zram slot as idle.
If system has mounted debugfs, user can see which slots
are idle via /sys/kernel/debug/zram/zram<id>/block_state
zram: support idle/huge page writeback
Add a new feature "zram idle/huge page writeback". In the zram-swap use
case, zram usually has many idle/huge swap pages. It's pointless to keep
them in memory (ie, zram).
To solve this problem, this feature introduces idle/huge page writeback to
the backing device so the goal is to save more memory space on embedded
systems.
Normal sequence to use idle/huge page writeback feature is as follows,
while (1) {
# mark allocated zram slot to idle
echo all > /sys/block/zram0/idle
# leave system working for several hours
# Unless there is no access for some blocks on zram,
# they are still IDLE marked pages.
echo "idle" > /sys/block/zram0/writeback
or/and
echo "huge" > /sys/block/zram0/writeback
# write the IDLE or/and huge marked slot into backing device
# and free the memory.
}
Per the discussion at
https://lore.kernel.org/lkml/20181122065926.GG3441@jagdpanzerIV/T/#u,
This patch removes direct incommpressibe page writeback feature
(d2afd25114f4 ("zram: write incompressible pages to backing device")).
Below concerns from Sergey:
== &< ==
"IDLE writeback" is superior to "incompressible writeback".
"incompressible writeback" is completely unpredictable and uncontrollable;
it depens on data patterns and compression algorithms. While "IDLE
writeback" is predictable.
I even suspect, that, *ideally*, we can remove "incompressible writeback".
"IDLE pages" is a super set which also includes "incompressible" pages.
So, technically, we still can do "incompressible writeback" from "IDLE
writeback" path; but a much more reasonable one, based on a page idling
period.
I understand that you want to keep "direct incompressible writeback"
around. ZRAM is especially popular on devices which do suffer from flash
wearout, so I can see "incompressible writeback" path becoming a dead
code, long term.
== &< ==
Below concerns from Minchan:
== &< ==
My concern is if we enable CONFIG_ZRAM_WRITEBACK in this implementation,
both hugepage/idlepage writeck will turn on. However someuser want to
enable only idlepage writeback so we need to introduce turn on/off knob
for hugepage or new CONFIG_ZRAM_IDLEPAGE_WRITEBACK for those usecase. I
don't want to make it complicated *if possible*.
Long term, I imagine we need to make VM aware of new swap hierarchy a
little bit different with as-is. For example, first high priority swap
can return -EIO or -ENOCOMP, swap try to fallback to next lower priority
swap device. With that, hugepage writeback will work tranparently.
So we could regard it as regression because incompressible pages doesn't
go to backing storage automatically. Instead, user should do it via "echo
huge" > /sys/block/zram/writeback" manually.
== &< ==
Link: http://lkml.kernel.org/r/20181127055429.251614-6-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Joey Pabalinas <joeypabalinas@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-12-28 00:36:47 -08:00
What: /sys/block/zram<id>/writeback
Date: November 2018
Contact: Minchan Kim <minchan@kernel.org>
Description:
The writeback file is write-only and trigger idle and/or
huge page writeback to backing device.
2018-12-28 00:36:51 -08:00
What: /sys/block/zram<id>/bd_stat
Date: November 2018
Contact: Minchan Kim <minchan@kernel.org>
Description:
The bd_stat file is read-only and represents backing device's
statistics (bd_count, bd_reads, bd_writes) in a format
similar to block layer statistics file format.
2018-12-28 00:36:54 -08:00
2019-01-08 15:22:53 -08:00
What: /sys/block/zram<id>/writeback_limit_enable
Date: November 2018
Contact: Minchan Kim <minchan@kernel.org>
Description:
The writeback_limit_enable file is read-write and specifies
eanbe of writeback_limit feature. "1" means eable the feature.
No limit "0" is the initial state.
2018-12-28 00:36:54 -08:00
What: /sys/block/zram<id>/writeback_limit
Date: November 2018
Contact: Minchan Kim <minchan@kernel.org>
Description:
The writeback_limit file is read-write and specifies the maximum
amount of writeback ZRAM can do. The limit could be changed
2019-01-08 15:22:53 -08:00
in run time.
2022-11-15 11:03:14 +09:00
What: /sys/block/zram<id>/recomp_algorithm
Date: November 2022
Contact: Sergey Senozhatsky <senozhatsky@chromium.org>
Description:
The recomp_algorithm file is read-write and allows to set
or show secondary compression algorithms.
What: /sys/block/zram<id>/recompress
Date: November 2022
Contact: Sergey Senozhatsky <senozhatsky@chromium.org>
Description:
The recompress file is write-only and triggers re-compression
with secondary compression algorithms.