IF YOU WOULD LIKE TO GET AN ACCOUNT, please write an
email to Administrator. User accounts are meant only to access repo
and report issues and/or generate pull requests.
This is a purpose-specific Git hosting for
BaseALT
projects. Thank you for your understanding!
Только зарегистрированные пользователи имеют доступ к сервису!
Для получения аккаунта, обратитесь к администратору.
Renoir uses integrated_system_info table v12. The table
has the same layout as v11 with respect to this data. Just
reuse the existing code for v12 for stable.
Fixes incorrectly reported vram info in the driver output.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[Why]
Changes that are fast don't require updating DLG parameters making
this call unnecessary. Considering this is an expensive call it should
not be done on every flip.
DML touches clocks, p-state support, DLG params and a few other DC
internal flags and these aren't expected during fast. A hang has been
reported with this change when called on every flip which suggests that
modifying these fields is not recommended behavior on fast updates.
[How]
Guard the validation to only happen if update type isn't FAST.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1191
Fixes: a24eaa5c51 ("drm/amd/display: Revalidate bandwidth before commiting DC updates")
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and
following started to happen on each boot:
...
BUG: kernel NULL pointer dereference, address: 0000000000000128
...
CPU: 9 PID: 1940 Comm: modprobe Tainted: G E 5.7.2-200.im0.fc32.x86_64 #1
Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020
RIP: 0010:lock_bus+0x42/0x60 [amdgpu]
...
Call Trace:
i2c_smbus_xfer+0x3d/0xf0
i2c_default_probe+0xf3/0x130
i2c_detect.isra.0+0xfe/0x2b0
? kfree+0xa3/0x200
? kobject_uevent_env+0x11f/0x6a0
? i2c_detect.isra.0+0x2b0/0x2b0
__process_new_driver+0x1b/0x20
bus_for_each_dev+0x64/0x90
? 0xffffffffc0f34000
i2c_register_driver+0x73/0xc0
do_one_initcall+0x46/0x200
? _cond_resched+0x16/0x40
? kmem_cache_alloc_trace+0x167/0x220
? do_init_module+0x23/0x260
do_init_module+0x5c/0x260
__do_sys_init_module+0x14f/0x170
do_syscall_64+0x5b/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
Error appears when some i2c device driver tries to probe for devices
using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`.
Code supporting this adapter requires `adev->psp.ras.ras` to be not
NULL, which is true only when `amdgpu_ras_init()` detects HW support by
calling `amdgpu_ras_check_supported()`.
Before 9015d60c9e, adapter was registered by
-> amdgpu_device_ip_init()
-> amdgpu_ras_recovery_init()
-> amdgpu_ras_eeprom_init()
-> smu_v11_0_i2c_eeprom_control_init()
after verifying that `adev->psp.ras.ras` is not NULL in
`amdgpu_ras_recovery_init()`. Currently it is registered
unconditionally by
-> amdgpu_device_ip_init()
-> pp_sw_init()
-> hwmgr_sw_init()
-> vega20_smu_init()
-> smu_v11_0_i2c_eeprom_control_init()
Fix simply adds HW support check (ras == NULL => no support) before
calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`.
Please note that there is a chance that similar fix is also required for
CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without
RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense
when there is no HW support.
Cc: stable@vger.kernel.org
Fixes: 9015d60c9e ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device")
Signed-off-by: Ivan Mironov <mironov.ivan@gmail.com>
Tested-by: Bjorn Nostvold <bjorn.nostvold@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The function kobject_init_and_add alloc memory like:
kobject_init_and_add->kobject_add_varg->kobject_set_name_vargs
->kvasprintf_const->kstrdup_const->kstrdup->kmalloc_track_caller
->kmalloc_slab, in err branch this memory not free. If use
kmemleak, this path maybe catched.
These changes are to add kobject_put in kobject_init_and_add
failed branch, fix potential memleak.
Signed-off-by: Bernard Zhao <bernard@vivo.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[Why]
Regression was introduced where setting max bpc property has no effect
on the atomic check and final commit. It has the same effect as max bpc
being stuck at 8.
[How]
Correctly propagate max bpc with the new connector state.
Signed-off-by: Stylon Wang <stylon.wang@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Add rename the gpu busy percentage for consistency and
add the mem busy percentage documentation.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The existing code used the major version number of the DRM driver
instead of the device major number of the DRM subsystem for
validating access for a devices cgroup.
This meant that accesses allowed by the devices cgroup weren't
permitted and certain accesses denied by the devices cgroup were
permitted (if they matched the wrong major device number).
Signed-off-by: Lorenz Brun <lorenz@brun.one>
Fixes: 6b855f7b83 ("drm/amdkfd: Check against device cgroup")
Reviewed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
When we want to use float point operation on Linux
we need to use within special kernel protection
(`kernel_fpu_{begin,end}()`.), otherwise the kernel
can clobber userspace FPU register state. For detecting
these issues we use a tool named objtool (with -Ffa
flags) to highlight the FPU problems, all warnings can
be summed up as follows:
./tools/objtool/objtool check -Ffa
drivers/gpu/drm/amd/display/dc/dml/dml_common_defs.o
[..] dc/dsc/rc_calc.o: warning: objtool: get_qp_set()+0x2f8:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc.o: warning: objtool: dsc_roundf()+0x5:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc.o: warning: objtool: dsc_ceil()+0x5:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc.o: warning: objtool: get_ofs_set()+0x3eb:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc.o: warning: objtool: calc_rc_params()+0x3c:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/dc_dsc.o: warning: objtool:
get_dsc_bandwidth_range.isra.0()+0x8d:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/dc_dsc.o: warning: objtool: setup_dsc_config()+0x2ef:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc_dpi.o: warning: objtool:copy_pps_fields()+0xbb:
FPU instruction outside of kernel_fpu_{begin,end}()
[..] dc/dsc/rc_calc_dpi.o: warning: objtool:
dscc_compute_dsc_parameters()+0x7b:
FPU instruction outside of kernel_fpu_{begin,end}()
This commit fixes the above issues by rework DSC as described:
1. Isolate all FPU operations in a single file;
2. Use FPU flags only in the file that handles FPU operations;
3. Isolate all functions that require float point operation in static
functions;
4. Add a mid-layer function that does not use any float point operation,
and that could be safely invoked in other parts of the code.
5. Keep float point operation under DC_FP_{START/END} macro.
CC: Christian König <christian.koenig@amd.com>
CC: Alexander Deucher <Alexander.Deucher@amd.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Tony Cheng <tony.cheng@amd.com>
CC: Harry Wentland <hwentlan@amd.com>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Reviewed-by: Mikita Lipski <Mikita.Lipski@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initializes Powertune data for a specific Hawaii card by fixing what
looks like a typo in the code. The device ID 66B1 is not a supported
device ID for this driver, and is not mentioned elsewhere. 67B1 is a
valid device ID, and is a Hawaii Pro GPU.
I have tested on my R9 390 which has device ID 67B1, and it works
fine without problems.
Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Use kfree() instead of kvfree() to free rgb_user in
calculate_user_regamma_ramp() because the memory is allocated with
kcalloc().
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use kvfree() instead of kfree() to free coeff in build_regamma()
because the memory is allocated with kvzalloc().
Fixes: e752058b86 ("drm/amd/display: Optimize gamma calculations")
Cc: stable@vger.kernel.org
Signed-off-by: Denis Efremov <efremov@linux.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull drm fixes from Dave Airlie:
"These are the fixes from last week for the stuff merged in the merge
window. It got a bunch of nouveau fixes for HDA audio on some new
GPUs, some i915 and some amdpgu fixes.
i915:
- gvt: Fix one clang warning on debug only function
- Use ARRAY_SIZE for coccicheck warning
- Use after free fix for display global state.
- Whitelisting context-local timestamp on Gen9 and two scheduler
fixes with deps (Cc: stable)
- Removal of write flag from sysfs files where ineffective
nouveau:
- HDMI/DP audio HDA fixes
- display hang fix for Volta/Turing
- GK20A regression fix.
amdgpu:
- Prevent hwmon accesses while GPU is in reset
- CTF interrupt fix
- Backlight fix for renoir
- Fix for display sync groups
- Display bandwidth validation workaround"
* tag 'drm-next-2020-06-08' of git://anongit.freedesktop.org/drm/drm: (28 commits)
drm/nouveau/kms/nv50-: clear SW state of disabled windows harder
drm/nouveau: gr/gk20a: Use firmware version 0
drm/nouveau/disp/gm200-: detect and potentially disable HDA support on some SORs
drm/nouveau/disp/gp100: split SOR implementation from gm200
drm/nouveau/disp: modify OR allocation policy to account for HDA requirements
drm/nouveau/disp: split part of OR allocation logic into a function
drm/nouveau/disp: provide hint to OR allocation about HDA requirements
drm/amd/display: Revalidate bandwidth before commiting DC updates
drm/amdgpu/display: use blanked rather than plane state for sync groups
drm/i915/params: fix i915.fake_lmem_start module param sysfs permissions
drm/i915/params: don't expose inject_probe_failure in debugfs
drm/i915: Whitelist context-local timestamp in the gen9 cmdparser
drm/i915: Fix global state use-after-frees with a refcount
drm/i915: Check for awaits on still currently executing requests
drm/i915/gt: Do not schedule normal requests immediately along virtual
drm/i915: Reorder await_execution before await_request
drm/nouveau/kms/gt215-: fix race with audio driver runpm
drm/nouveau/disp/gm200-: fix NV_PDISP_SOR_HDMI2_CTRL(n) selection
Revert "drm/amd/display: disable dcn20 abm feature for bring up"
drm/amd/powerplay: ack the SMUToHost interrupt on receive V2
...
[Why]
Whenever we switch between tiled formats without also switching pixel
formats or doing anything else that recreates the DC plane state we
can run into underflow or hangs since we're not updating the
DML parameters before committing to the hardware.
[How]
If the update type is FULL then call validate_bandwidth again to update
the DML parmeters before committing the state.
This is basically just a workaround and protective measure against
update types being added DC where we could run into this issue in
the future.
We can only fully validate the state in advance before applying it to
the hardware if we recreate all the plane and stream states since
we can't modify what's currently in use.
The next step is to update DM to ensure that we're creating the plane
and stream states for whatever could potentially be a full update in
DC to pre-emptively recreate the state for DC global validation.
The workaround can stay until this has been fixed in DM.
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Hersen Wu <hersenxs.wu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull hmm updates from Jason Gunthorpe:
"This series adds a selftest for hmm_range_fault() and several of the
DEVICE_PRIVATE migration related actions, and another simplification
for hmm_range_fault()'s API.
- Simplify hmm_range_fault() with a simpler return code, no
HMM_PFN_SPECIAL, and no customizable output PFN format
- Add a selftest for hmm_range_fault() and DEVICE_PRIVATE related
functionality"
* tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
MAINTAINERS: add HMM selftests
mm/hmm/test: add selftests for HMM
mm/hmm/test: add selftest driver for HMM
mm/hmm: remove the customizable pfn format from hmm_range_fault
mm/hmm: remove HMM_PFN_SPECIAL
drm/amdgpu: remove dead code after hmm_range_fault()
mm/hmm: make hmm_range_fault return 0 or -1
Pull power management updates from Rafael Wysocki:
"These rework the system-wide PM driver flags, make runtime switching
of cpuidle governors easier, improve the user space hibernation
interface code, add intel-speed-select interface documentation, add
more debug messages to the ACPI code handling suspend to idle, update
the cpufreq core and drivers, fix a minor issue in the cpuidle core
and update two cpuidle drivers, improve the PM-runtime framework,
update the Intel RAPL power capping driver, update devfreq core and
drivers, and clean up the cpupower utility.
Specifics:
- Rework the system-wide PM driver flags to make them easier to
understand and use and update their documentation (Rafael Wysocki,
Alan Stern).
- Allow cpuidle governors to be switched at run time regardless of
the kernel configuration and update the related documentation
accordingly (Hanjun Guo).
- Improve the resume device handling in the user space hibernarion
interface code (Domenico Andreoli).
- Document the intel-speed-select sysfs interface (Srinivas
Pandruvada).
- Make the ACPI code handing suspend to idle print more debug
messages to help diagnose issues with it (Rafael Wysocki).
- Fix a helper routine in the cpufreq core and correct a typo in the
struct cpufreq_driver kerneldoc comment (Rafael Wysocki, Wang
Wenhu).
- Update cpufreq drivers:
- Make the intel_pstate driver start in the passive mode by
default on systems without HWP (Rafael Wysocki).
- Add i.MX7ULP support to the imx-cpufreq-dt driver and add
i.MX7ULP to the cpufreq-dt-platdev blacklist (Peng Fan).
- Convert the qoriq cpufreq driver to a platform one, make the
platform code create a suitable device object for it and add
platform dependencies to it (Mian Yousaf Kaukab, Geert
Uytterhoeven).
- Fix wrong compatible binding in the qcom driver (Ansuel Smith).
- Build the omap driver by default for ARCH_OMAP2PLUS (Anders
Roxell).
- Add r8a7742 SoC support to the dt cpufreq driver (Lad
Prabhakar).
- Update cpuidle core and drivers:
- Fix three reference count leaks in error code paths in the
cpuidle core (Qiushi Wu).
- Convert Qualcomm SPM to a generic cpuidle driver (Stephan
Gerhold).
- Fix up the execution order when entering a domain idle state in
the PSCI driver (Ulf Hansson).
- Fix a reference counting issue related to clock management and
clean up two oddities in the PM-runtime framework (Rafael Wysocki,
Andy Shevchenko).
- Add ElkhartLake support to the Intel RAPL power capping driver and
remove an unused local MSR definition from it (Jacob Pan, Sumeet
Pawnikar).
- Update devfreq core and drivers:
- Replace strncpy() with strscpy() in the devfreq core and use
lockdep asserts instead of manual checks for a locked mutex in
it (Dmitry Osipenko, Krzysztof Kozlowski).
- Add a generic imx bus scaling driver and make it register an
interconnect device (Leonard Crestez, Gustavo A. R. Silva).
- Make the cpufreq notifier in the tegra30 driver take boosting
into account and delete an unuseful error message from that
driver (Dmitry Osipenko, Markus Elfring).
- Remove unneeded semicolon from the cpupower code (Zou Wei)"
* tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (51 commits)
cpuidle: Fix three reference count leaks
PM: runtime: Replace pm_runtime_callbacks_present()
PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex
PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR
PM / devfreq: Replace strncpy with strscpy
PM / devfreq: imx: Register interconnect device
PM / devfreq: Add generic imx bus scaling driver
PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe()
PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting
PM: hibernate: Restrict writes to the resume device
PM: runtime: clk: Fix clk_pm_runtime_get() error path
cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver
ACPI: EC: PM: s2idle: Extend GPE dispatching debug message
ACPI: PM: s2idle: Print type of wakeup debug messages
powercap: RAPL: remove unused local MSR define
PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend()
Documentation: admin-guide: pm: Document intel-speed-select
PM: hibernate: Split off snapshot dev option
PM: hibernate: Incorporate concurrency handling
Documentation: ABI: make current_governer_ro as a candidate for removal
...
Pull drm fixes from Dave Airlie:
"A couple of amdgpu fixes and minor ingenic fixes:
amdgpu:
- display atomic test fix
- Fix soft hang in display vupdate code
ingenic:
- fix pointer cast
- fix crtc atomic check callback"
* tag 'drm-fixes-2020-05-29-1' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Fix potential integer wraparound resulting in a hang
drm/amd/display: drop cursor position check in atomic test
gpu/drm: Ingenic: Fix opaque pointer casted to wrong type
gpu/drm: ingenic: Fix bogus crtc_atomic_check callback
Return an error for sysfs and debugfs power interfaces during
gpu reset and suspend. Prevents access to the hw while it may
be in an unusable state.
v2: squash in fix to drop suspend check
Acked-by: Nirmoy Das <nirmoy.das@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
If VUPDATE_END is before VUPDATE_START the delay calculated can become
very large, causing a soft hang.
[How]
Take the absolute value of the difference between START and END.
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Reviewed-by: Nicholas Kazlauskas <Nicholas.Kazlauskas@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull cgroup fixes from Tejun Heo:
- Reverted stricter synchronization for cgroup recursive stats which
was prepping it for event counter usage which never got merged. The
change was causing performation regressions in some cases.
- Restore bpf-based device-cgroup operation even when cgroup1 device
cgroup is disabled.
- An out-param init fix.
* 'for-5.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
device_cgroup: Cleanup cgroup eBPF device filter code
xattr: fix uninitialized out-param
Revert "cgroup: Add memory barriers to plug cgroup_rstat_updated() race window"
the origin design will use varible of "attr->states" to save node
supported states on current gpu device, but for multi gpu device, when
probe second gpu device, the driver will check attribute node states
from previous gpu device wthether to create attribute node.
it will cause other gpu device create attribute node faild.
1. add member attr_list into amdgpu_device to link supported device attribute node.
2. add new structure "struct amdgpu_device_attr_entry{}" to track device attribute state.
3. drop member "states" from amdgpu_device_attr.
v2:
1. move "attr_list" into amdgpu_pm and rename to "pm_attr_list".
2. refine create & remove device node functions parameter.
fix:
drm/amdgpu: optimize amdgpu device attribute code
Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Previously we used the s3 codepath for gpu reset. This can lead to issues in
certain case where we end of waiting for fences which will never come (because
parts of the hw are off due to gpu reset) and we end up waiting forever causing
a deadlock.
[How]
Handle GPU reset separately from normal s3 case. We essentially need to redo
everything we do in s3, but avoid any drm calls.
For GPU reset case
suspend:
-Acquire DC lock
-Cache current dc_state
-Commit 0 stream/planes to dc (this puts dc into a state where it can be
powered off)
-Disable interrupts
resume
-Edit cached state to force full update
-Commit cached state from suspend
-Build stream and plane updates from the cached state
-Commit stream/plane updates
-Enable interrupts
-Release DC lock
v2:
-Formatting
-Release dc_state
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add some APU flags to simplify handling of different APU
variants. It's easier to understand the special cases
if we use names flags rather than checking device ids and
silicon revisions.
v2: rebase on latest code
Acked-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Problem description]
1. Boot up picasso platform, launches desktop, Don't do anything (APU enter into "gfxoff" state)
2. Remote login to platform using SSH, then type the command line:
sudo su -c "echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level"
sudo su -c "echo 2 > /sys/class/drm/card0/device/pp_dpm_sclk" (fix SCLK to 1400MHz)
3. Move the mouse around in Window
4. Phenomenon : The screen frozen
Tester will switch sclk level during glmark2 run time.
APU will enter "gfxoff" state intermittently during glmark2 run time.
The system got hanged if fix GFXCLK to 1400MHz when APU is in "gfxoff"
state.
[Debug]
1. Fix SCLK to X MHz
1400: screen frozen, screen black, then OS will reboot.
1300: screen frozen.
1200: screen frozen, screen black.
1100: screen frozen, screen black, then OS will reboot.
1000: screen frozen, screen black.
900: screen frozen, screen black, then OS will reboot.
800: Situation Nomal, issue disappear.
700: Situation Nomal, issue disappear.
2. SBIOS setting: AMD CBS --> SMU Debug Options -->SMU Debug --> "GFX DLDO Psm Margin Control":
50 : Situation Nomal, issue disappear.
45 : Situation Nomal, issue disappear.
40 : Situation Nomal, issue disappear.
35 : Situation Nomal, issue disappear.
30 : screen black.
25 : screen frozen, then blurred screen.
20 : screen frozen.
15 : screen black.
10 : screen frozen.
5 : screen frozen, then blurred screen.
3. Disable GFXOFF feature
Situation Nomal, issue disappear.
[Why]
Through a period of time debugging with Sys Eng team and SMU team, Sys
Eng team said this is voltage/frequency marginal issue not a F/W or H/W
bug. This experiment proves that default targetPsm [for f=1400MHz] is
not sufficient when GFXOFF is enabled on Picasso.
SMU team think it is an odd test conditions to force sclk="1400MHz" when
GPU is in "gfxoff" state,then wake up the GFX. SCLK should be in the
"lowest frequency" when gfxoff.
[How]
Disable gfxoff when setting manual mode.
Enable gfxoff when setting other mode(exiting manual mode) again.
By the way, from the user point of view, now that user switch to manual
mode and force SCLK Frequency, he don't want SCLK be controlled by
workload.It becomes meaningless to "switch to manual mode" if APU enter "gfxoff"
due to lack of workload at this point.
Tips: Same issue observed on Raven.
Signed-off-by: chen gong <curry.gong@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Try to resize BAR0 to let CPU access all of VRAM on Navi. Syncs
code with previous gfx generations from commit d6895ad39f
("drm/amdgpu: resize VRAM BAR for CPU access v6").
Signed-off-by: Alan Swanson <reiver@improbability.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It's not implemented yet so just drop it so the sysfs
pcie bw file returns an appropriate error instead of
garbage.
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-By: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
1. Initialize the counters to 0 in case the callback
fails to initialize them.
2. The counters don't exist on APUs so return an error
for them.
3. Return an error if the callback doesn't exist.
Reviewed-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-By: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In free memory of gpu path, remove bo from validate_list to make sure
restore worker don't access the BO any more, then unregister bo MMU
interval notifier. Otherwise, the restore worker will crash in the
middle of validating BO user pages if MMU interval notifer is gone.
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>