331 Commits

Author SHA1 Message Date
Ben Skeggs
46fc98bfb8 drm/nouveau/pmu/gm20x: don't pretend we support loading with our custom FW
It technically loads, and runs, but is ultimately pointless outside of
a very narrow window (fanless systems where one wants to attempt using
the, broken for a lot of gm20x, memory reclocking code).

It's also potentially dangerous to override the VBIOS-provided "Pre-OS"
PMU, which would be responsible for fan control otherwise.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-07-24 18:50:48 +10:00
Ben Skeggs
de088372da drm/nouveau/acr: store a mask of LS falcons the controlling LSFW can bootstrap
This will prevent some pain with broken firmware trees, as under some
circumstances the HSFW can fail and leave the GPU in a state we don't
know how to recover from.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-07-24 18:50:48 +10:00
Timur Tabi
804f570502 drm/nouveau/tmr: fix nvkm_usec/nvkm_msec definitions
nvkm_timer_wait_init() takes a u64 as a duration parameter, but the
expression "(m) * 1000" will be promoted only to a 32-bit integer,
if 'm' is also an integer.  Changing the 1000 to 1000ULL ensures that
the expression will be 64 bits.

This change currently has no effect as there are no callers of
nvkm_msec() that exceed 2000ms.

Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-07-24 18:50:47 +10:00
Ben Skeggs
3fa8fe1572 drm/nouveau/acr/tu10x: initial support
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:30 +10:00
Ben Skeggs
22dcda45a3 drm/nouveau/acr: implement new subdev to replace "secure boot"
ACR is responsible for managing the firmware for LS (Low Secure) falcons,
this was previously handled in the driver by SECBOOT.

This rewrite started from some test code that attempted to replicate the
procedure RM uses in order to debug early Turing ACR firmwares that were
provided by NVIDIA for development.

Compared with SECBOOT, the code is structured into more individual steps,
with the aim of making the process easier to follow/debug, whilst making
it possible to support newer firmware versions that may have a different
binary format or API interface.

The HS (High Secure) binary(s) are now booted earlier in device init, to
match the behaviour of RM, whereas SECBOOT would delay this until we try
to boot the first LS falcon.

There's also additional debugging features available, with the intention
of making it easier to solve issues during FW/HW bring-up in the future.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:29 +10:00
Ben Skeggs
ebe52a58ac drm/nouveau/fb/gp102-: unlock VPR as part of FB init
We perform memory allocations long before we hit the code in SECBOOT that
would unlock the VPR, which could potentially result in memory allocation
within the locked region.

Run the scrubber binary right after VRAM init to ensure we don't.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:29 +10:00
Ben Skeggs
7a4dde711b drm/nouveau/secboot: move code to boot LS falcons to subdevs
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:29 +10:00
Ben Skeggs
d114a1393f drm/nouveau/flcn/msgq: move handling of init message to subdevs
When the PMU/SEC2 LS FWs have booted, they'll send a message to the host
with various information, including the configuration of message/command
queues that are available.

Move the handling for this to the relevant subdevs.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:29 +10:00
Ben Skeggs
86ce2a7153 drm/nouveau/flcn/cmdq: move command generation to subdevs
This moves the code to generate commands for the ACR unit of the PMU/SEC2 LS
firmwares to those subdevs.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:29 +10:00
Ben Skeggs
2e8a65973b drm/nouveau/flcn/cmdq: split the condition for queue readiness vs pmu acr readiness
This is to allow for proper separation of the LS interface code from the
queue handling code.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:28 +10:00
Ben Skeggs
22431189d6 drm/nouveau/flcn/msgq: explicitly create message queue from subdevs
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.

This is an incremental step towards that goal.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:28 +10:00
Ben Skeggs
acc466ab46 drm/nouveau/flcn/cmdq: explicitly create command queue(s) from subdevs
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.

This is an incremental step towards that goal.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:28 +10:00
Ben Skeggs
8763955ba7 drm/nouveau/flcn/qmgr: explicitly create queue manager from subdevs
Code to interface with LS firmwares is being moved to the subdevs where it
belongs, rather than living in the common falcon code.

This is an incremental step towards that goal.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:28 +10:00
Ben Skeggs
2952a2b42e drm/nouveau/pmu: initialise SW state for falcon from constructor
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:26 +10:00
Ben Skeggs
e905736c6d drm/nouveau/pmu/gp10b: split from gm20b implementation
ACR LS FW loading is moving out of SECBOOT and into their specific subdevs,
and the available GM20B/GP10B FWs have interface differences.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:26 +10:00
Ben Skeggs
334815ef31 drm/nouveau/gsp: initialise SW state for falcon from constructor
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:26 +10:00
Ben Skeggs
c63fe2e704 drm/nouveau/acr: add loaders for currently available LS firmware images
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:26 +10:00
Ben Skeggs
67e7c6cf8f drm/nouveau/acr: add stub implementation for all GPUs currently supported by SECBOOT
PMU, SEC2 and GR will be modified to register their falcons with ACR before
the main commit switching everything over.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:26 +10:00
Ben Skeggs
31bef57f6c drm/nouveau/core: define ACR subdev
This will replace the current SECBOOT subdev for handling firmware on
secure falcons.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:26 +10:00
Thierry Reding
0d0d498265 drm/nouveau/ltc/gp10b: Add custom L2 cache implementation
There are extra registers that need to be programmed to make the level 2
cache work on GP10B, such as the stream ID register that is used when an
SMMU is used to translate memory addresses.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:49:59 +10:00
Thierry Reding
0ac7facb70 drm/nouveau/fault: Add support for GP10B
There is no BAR2 on GP10B and there is no need to map through BAR2
because all memory is shared between the GPU and the CPU. Add a custom
implementation of the fault sub-device that uses nvkm_memory_addr()
instead of nvkm_memory_bar2() to return the address of a pinned fault
buffer.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:49:58 +10:00
Mark Menzynski
3c978f7395 drm/nouveau/gpio: check function 76 in the power check as well
Added GPIO is "Power Alert". It's uncertain if this
GPIO is set on GPU initialization or only if a change is detected by the
GPU at runtime.

This GPIO can be found on Tesla and sometimes on Fermi GPUs.

Untested, wrote according to documentation.

Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-08-23 12:55:34 +10:00
Mark Menzynski
940794b3dd drm/nouveau/gpio: check the gpio function 16 in the power check as well
Added GPIO is "Thermal and External Power Detect". It's uncertain if this
GPIO is set on GPU initialization or only if a change is detected by the
GPU at runtime.

This GPIO can be found in Rankine and Curie and rarely on Tesla GPUs
VBIOS.

Untested, wrote according to documentation.

Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-08-23 12:55:33 +10:00
Mark Menzynski
72251fac06 drm/nouveau/gpio: fail if gpu external power is missing
Currently, nouveau doesn't check if GPU is missing power. This
patch makes nouveau fail when this happens on latest GPUs.

It checks GPIO function 121 (External Power Emergency), which
should detect power problems on GPU initialization.

This can be disabled with nouveau.config=NvPowerChecks=1

Tested on TU104, GP106 and GF100.

v3:
*  Add config override for disabling power checks

Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-08-23 12:55:33 +10:00
Mark Menzynski
e79ef1c007 drm/nouveau/bios/gpio: sort gpios by values
One gpio was in wrong place, moved it for better readability.

Signed-off-by: Mark Menzynski <mmenzyns@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-08-23 12:55:33 +10:00
Ben Skeggs
69cbbb7b04 drm/nouveau/therm: don't attempt fan control where PMU is already managing it
There's already a condition in place which attempts to detect this, but
since we've begun to require a PMU subdev even on boards where we don't
load a custom FW, it's become inaccurate.

This will prevent unnecessarily running a periodic fan update thread on
GP100 and newer, where we don't yet override the default PMU FW.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-08-23 12:55:33 +10:00
Ben Skeggs
f0790cda65 drm/nouveau/therm: skip probing for devices not specified in thermal tables
Saves some time during driver load, as described by the relevant section[1]
of the DCB 4.x specification.

[1] https://nvidia.github.io/open-gpu-doc/DCB/DCB-4.x-Specification.html#_i2c_device_table

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-08-23 12:55:33 +10:00
Ilia Mirkin
b7019ac550 drm/nouveau: fix bogus GPL-2 license header
The bulk SPDX addition made all these files into GPL-2.0 licensed files.
However the remainder of the project is MIT-licensed, these files
(primarily header files) were simply missing the boiler plate and got
caught up in the global update.

Fixes: b24413180f5 (License cleanup: add SPDX GPL-2.0 license identifier to files with no license)
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Acked-by: Emil Velikov <emil.l.velikov@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-07-19 16:26:50 +10:00
Lyude Paul
342406e4fb drm/nouveau/i2c: Disable i2c bus access after ->fini()
For a while, we've had the problem of i2c bus access not grabbing
a runtime PM ref when it's being used in userspace by i2c-dev, resulting
in nouveau spamming the kernel log with errors if anything attempts to
access the i2c bus while the GPU is in runtime suspend. An example:

[  130.078386] nouveau 0000:01:00.0: i2c: aux 000d: begin idle timeout ffffffff

Since the GPU is in runtime suspend, the MMIO region that the i2c bus is
on isn't accessible. On x86, the standard behavior for accessing an
unavailable MMIO region is to just return ~0.

Except, that turned out to be a lie. While computers with a clean
concious will return ~0 in this scenario, some machines will actually
completely hang a CPU on certian bad MMIO accesses. This was witnessed
with someone's Lenovo ThinkPad P50, where sensors-detect attempting to
access the i2c bus while the GPU was suspended would result in a CPU
hang:

  CPU: 5 PID: 12438 Comm: sensors-detect Not tainted 5.0.0-0.rc4.git3.1.fc30.x86_64 #1
  Hardware name: LENOVO 20EQS64N17/20EQS64N17, BIOS N1EET74W (1.47 ) 11/21/2017
  RIP: 0010:ioread32+0x2b/0x30
  Code: 81 ff ff ff 03 00 77 20 48 81 ff 00 00 01 00 76 05 0f b7 d7 ed c3
  48 c7 c6 e1 0c 36 96 e8 2d ff ff ff b8 ff ff ff ff c3 8b 07 <c3> 0f 1f
  40 00 49 89 f0 48 81 fe ff ff 03 00 76 04 40 88 3e c3 48
  RSP: 0018:ffffaac3c5007b48 EFLAGS: 00000292 ORIG_RAX: ffffffffffffff13
  RAX: 0000000001111000 RBX: 0000000001111000 RCX: 0000043017a97186
  RDX: 0000000000000aaa RSI: 0000000000000005 RDI: ffffaac3c400e4e4
  RBP: ffff9e6443902c00 R08: ffffaac3c400e4e4 R09: ffffaac3c5007be7
  R10: 0000000000000004 R11: 0000000000000001 R12: ffff9e6445dd0000
  R13: 000000000000e4e4 R14: 00000000000003c4 R15: 0000000000000000
  FS:  00007f253155a740(0000) GS:ffff9e644f600000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00005630d1500358 CR3: 0000000417c44006 CR4: 00000000003606e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   g94_i2c_aux_xfer+0x326/0x850 [nouveau]
   nvkm_i2c_aux_i2c_xfer+0x9e/0x140 [nouveau]
   __i2c_transfer+0x14b/0x620
   i2c_smbus_xfer_emulated+0x159/0x680
   ? _raw_spin_unlock_irqrestore+0x1/0x60
   ? rt_mutex_slowlock.constprop.0+0x13d/0x1e0
   ? __lock_is_held+0x59/0xa0
   __i2c_smbus_xfer+0x138/0x5a0
   i2c_smbus_xfer+0x4f/0x80
   i2cdev_ioctl_smbus+0x162/0x2d0 [i2c_dev]
   i2cdev_ioctl+0x1db/0x2c0 [i2c_dev]
   do_vfs_ioctl+0x408/0x750
   ksys_ioctl+0x5e/0x90
   __x64_sys_ioctl+0x16/0x20
   do_syscall_64+0x60/0x1e0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe
  RIP: 0033:0x7f25317f546b
  Code: 0f 1e fa 48 8b 05 1d da 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff
  ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01
  f0 ff ff 73 01 c3 48 8b 0d ed d9 0c 00 f7 d8 64 89 01 48
  RSP: 002b:00007ffc88caab68 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  RAX: ffffffffffffffda RBX: 00005630d0fe7260 RCX: 00007f25317f546b
  RDX: 00005630d1598e80 RSI: 0000000000000720 RDI: 0000000000000003
  RBP: 00005630d155b968 R08: 0000000000000001 R09: 00005630d15a1da0
  R10: 0000000000000070 R11: 0000000000000246 R12: 00005630d1598e80
  R13: 00005630d12f3d28 R14: 0000000000000720 R15: 00005630d12f3ce0
  watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [sensors-detect:12438]

Yikes! While I wanted to try to make it so that accessing an i2c bus on
nouveau would wake up the GPU as needed, airlied pointed out that pretty
much any usecase for userspace accessing an i2c bus on a GPU (mainly for
the DDC brightness control that some displays have) is going to only be
useful while there's at least one display enabled on the GPU anyway, and
the GPU never sleeps while there's displays running.

Since teaching the i2c bus to wake up the GPU on userspace accesses is a
good deal more difficult than it might seem, mostly due to the fact that
we have to use the i2c bus during runtime resume of the GPU, we instead
opt for the easiest solution: don't let userspace access i2c busses on
the GPU at all while it's in runtime suspend.

Changes since v1:
* Also disable i2c busses that run over DP AUX

Signed-off-by: Lyude Paul <lyude@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-05-01 11:08:39 +10:00
Ben Skeggs
13e9572906 drm/nouveau/fault/gp100: expose MaxwellFaultBufferA
This nvclass exposes the replayable fault buffer, which will be used
by SVM to manage GPU page faults.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 09:00:00 +10:00
Ben Skeggs
ab2ee9ffa3 drm/nouveau/mmu/gp100-: support vmms with gcc/tex replayable faults enabled
Some GPU units are capable of supporting "replayable" page faults, where
the execution unit will wait for SW to fixup GPU page tables rather than
triggering a channel-fatal fault.

This feature isn't useful (it's harmful, even) unless something like HMM
is being used to manage events appearing in the replayable fault buffer,
so, it's disabled by default.

This commit allows a client to request it be enabled.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 09:00:00 +10:00
Ben Skeggs
a5ff307fe1 drm/nouveau/mmu: add a privileged method to directly manage PTEs
This provides a somewhat more direct method of manipulating the GPU page
tables, which will be required to support SVM.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 09:00:00 +10:00
Ben Skeggs
8e68271d7c drm/nouveau/mmu: store mapped flag separately from memory pointer
This will be used to support a privileged client providing PTEs directly,
without a memory object to use as a reference.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 09:00:00 +10:00
Ben Skeggs
2944b19b5c drm/nouveau/gsp/gv100-: instantiate GSP falcon
We need this for Turing ACR, but it's present from Volta onwards.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 08:59:58 +10:00
Ben Skeggs
eec9ffe47f drm/nouveau/top: add function to lookup PRI address for devices
Will be using this in upcoming changes to avoid the need for entirely
new subdevs to deal with Turing register moves.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 08:59:58 +10:00
Ben Skeggs
78cdadb840 drm/nouveau/core: define GSP subdev
Exact meaning of the acronym is unknown, but we need this for Turing ACR.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 08:59:58 +10:00
Ben Skeggs
954f97983c drm/nouveau/fault/tu102: rename implementation from tu104
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 08:59:58 +10:00
Ben Skeggs
ef7664d9df drm/nouveau/bar/tu102: rename implementation from tu104
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 08:59:57 +10:00
Ben Skeggs
c011b25421 drm/nouveau/mmu/tu102: rename implementation from tu104
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 08:59:57 +10:00
Ben Skeggs
fd95bfbdb9 drm/nouveau/mc/tu102: rename implementation from tu104
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 08:59:57 +10:00
Ben Skeggs
b51f9dfac7 drm/nouveau/devinit/tu102: rename implementation from tu104
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 08:59:57 +10:00
Ilia Mirkin
fc78224274 drm/nouveau/volt/gf117: fix speedo readout register
GF117 appears to use the same register as GK104 (but still with the
general Fermi readout mechanism).

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108980
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2019-02-20 08:59:57 +10:00
Ben Skeggs
17fb2807c6 drm/nouveau/fault/tu104: initial support
New registers.

Currently uncertain how exactly to mask fault buffer interrupts.  This will
likely be corrected at around the same time as the new MC interrupt stuff
has been properly figured out and implemented.

For the moment, it shouldn't matter too much.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-12-11 15:37:54 +10:00
Ben Skeggs
838efaa574 drm/nouveau/bar/tu104: initial support
New registers.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-12-11 15:37:53 +10:00
Ben Skeggs
7986f813c6 drm/nouveau/mmu/tu104: initial support
New flush method.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-12-11 15:37:53 +10:00
Ben Skeggs
f2e55b9ea9 drm/nouveau/mc/tu104: initial support
Things are a bit different here on Turing, and will require further changes
yet once I've investigated them more thoroughly.

For now though, the existing GP100 code is compatible enough with one small
hack to forward on fault buffer interrupts.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-12-11 15:37:52 +10:00
Ben Skeggs
43d61cda30 drm/nouveau/devinit/tu104: initial support
The GPU executes DEVINIT itself now, which makes our lives a bit easier.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-12-11 15:37:50 +10:00
Ben Skeggs
302daab1a7 drm/nouveau/fifo/gf100-: call into BAR to reset BARs after MMU fault
This is needed for Turing, but we're supposed to wait for completion after
re-writing the value on older GPUs anyway.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-12-11 15:37:47 +10:00
Ben Skeggs
e4f90a35c9 drm/nouveau/tmr: detect stalled gpu timer and break out of waits
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-12-11 15:37:45 +10:00
Ben Skeggs
7919faab51 drm/nouveau/bios: translate USB-C connector type
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2018-12-11 15:37:45 +10:00