Help with intermittent freezing after resume from suspend in 20.04 and amd RX570 graphics
I've had this problem with 19.10 and now 20.04. I did not have this problem with 18.04 which I built this computer with back in Feb 2020. I did a clean install for 20.04. Briefly, after some variable period (minutes up to an hour) scrolling in FireFox, the mouse becomes inactive (I can move it but clicks don't register) and a few seconds later the system becomes completely unresponsive, often with a blank or false-color low-res screen and requires a hard boot to reset.
Typically, this happens after a resume from suspend, but has happened after a fresh boot as well (more rarely). It is, however, an intermittent issue and I can't say for sure what the preconditions are. Scrolling in FireFox seems to be more or less a constant trigger. My suspicion is that there is some race condition on resume or initialization that causes an improper condition in the amdgpu drivers. I have searched for this issue by the errors in syslog and followed what clues I can glean - re-installing the amdgpu drivers from AMD site, updating the kernel (now at 5.8.1), but nothing has helped. Syslog errors always start with:
Aug 18 21:05:26 mvlLinux-pc kernel: [28611.718399] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] ERROR Waiting for fences timed out!
Aug 18 21:05:31 mvlLinux-pc kernel: [28611.718497] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] ERROR Waiting for fences timed out!
Aug 18 21:05:31 mvlLinux-pc kernel: [28617.360497] [drm:amdgpu_job_timedout [amdgpu]] ERROR ring gfx timeout, signaled seq=624416, emitted seq=624418
Aug 18 21:05:31 mvlLinux-pc kernel: [28617.360584] [drm:amdgpu_job_timedout [amdgpu]] ERROR Process information: process gnome-shell pid 2328 thread gnome-shel:cs0 pid 2354
Aug 18 21:05:31 mvlLinux-pc kernel: [28617.360590] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
Hardware summary:
Motherboard Asus PRIME X470-PRO
Processor: AMD Ryzen 5 2600X Six-Core
Processor
Video: Asus Strix Radeon RX570
Ram: CRUCIAL 16 GiB
Further details available, of course. Any suggestions gratefully accepted. I've found using Linux lately to be just too crash-prone to use.
@heynnema
I don't think memory is the issue, but here it is:
free -h
total used free shared buff/cache available<br />
Mem: 15Gi 2.7Gi 10Gi 235Mi 2.0Gi 12Gi<br />
Swap: 2.0Gi 0B 2.0Gi
sudo dmidecode -s bios-version
5406
sudo lshw -C memory
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: 5406
date: 11/13/2019
size: 64KiB
capacity: 16MiB
capabilities: pci apm upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
*-memory
description: System Memory
physical id: 2e
slot: System board or motherboard
size: 16GiB
*-bank:0
description: [empty]
product: Unknown
vendor: Unknown
physical id: 0
serial: Unknown
slot: DIMM_A1
*-bank:1
description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2400 MHz (0.4 ns)
product: BLS8G4D32AESBK.M8FE1
vendor: CRUCIAL
physical id: 1
serial: E316F686
slot: DIMM_A2
size: 8GiB
width: 64 bits
clock: 2400MHz (0.4ns)
*-bank:2
description: [empty]
product: Unknown
vendor: Unknown
physical id: 2
serial: Unknown
slot: DIMM_B1
*-bank:3
description: DIMM DDR4 Synchronous Unbuffered (Unregistered) 2400 MHz (0.4 ns)
product: BLS8G4D32AESBK.M8FE1
vendor: CRUCIAL
physical id: 3
serial: E316E264
slot: DIMM_B2
size: 8GiB
width: 64 bits
clock: 2400MHz (0.4ns)
*-cache:0
description: L1 cache
physical id: 30
slot: L1 - Cache
size: 576KiB
capacity: 576KiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=1
*-cache:1
description: L2 cache
physical id: 31
slot: L2 - Cache
size: 3MiB
capacity: 3MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=2
*-cache:2
description: L3 cache
physical id: 32
slot: L3 - Cache
size: 16MiB
capacity: 16MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=3
@heynnema
Adding more of the error messages from freeze after suspend/resume:
Aug 29 08:36:17 mvlLinux-pc systemd-resolved[830]: Server returned error NXDOMAIN, mitigating potential DNS violation DVE-2018-0001, retrying transaction with reduced feature level UDP.
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248541] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248550] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248553] pcieport 0000:00:03.1: AER: device [1022:1453] error status/mask=00200000/04400000
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248556] pcieport 0000:00:03.1: AER: [21] ACSViol (First)
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248559] amdgpu 0000:09:00.0: AER: can't recover (no error_detected callback)
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248561] snd_hda_intel 0000:09:00.1: AER: can't recover (no error_detected callback)
Aug 29 08:39:37 mvlLinux-pc kernel: [ 8030.248587] pcieport 0000:00:03.1: AER: device recovery failed
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331741] pcieport 0000:00:03.1: AER: Multiple Uncorrected (Non-Fatal) error received: 0000:00:00.0
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331751] pcieport 0000:00:03.1: AER: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID)
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331756] pcieport 0000:00:03.1: AER: device [1022:1453] error status/mask=00200000/04400000
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331759] pcieport 0000:00:03.1: AER: [21] ACSViol (First)
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331763] amdgpu 0000:09:00.0: AER: can't recover (no error_detected callback)
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331765] snd_hda_intel 0000:09:00.1: AER: can't recover (no error_detected callback)
Aug 29 08:39:39 mvlLinux-pc kernel: [ 8032.331799] pcieport 0000:00:03.1: AER: device recovery failed
Aug 29 08:39:47 mvlLinux-pc kernel: [ 8040.390787] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Aug 29 08:39:47 mvlLinux-pc kernel: [ 8040.390799] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:49:crtc-1] flip_done timed out
Aug 29 08:39:49 mvlLinux-pc kernel: [ 8042.438900] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=22040, emitted seq=22042
Aug 29 08:39:49 mvlLinux-pc kernel: [ 8042.438988] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Aug 29 08:39:49 mvlLinux-pc kernel: [ 8042.438995] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
Aug 29 08:39:50 mvlLinux-pc kernel: [ 8043.146715] amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Aug 29 08:39:50 mvlLinux-pc kernel: [ 8043.146795] [drm:gfx_v8_0_kcq_disable.isra.0 [amdgpu]] *ERROR* KCQ disable failed
Aug 29 08:39:50 mvlLinux-pc kernel: [ 8043.423697] amdgpu: cp is busy, skip halt cp
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8043.700692] amdgpu: rlc is busy, skip halt rlc
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8043.701711] amdgpu 0000:09:00.0: amdgpu: GPU BACO reset
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.346691] amdgpu 0000:09:00.0: amdgpu: GPU reset succeeded, trying to resume
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.348500] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.348515] [drm] VRAM is lost due to GPU reset!
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.678238] amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.678302] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110
Aug 29 08:39:51 mvlLinux-pc kernel: [ 8044.678328] amdgpu 0000:09:00.0: amdgpu: GPU reset(1) failed
Aug 29 08:39:52 mvlLinux-pc kernel: [ 8044.680626] amdgpu 0000:09:00.0: amdgpu: GPU reset end with ret = -110
Aug 29 08:39:54 mvlLinux-pc kernel: [ 8047.302923] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
Aug 29 08:40:02 mvlLinux-pc kernel: [ 8054.727115] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=22042, emitted seq=22042
Aug 29 08:40:02 mvlLinux-pc kernel: [ 8054.727203] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
Aug 29 08:40:02 mvlLinux-pc kernel: [ 8054.727216] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
Aug 29 08:40:46 mvlLinux-pc systemd-modules-load[388]: Inserted module 'lp'
Aug 29 08:40:46 mvlLinux-pc systemd-modules-load[388]: Inserted module 'ppdev'
Aug 29 08:40:46 mvlLinux-pc kernel: [ 0.000000] Linux version 5.8.1-050801-generic (kernel@sita) (gcc (Ubuntu 10.2.0-5ubuntu2) 10.2.0, GNU ld (GNU Binutils for Ubuntu) 2.35) #202008111432 SMP Tue Aug 11 14:34:42 UTC 2020
Aug 29 08:40:46 mvlLinux-pc kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.8.1-050801-generic root=UUID=566746e2-b4e2-42a6-b18a-fa84ebca61aa ro quiet splash vt.handoff=7`
I have seen errors akin to this in bug reports, always involving AMD graphics, but mostly integrated APUs, not my discrete setup. This problem appeared for me with the move from Ubuntu 18.04 to 19.10 and others have indicated that newer kernels fixed it, but updating to 5.8.1 did not help me. Given the intermittent nature of the issue, it is possible others only think it's gone away and several I've seen note that it came back. No resolution seen in any of the literally dozens of threads I've read so far. I think I may try putting an older video card in just to see if that narrows it down. Thanks!
@heynnema
After setting pci=noaer in the grub command line, I got the same error on resume from suspend. Dmesg output from the resume:
[ 2456.697121] ACPI: Low-level resume complete
[ 2456.697163] ACPI: EC: EC started
[ 2456.697164] PM: Restoring platform NVS memory
[ 2456.697710] Enabling non-boot CPUs ...
[ 2456.697747] x86: Booting SMP configuration:
[ 2456.697748] smpboot: Booting Node 0 Processor 1 APIC 0x2
[ 2456.697845] microcode: CPU1: patch_level=0x0800820d
[ 2456.700139] ACPI: \_PR_.C002: Found 2 idle states
[ 2456.700328] CPU1 is up
[ 2456.700344] smpboot: Booting Node 0 Processor 2 APIC 0x4
[ 2456.700442] microcode: CPU2: patch_level=0x0800820d
[ 2456.702609] ACPI: \_PR_.C004: Found 2 idle states
[ 2456.702779] CPU2 is up
[ 2456.702793] smpboot: Booting Node 0 Processor 3 APIC 0x8
[ 2456.702921] microcode: CPU3: patch_level=0x0800820d
[ 2456.705121] ACPI: \_PR_.C006: Found 2 idle states
[ 2456.705330] CPU3 is up
[ 2456.705344] smpboot: Booting Node 0 Processor 4 APIC 0xa
[ 2456.705468] microcode: CPU4: patch_level=0x0800820d
[ 2456.707683] ACPI: \_PR_.C008: Found 2 idle states
[ 2456.707886] CPU4 is up
[ 2456.707901] smpboot: Booting Node 0 Processor 5 APIC 0xc
[ 2456.708026] microcode: CPU5: patch_level=0x0800820d
[ 2456.710215] ACPI: \_PR_.C00A: Found 2 idle states
[ 2456.710422] CPU5 is up
[ 2456.710435] smpboot: Booting Node 0 Processor 6 APIC 0x1
[ 2456.710561] microcode: CPU6: patch_level=0x0800820d
[ 2456.712760] ACPI: \_PR_.C001: Found 2 idle states
[ 2456.713055] CPU6 is up
[ 2456.713084] smpboot: Booting Node 0 Processor 7 APIC 0x3
[ 2456.713186] microcode: CPU7: patch_level=0x0800820d
[ 2456.715367] ACPI: \_PR_.C003: Found 2 idle states
[ 2456.715594] CPU7 is up
[ 2456.715609] smpboot: Booting Node 0 Processor 8 APIC 0x5
[ 2456.715709] microcode: CPU8: patch_level=0x0800820d
[ 2456.717892] ACPI: \_PR_.C005: Found 2 idle states
[ 2456.718131] CPU8 is up
[ 2456.718143] smpboot: Booting Node 0 Processor 9 APIC 0x9
[ 2456.718271] microcode: CPU9: patch_level=0x0800820d
[ 2456.720463] ACPI: \_PR_.C007: Found 2 idle states
[ 2456.720728] CPU9 is up
[ 2456.720742] smpboot: Booting Node 0 Processor 10 APIC 0xb
[ 2456.720868] microcode: CPU10: patch_level=0x0800820d
[ 2456.723067] ACPI: \_PR_.C009: Found 2 idle states
[ 2456.723342] CPU10 is up
[ 2456.723356] smpboot: Booting Node 0 Processor 11 APIC 0xd
[ 2456.723483] microcode: CPU11: patch_level=0x0800820d
[ 2456.725687] ACPI: \_PR_.C00B: Found 2 idle states
[ 2456.725971] CPU11 is up
[ 2456.727331] ACPI: Waking up from system sleep state S3
[ 2456.728144] ACPI: EC: interrupt unblocked
[ 2456.810892] ACPI: EC: event unblocked
[ 2456.810961] usb usb1: root hub lost power or was reset
[ 2456.810962] usb usb2: root hub lost power or was reset
[ 2456.811202] usb usb3: root hub lost power or was reset
[ 2456.811203] usb usb4: root hub lost power or was reset
[ 2456.811595] sd 1:0:0:0: [sda] Starting disk
[ 2456.811933] serial 00:03: activated
[ 2457.124313] ata5: SATA link down (SStatus 0 SControl 330)
[ 2457.124331] ata6: SATA link down (SStatus 0 SControl 330)
[ 2457.124375] ata7: SATA link down (SStatus 0 SControl 330)
[ 2457.124474] ata1: SATA link down (SStatus 0 SControl 300)
[ 2457.124622] ata9: SATA link down (SStatus 0 SControl 300)
[ 2457.128321] ata3: SATA link down (SStatus 0 SControl 330)
[ 2457.168893] nvme nvme0: Shutdown timeout set to 8 seconds
[ 2457.181058] ata4: SATA link down (SStatus 0 SControl 330)
[ 2457.204000] nvme nvme0: 32/0/0 default/read/poll queues
[ 2457.215120] usb 4-1: reset SuperSpeed Gen 1 USB device number 2 using xhci_hcd
[ 2457.283762] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 2457.366979] usb 4-2: reset SuperSpeed Gen 1 USB device number 3 using xhci_hcd
[ 2457.403433] [drm] UVD and UVD ENC initialized successfully.
[ 2457.526411] [drm] VCE initialized successfully.
[ 2457.586664] usb 3-1: reset high-speed USB device number 2 using xhci_hcd
[ 2457.850542] ata8: failed to resume link (SControl 0)
[ 2457.850553] ata8: SATA link down (SStatus 0 SControl 0)
[ 2458.122724] usb 3-1.1: reset full-speed USB device number 3 using xhci_hcd
[ 2460.178827] igb 0000:07:00.0 enp7s0: igb: enp7s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 2462.202613] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2462.379171] usb 5-2.2: reset low-speed USB device number 5 using xhci_hcd
[ 2462.607145] ata2.00: configured for UDMA/133
[ 2467.726718] PM: dpm_run_callback(): usb_dev_resume+0x0/0x20 returns -5
[ 2467.726722] PM: Device 5-2.2 failed to resume async: error -5
[ 2467.727071] OOM killer enabled.
[ 2467.727072] Restarting tasks ... done.
[ 2467.821378] PM: suspend exit
[ 2467.887621] usb 5-2.2: USB disconnect, device number 5
[ 2467.994352] usb 5-2.2: new low-speed USB device number 7 using xhci_hcd
[ 2468.103947] usb 5-2.2: New USB device found, idVendor=0764, idProduct=0501, bcdDevice= 0.01
[ 2468.103949] usb 5-2.2: New USB device strings: Mfr=3, Product=1, SerialNumber=0
[ 2468.103950] usb 5-2.2: Product: ST Series
[ 2468.103951] usb 5-2.2: Manufacturer: CPS
[ 2468.161509] hid-generic 0003:0764:0501.0008: hiddev2,hidraw5: USB HID v1.10 Device [CPS ST Series] on usb-0000:0a:00.3-2.2/input0
[ 2471.910903] igb 0000:07:00.0 enp7s0: igb: enp7s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 2472.022608] IPv6: ADDRCONF(NETDEV_CHANGE): enp7s0: link becomes ready
[ 2575.502700] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
[ 2575.502806] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]] *ERROR* Waiting for fences timed out!
[ 2580.632921] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=84864, emitted seq=84866
[ 2580.633010] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 1874 thread Xorg:cs0 pid 1877
[ 2580.633018] amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
[ 2581.335993] amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
[ 2581.336073] [drm:gfx_v8_0_kcq_disable.isra.0 [amdgpu]] *ERROR* KCQ disable failed
[ 2581.613633] amdgpu: cp is busy, skip halt cp
[ 2581.890354] amdgpu: rlc is busy, skip halt rlc
[ 2581.891376] amdgpu 0000:09:00.0: amdgpu: GPU BACO reset
[ 2582.546375] amdgpu 0000:09:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 2582.548207] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 2582.548220] [drm] VRAM is lost due to GPU reset!
[ 2582.878644] amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
[ 2582.878708] [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v8_0> failed -110
[ 2582.878764] amdgpu 0000:09:00.0: amdgpu: GPU reset(2) failed
[ 2582.881066] amdgpu 0000:09:00.0: amdgpu: GPU reset end with ret = -110
[ 2585.742804] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 2585.742817] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:49:crtc-1] flip_done timed out
[ 2588.558904] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:47:crtc-0] flip_done timed out
[ 2592.910983] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[ 2603.150799] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
At that point the screen is blank and system is frozen. It looks pretty much the same as usual. The GPU reset is retried and times out and fails so to my mind, what happens is that the GPU cannot be recovered/reset after suspend. I have seen it on a fresh reboot, but much more rarely, and I can usually work/play for hours - just so long as I don't allow it to suspend. Thanks!
BIOS
Asus PRIME X470-PRO
You have BIOS version 5406.
There's a newer BIOS available, version 5603, dated 8/10/2020, and can be downloaded here.
Note: Confirm that I have the correct web page for your motherboard model #.
Note: Have good backups before updating the BIOS.
memory
Ryzen processors are notorious for memory compatibility issues.
Go to https://www.memtest86.com/ and download/run their free memtest
to test your memory. Get at least one complete pass of all the 4/4 tests to confirm good memory. This may take many hours to complete.
Memory Support page: https://www.asus.com/us/Motherboards/PRIME-X470-PRO/HelpDesk_QVL/
Update #1:
Swap
Let's increase your /swapfile from 2G to 4G...
Note: Incorrect use of the dd
command can cause data loss. Suggest copy/paste.
sudo swapoff -a # turn off swap
sudo rm -i /swapfile # remove old /swapfile
sudo dd if=/dev/zero of=/swapfile bs=1M count=4096
sudo chmod 600 /swapfile # set proper file protections
sudo mkswap /swapfile # init /swapfile
sudo swapon /swapfile # turn on swap
free -h # confirm 4G RAM and 4G swap
Confirm this line in /etc/fstab... and confirm no other “swap” lines...
/swapfile none swap sw 0 0
reboot # reboot and verify operation
Update #2:
I hesitate to offer this, as I suspect that your AMD video card may be defective... but you can try this...
AER (Advanced Error Reporting)
sudo -H gedit /etc/default/grub
# edit this file
Find:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash"
Change it to:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash pci=noaer"
sudo update-grub
# update GRUB
reboot
# reboot the computer
Update #3:
Disconnected a bunch of USB stuff, and suspect/resume is working now. Suspect a couple of USB hubs.