Complete freeze on a Thinkpad T520 [closed]

Ubuntu 11.04 on my Thinkpad T520 has been randomly freezing every since I installed it.

I asked the following question a long time ago but it hasn't really helped me: How do I debug when the system freezes or when it crashes back to login?

Here is a complete copy paste of my xsession.errors file:

http://pastebin.com/7rBs0EBH

I also tried everything in the following question:

What should I do when Ubuntu freezes?

I tried REISUB and the other suggestions in that question but nothing seems to work. The only think that works is resetting the laptop.

Any help is appreciated and if I need to provide more information/logs just ask I am really motivated to fix this.

Update

    00:00.0 Host bridge: Intel Corporation 2nd Generation Core Processor Family DRAM Controller (rev 09)
    00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)
    00:16.0 Communication controller: Intel Corporation 6 Series Chipset Family MEI Controller #1 (rev 04)
    00:16.3 Serial controller: Intel Corporation 6 Series Chipset Family KT Controller (rev 04)
    00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network Connection (rev 04)
    00:1a.0 USB Controller: Intel Corporation 6 Series Chipset Family USB Enhanced Host Controller #2 (rev 04)
    00:1b.0 Audio device: Intel Corporation 6 Series Chipset Family High Definition Audio Controller (rev 04)
    00:1c.0 PCI bridge: Intel Corporation 6 Series Chipset Family PCI Express Root Port 1 (rev b4)
    00:1c.1 PCI bridge: Intel Corporation 6 Series Chipset Family PCI Express Root Port 2 (rev b4)
    00:1c.3 PCI bridge: Intel Corporation 6 Series Chipset Family PCI Express Root Port 4 (rev b4)
    00:1c.4 PCI bridge: Intel Corporation 6 Series Chipset Family PCI Express Root Port 5 (rev b4)
    00:1d.0 USB Controller: Intel Corporation 6 Series Chipset Family USB Enhanced Host Controller #1 (rev 04)
    00:1f.0 ISA bridge: Intel Corporation 6 Series Chipset Family LPC Controller (rev 04)
    00:1f.2 SATA controller: Intel Corporation 6 Series Chipset Family 6 port SATA AHCI Controller (rev 04)
    00:1f.3 SMBus: Intel Corporation 6 Series Chipset Family SMBus Controller (rev 04)
    03:00.0 Network controller: Intel Corporation Centrino Ultimate-N 6300 (rev 35)
    0d:00.0 System peripheral: Ricoh Co Ltd Device e823 (rev 05)
    0d:00.3 FireWire (IEEE 1394): Ricoh Co Ltd FireWire Host Controller (rev 04)

    Bus 002 Device 003: ID 0bdb:1911 Ericsson Business Mobile Networks BV 
    Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
    Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
    Bus 001 Device 006: ID 04f2:b217 Chicony Electronics Co., Ltd 
    Bus 001 Device 005: ID 0a5c:217f Broadcom Corp. Bluetooth Controller
    Bus 001 Device 004: ID 147e:2016 Upek Biometric Touchchip/Touchstrip Fingerprint Sensor
    Bus 001 Device 003: ID 045e:0737 Microsoft Corp. 
    Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
    Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Solution 1:

I have the same issue, and the cause of the problem is easy to see by looking in /var/log/syslog. In essence the GPU gets halted and causes compiz to segfault:

Sep  9 10:29:46 helix kernel: [ 7946.237954] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer elapsed... GPU hung
Sep  9 10:29:46 helix kernel: [ 7946.250096] [drm:i915_do_wait_request] *ERROR* i915_do_wait_request returns -11 (awaiting 3077849 at 3077840, next 3077850)
Sep  9 10:30:10 helix kernel: [ 7970.376485] compiz[1571]: segfault at 0 ip 00007f4da365b7d1 sp 00007fff1dbd5690 error 6 in i965_dri.so[7f4da35ea000+ac000]
Sep  9 10:30:15 helix kernel: [ 7975.150824] compiz[10649]: segfault at 0 ip 00007f059c445be8 sp 00007fff629e2d90 error 6 in i965_dri.so[7f059c3d4000+ac000]
Sep  9 10:30:20 helix kernel: [ 7979.892104] compiz[10671]: segfault at 0 ip 00007f1b2cd1cbe8 sp 00007fff9ef21f40 error 6 in i965_dri.so[7f1b2ccab000+ac000]
Sep  9 10:30:24 helix kernel: [ 7984.489864] compiz[10691]: segfault at 0 ip 00007f05d48debe8 sp 00007fffee43a810 error 6 in i965_dri.so[7f05d486d000+ac000]
Sep  9 10:30:29 helix kernel: [ 7989.095058] compiz[10710]: segfault at 0 ip 00007f74d0326be8 sp 00007fff09f4a480 error 6 in i965_dri.so[7f74d02b5000+ac000]
Sep  9 10:30:33 helix kernel: [ 7993.793423] compiz[10730]: segfault at 0 ip 00007fe855c9fbe8 sp 00007fff23af8570 error 6 in i965_dri.so[7fe855c2e000+ac000]
Sep  9 10:30:38 helix kernel: [ 7998.316195] compiz[10750]: segfault at 0 ip 00007fa4facb3be8 sp 00007fffe0b08c10 error 6 in i965_dri.so[7fa4fac42000+ac000]

You can see that the kernel is using the i915 driver for this chipset by default:

00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
    Subsystem: Lenovo Device 21cf
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0
    Interrupt: pin A routed to IRQ 43
    Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
    Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
    Region 4: I/O ports at 6000 [size=64]
    Expansion ROM at <unassigned> [disabled]
    Capabilities: <access denied>
    Kernel driver in use: i915
    Kernel modules: i915

This is a brand new machine with a new install of 11.04, so it is not related to an upgrade or anything like that.

In summary, I would recommend the following:
apt-get install xserver-xorg-video-intel libdrm-intel1 (I am almost sure you have these)
apt-get install libdrm-intel1-dbg xserver-xorg-video-intel-dbg

Then boot your kernel with debugging turned on(drm.debug=0x06) and mount debugfs: sudo mount -t debugfs debugfs /sys/kernel/debug

Additionally you can configure your system for cores using ulimit:

ulimit -c unlimited
ulimit -s unlimited

(etc)

Verify the changes with ulimit -a

When the issue happens again, you can then use /usr/bin/intel_gpu_dump to get more details about the state of the GPU AFTER the GPU is hung as before.

Additional information might be found under /sys/kernel/debug/dri/0/i915_error_state AFTER a crash has happened.

You can also pull the stack information from the core file generated, usually under /.

In summary, this looks like a bug to me. You can take this information, as well as a software stack report and file a formal bug report.

Solution 2:

Have a look whether the file /var/log/syslog contains any error messages from the time your system froze. If you find error messages, you can try to search for them in Launchpad.

Your computer has a Sandy Bridge -based processor, and there are some known bugs related to that. For example, I recently encountered system hangs caused by bug #761065 on a T520.