How do I diagnose my issue, when I'm not sure if it is a hardware or software issue?

I'm having some issue. I'm not sure if it is a hardware issue or a software issue, but I'd like to determine the cause of the issue so I can fix it. How can I do that?


It's pretty easy. Take a USB stick. I suggest 32 GB, but other sizes work too. Make a backup of everything important on this USB stick, because we will put a live CD on it.

There are a few possible solutions from here. Pick the correct path depending on what the issue is.

Creating a Ubuntu LiveCD

This isn't a troubleshooting step on its own, but I mention it several times, so I'm having it be separate.

  1. Download the latest (not LTS, and not experimental, but latest and supported) release of Ubuntu Desktop from the Ubuntu website.
  2. Verify your download. You don't have to, but I suggest it, especially if your internet connection isn't great. Follow the steps on the Ubuntu Wiki.
  3. Next, you need to write the ISO to the flash drive. You can use dd, but because it is easy to mess up, I suggest balenaEtcher instead (no affiliation, but I like it). Download the Linux 64 bit version. Open a terminal, and navigate to where the downloaded .zip is (probably ~/Downloads). Unzip the .zip file with unzip balena-etcher-electron-X.Y.Z-linux-x64.zip. Make the AppImage inside the .zip executable with chmod +x balenaEtcher-X.Y.Z-x64.AppImage. Run it with ./balenaEtcher-X.Y.Z-x64.AppImage.
  4. Select the ISO, your flash drive, and hit Flash!. Wait for it to finish, and reboot to your flash drive.

I think the memory/RAM is bad

  1. Download MemTest86.
  2. Open balenaEtcher. Select the ISO you downloaded from step #1, and the USB stick. This is your last chance to backup any data on the flash drive before it is gone forever. Press Flash!. Wait for it to finish and verify.

Note: You can skip the above steps if you are on a dell based hardware. Dell laptops already include MemTest86 by default.

  1. Reboot your PC, and boot to the USB stick (in the boot menu, select the flash drive).
  2. Run a memory test. If it fails, you have a memory issue. Replace the bad stick(s) of RAM.

I think my disk had an issue

You can either do this from a live CD (see above) or from your main system. For an HDD, check the SMART data. See here for how to interpret those numbers. If you have an SSD, you can still try to check the SMART data like an HDD, but you can also check the wear indicator (a measurement of how worn-out your SSD is). For that, see this answer.

Why doesn't XYZ hardware work?

First, check the obvious (is it plugged in)? Don't go through this entire process to realize you plugged your USB stick into an Ethernet port. Yes, it happens.

Done that? OK. Create a Ubuntu Live CD with the latest release version of Ubuntu. Not LTS, the latest release. See the directions above. Boot to it, and press Try Ubuntu. Now, try to reproduce the issue or see if "it works". If it does, yay. Upgrade to that version of Ubuntu (after taking a backup), and it should work. Done. If it doesn't, search online to see if anyone else has. If not, check out the output of dmesg, and see if there is anything relevant there. Also, you can try making a live CD with the latest beta version of Ubuntu, but that isn't stable, so I wouldn't suggest it unless you know what you're doing. Of course, feel free to ask a question here. Remember to provide your OS details (version, etc.), and what you've tried. Not everything works with Ubuntu, so you might be out of luck.


If your computer is freezing or crashing you can stress test it to induce the freezing or crashing and generate a report about what happened before the system froze or crashed. stress-ng can stress various subsystems of a computer. It can stress load CPU, cache, disk, memory, socket and pipe I/O, scheduling, and much more. To install stress-ng in all currently supported versions of Ubuntu open the terminal and type:

sudo apt install stress-ng

From the results of man stress-ng :

DESCRIPTION: Go to BIOS settings to make sure that your system is loading in BIOS mode and not UEFI mode. 
       stress-ng will stress test a computer system in various selectable
       ways. It was designed to exercise various physical subsystems of a 
       computer as well as the various operating system kernel interfaces.
       stress-ng also has a wide range of CPU specific stress tests that 
       exercise floating point, integer, bit manipulation and control flow.

       stress-ng  was originally intended to make a machine work hard and trip
       hardware issues such as thermal overruns as well as operating system
       bugs that only occur when a system is being thrashed hard. Use
       stress-ng with caution as some of the tests can make a system run hot
       on poorly designed hardware and also can cause excessive system 
       thrashing which may be difficult to stop.

       stress-ng can also measure test throughput rates; this can be useful to
       observe performance changes across different operating system releases
       or types of hardware, however it has never been intended to be used as
       a precise benchmark test suite, so do NOT use it in this manner.

       Running stress-ng with root privileges will adjust out of memory settings
       on Linux systems to make the stressors unkillable in low memory
       situations, so use this judiciously. With the appropriate privilege,
       stress-ng can allow the ionice class and ionice levels to be adjusted.
       Again this should be used with care.

       One can specify the number of processes to invoke per type of stress
       test; specifying a negative or zero value will select the number of
       processors available as defined by sysconf(_SC_NPROCESSORS_CONF).

Faulty RAM can also be an issue. Faulty RAM is a known cause of freezing. To test RAM select the Memory test option from the GRUB menu while booting.

Immediately after the BIOS/UEFI splash screen during boot, with BIOS, quickly press and hold the Shift key, which will bring up a GNU GRUB menu screen. With UEFI press (perhaps several times) the Esc key to get to the GNU GRUB menu screen. Sometimes the manufacturer's splash screen is a part of the Windows bootloader, so when you power up the machine it goes straight to the GNU GRUB menu screen, and then pressing Shift is unnecessary.

memtest86+ is a 16-bit program, and it works in BIOS mode but not in UEFI mode. On computers that have BIOS firmware, you will see a GNU GRUB menu screen that looks like the below screenshot. Select Memory test and press Enter.

On computers that have UEFI firmware go to the BIOS settings, select the Boot tab, and try to temporarily change the settings so that your system is loading in Legacy mode or BIOS mode and not UEFI mode. Then you will be able to select Memory test from the GNU GRUB menu screen.

enter image description here

Discrete graphics processors also have their own RAM. Faulty RAM on discrete graphics processors is a known cause of a special type of freezing while playing videos or playing a game in which the entire screen freezes but an audio loop plays repeatedly and unstoppably until the computer is shut down by holding down the computer's power button.

After you have tried the software solutions temporarily replacing hardware components one at a time is still an option, but there are several caveats. Please read these caveats carefully before temporarily replacing a hardware component.

  1. In the case of multiple RAM sticks that are physically accessible to the user, removing one RAM stick at a time until you find the faulty one is an option. RAM sticks should only be swapped with RAM sticks of the same type (e.g. swap DDR3 only with DDR3, DDR4 only with DDR4, etc.).

  2. If you have a spare graphics card that is known to be in good condition, swapping the graphics card is an option. Don't just swap in any graphics card that you have on hand. Only swap the existing graphics card with a graphics card that you have used before.

  3. Warning: I don't recommend swapping any existing components unless you are sure that the new component is in good condition. Swapping an existing component with a damaged component can also damage your computer.


WIFI

AP <=> Wifi Access Point, Router

  • Neighbor AP is strong, creating contention
    Using a smartphone, download e.g. Wifi Analyzer (free, ads) and let it check your AP against those visible nearby.
    One possible solution to problematic speed or access issues is:
    Make sure to have your AP on a fixed channel, defined to be a "good" choice by the app. Some AP software has trouble swapping channels on "Auto", even though it should do the same change in Auto mode.

  • Settings problems
    The easy way out of this situation is to reset the AP and start over. Maybe you changed something in the settings and forgot about it?
    Some AP's has the ability to save settings... if that is possible, and the file turns out to be readable text - when opened with a text editor: save the settings before the reset, then compare with a settings file from after the reset - it might be possible to determine what the problem might have been.

  • Hardware problems
    This is a bit harder if there is e.g. broken circuitry in a device.
    Simple things that are possible: Verify all cabling to be fault free, even tiny nicks and kinks may be visible signs of damage that can cause trouble. Swap out one item at a time and try running a fair test of relevant kind.
    Jog all cable ends in and out of the related connector a few times; if it has been in place a long time, there might be oxidization on connector pins, creating "contact resistance" - something that causes electrical signal degradation. If there is a visible antenna; treat it in line with the above: Connection ok? Any damage?