File browser and file dialogs take a long time to open or fail to open in all applications

I had the same problem for years because I have uptimes of at least a month quite often. After a while, all programs using the system-supplied file open dialog will seemingly hang or wait for many minutes.

pkill gvfsd-trash

Should fix the problem and all programs hanging and waiting for the file dialog should immediately resume working after killing gvfsd-trash.

Another thing to note is that gvfsd-trash gets restarted after a while, probably by caja or any other file browser or by the system file open dialog itself. This is not the first problem I had with gvfsd-trash and I begin to bear a grudge, however, I don't want to uninstall gvfsd, which I need for mounting USB thumb drives and my smartphone over MTP. So, I opted for the brute-force solution and made only the gvfsd-trash binary inaccessible, e.g., by renaming it:

sudo mv /usr/libexec/{gvfsd-trash,.bak}

It might be in another location on different systems, try asking your packet manager, e.g., by calling:

dpkg -L gvfs-daemons | grep trash

Beware of the implications. Without gvfsd-trash you won't be able to access the special trash:/// URI path via your file browser! Personally, I never used that anyway. You can also access deleted files manually in the .Trash folders, which exist for each mount point and in your home, e.g.:

  • ~/.local/share/Trash
  • /media/mounted-external-harddrive/.Trash

It looks like the problem is a combination of gvfsd-trash and D-Bus. There seems to be a fitting issue issue in the gvfs bugtracker, which has already been fixed but has not been rolled out to my system, yet.

For years, I used a different solution, namely starting each program in a new DBus session by prefixing the most important programs like Firefox and text editors with dbus-launch. However, that comes with its own shares of problems because each session will spawn at least five gvfsd processes and possible others and the D-Bus session will not close by closing the program opened in it and the number of D-Bus sessions in total is limited, so after a while, you won't be able to start programs.


Diagnosis

I guess you are showing an excerpt from you strace log, pointing to where the lag occurs. It would be useful if you posted the full strace command you used.

Actions that may lead to the culprit:

  1. While an straced nautilus is running, in another terminal use

    $ pidof nautilus
    $ lsof -p <pidno>
    

    where <pidno> is the PID returned by the previous command. That lets you inspect which file descriptors (answer to your question: Yes) the strace log refers to (source). You might want to pipe the output with less since it is typically long.

  2. Option -c can further help identifying bottleneck processes (source), yours is probably poll. From the man strace page:

    -c
    --summary-only
                Count time, calls, and errors for each system call and report a summary on program exit, suppressing the  regular
                output.  This attempts to show system time (CPU time spent running in the kernel) independent of wall clock time.
                If -c is used with -f, only aggregate totals for all traced processes are kept.
    

    This is an example of what the output looks like

    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     18,29    0,009128           4      2058           mmap
     15,19    0,007580           7       989         1 poll
     11,96    0,005967           2      2465       318 openat
      6,76    0,003374           1      2187           close
    ...
    
  3. Check other instances of delayed nautilus and thunar launching, to confirm a similar pattern.

  4. Apply the above, and any further diagnostic commands, to a comparison with what a root instance (which you mention it works fine..., I guess you meant ... at times where with a normal user the delays already appeared) produces.

  5. Consider the problem might be related to availability of file descriptors, or the like, see this.

  6. Please post the complete output of

    $ bash -c 'time strace -fc nautilus'
    $ bash -c 'time strace -fc thunar'
    $ bash -c 'strace -fc time nautilus'
    $ bash -c 'strace -fc time thunar'
    
  7. Please mention whether you moved your home dir, if you had a different Ubuntu version and updated, etc., as it may be relevant. Also, try creating a temporary user to see if the problem shows up. This would be an additional proof to comparing with root.

    EDIT a: The comparison between what you get with users <you>, tester and root, for bash -c 'time strace -fc nautilus', shows the following essential difference in terms of what could be spending the ~30sec at launching:

    <you>:

    (org.gnome.Nautilus:2641915): Tracker-WARNING **: 18:30:21.999: Falling back 
    to bus backend, the direct backend failed to initialize: Could not find datab
    ase file:'/home/coljac/.cache/tracker/meta.db'.
    

    tester:

    (org.gnome.Nautilus:2976790): dbind-WARNING **: 20:45:38.814: Couldn't register with accessibility bus: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken.
    strace: Process 2976919 attached
    strace: Process 2976920 attached
    
    ** (org.gnome.Nautilus:2976790): WARNING **: 20:46:04.102: Unable to get contents of the bookmarks file: Error opening file /home/tester/.gtk-bookmarks: No such file or directory
    

    root:

    ** (org.gnome.Nautilus:2980894): WARNING **: 21:00:42.191: Unable to get contents of the bookmarks file: Error opening file /root/.gtk-bookmarks: No such file or directory
    
    ** (org.gnome.Nautilus:2980894): WARNING **: 21:00:42.191: Unable to get contents of the bookmarks file: Error opening file /root/.gtk-bookmarks: No such file or directory
    strace: Process 2980907 attached
    Nautilus-Share-Message: 21:00:42.241: Called "net usershare info" but it failed: Failed to execute child process “net” (No such file or directory)
    

    So, tester vs. root shows a warning that could help with diagnosis. The strange thing is that the same warning does not show for <you>.

Solution

I would wait until you post more info after the diagnosis suggested. In the meantime, I give a (nuke?) workaround. I am assuming your "several machines with very similar setups" are a sort of clones (at least in the beginning) of each other. If so, and you don't find a solution (after the diagnosis), and this bothers you enough, you might try cloning again from one of your working systems, with clonezilla, e.g.

Related

  1. https://www.linuxquestions.org/questions/linux-server-73/strace-question-poll-taking-a-long-time-on-an-open-command-939714/
  2. Nautilus does not start: .cache/tracker/meta.db not found

There are a couple of items that might cause Nautilus to slow down. I've seen this happen on notebooks that are used primarily for artwork and contained — quite literally — millions of thumbnail files sitting in ~/.cache/thumbnails. Scrubbing the directory had an immediate impact on the speed of launching Nautilus (not to mention freeing up quite a bit of storage space).

Another item to look at would be anything in ~/.local/share/nautilus that might be causing an issue, such as looking up a network share that is no longer available.

As a last-ditch effort, if you have a ~/.cache/dconf/user file, you can see if there's a configuration problem in the dconf settings for your user account. The simplest way to test this would be to rename ~/.cache/dconf/user to something like ~/.cache/dconf/user-old and then restart the shell by pressing Alt+F2 to bring up a command input, type r, then hit Enter. If the problem persists, you can return the renamed file to its proper place.

Hopefully something here will help you out.