Process hangs before termination with ubuntu 18.04

When using ABAQUS 6.14 (but also ABAQUS 2018) on ubuntu 18.04 everything seems to work fine except the termination of the standard process (the process started when performing an implicit analysis -- if you are not familiar with this it doesn't matter).

The analysis indeed works as one can also see in a log file (the .sta file, for those who are familiar with abaqus) the message THE ANALYSIS HAS COMPLETED SUCCESSFULLY. The output database contains the analysis results. However, after the analysis has been completed, the process standard remains in a sleeping status using 0% CPU and keeping the same amount of RAM as when it was running.

From strace I get:

[pid 23191] close(8)                    = 0
[pid 23185] <... select resumed> )      = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> )      = 0 (Timeout)
[pid 23193] futex(0x7f3acd917db0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 23191] futex(0x7f3acd917db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 23193] <... futex resumed> )       = 0
[pid 23191] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 23191] futex(0x7f3acd917db0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23191] munmap(0x7f3ab130b000, 327680) = 0
[pid 23191] munmap(0x7f3ab136b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab16db000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0fbb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0ddb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0a0b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab03fb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab050b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab00cb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab02eb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab14eb000, 1114112) = 0
[pid 23191] futex(0x7f3ab8a5dd44, FUTEX_WAIT_PRIVATE, 8, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 23191] futex(0x7f3ab8a5dd44, FUTEX_WAIT_PRIVATE, 12, NULL <unfinished ...>
[pid 23193] <... select resumed> )      = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> )      = 0 (Timeout)
[pid 23185] select(10, [5 6 8 9], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23193] <... select resumed> )      = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> )      = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> )      = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> )      = 0 (Timeout)
[pid 23185] select(10, [5 6 8 9], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23193] <... select resumed> )      = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> )      = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> )      = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>

Like if the two processes were in a deadlock state. Moreover, the commands

pid -p 7002

and

pid -p 7010

do give an empty output. The dirs /proc/7002 and /proc/7010 do not exist.

The only abaqus-related processes executing are

david  6995  0.0  0.1 295428 51388 pts/0    S    17:00   0:00 /opt/abaqus/6.14-1/code/bin/python /opt/abaqus/6.14-1
david  6998  0.0  0.2 368744 97948 pts/0    S    17:00   0:00 /opt/abaqus/6.14-1/code/bin/python std_inst.com
david  7001  0.1  0.0 122076 20096 pts/0    Sl   17:00   0:03 /opt/abaqus/6.14-1/code/bin/eliT_DriverLM -job std_in
david  7008  0.4  0.5 735812 185364 pts/0   Sl   17:00   0:07 /opt/abaqus/6.14-1/code/bin/standard -standard -acade

On ubuntu 16.04 the exact same version works like a charm. Here the same strace on ubuntu 16.04 (with the same kernel version as my 18.04, i.e. 4.15.0-29):

3890  close(8)                          = 0
3892  <... select resumed> )            = 0 (Timeout)
3892  futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3890  futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892  <... futex resumed> )             = -1 EAGAIN (Resource temporarily unavailable)
3892  futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3890  futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892  <... futex resumed> )             = -1 EAGAIN (Resource temporarily unavailable)
3892  futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
3890  futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3892  <... futex resumed> )             = 0
3890  <... futex resumed> )             = -1 EAGAIN (Resource temporarily unavailable)
3890  futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892  select(7, [4 5 6], NULL, NULL, {0, 20000} <unfinished ...>
3890  munmap(0x7f29c7adb000, 327680)    = 0
3890  munmap(0x7f29c7b3b000, 1114112)   = 0
3890  munmap(0x7f29c7eab000, 1114112)   = 0
3890  munmap(0x7f29c778b000, 1114112)   = 0
3890  munmap(0x7f29c75ab000, 1114112)   = 0
3890  munmap(0x7f29c71db000, 1114112)   = 0
3890  munmap(0x7f29c6bcb000, 1114112)   = 0
3890  munmap(0x7f29c6cdb000, 1114112)   = 0
3890  munmap(0x7f29c689b000, 1114112)   = 0
3890  munmap(0x7f29c6abb000, 1114112)   = 0
3890  munmap(0x7f29c7cbb000, 1114112)   = 0
3890  exit_group(0)                     = ?
3891  +++ exited with 0 +++
3893  +++ exited with 0 +++
3892  +++ exited with 0 +++
3890  +++ exited with 0 +++
3880  <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3890
3880  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3890, si_uid=1000, si_status=0, si_utime=107, si_stime=7} ---

Has someone a good idea how to solve this? Or in which direction should I investigate further.


Solution 1:

I found a solution that circumvents the deadlock by using a singularity container as proposed by Will Furnass here: http://learningpatterns.me/posts-output/2018-01-30-abaqus-singularity/

Although a bit complicated in the first place, it works like a charm when setup properly. I modified my aliases for abaqus on my host system (Manjaro/Arch linux) such that they point to the install in the singularity container and execute the command in the containers environment. However, since I need Intel Fortran Compiler, I generated a basic centos 7 container and modified it afterwards to install compilers and abaqus (v2019 in this case) rather than using the .def script as proposed by Will Furnass.

It takes some time to setup but now I have a container image I can work with on any system that runs singularity, which is quite nice :)

EDIT: I also tested copying a working install to a more recent linux system (and avoiding a fresh install of abaqus), I can confirm that this didn't work in my case (CentOS 7 install copied to Manajaro system).

Solution 2:

Dassaults System published a bug-fix this month:

You need to update to Abaqus 2018 to Abaqus 2018-HF16 from https://software.3ds.com/ more details can be found at https://github.com/willfurnass/abaqus-2017-centos-7-singularity/issues/5#issue-713025844

I tried it with updating Abaqus 2020 to Abaqus 2020-HF5 and it worked for Ubuntu 20.04 as well as Fedora 32.