Process hangs before termination with ubuntu 18.04
When using ABAQUS 6.14 (but also ABAQUS 2018) on ubuntu 18.04 everything seems to work fine except the termination of the standard
process (the process started when performing an implicit analysis -- if you are not familiar with this it doesn't matter).
The analysis indeed works as one can also see in a log file (the .sta
file, for those who are familiar with abaqus) the message THE ANALYSIS HAS COMPLETED SUCCESSFULLY
. The output database contains the analysis results. However, after the analysis has been completed, the process standard
remains in a sleeping status using 0% CPU and keeping the same amount of RAM as when it was running.
From strace
I get:
[pid 23191] close(8) = 0
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] futex(0x7f3acd917db0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
[pid 23191] futex(0x7f3acd917db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 23193] <... futex resumed> ) = 0
[pid 23191] <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
[pid 23191] futex(0x7f3acd917db0, FUTEX_WAKE_PRIVATE, 1) = 0
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23191] munmap(0x7f3ab130b000, 327680) = 0
[pid 23191] munmap(0x7f3ab136b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab16db000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0fbb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0ddb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab0a0b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab03fb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab050b000, 1114112) = 0
[pid 23191] munmap(0x7f3ab00cb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab02eb000, 1114112) = 0
[pid 23191] munmap(0x7f3ab14eb000, 1114112) = 0
[pid 23191] futex(0x7f3ab8a5dd44, FUTEX_WAIT_PRIVATE, 8, NULL) = -1 EAGAIN (Resource temporarily unavailable)
[pid 23191] futex(0x7f3ab8a5dd44, FUTEX_WAIT_PRIVATE, 12, NULL <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(10, [5 6 8 9], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(10, [5 6 8 9], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
[pid 23185] <... select resumed> ) = 0 (Timeout)
[pid 23185] select(0, NULL, NULL, NULL, {tv_sec=0, tv_usec=50000} <unfinished ...>
[pid 23193] <... select resumed> ) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000}) = 0 (Timeout)
[pid 23193] select(7, [4 5 6], NULL, NULL, {tv_sec=0, tv_usec=20000} <unfinished ...>
Like if the two processes were in a deadlock state. Moreover, the commands
pid -p 7002
and
pid -p 7010
do give an empty output. The dirs /proc/7002
and /proc/7010
do not exist.
The only abaqus-related processes executing are
david 6995 0.0 0.1 295428 51388 pts/0 S 17:00 0:00 /opt/abaqus/6.14-1/code/bin/python /opt/abaqus/6.14-1
david 6998 0.0 0.2 368744 97948 pts/0 S 17:00 0:00 /opt/abaqus/6.14-1/code/bin/python std_inst.com
david 7001 0.1 0.0 122076 20096 pts/0 Sl 17:00 0:03 /opt/abaqus/6.14-1/code/bin/eliT_DriverLM -job std_in
david 7008 0.4 0.5 735812 185364 pts/0 Sl 17:00 0:07 /opt/abaqus/6.14-1/code/bin/standard -standard -acade
On ubuntu 16.04 the exact same version works like a charm. Here the same strace
on ubuntu 16.04 (with the same kernel version as my 18.04, i.e. 4.15.0-29):
3890 close(8) = 0
3892 <... select resumed> ) = 0 (Timeout)
3892 futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3890 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
3892 futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3890 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
3892 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
3890 futex(0x7f29e43e1db0, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
3892 <... futex resumed> ) = 0
3890 <... futex resumed> ) = -1 EAGAIN (Resource temporarily unavailable)
3890 futex(0x7f29e43e1db0, FUTEX_WAKE_PRIVATE, 1) = 0
3892 select(7, [4 5 6], NULL, NULL, {0, 20000} <unfinished ...>
3890 munmap(0x7f29c7adb000, 327680) = 0
3890 munmap(0x7f29c7b3b000, 1114112) = 0
3890 munmap(0x7f29c7eab000, 1114112) = 0
3890 munmap(0x7f29c778b000, 1114112) = 0
3890 munmap(0x7f29c75ab000, 1114112) = 0
3890 munmap(0x7f29c71db000, 1114112) = 0
3890 munmap(0x7f29c6bcb000, 1114112) = 0
3890 munmap(0x7f29c6cdb000, 1114112) = 0
3890 munmap(0x7f29c689b000, 1114112) = 0
3890 munmap(0x7f29c6abb000, 1114112) = 0
3890 munmap(0x7f29c7cbb000, 1114112) = 0
3890 exit_group(0) = ?
3891 +++ exited with 0 +++
3893 +++ exited with 0 +++
3892 +++ exited with 0 +++
3890 +++ exited with 0 +++
3880 <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3890
3880 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3890, si_uid=1000, si_status=0, si_utime=107, si_stime=7} ---
Has someone a good idea how to solve this? Or in which direction should I investigate further.
Solution 1:
I found a solution that circumvents the deadlock by using a singularity container as proposed by Will Furnass here: http://learningpatterns.me/posts-output/2018-01-30-abaqus-singularity/
Although a bit complicated in the first place, it works like a charm when setup properly. I modified my aliases for abaqus on my host system (Manjaro/Arch linux) such that they point to the install in the singularity container and execute the command in the containers environment. However, since I need Intel Fortran Compiler, I generated a basic centos 7 container and modified it afterwards to install compilers and abaqus (v2019 in this case) rather than using the .def script as proposed by Will Furnass.
It takes some time to setup but now I have a container image I can work with on any system that runs singularity, which is quite nice :)
EDIT: I also tested copying a working install to a more recent linux system (and avoiding a fresh install of abaqus), I can confirm that this didn't work in my case (CentOS 7 install copied to Manajaro system).
Solution 2:
Dassaults System published a bug-fix this month:
You need to update to Abaqus 2018
to Abaqus 2018-HF16
from https://software.3ds.com/ more details can be found at https://github.com/willfurnass/abaqus-2017-centos-7-singularity/issues/5#issue-713025844
I tried it with updating Abaqus 2020
to Abaqus 2020-HF5
and it worked for Ubuntu 20.04 as well as Fedora 32.