I want to capture a specific colum from a command output using python 3

On an ubuntu machine I want to capture the second column of the output of the command "dpkg -l" . I use python 3 and want to use the subprocess module. the following command:

fh=open("/tmp/test.out", 'wb')
with subprocess.Popen(["dpkg", "-l"], stdout=subprocess.PIPE) as proc:
   fh.write(proc.stdout.read())

returns as under:

ii yum 3.4.3-3 all Advanced front-end for rpm  
ii zeitgeist-core 1.0-0ubuntu4 amd64 event logging framework - engine  
ii zenity 3.24.0-1 amd64 Display graphical dialog boxes from shell scripts  
ii zenity-common 3.24.0-1 all Display graphical dialog boxes from shell scripts   
ii zip 3.0-11build1 amd64 Archiver for .zip files  
ii zita-ajbridge 0.7.0-1 amd64 alsa to jack bridge  
ii zita-at1 0.6.0-1 amd64 JACK autotuner  
ii zita-lrx 0.1.0-3 amd64 Command line jack application providing crossover filters  
ii zita-mu1 0.2.2-2 amd64 organise stereo monitoring for Jack Audio Connection Kit  
.....  
.....  

I want to get the second column, example:

....  
....  
yum  
zeitgeist-core  
zenity  
zenity-common  
zip  
....  
.... etc etc  

Please help

>>> with subprocess.Popen(["dpkg", "-l"], stdout=subprocess.PIPE) as proc:
...     line1=proc.stdout.read()
...     type(line1)
...
<class 'bytes'>

the type is bytes. How to split. When i use the following:

>>> with subprocess.Popen(["dpkg", "-l"], stdout=subprocess.PIPE) as proc:
...     line1=proc.stdout.read()
...     line2=str(line)  # the type is byte so I try to convert to string
...     print(line2)
...
10

(The output is messed up)


Solution 1:

Your way to get the output of system calls is quite outdated. Use

subprocess.check_output()

instead:

#!/usr/bin/env python3
import subprocess

f = "/home/jacob/Desktop/output.txt"

lines = subprocess.check_output(["dpkg", "-l"]).decode("utf-8").splitlines()
with open(f, "wt") as out:
    for l in lines:
        if l.startswith("ii"):
            out.write(l.split()[1] + "\n")

Replace f with the actual path of the output file.

Output file:

...
...
apg
app-install-data
app-install-data-partner
apparmor
apport
apport-gtk
apport-retrace
apport-symptoms
appstream
apt
apt-transport-https
...
...

Note

The solution above will create a file, ending with an empty line. If that is an issue somehow, use the solution below.

#!/usr/bin/env python3
import subprocess

f = "/home/jacob/Bureaublad/output.txt"

lines = subprocess.check_output(["dpkg", "-l"]).decode("utf-8").splitlines()
open(f, "wt").write(
    "\n".join([l.split()[1] for l in lines if l.startswith("ii")])
)

Solution 2:

Note that dpkg -l is essentially a frontend to dpkg-query, and dpkg-query allows you to format the output. From man dpkg:

dpkg-query actions
  See dpkg-query(1) for more information about the following actions.

  -l, --list package-name-pattern...
      List packages matching given pattern.

And man dpkg-query:

-l, --list [package-name-pattern...]
      List  packages  matching  given  pattern.
-W, --show [package-name-pattern...]
      Just like the --list option this will list all packages matching
      the  given  pattern.  However the output can be customized using
      the --showformat option.

So instead of dpkg -l, do:

dpkg-query -f '${Package}\n' -W

'${Package}' is not a shell variable here. It's a format specifier for dpkg-query.