What is AppStream? Why is it installed? What is it doing?

There's another question about AppStream that has —for better or worse— focussed on what seems to be a particular bug in AppStream but that left me realising I don't have a clue what this thing actually is.

man appstreamcli (the application taking the CPU off the deep end) throws some technical jargon at the confusion:

appstreamcli is a small helper tool to work with AppStream metadata and access the AppStream component index from the command-line. The AppStream component index contains a list of all available software components for your distribution, matched to their package names. It is generated using AppStream XML or Debian DEP-11 data, which is provided by your distributor.

So... What is it actually there for? Do all desktop (and their related package managers) use it?

By all of which I actually mean: what will break when I tear this out with my teeth?


The AppStream project page adds a little more jargon but does go onto say:

It provides the foundation to build software-center applications, by providing metadata necessary for an application-centric view on package repositories. AppStream additionally provides specifications for things like an unified software metadata database, screenshot services and various other things needed to create user-friendly application-centers for (Linux) distributions.

So this is basically what the new software centres are using to get their data from, rather than directly from Apt. appstreamcli refresh (the thing people are freaking out about using all the CPU) is the process that runs after an apt update to regenerate its own data. As mentioned in the question, there appears to be a bug here.

You can see what needs it based on the reverse dependencies:

$ apt rdepends appstream
appstream
Reverse Depends:
  Breaks: libapt-pkg5.0 (<< 0.9.0-3~)
  Depends: plasma-discover
  Depends: plasma-discover
  Depends: gnome-software
  Depends: plasma-discover
  Depends: isenkram-cli
  Recommends: check-all-the-things
  Depends: appstream-index (>= 0.9.4-1)
  Recommends: libappstreamqt1 (>= 0.9.4-1)
  Suggests: appstream-doc
  Depends: gnome-software

So at a base level to answer "What will break [...]?", ^^ that stuff.

Both Gnome's and KDE's software tools depend on it today. More may in the future. Most of these indirect package manager are just "recommends" so if you only ever use apt or tools that only use Apt's package library directly (too many to list), you can get rid of it without removing the whole desktop environment.

You can simulate a removal with apt -s remove appstream:

$ apt -s remove appstream
NOTE: This is only a simulation!
      apt-get needs root privileges for real execution.
      Keep also in mind that locking is deactivated,
      so don't depend on the relevance to the real current situation!
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  linux-headers-4.5-2.dmz.4-liquorix-amd64 linux-headers-4.5-3.dmz.1-liquorix-amd64 linux-image-4.5-2.dmz.4-liquorix-amd64
  linux-image-4.5-3.dmz.1-liquorix-amd64
Use 'apt autoremove' to remove them.
The following packages will be REMOVED
  appstream muon-discover plasma-discover
0 to upgrade, 0 to newly install, 3 to remove and 80 not to upgrade.
Remv muon-discover [4:5.6.4+p16.04+git20160517.1518-0]
Remv plasma-discover [5.6.4+p16.04+git20160517.1518-0]
Remv appstream [0.9.4-1]

This is under KDE so don't assume anything from that. It seems safe enough.


Update:

Running apt -s remove appstream in Kubuntu 18.04 is slightly different:

Remv plasma-discover [5.12.6-0ubuntu0.1]
Remv apt-config-icons-large-hidpi [0.12.0-3ubuntu1]
Remv apt-config-icons-large [0.12.0-3ubuntu1]
Remv apt-config-icons [0.12.0-3ubuntu1] [apt-config-icons-hidpi:amd64 ]
Remv appstream [0.12.0-3ubuntu1] [apt-config-icons-hidpi:amd64 ]
Remv apt-config-icons-hidpi [0.12.0-3ubuntu1]