Why are Hadoop and Spark not in the official Ubuntu repositories?

Solution 1:

Both Hadoop and Spark were dropped from Debian years ago, mostly due to a lack of volunteer interest in maintaining those packages. Ubuntu gets most of its deb packages from Debian, so they were dropped from Ubuntu, too.

  • Hadoop: Debian tracker page - Debian Bug #630820
  • Spark: Debian tracker page - Debian Bug #946336

Any community volunteer willing to learn the process and contribute the effort can re-introduce the packages to Debian, and they will subsequently flow into future releases of Ubuntu. More volunteers = More, better, and up-to-date software.

Also, according to https://wiki.debian.org/Hadoop, the Hadoop developers didn't make deb packaging and maintaining easy for the Debian volunteers:

There are a number of reasons for this; in particular the Hadoop build process will load various dependencies via Maven instead of using distribution-supplied packages. Java projects like this are unfortunately not easy to package because of interdependencies; and unfortunately the Hadoop stack is full of odd dependencies

If this information is stale or incorrect, once again it's up to community volunteers to step up, make corrections, and implement changes. Debian and Ubuntu are driven by volunteers. More volunteers = Better documentation.