How do we know we can trust the Maven Central Repository?
Sorry if this question isn't appropriate for StackOverflow, it's not a coding question.
I'm new to Maven and am curious how there can be a Maven Central Repository that appears to be accessible free of charge. As far as I can tell it's maintained by a company called Sonatype. Are they funding it? Why? Does it act as a lead generation vehicle for the rest of their business? I think if I understood their reasons I'd know if or how/when to trust it.
Excellent question, especially since using insecure third party libraries is now in the OWASP Top 10. Unfortunately, most people take the trust of Maven Central for granted. I think they are overly optimistic.
A lot of people think that because you have to sign files submitted to Maven, the software can be trusted. Unfortunately what they overlook is that a signature is meaningless if you cannot be sure that the key that signed the library belongs to the original source that provided it to Maven. This appears to be the case with Maven.
Explanation Made Simple
To simplify the explanation, consider the following analogy. When you go to the airport to fly somewhere, you are required to provide ID to prove you are who you say you are. For local travel, a drivers license is sufficient. The person who checks your ID has a few things to check:
- SAME SOURCE: Does the name on the drivers license match the name on the ticket? This prevents Osama bin Laden from reserving a ticket under Barack Obama's name.
- AUTHENTICITY OF SOURCE: The picture on the drivers license looks like the person checking in. This prevents Osama bin Laden from stealing Barack Obama's drivers license and using it to check in under his name.
- AUTHENTICITY OF DOCUMENT: The drivers license is authentic. This prevents Osama bin Laden from getting a counterfeit drivers license that has his picture on it but otherwise has Barack Obama's name on it.
Now carry the same analogy forward to verifying an object coming from Maven. The Maven documentation claims PGP signatures are required (reference: Guide to uploading artifacts to the Central repository) on libraries that are uploaded there (although Sonatype claims many older packages do not have the signatures). Think of the drivers license being analogous to a key and the ticket being analogous to the Maven artifact, and ask the same questions:
- SAME SOURCE: Is the key that signed the library associated to someone from the same original source that signed the library?
- AUTHENTICITY OF SOURCE: Can we verify that that key really does belong to the same original source that signed it?
- AUTHENTICITY OF DOCUMENT: Does the PGP signature on the artifact (library) check out to be a valid signature?
For checking the same source, one simply downloads the public key associated with the artifact from a key server, and verifies that it comes from an email address that is associated with the original source. For example, this can be done the MIT PGP keyserver. For specific details on how this is done, see Verify Dependencies using PGP.
The third of these is actually the easiest to do because it can be entirely automated (see GPG quickstart guide, section on verifying detached signatures). This part requires no manual intervention.
The second part is where the gap is. Even though the signature checks out and the MIT keyserver claims the key is associated with the original source, how do we know it really was created by that original source and not someone else? In fact, just for demonstration purposes, I have created a key for Mickey Mouse on the MIT key server. I could have just as easily created a key that appears to be associated with oracle.com or spring.io or whitehouse.gov.
So What Really is the Threat?
In a world where people are sending emails out that contain executable malware and the NSA is breaking SSL, it would be naive to think that these same black hats are not targeting software repositories that are bundled into numerous applications around the world. In fact, there are at least three serious examples that were caught.
So to be clear, if I were a black hat, here is exactly what I would do (For the record: I am not a black hat!!!). I would take an open source package that is widely used. I would then sneak my backdoor into the source code of the package and build it locally. The next step is creating a key pair that is associated to an email address of the original source. I upload that key to the MIT key server (which accepts the email address without any validation), and then sign my malicious package. Finally, I upload my malicious package along with the signatures to Maven and snicker as my malware gets adopted in products little by little all over the world. I claim that it is unlikely that this attack is caught any time soon.
What Can You do to Trust The Software You Download From Maven?
Unfortunately, there is no simple answer to this. The more you do to trust the software you are using, the less productive you are going to become. There needs to be realistic tradeoffs that you make which balance security and productivity.
To be brief (since I have already huffed and puffed too much), I will just cite two sources which can help guide you in choosing your third party libraries better. The first is to follow the advice in Section 5 of the Fortify Attacking the Build paper, especially including the security vetting process.
The second recommendation is that Sonatype has a product called CLM which can help your company analyze the software you are using, including provide information about known defects and how many other organizations are using the same product.
What's in It for Sonatype?
In addition to Sonatype's Nexus and CLM products which they can sell, it is also worth reading this article. Sonatype is leading the the balancing act between software development efficiency and trust of open source solutions. They don't quite have everything solved yet (will not reveal private emails I exchanged with them), but they're heading the right direction.
The Sonatype terms and conditions are mentioned by Jason. Contained within is a link on how to submit content:
- http://central.sonatype.org/pages/ossrh-guide.html
The requirements section is particularly interesting. In brief all submitters are expected to provide the following:
- Javadoc and source code
- Digitally sign the submitted files
- Correct project metadata
- GAV identifiers (Group, Artifact, Version)
- Name and description fields and project URL
- Developers working on the project
- License information
- Location of source code repositories
This information publishes everything you and I need to know about the code, how it was built and more importantly who built it. The use of GPG enables us to verify that the binaries were built by the developers stated in the project POM file. Additionally, Maven Central automatically generates SHA checksums, enabling you to verify that the integrity of files downloaded by your build process.
So what does Sonatype get out of this?
- It's a great publicity tool when selling the professional edition of their repository hosting software.
- One useful professional feature is the ability to restrict the artifacts that may be downloaded from Maven Central. Useful for enforcing standards or concerns about 3rd party software.
- Maven Central has become the world's largest repository of open source Java software. Sonatype uses this to offer a number of products to their corporate customers.
- These provide detailed reports on the security vulnerabilities associated with 3rd party libaries used by a company's software. Impressively these tools can be integrated right into the software development and build processes.
- Sonatype can also provide reports on the software licencing associated with their code's 3rd party dependencies. Very important for compliance and difficult to do in practice without this kind of tooling.
Hope this helps. I would finish by pointing out that what Sonatype is doing is not very different to other open source software packaging initiatives. Redhat, Debian and Canonical spend a lot of effort packaging software for safe and secure distribution with their OS's. Maven Central is something that is perhaps more developer friendly.