How safe is it to host sensitive data on repository sites like github, bitbucket, etc.? [duplicate]

As always, it depends :-)

There can be two different meanings of "safety":

  1. Can I trust the hoster to keep my stuff (intellectual property, company secrets...) private?
  2. What happens to my code if the hoster suddenly goes out of service?

For 1., there is no 100% guarantee.
Of course, the big hosters like GitHub and Bitbucket won't share your code intentionally with third parties, but there is always the possibility that some hacker manages to get the content of your private repositories.
(this could happen to you as well if you host your code internally in your company, but this is unlikely, because unless your company is as known as, say, Google, the chance of someone trying to attack your company is much smaller than the chance of someone trying to attack a well-known public hoster).

Plus, you have to consider the laws of the country where the hoster resides.
A few weeks ago I read somewhere that if your hoster is in the USA, they can be forced by law to give your data to the US government under certain circumstances, and they are not even allowed to tell you about that (I don't remember the name of the law, but maybe someone else knows).

I guess that all this causes most "big" companies to not host their code on a public service (my company is mid-sized, and we host our code private as well).

By the way, as you mentioned Google:
I'm sure that especially Google does not use Bitbucket or GitHub. They have the complete infrastructure for project hosting themselves, so I guess they are using it internally, too. Why should they use an external service? It's in the cloud, yes...but it's their cloud.

Concerning 2.: it's unlikely that GitHub or Bitbucket will go bankrupt tomorrow, but you never know.
IMO it's your responsibility to take backups of your code yourself.
The nature of DVCS makes sure that you have some local copies of your code anyway, but it might be difficult to search lots of developer machines for the newest versions of all of your projects.
I do this by pulling all my repositories to my local machine regularly (I wrote a tool that can do this for Bitbucket, which I use for my private projects)


One key questions is who has the administrative access. That or those person can always read your data, and potentially leak this out to third parties knowingly or not knowingly or just read it for their own entertainment or education. This is not only a problem for hosted services, this is also a problem if your store your data within your own company. But at least you know the person. For small companies the administrative password might be in the hands of the business owner.

The main point is that the public code hosting companies are such a huge target. There is a lot to gain from hacking such a large code repository. This is a very interesting target for government agencies, so big that they just get an insider into the hosting company who just takes a USB stick with all the data on his way home. This might be as easy as just applying for an admin job there and even get paid with all benefits. I don't think we will ever see any news about this, simply because there are no traces to be expected, unless someone wants to brag about it. Hosting companies as far as I know don't require security clearances anything like government agencies do. And the fact that this all will be in stealth mode puts very little pressure on a hosting company to actually do anything about it.