Which data format convention in Keras (channels_last or channels_first) should be used when?

I believe the reason that there are two data formats, is that Keras supports Theano as another backend too. In Theano, the first axis represents the channels.