How do instance segmentation methods deal with partially labelled data?

It depends on the formulation of the learning setup. If you consider each image as a collection of samples, i.e. objects contained therein, each with their bounding box and a label in {cat, dog}, you should be fine. In essence, you will be asking for the generation of a bounding box and a label, each one of which will be matched to a ground truth. Wrong predictions of a bounding box or label will generate an error signal by contrasting them to the corresponding truth, which means you have something to penalize your model for (or train it with). Missing labels will not be contrasted with anything, basically meaning that you are simply underutilizing your images, which is not in itself a deal breaker.

If, on the other hand, you are generating all bounding boxes from a single image, and penalizing your model for creating "redundant" boxes, you run the risk of penalizing correct predictions that are unlabeled, which is indeed bad. If the count of penalized good predictions is comparable to penalized bad predictions, you would be doing more harm than good.

Perhaps the way to go would be to start with training only on images that have a full and correct labeling, and then move on to include the noisy ones, manually checking whether the "extra" predictions actually correspond to real but unlabeled entities, then fixing the data as need a la human-in-the-loop training.