How to fix TypeError: Caught TypeError in DataLoader worker process 1 in Detectron2

I'm trying to train a Detectron2 model with a COCO dataset. My dataset seems to load correctly. But when I try to train the model using the DefaultTrainer I get

TypeError: Caught TypeError in DataLoader worker process 1.

This is my setup:

from detectron2.engine import DefaultTrainer

# TOTAL_NUM_IMAGES = 10531

cfg = get_cfg()
cfg.OUTPUT_DIR = os.path.join('./output')
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("my_dataset_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")  # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025  # pick a good LR

# single_iteration = cfg.SOLVER.IMS_PER_BATCH
# iterations_for_one_epoch = TOTAL_NUM_IMAGES / single_iteration
# cfg.SOLVER.MAX_ITER = int(iterations_for_one_epoch) * 20

cfg.SOLVER.STEPS = []        # do not decay learning rate
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1  # only has one class (person). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.

os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg) 
trainer.resume_or_load(resume=False)
trainer.train()

And I get this error after a couple of iterations:

[01/06 15:14:00 d2.utils.events]:  eta: 11:25:20  iter: 125  total_loss: 0.9023  loss_cls: 0.1827  loss_box_reg: 0.1385  loss_mask: 0.5601  loss_rpn_cls: 0.009945  loss_rpn_loc: 0.0023  time: 0.5232  data_time: 0.3085  lr: 3.1219e-05  max_mem: 3271M
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-8c48e6e17647> in <module>()
     26 trainer = DefaultTrainer(cfg)
     27 trainer.resume_or_load(resume=False)
---> 28 trainer.train()

8 frames
/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self)
    432             # instantiate since we don't know how to
    433             raise RuntimeError(msg) from None
--> 434         raise exception
    435 
    436 

TypeError: Caught TypeError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
    data.append(next(self.dataset_iter))
  File "/usr/local/lib/python3.7/dist-packages/detectron2/data/common.py", line 201, in __iter__
    yield self.dataset[idx]
  File "/usr/local/lib/python3.7/dist-packages/detectron2/data/common.py", line 90, in __getitem__
    data = self._map_func(self._dataset[cur_idx])
  File "/usr/local/lib/python3.7/dist-packages/detectron2/utils/serialize.py", line 26, in __call__
    return self._obj(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/detectron2/data/dataset_mapper.py", line 189, in __call__
    self._transform_annotations(dataset_dict, transforms, image_shape)
  File "/usr/local/lib/python3.7/dist-packages/detectron2/data/dataset_mapper.py", line 128, in _transform_annotations
    for obj in dataset_dict.pop("annotations")
  File "/usr/local/lib/python3.7/dist-packages/detectron2/data/dataset_mapper.py", line 129, in <listcomp>
    if obj.get("iscrowd", 0) == 0
  File "/usr/local/lib/python3.7/dist-packages/detectron2/data/detection_utils.py", line 297, in transform_instance_annotations
    p.reshape(-1) for p in transforms.apply_polygons(polygons)
  File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 297, in <lambda>
    return lambda x: self._apply(x, name)
  File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 291, in _apply
    x = getattr(t, meth)(x)
  File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 150, in apply_polygons
    return [self.apply_coords(p) for p in polygons]
  File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 150, in <listcomp>
    return [self.apply_coords(p) for p in polygons]
  File "/usr/local/lib/python3.7/dist-packages/detectron2/data/transforms/transform.py", line 150, in apply_coords
    coords[:, 0] = coords[:, 0] * (self.new_w * 1.0 / self.w)
TypeError: can't multiply sequence by non-int of type 'float'

Solution 1:

Turns out some of the id's in "annotations" where written in scientific notation resulting in some id's with type float. Converting these to integers solved the problem.