How to fix TypeError: Caught TypeError in DataLoader worker process 1 in Detectron2
I'm trying to train a Detectron2 model with a COCO dataset. My dataset seems to load correctly. But when I try to train the model using the DefaultTrainer
I get
TypeError: Caught TypeError in DataLoader worker process 1.
This is my setup:
from detectron2.engine import DefaultTrainer
# TOTAL_NUM_IMAGES = 10531
cfg = get_cfg()
cfg.OUTPUT_DIR = os.path.join('./output')
cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = ("my_dataset_train",)
cfg.DATASETS.TEST = ()
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR
# single_iteration = cfg.SOLVER.IMS_PER_BATCH
# iterations_for_one_epoch = TOTAL_NUM_IMAGES / single_iteration
# cfg.SOLVER.MAX_ITER = int(iterations_for_one_epoch) * 20
cfg.SOLVER.STEPS = [] # do not decay learning rate
cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # only has one class (person). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets)
# NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here.
os.makedirs(cfg.OUTPUT_DIR, exist_ok=True)
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()
And I get this error after a couple of iterations:
[01/06 15:14:00 d2.utils.events]: eta: 11:25:20 iter: 125 total_loss: 0.9023 loss_cls: 0.1827 loss_box_reg: 0.1385 loss_mask: 0.5601 loss_rpn_cls: 0.009945 loss_rpn_loc: 0.0023 time: 0.5232 data_time: 0.3085 lr: 3.1219e-05 max_mem: 3271M
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-8c48e6e17647> in <module>()
26 trainer = DefaultTrainer(cfg)
27 trainer.resume_or_load(resume=False)
---> 28 trainer.train()
8 frames
/usr/local/lib/python3.7/dist-packages/torch/_utils.py in reraise(self)
432 # instantiate since we don't know how to
433 raise RuntimeError(msg) from None
--> 434 raise exception
435
436
TypeError: Caught TypeError in DataLoader worker process 1.
Original Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 32, in fetch
data.append(next(self.dataset_iter))
File "/usr/local/lib/python3.7/dist-packages/detectron2/data/common.py", line 201, in __iter__
yield self.dataset[idx]
File "/usr/local/lib/python3.7/dist-packages/detectron2/data/common.py", line 90, in __getitem__
data = self._map_func(self._dataset[cur_idx])
File "/usr/local/lib/python3.7/dist-packages/detectron2/utils/serialize.py", line 26, in __call__
return self._obj(*args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/detectron2/data/dataset_mapper.py", line 189, in __call__
self._transform_annotations(dataset_dict, transforms, image_shape)
File "/usr/local/lib/python3.7/dist-packages/detectron2/data/dataset_mapper.py", line 128, in _transform_annotations
for obj in dataset_dict.pop("annotations")
File "/usr/local/lib/python3.7/dist-packages/detectron2/data/dataset_mapper.py", line 129, in <listcomp>
if obj.get("iscrowd", 0) == 0
File "/usr/local/lib/python3.7/dist-packages/detectron2/data/detection_utils.py", line 297, in transform_instance_annotations
p.reshape(-1) for p in transforms.apply_polygons(polygons)
File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 297, in <lambda>
return lambda x: self._apply(x, name)
File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 291, in _apply
x = getattr(t, meth)(x)
File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 150, in apply_polygons
return [self.apply_coords(p) for p in polygons]
File "/usr/local/lib/python3.7/dist-packages/fvcore/transforms/transform.py", line 150, in <listcomp>
return [self.apply_coords(p) for p in polygons]
File "/usr/local/lib/python3.7/dist-packages/detectron2/data/transforms/transform.py", line 150, in apply_coords
coords[:, 0] = coords[:, 0] * (self.new_w * 1.0 / self.w)
TypeError: can't multiply sequence by non-int of type 'float'
Solution 1:
Turns out some of the id's in "annotations" where written in scientific notation resulting in some id's with type float. Converting these to integers solved the problem.