Order of Serializer Validation in Django REST Framework
Situation
While working with validation in the Django REST Framework's ModelSerializer
, I have noticed that the Meta.model
fields are always validated, even when it does not necessarily make sense to do so. Take the following example for a User
model's serialization:
- I have an endpoint that creates a user. As such, there is a
password
field and aconfirm_password
field. If the two fields do not match, the user cannot be created. Likewise, if the requestedusername
already exists, the user cannot be created. - The user POSTs improper values for each of the fields mentioned above
- An implementation of
validate
has been made in the serializer (see below), catching the non-matchingpassword
andconfirm_password
fields
Implementation of validate
:
def validate(self, data):
if data['password'] != data.pop('confirm_password'):
raise serializers.ValidationError("Passwords do not match")
return data
Problem
Even when the ValidationError
is raised by validate
, the ModelSerializer
still queries the database to check to see if the username
is already in use. This is evident in the error-list that gets returned from the endpoint; both the model and non-field errors are present.
Consequently, I would like to know how to prevent model validation until after non-field validation has finished, saving me a call to my database.
Attempt at solution
I have been trying to go through the DRF's source to figure out where this is happening, but I have been unsuccessful in locating what I need to override in order to get this to work.
Solution 1:
Since most likely your username
field has unique=True
set, Django REST Framework automatically adds a validator that checks to make sure the new username is unique. You can actually confirm this by doing repr(serializer())
, which will show you all of the automatically generated fields, which includes the validators.
Validation is run in a specific, undocumented order
- Field deserialization called (
serializer.to_internal_value
andfield.run_validators
) -
serializer.validate_[field]
is called for each field - Serializer-level validators are called (
serializer.run_validation
followed byserializer.run_validators
) -
serializer.validate
is called
So the problem that you are seeing is that the field-level validation is called before your serializer-level validation. While I wouldn't recommend it, you can remove the field-level validator by setting extra_kwargs
in your serilalizer's meta.
class Meta:
extra_kwargs = {
"username": {
"validators": [],
},
}
You will need to re-implement the unique
check in your own validation though, along with any additional validators that have been automatically generated.
Solution 2:
I was also trying to understand how the control flows during serializer validation and after carefully going through the source code of djangorestframework-3.10.3 I came up with below request flow diagram. I have described the flow and what happens in the flow to the best of my understanding without going into too much detail as it can be looked up from source.
Ignore the incomplete method signatures. Only focusing on what methods are called on what classes.
Assuming you have an overridden is_valid
method on your serializer class (MySerializer(serializers.Serializer)
) when you call my_serializer.is_valid()
the following takes place.
-
MySerializer.is_valid()
is executed. - Assuming you are calling the super class (
BaseSerializer
)is_valid
method (like:super(MySerializer, self).is_valid(raise_exception)
in yourMySerializer.is_valid()
method, that will be called. - Now since
MySerializer
is extendingserializers.Serializer
, therun_validation()
method fromserializer.Serializers
is called. This is validating only the data dict the first. So we haven't yet started field level validations. - Then the
validate_empty_values
fromfields.Field
gets called. This again happens on the entiredata
and not a single field. - Then the
Serializer.to_internal_method
is called. - Now we loop over each fields defined on the serializer. And for each field, first we call the
field.run_validation()
method. If the field has overridden theField.run_validation()
method then that will be called first. In case of aCharField
it is overridden and calls therun_validation
method ofField
base class. Step 6-2 in the figure. - On that field we again call the
Field.validate_empty_values()
- The
to_internal_value
of the type of field is called next. - Now there is a call to the
Field.run_validators()
method. I presume this is where the additional validators that we add on the field by specifying thevalidators = []
field option get executed one by one - Once all this is done, we are back to the
Serializer.to_internal_value()
method. Now remember that we are doing the above for each field within that for loop. Now the custom field validators you wrote in your serializer (methods likevalidate_field_name
) are run. If an exception occurred in any of the previous steps, your custom validators wont run. read_only_defaults()
- update validate data with defaults I think
- run object level validators. I think the
validate()
method on your object is run here.
Solution 3:
I don't believe the above solutions work any more. In my case, my model has fields 'first_name' and 'last_name', but the API will only receive 'name'.
Setting 'extra_kwargs' and 'validators' in the Meta class seems to have no effect, first_name and last_name are allways deemed required, and validators are always called. I can't overload the first_name/last_name character fields with
anotherrepfor_first_name = serializers.CharField(source=first_name, required=False)
as the names make sense. After many hours of frustration, I found the only way I could override the validators with a ModelSerializer instance was to override the class initializer as follows (forgive the incorrect indentation):
class ContactSerializer(serializers.ModelSerializer):
name = serializers.CharField(required=True)
class Meta:
model = Contact
fields = [ 'name', 'first_name', 'last_name', 'email', 'phone', 'question' ]
def __init__(self, *args, **kwargs):
self.fields['first_name'] = serializers.CharField(required=False, allow_null=True, allow_blank=True)
self.fields['last_name'] = serializers.CharField(required=False, allow_null=True, allow_blank=True)
return super(ContactSerializer, self).__init__(*args, **kwargs)
def create(self, validated_data):
return Contact.objects.create()
def validate(self, data):
"""
Remove name after getting first_name, last_name
"""
missing = []
for k in ['name', 'email', 'question']:
if k not in self.fields:
missing.append(k)
if len(missing):
raise serializers.ValidationError("Ooops! The following fields are required: %s" % ','.join(missing))
from nameparser import HumanName
names = HumanName(data['name'])
names.capitalize()
data['last_name'] = names.last
if re.search(r'\w+', names.middle):
data['first_name'] = ' '.join([names.first, names.middle])
else:
data['first_name'] = names.first
del(data['name'])
return data
Now the doc says that allowing blank and null with character fields is a no no, but this is a serializer, not a model, and as the API gets called by all kinds of cowboys, I need to cover my bases.
Solution 4:
Here's the approach that worked for me.
- Use a sentinel error type that gets caught in an overridden view function
- The sentinel is raised from the custom serializer
The sentinel error type:
from django.core.exceptions import ValidationError
class CustomValidationErrors(ValidationError):
""" custom validation error for the api view to catch the status code """
And in the serializer we override errors
, _errors
, validated_data
, and _validated_data
, as well as is_valid
:
class CustomSerializer(serializers.ModelSerializer):
# fields that usually run validation before parent serializer validation
child_field1 = Child1Serializer()
child_field2 = Child2Serializer()
# override DRF fields
errors = {}
_errors = None
validated_data = {}
_validated_data = []
def is_valid(self, *args, **kwargs):
# override drf.serializers.Serializer.is_valid
# and raise CustomValidationErrors from parent validate
self.validate(self.initial_data)
return not bool(self.errors)
def validate(self, attrs):
self._errors = {}
if len(attrs.get("child_field1", {}).get("name", "")) > 100:
self._errors.update({"child_field1": {"name": "child 1 name > 100"}})
if len(attrs.get("child_field2", {}).get("description", "")) > 1000:
self._errors.update({"child_field2.description": "child 2 description > 100"})
if len(self._errors):
# set the overriden DRF values
self.errors = self._errors
# raise the sentinel error type
raise CustomValidationErrors(self._errors)
# set the overriden DRF serializer values
self._errors = None
self.validated_data = attrs
self._validated_data = [[k, v] for k, v in attrs.items()]
return attrs
class Meta:
model = CustomModel
And in the view we can override the default method, and catch the sentinel error type:
class CustomSerializerView(ListCreateAPIView):
serializer_class = CustomeSerializer
def post(self, *args, **kwargs):
try:
# if this fails for any exception
# other than CustomValidationErrors
# it will return the default error
return super().post(*args, **kwargs)
except CustomValidationErrors as e:
############
# returns a 400 with the following
# {"child_field1":
# [[{"name": "child 1 name > 100"}]],
# "child_field2.description":
# [["child 2 description > 100"]]
# }
############
return Response(e.error_dict, status=400)
drf version:
djangorestframework==3.11.0