Validating the inputs using Cerberus in Python

Cerberus :

A dog, usually represented as having three heads, that guarded the entrance to Hades

This is the original meaning of Cerberus as per Greek mythology. The name itself is pretty symbolic of what 'Cerberus' must be doing. It is lightweight and extensible data validation library for Python.

Usage:

This validation library can be used in many scenarios, it may be in case of validating large amount of data which you already have in some excel sheet or the data which you are receiving from some online form(this online form can be of customer or student registration form)

Prerequisite:

Install the Cerberus library

pip install cerberus

Demo scenario:

For demo purpose we will assume that our site has one contact-us form and we need to validate the data received from the form. Below are the fields we have in the form:

  • First name:
  • Last name:
  • Email:
  • Message :

So here we need to first decide the type of inputs allowed in the fields. So we will decide them as below:

{
    "FirstName": {
      "type": "string"     
    },
    "LastName": {
      "type": "string"
    },
    "Email": {
        "type": "string"
    },    
    "Message": {
        "type": "string",       

      }

}

After deciding the inputs we have to apply some validation rules on each field. We can apply length bounds using Cerberus’ minlength and maxlength properties. So the schema will look like below:

{
    "FirstName": {
      "type": "string",
      "minlength": 2,
      "maxlength": 255      
    },
    "LastName": {
      "type": "string",
      "minlength": 2,
      "maxlength": 255      
    },
    "Email": {
        "type": "string",
        "minlength": 2,
        "maxlength": 255    

    },    
    "Message": {
        "type": "string"         
      }

}

Now we need to make sure that we should get all input fields in payload, we can do that by adding 'required' : true field

{
    "FirstName": {
      "type": "string",
      "minlength": 2,
      "maxlength": 255,
      "required": true
    },
    "LastName": {
      "type": "string",
      "minlength": 2,
      "maxlength": 255,
      "required": true
    },
    "Email": {
        "type": "string",
        "minlength": 2,
        "maxlength": 255,
        "required": true

    },    
    "Message": {
        "type": "string",       
        "required": true
      }

}

We have to also validate that our email id is in proper format. For that 'regex' functionality helps us:

{
    "FirstName": {
      "type": "string",
      "minlength": 2,
      "maxlength": 255,
      "required": true
    },
    "LastName": {
      "type": "string",
      "minlength": 2,
      "maxlength": 255,
      "required": true
    },
    "Email": {
        "type": "string",
        "minlength": 2,
        "maxlength": 255,
        "required": true,
        "regex": "^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+$"
    },    
    "Message": {
        "type": "string",       
        "required": true
      }

}

So finally our schema is designed. Now let's work on validating the payload we will get when users enter the data in 'Contact-us' form.

def validation(payload: dict):
    """
    validate payload against a defined schema
    :param payload: payload to be validated
    :return: None
    """
    with open(os.path.abspath("contact_us_schema.json")) as f:
        schema = json.loads(f.read())

    validator = Validator()
    validator.validate(payload, schema)

    if validator.errors:
        exit(f"Validation failed due to following errors: {validator.errors}")
    else:
        print("Please proceed for further activity")

The function is quite expressive of what it does, we are validating the schema which we are pulling from JSON file. Here I am using JSON because I am comfortable using it, you can use your favorite serializer, for example YAML. As long as there is a decoder that can produce a nested dict, you can use it to define a schema. So our validator checks if input payload is in sync with the schema we defined, if it is not then it will let us know through Validator’s errors property.

If there are any errors during validation of input payload then we will stop processing the data at the same time by exit and displaying the error. If everything is properly validated then we can proceed for further activities like entering the data in database, or triggering the mail to particular stakeholder with the data.

That's all guys. For complete code visit the repository. Do let me know if you have any other suggestions for data validation in Python in comment section.