Data validation and classes

Skip to content

This is a machine-translated text that may contain errors!

In Python (this also applies to most other languages), it is possible to create your own objects, with their own values, rules and functions. This is called classes (classes in English). We use classes to gather related data and functions into a unit, e.g. an Ordre class that has data such as ordre_id, kunde_navn, produkter and functions such as legg_til_produkt(), beregn_total(), etc.

Dictionary (JSON) vs Classes (Objects)

JSON (JavaScript Object Notation) is a format for storing and transferring data independent of programming language, while a class is a structure in a specific programming language.

When we need to send data over the network, or store it in a file, we often use JSON (or tables in databases).

When we need to work with structured data in code, we use classes.

In this module, we will look at how we can use classes to validate data.

The easiest way is to use @dataclass from the dataclasses library. This allows us to avoid writing a lot of boilerplate code to create a class. (Such as the built-in __init__ and __repr__ (representation) functions).

Example of a class without using the dataclass decorator:

class Car:
    def __init__(self, make: str, model: str, year: int):
        self.make = make
        self.model = model
        self.year = year

    def __repr__(self):
        return f"{self.year} {self.make} {self.model}"

my_car = Car("Toyota", "Corolla", 2020)
print(my_car)  # Output: 2020 Toyota Corolla

British python devs be like thats a constructor, __init__?

Example with dataclass decorator, which achieves the same as above (but with less code):

from dataclasses import dataclass

@dataclass
class Car:
    make: str
    model: str
    year: int

my_car = Car("Toyota", "Corolla", 2020)
print(my_car)  # Output: Car(make='Toyota', model='Corolla', year=2020)

Easy Task 1 - Create a Class

Create a class named Person. This class should have the following properties (attributes):

  • name: The name of the person
  • eye_color: The eye color of the person
  • phone_number: The phone number of the person
  • email: The email address of the person

Instantiate (use) an object of the Person class with valid values for all properties as in the example below.

@dataclass
class Person:
    ... # Your code here

bob_kaare = Person(name="Bob Kåre",
                   eye_color="blue",
                   phone_number="12345678",
                   email="bob_kaare@example.com")
print(bob_kaare)

Solution: A dataclass for Person

Here is a possible solution:

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    eye_color: str
    phone_number: str
    email: str

Medium Task 2 - Validation in the Class

In the example above, we have not added any validation. This means that we can create a Person with invalid values, such as:

@dataclass
class Person:
    ... # Your code here

invalid_person = Person(name="",
                        eye_color="yes",
                        phone_number="12345",
                        email="not-an-email")
print(invalid_person)
# Output: Person(name='', eye_color='yes', phone_number='12345', email='not-an-email')

This is (potentially) problematic and can easily lead to technical debt in the future. Fortunately, there are simple ways to add validation to classes.

We will start by looking at email validation. There are built-in libraries in Python that can help us with this, but because we are doing this to learn, we will create our own simple validation by creating a new class for “Email” and investigating a __post_init__ function (only for dataclass).

Merk

We can also do this in the Person class itself, but it is often better to create separate classes for things that can be reused.

Example of a post_init function
from dataclasses import dataclass

@dataclass
class Email:
    address: str

    def __post_init__(self):
        print(f"Validating email: {self.address}")
        # Your code here

For a simple email validation, we can for example check that the email contains both @ and . characters. Optionally, you can check that the email matches a regex pattern. (More advanced, but feel free to search online!)

Solution: Code for simple email validation

Here is a possible solution, we use exceptions to “crash” if the email is invalid, this will stop the program immediately and give an error message.

from dataclasses import dataclass

@dataclass
class Email:
    address: str

    def __post_init__(self):
        if "@" not in self.address:
            raise ValueError(f"Missing @ in email address: {self.address}")
        if "." not in self.address.split("@")[1]:
            raise ValueError(f"Missing . in the domain of the email address: {self.address}")
        if " " in self.address:
            raise ValueError(f"The email address cannot contain spaces: {self.address}")

# Test the code 
test = Email("hei@example.com")  # Valid
try:
    test = Email("heiexample.com")
except ValueError as e:
    print(e) # Invalid, missing @

Medium Task 3 - Phone Number Validation

Create a class similar to the one you made for emails, but now for phone numbers.

Challenge with phone validation!

Can you fix the validation for phone numbers to accept both letters (str) and numbers (int)? For example, 12345678 and "12345678" should both be valid.

Also, try adding country codes as an attribute (subvalue to the class). For example, 47 for Norway, 46 for Sweden

Medium Task 4 - Use validation in the Person class

Now that we have created validation for email and phone number, we can use these in the Person class.

@dataclass
class Person:
    name: str
    eye_color: str
    phone_number: PhoneNumber  # Use the PhoneNumber class
    email: Email               # Use the Email class

New challenge arises!

Now that we have changed the Person class to use the PhoneNumber and Email classes, we must also change how we instantiate (create) a Person. We must now first create a PhoneNumber and an Email object, before we can create a Person.

bob_kaare = Person(name="Bob Kåre",
                   eye_color="blue",
                   phone_number=PhoneNumber("12345678"),  # Note the change here
                   email=Email("bob_kaare@example.com"))  # Note the change here
print(bob_kaare)

# Note: a change must be made in the way we retrieve the values as well
print(bob_kaare.email.address)
print(bob_kaare.phone_number.number)  # .country_code(?)

Hard Task 5 - Properties in classes (Optional)

When we use objects to represent values such as email and phone numbers, we need to specify the sub-value (e.g. address for email and number for phone number) each time we want to retrieve the value. This can become a bit cumbersome in the long run. Fortunately, there is a solution to this, by using the @property decorator in a class, which allows us to retrieve the value directly from the object, without having to specify the sub-value.

This does, however, present another challenge, and that is that we need the __init__ function in the Person class. This is because we cannot use the same name for both a property and an attribute in a dataclass.

from dataclasses import dataclass

@dataclass
class EksempelVerdi:
    attributt: str

@dataclass
class Person:
    name: str
    _verdi: EksempelVerdi  # Internal variable (starts with _ to indicate that it is "private")

    def __init__(self, name: str, verdi: EksempelVerdi):
        self.name = name
        self._verdi = verdi

    @property
    def verdi(self):
        return self._verdi.attributt  # Retrieves the subvalue directly

# Test the code
person = Person(name="Alice", verdi=EksempelVerdi("Some text"))
print(person.verdi)  # Output: Some text

Merk

Properties are unique in that they do not need parameters, and do not need parentheses to run. In the example, we retrieve person.verdi without parentheses (not person.verdi()), even though it is technically a function.

Alternatively, an example that accepts both str and EksempelVerdi
@dataclass
class Person:
    name: str
    _verdi: str

    def __init__(self, name: str, verdi: str | EksempelVerdi):
        self.name = name
        if isinstance(verdi, EksempelVerdi):
            self._verdi = verdi
        elif isinstance(verdi, str):
            self._verdi = EksempelVerdi(verdi)
        else:
            raise TypeError("verdi må være av type str eller EksempelVerdi")

    @property
    def verdi(self) -> str:
        return self._verdi.attributt  # Retrieves the underlying value directly

# Test the code
person = Person(name="Alice", verdi="Some text")
print(person.verdi)  # Output: Some text

Hard Task 6 - Properties with Logic (Optional)

Update Person by adding a new attribute called birthday. This should be of type datetime.date (from the datetime library).

Then, create the following properties:

  • Create a property age that calculates the person’s age based on birthday and today’s date.
  • Create a property is_adult that returns True if the person is 18 years or older, otherwise False.

Solution: Age and adult as properties

Here is a possible solution:

from dataclasses import dataclass
from datetime import date

@dataclass
class Person:
    name: str
    birthday: date

    @property
    def age(self) -> int:
        """Calculates the age based on the date of birth and today's date"""
        today = date.today()
        age = today.year - self.birthday.year
        # Adjust down by 1 if the person has not yet had their birthday this year
        if (today.month, today.day) < (self.birthday.month, self.birthday.day):
            age -= 1
        return age

    @property
    def is_adult(self) -> bool:
        """Returns True if the person is 18 years or older"""
        return self.age >= 18

# Test the code
person = Person(name="Alice", birthday=date(2005, 5, 15))
print(person.age)       # E.g. 18 if today's date is after May 15, 2023
print(person.is_adult)  # True

Yet another challenge!

Can you get the instantiation of Person to accept both datetime.date and a text in the format "DD-MM-YYYY" for birthday? (Hint: use datetime.strptime to convert the string to a datetime.date)