Objective

By the end of this article, you should be able to:

  • Construct object in dataclasses.
  • Understand and Implment inheritance and composition using dataclasses.
  • Understand field dataclass.

Before reading this article you must first understand inheritance, composition and some basic python. Visit realpython’s for some refresher. Else lets start!.

Dataclasses

In simple terms, Dataclasses are python classes that are suitable for storing objects data.

Before dataclasses in python3.7+, creating a class with data was a bit of a headache and lots of repetition. If we wanted to construct an object class name Employee with data id and name, the data would be instantiated via the __init__() constructor method.

class Employee:
    def __init__(self, id: int, name: str):
        self.id = id 
        self.name = name

However, with dataclasses, one would construct the same class with data above as such;

from dataclass import dataclass

@dataclass
class Employee:
    id: int
    name: str

As we can see above, Dataclasses automates the creation of the __init__() constructor method and, all we need is to state the object and its type(s).

We can even go further and create a method in the dataclass object as done below;

from dataclass import dataclass

@dataclass
class Employee:
    id: int
    name: str

    def employeeinfo(self):
        return f"employee name is {self.name} with id {self.id}"

Back to Inhertiance

From realpython’s article, “Inheritance is the mechanism you’ll use to create hierarchies of related classes. These related classes will share a common interface that will be defined in the base classes. Derived classes can specialize the interface by providing a particular implementation where applies”.

As done in realpython’s article, we create an HR system to process payroll for the company’s employees.

from typing import List
from dataclasses import dataclass

class PayrollSystem:
    def calculate_payroll(self, employees: List[dataclass]) -> None:
        """Takes a collection of employees and prints their id: str, name: str and check amount: float
        using the .calculate_payroll() method expoed on each employee object.

        Args:
            employees (List): collection of employees
        """
        print("Calculating Payroll")
        print("===================")
        for employee in employees:
            print(f"Payroll for: {employee.id} - {employee.name}")
            print(f"- Check amount: {employee.calculate_payroll()}")
            print("")

As noted in the comments above, the class PayrollSystem has a method calculate_payroll which takes collection(i.e. List) of employees and displays their id, name and checks amount using calculate_payroll() method, which is part of the employee object.

We go ahead to implement a base class for an employee that handles interface for employee types. This base class implemented will further be inherited by other classes.

from dataclasses import dataclass
from typing import Union

@dataclass
class Employee:
    """Base class for Employees. Common interface for every employee type.
    Constructed with id: int, name: str
    """
    id: int
    name: str

We further create a class SalaryEmployee, a subclass of the parent class Employee (implemented earlier).

from dataclasses import dataclass
from typing import Union

@dataclass
class SalaryEmployee(Employee):
    """Salary Employee Inherits Employee base class and add weekly_salary data.
    """
    weekly_salary: Union[float, int]

    def calculate_payroll(self) -> Union[float, int]:
        """Calulate the payroll and returns pay.
        """
        return self.weekly_salary

From the code above, we did two things;

  • inherited the base class and added the data weekly_salary to it.
  • plus added a method calculate_payroll which returns weekly_salary data.

Running the code implemented above, we can see that the dataclass decorator automatically initialized weekly_salary data and, further initialized the members of the base class.

Guess what, what happened is one of dataclass superpowers!.

We can go extra and extend our base class to other classes such as HourEmployees and CommissionEmployee; Plus add new data and create methods as done below.

@dataclass
class HourlyEmployee(Employee):
    """Hourly Employee Inherits Employee base class and add hours_worked and hour_rate data.
    """
    hours_worked: Union[float, int]
    hour_rate: Union[float, int]

    def calculate_payroll(self) -> Union[float, int]:
        """Calulate the payroll and returns pay.
        """
        return self.hours_worked * self.hour_rate


@dataclass
class CommissionEmployee(SalaryEmployee):
    """Comission Employee Inherits SalaryEmployee base class and add commission data.
    """
    commission: int

    def calculate_payroll(self) -> Union[float, int]:
        """Calulate the payroll from SalaryEmployee class using super() and adds commisssion.
        """
        fixed = super().calculate_payroll()
        return fixed + self.commission

Testing our implement, we pass them to the PayrollSystem class to process payroll;

salary_employee = SalaryEmployee(1, "Sam May", 1500)
hourly_employee = HourlyEmployee(2, 'Jane Doe', 40, 15)
commission_employee = CommissionEmployee(3, 'Kevin Bacon', 1000, 250)

payroll_system = PayrollSystem()
payroll_system.calculate_payroll([salary_employee, hourly_employee, commission_employee])

Running the program and we get the following results.

# output below

Calculating Payroll
===================
Payroll for: 1 - Sam May
- Check amount: 1500

Payroll for: 2 - Jane Doe
- Check amount: 600

Payroll for: 3 - Kevin Bacon
- Check amount: 1250

As you can see, three employee objects were created, passed on to the payroll system, which in turn used the calculate_payroll method to calculate the payroll for each employee and printed the results.

Abstract Base Class and Dataclass in Pyton

As I said earlier, dataclasses automate automates the creation of the __init__() constructor method. We can also apply python abc module, which provide the functionality to prevent creating objects from abstract base classes.

Here is an example below;

from dataclasses import dataclass
from abc import ABC, abstractmethod
from typing import Union

@dataclass
class Employee_(ABC):
    id: int
    name: str

    @abstractmethod
    def calculate_payroll(self):
        pass

@dataclass
class SalaryEmployee(Employee_):
    weekly_salary: Union[float, int]

    def calculate_payroll(self) -> Union[float, int]:
        return self.weekly_salary

@dataclass
class HourlyEmployee(Employee_):
    hours_worked: Union[float, int]
    hour_rate: Union[float, int]

    def calculate_payroll(self) -> Union[float, int]:
        return self.hours_worked * self.hour_rate

@dataclass
class CommissionEmployee(SalaryEmployee):
    commission: int

    def calculate_payroll(self) -> Union[float, int]:
        fixed = super().calculate_payroll()
        return fixed + self.commission

Testing our implementation as done above, we get same results below;

# output below

Calculating Payroll
===================
Payroll for: 1 - Sam May
- Check amount: 1500

Payroll for: 2 - Jane Doe
- Check amount: 600

Payroll for: 3 - Kevin Bacon
- Check amount: 1250

As done before, let’s go further, extend the derived class to create other derived classes and further add method work, which prints out the work an employee does.

class Manager(SalaryEmployee):
    def work(self, hours: Union[float, int]) -> str:
        print(f"{self.name} screams and yells for {hours} hours")

class Secretary(SalaryEmployee):
    def work(self, hours: Union[float, int]) -> str:
        print(f"{self.name} expands {hours} hours doing office paperwork")


class SalesPerson(CommissionEmployee):
    def work(self, hours: Union[float, int]) -> str:
        print(f"{self.name} expands {hours} hours on the phone")


class FactoryWorker(HourlyEmployee):
    def work(self, hours: Union[float, int]) -> str:
        print(f"{self.name} manufactures gadgets for {hours} hours")

To test if our work method is work? We create a productivity platform, which displays the work and hours the employee does.

import Employee
from typing import Union, List

class ProductivitySystem:
    def track(self, employees: List[Employee], hours: Union[float, int]) -> None:
        print("Tracking Employee Productivity")
        print("==============================")
        for employee in employees:
            result = employee.work(hours)
            print(f'{employee.name}: {result}')
        print('')

So far, all our implementations are working and putting it together should work too! Putting it all together, we implement a program (as below) that tell us about the Employees productivity and their Calculate their Payroll.

import PayrollSystem
import Manager, Secretary, SalesPerson, FactoryWorker
import ProductivitySystem

manager = Manager(id=1, name='Mary Poppins', weekly_salary=3000)
secretary = Secretary(id=2, name='John Smith', weekly_salary=1500)
sales_guy = SalesPerson(id=3, name='Kevin Bacon', weekly_salary=1000, commission=250)
factory_worker = FactoryWorker(id=4, name='Jane Doe', hours_worked=40, hour_rate=15)


company_employees = [manager, secretary, sales_guy,factory_worker, ]

productivity_system = ProductivitySystem()
productivity_system.track(company_employees, 40)

payroll_system = PayrollSystem()
payroll_system.calculate_payroll(company_employees)

Running our code, we get the output below as expected (Sorry, I did not write tests), which indicate all is fine as expected.

# output below

Tracking Employee Productivity
==============================
Mary Poppins screams and yells for 40 hours
Mary Poppins: None
John Smith expands 40 hours doing office paperwork
John Smith: None
Kevin Bacon expands 40 hours on the phone
Kevin Bacon: None
Jane Doe manufactures gadgets for 40 hours
Jane Doe: None

Calculating Payroll
===================
Payroll for: 1 - Mary Poppins
- Check amount: 3000

Payroll for: 2 - John Smith
- Check amount: 1500

Payroll for: 3 - Kevin Bacon
- Check amount: 1250

Payroll for: 2 - Jane Doe
- Check amount: 600

To test how multiple inheritance works in dataclasses, we create other classes, which inherit two or three dataclass objects.

Let’s go ahead and create roles and policies, which will be inherited and used by other classes.

from dataclasses import dataclass
from typing import Union

class ManagerRole:
    def work(self, hours: Union[float, int]) -> str:
        return f"screams and yells for {hours} hours."

class SecretaryRole:
    def work(self, hours: Union[float, int]) -> str:
        return f"expands {hours} hours doing office paperwork."

class SalesRole:
    def work(self, hours: Union[float, int]) -> str:
        return f"expands {hours} hours on phone."

class FactoryRole:
    def work(self, hours: Union[float, int]) -> str:
        return f"manufactures gadgets for {hours} hours."

@dataclass
class SalaryPolicy:
    weekly_salary: Union[float, int]

    def calculate_payroll(self) -> Union[float, int]:
        return self.weekly_salary

@dataclass
class HourlyPolicy:
    hours_worked: Union[float, int]
    hour_rate: Union[float, int]

    def calculate_payroll(self) -> Union[float, int]:
        return self.hours_worked * self.hour_rate

@dataclass
class CommissionPolicy(SalaryPolicy):
    commission: int

    def calculate_payroll(self) -> Union[float, int]:
        fixed = super().calculate_payroll()
        return fixed + self.commission

We further modify some classes implemented earlier, which make use of the roles and policies created (Multiple inheritance here)


from dataclasses import dataclass
from typing import Union

import SalaryPolicy, CommissionPolicy, HourlyPolicy
import ManagerRole, SecretaryRole, SalesRole, FactoryRole


@dataclass
class Employee:
    id: int
    name: str

@dataclass
class Manager(Employee, ManagerRole, SalaryPolicy):

    def __post_init__(self):
        SalaryPolicy.__init__(self, self.weekly_salary)
        super().__init__(self.id, self.name)


@dataclass
class Secretary(Employee, SecretaryRole, SalaryPolicy):
    
    def __post_init__(self):
        SalaryPolicy.__init__(self, self.weekly_salary)
        super().__init__(self.id, self.name)


@dataclass
class SalesPerson(Employee, SalesRole, CommissionPolicy):

    def __post_init__(self):
        CommissionPolicy.__init__(self, self.weekly_salary, self.commission)
        super().__init__(self.id, self.name)


@dataclass
class FactoryWorker(Employee, FactoryRole, HourlyPolicy):

    def __post_init__(self):
        HourlyPolicy.__init__(self, self.hours_worked, self.hour_rate)
        super().__init__(self.id, self.name)


@dataclass
class TemporarySecretary(Employee, SecretaryRole, HourlyPolicy):

    def __post_init__(self):
        HourlyPolicy.__init__(self, self.hours_worked, self.hour_rate)
        super().__init__(self.id, self.name)

Lots of code above, right? Plus, I guess you’re wondering the use and purpose of the __post_init__ method.

Though the dataclass decorator automates the creation of the __init__ method, at some point, we would like to get control of this process to tune it for our use. Including __post_init__ in our class, we can provide other instructions for modifying fields or even instantiate other data as we please.

As you can see in the Manager class, we inherited Employee, ManagerRole, SalaryPolicy classes and used __post_init__ to initialize the SalaryPolicy class. We also used super(), which allows us to call methods of the superclass in our subclass.

To test our multiple inheritance, we modify our program as done below;

import Manager, Secretary, SalesPerson, FactoryWorker, TemporarySecretary
import PayrollSystem
import ProductivitySystem


manager = Manager(id=1, name='Mary Poppins', weekly_salary=3000)
secretary = Secretary(id=2, name='John Smith', weekly_salary=1500)
sales_guy = SalesPerson(id=3, name='Kevin Bacon', weekly_salary=1000, commission=250)
factory_worker = FactoryWorker(id=2, name='Jane Doe', hours_worked=40, hour_rate=15)
temporary_secretary = TemporarySecretary(id=5, name='Robin Williams', hours_worked=40,
                                                   hour_rate=9)

employees = [manager, secretary, sales_guy, factory_worker, temporary_secretary]

productivity_system = ProductivitySystem()
productivity_system.track(employees, 40)

payroll_system = PayrollSystem()
payroll_system.calculate_payroll(employees=employees)

Running the code above, as done earlier, we get the same output as earlier.

# output below

Tracking Employee Productivity
==============================
Mary Poppins screams and yells for 40 hours
Mary Poppins: None
John Smith expands 40 hours doing office paperwork
John Smith: None
Kevin Bacon expands 40 hours on the phone
Kevin Bacon: None
Jane Doe manufactures gadgets for 40 hours
Jane Doe: None

Calculating Payroll
===================
Payroll for: 1 - Mary Poppins
- Check amount: 3000

Payroll for: 2 - John Smith
- Check amount: 1500

Payroll for: 3 - Kevin Bacon
- Check amount: 1250

Payroll for: 2 - Jane Doe
- Check amount: 600

Compostion Here

From realpython’s article, “Composition is an OO design concept that models has a realationship. In composition, a class known as composite contains an object of another class known to as components”.

One advantage of composition compared to inheritance is; a change in one component rarely affects the composite class. Vice versa, a change in the composite class rarely affect the component class. In fact, this advantage enables code adaptability and code base changes without introducing problems.

This advantage enables code adaptability and code base changes without introducing problems.

Let’s go ahead and implement an Address class which components of an address using dataclass;

from dataclasses import dataclass, field
from typing import Optional, Union, Dict

@dataclass
class Address:
    street: str
    city: str
    state: str
    zipcode: str
    street2: Optional[str] = ''
    

    def __str__(self) -> str:
        """Provides pretty response of address."""
        lines = [self.street]
        if self.street2:
            lines.append(self.street2)
        lines.append(f"{self.city}, {self.state} {self.zipcode}")
        return "\n".join(lines)

We implemented the __str__ method in the code above to provides us with a pretty implementation of our Address object.

Testing our implementation works, we run the following code.

address1 = Address(street="55 main st.", city="concord", state="NH", zipcode="03301")

address2 = Address(street="55 main st.", city="concord", state="NH", zipcode="03301", street2="denso")

print(address1)
print(address2)

This gives output

# output below

55 main st.
concord, NH 03301
55 main st.
denso
concord, NH 03301

Since all is working as expected, we modify the Employee class by adding the Address class as a composite.

import Address
from dataclasses import dataclass
from typing import Union

@dataclass
class Employee:
    """We making the Employee an abstract base class.  There are two side effects here;
    * You telling users of the module that objects of type Employee can't be created.
    * You telling other devs working on the hr module hat if they derive from Employee, the they must 
    override the .calculate_payroll abstract method."""
    id: int
    name: str
    address: Address = None

We further modify the PayrollSystem class to print employees address if present.

from typing import List
from dataclasses import dataclass

class PayrollSystem:
    def calculate_payroll(self, employees: List[dataclass]) -> None:
        """Takes a collection of employees and prints their id: str, name: str and check amount: float
        using the .calculate_payroll() method expoed on each employee object.

        Args:
            employees (List): collection of employees
        """
        print("Calculating Payroll")
        print("===================")
        for employee in employees:
            print(f"Payroll for: {employee.id} - {employee.name}")
            print(f"- Check amount: {employee.calculate_payroll()}")
            if employee.address:
                print("- Sent to:")
                print(employee.address)
            print("")

We also modify our program to include Address as done below;

import Manager, Secretary, SalesPerson, FactoryWorker, TemporarySecretary
import Address
import PayrollSystem
import ProductivitySystem

manager = Manager(id=1, name='Mary Poppins', weekly_salary=3000)
manager.address = Address("121 Admin Rd", "Concord", "NH", "03301")

secretary = Secretary(id=2, name='John Smith', weekly_salary=1500)
secretary.address = Address('67 Paperwork Ave.', 'Manchester',  'NH', '03101')

sales_guy = SalesPerson(id=3, name='Kevin Bacon', weekly_salary=1000, commission=250)

factory_worker = FactoryWorker(id=2, name='Jane Doe', hours_worked=40, hour_rate=15)

temporary_secretary = TemporarySecretary(id=5, name='Robin Williams', hours_worked=40, hour_rate=9)

employees = [manager, secretary, sales_guy, factory_worker, temporary_secretary,]

productivity_system = ProductivitySystem()
productivity_system.track(employees, 40)

payroll_system = PayrollSystem()
payroll_system.calculate_payroll(employees)

Running the code above, as done earlier, we get the same output as earlier.

Tracking Employee Productivity
==============================
Mary Poppins: screams and yells for 40 hours.
John Smith: expands 40 hours doing office paperwork.
Kevin Bacon: expands 40 hours on phone.
Jane Doe: manufactures gadgets for 40 hours.
Robin Williams: expands 40 hours doing office paperwork.

Calculating Payroll
===================
Payroll for: 1 - Mary Poppins
- Check amount: 3000
- Sent to:
121 Admin Rd
Concord, NH 03301

Payroll for: 2 - John Smith
- Check amount: 1500
- Sent to:
67 Paperwork Ave.
Manchester, NH 03101

Payroll for: 3 - Kevin Bacon
- Check amount: 1250

Payroll for: 2 - Jane Doe
- Check amount: 600

Payroll for: 5 - Robin Williams
- Check amount: 360

As we can see, we print out the address if present. Also, this design is flexible and, we can change the Address class without having an impact on the Employee class.

Dataclass field

As you can see, using dataclass is super cool and simple.

However, in some cases, we will require or like to customize our dataclass field and, this is where the use of field comes to play.

With the default parameter in field, we can define the default value of the attributes declared. Below are some examples.

Example One: We use field to define default values from a function for an attribute:

import uuid
from dataclasses import dataclass, field

def gen_random_id():
    return uuid.uuid1().hex

@dataclass
class Employee:
    name: str
    id: str = field(default_factory=gen_random_id)

# Testing our implementation
Employee(name="kwesi")
#output 
Employee(name='kwesi', id='54073e27060b11ec85e6a44cc81af35c')

In the code above, we have a function that generates a random id using uuid and Employee class attribute id uses it.

Example Two: We use field to define default values for a class attribute.

import uuid
from dataclasses import dataclass, field

def gen_random_id():
    return uuid.uuid1().hex

@dataclass
class Employee:
    name: str
    id: str = field(default_factory=gen_random_id)
    working_hrs: int = field(default=40)

# Testing our implementation
Employee(name="kwesi")
# output 
Employee(name='kwesi', id='f6b1f661060d11ec9ec9a44cc81af35c', working_hrs=40)

In the code above, we extended the function in example one by defining a default value for working_hrs attribute.

Example Three: Here, things get a bit complicated. We create a new class, EmployeeDB and use field to define the class attribute _employees.

We further use __post_init__ to modify the _employees data defined earlier, and finally, we created a method to display all _employees when called.

import uuid
from dataclasses import dataclass, field
from typing import List, Dict

def gen_random_id():
    return uuid.uuid1().hex

@dataclass
class Employee:
    name: str
    id: str = field(default_factory=gen_random_id)
    working_hrs: int = field(default=40)

@dataclass
class EmployeesDB:
    _employees: List[Dict[str, Employee]] = field(default=Dict)

    def __post_init__(self):
        self._employees = [{
            "emp1": Employee("Doe"),
            "emp2": Employee("Jane", working_hrs=50),
            "emp3": Employee("Kwesi", working_hrs=20),
        }]

    @property
    def employees(self):
        return [employee for employee in self._employees]


# Testing our implementation
EmployeesDB().employees
#output 
[{'emp1': Employee(name='Doe', id='b6809f19061111ec86cba44cc81af35c', working_hrs=40),
  'emp2': Employee(name='Jane', id='b680a0cc061111ecb53ca44cc81af35c', working_hrs=50),
  'emp3': Employee(name='Kwesi', id='b680a120061111eca92ca44cc81af35c', working_hrs=20)}]

Reference

Check code and some other implemenation on my github repo github.