Python DataClasses模块深度解析与总结

文章目录

  • 1.dataclasses的dataclass(定义数据类)
  • 1.1 dataclass🆚class
  • 1.2 创建不可变数据对象
  • 1.3 dataclass继承
  • 1.4 自定义初始化
  • 1.5 数据对象自定义排序
  • 2.dataclasses的astuple和asdict(数据类变成元组和字典)
  • 3.dataclasses的fields(数据类字段设置)
  • 参考:
    Understanding Python Dataclasses
    docs.python.org
    Python dataclass
    heapq — Heap queue algorithm

    DataClasses从python3.7开始加入,是一种用来高效存储数据的工具,本文介绍以下内容
    (1)dataclass的基本定义和功能
    (2)dataclass+优先队列的排序
    (3)dataclass的字段设置

    1.dataclasses的dataclass(定义数据类)

    1.1 dataclass🆚class

    dataclass与python中正常的class相似,但是提供了实例化(instantiation),比较(comparing)和输出(printing)的基本功能,dataclass的语法记录如下

    dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False)
     - init:如果为true,__init__()方法将会生成
     - repr:如果为tru,__repr__()方法将会生成
     - eq:如果为true,__eq__()方法将会生成
     - order:如果为true,__lt__(),__le__(),__gt__(),__ge__()方法将会生成
     - unsafe_hash:如果为false,__hash__()根据eq和frozen的设置方式生成
     - frozen:如果为false, 赋值字段将产生异常
    

    首先看下正常的class的实例化,比较和输出

    class Employee:
        def __init__(self, name, age, city):
            self.name = name
            self.age = age
            self.city = city
    
        def __repr__(self):
            return f'employee name:{self.name}, age:{self.age}, city:{self.city}'
    
        def __eq__(self, other):
            return (self.name, self.age, self.city) == (other.name, other.age, other.city)
    
    
    e1 = Employee('zoey', 18, 'patna')
    e2 = Employee('mike', 20, 'delhi')
    e3 = Employee('zoey', 18, 'patna')
    
    print('employee information:')
    print(e1)
    print(e2)
    print(f'e1 and e3 same? {e1 == e3}')
    print(f'e1 and e2 same? {e1 == e2}')
    
    employee information:
    employee name:zoey, age:18, city:patna
    employee name:mike, age:20, city:delhi
    e1 and e3 same? True
    e1 and e2 same? False
    

    __init__方法用于实例化对象,__repr__方法用于输出对象内容,__eq__用于比较对象内容是否相等。这些方法的使用最大问题是每次都要复制属性并返回对象,在处理少量数据时还能接受,但是大量数据就会变得复杂,dataclass就是为了解决这些问题

    from dataclasses import dataclass
    
    @dataclass
    class Employee:
        name: str
        age: int
        city: str
    
    e1 = Employee('zoey', 18, 'patna')
    e2 = Employee('mike', 20, 'delhi')
    e3 = Employee('zoey', 18, 'patna')
    
    print('employee information:')
    print(e1)
    print(e2)
    print(f'e1 and e3 same? {e1 == e3}')
    print(f'e1 and e2 same? {e1 == e2}')
    
    employee information:
    Employee(name='zoey', age=18, city='patna')
    Employee(name='mike', age=20, city='delhi')
    e1 and e3 same? True
    e1 and e2 same? False
    

    同样的内容,dataclass不需要再重新写__init____repr____eq__

    1.2 创建不可变数据对象

    通常情况下数据类的实例可以再修改字段值,如果想要这个数据对象不可变,可以设置frozen=True,此处修改字段值会报错

    @dataclass(frozen=True)
    class Employee:
        name: str
        age: int
        city: str
    
    
    e1 = Employee('zoey', 18, 'patna')
    e1.name = 'mike'
    
    dataclasses.FrozenInstanceError: cannot assign to field 'name'
    

    1.3 dataclass继承

    dataclass和正常类一样可以继承父类的所有属性

    @dataclass(unsafe_hash=True)
    class Staff:
        name: str
        age: int
        city: str
    
    @dataclass
    class Employee(Staff):
        salary: int
    
    e1 = Employee('zoey', 18, 'patna', 20000)
    print(e1)
    
    Employee(name='zoey', age=18, city='patna', salary=20000)
    

    1.4 自定义初始化

    如果有一些字段的初始化需要依赖其它字段的值,可以使用__post_init__方法,同时使用field设置这个字段的init=False,field的更多介绍见后面内容。

    @dataclass
    class Employee:
        name: str
        age: int
        city: str
        adult: bool = field(init=False)
    
        def __post_init__(self):
            self.adult = 18 <= self.age <= 70
    
    e1 = Employee('zoey', 18, 'patna')
    print(e1)
    
    Employee(name='zoey', age=18, city='patna', adult=True)
    

    基于age字段来判断adult字段,但是如果实例化后,修改对象的age,adult是不会随之更新的。

    e1 = Employee('zoey', 18, 'patna')
    print(e1)
    e1.age = 8
    print(e1)
    
    Employee(name='zoey', age=18, city='patna', adult=True)
    Employee(name='zoey', age=8, city='patna', adult=True)
    

    age修改为8,adult依然为True

    1.5 数据对象自定义排序

    python中的富比较方法如下,对各种对象都适用

  • object.__lt__(self, other):x<y
  • object.__le__(self, other):x<=y
  • object.__eq__(self, other):x==y
  • object.__ne__(self, other):x!=y
  • object.__gt__(self, other):x>y
  • object.__ge__(self, other):x>=y
  • 如果想给数据对象进行排序,可以结合优先队列实现,优先队列有两种实现queue.PriorityQueueheapqqueue.PriorityQueue也是基于heapq实现,heapq提供了堆排序算法的实现,本身heapq是不支持自定义比较函数,但是可以通过重写数据类的__lt__(self, other)函数来实现自定义,__lt__(self, other)对应到<

    from dataclasses import dataclass, field
    from queue import PriorityQueue
    
    @dataclass
    class Employee:
        name: str = field(compare=False)
        age: int
        city: str = field(compare=False)
        work: int
    
        def __lt__(self, other):
            if self.age < other.age:
                return True
            elif self.work > other.work:
                return True
    
    e1 = Employee('zoey', 18, 'patna', 20)
    e2 = Employee('joe', 19, 'patna', 21)
    e3 = Employee('mike', 19, 'deli', 20)
    e4 = Employee('judy', 17, 'india', 22)
    
    q = PriorityQueue()
    q.put(e1)
    q.put(e2)
    q.put(e3)
    q.put(e4)
    
    while not q.empty():
        next_item = q.get()
        print(next_item)
        print('\n')
    
    Employee(name='judy', age=17, city='india', work=22)
    Employee(name='zoey', age=18, city='patna', work=20)
    Employee(name='joe', age=19, city='patna', work=21)
    Employee(name='mike', age=19, city='deli', work=20)
    

    通过重写数据类的__lt__(self, other)函数,设置age越小越有限,work越大越优先,注意的是,__lt__是<,因此比较work时写的是self.work > other.work,这样才能work大的排在前面。如果要自定义比较函数,不能设置order=True,这和后面介绍的field的compare字段不一样

    2.dataclasses的astuple和asdict(数据类变成元组和字典)

    dataclasses模块还提供了astuple()asdict()功能,能将dataclass实例变成元组和字典

    from dataclasses import dataclass, astuple, asdict
    
    
    @dataclass(unsafe_hash=True)
    class Employee:
        name: str
        age: int
        city: str
    
    e1 = Employee('zoey', 18, 'patna')
    
    print(astuple(e1))
    print(asdict(e1))
    
    ('zoey', 18, 'patna')
    {'name': 'zoey', 'age': 18, 'city': 'patna'}
    

    3.dataclasses的fields(数据类字段设置)

    dataclasses.field()对象描述dataclass中每个已定义的字段

    dataclasses.field(*, default=MISSING, default_factory=MISSING, repr=True, hash=None, init=True, compare=True, metadata=None)
    

    (1)参数1:default,指定该字段的默认值

    from dataclasses import dataclass, field
    
    @dataclass
    class Employee:
        name: str
        age: int
        city: str
        work: str = field(default='china')
    
    e1 = Employee('zoey', 18, 'patna')
    print(e1)
    
    Employee(name='zoey', age=18, city='patna', work='china')
    

    work字段默认是china

    (2)参数2:default_factory,字段接收一个函数,返回这个字段的初始值,要求函数无参数

    from dataclasses import dataclass, field
    
    def get_work():
        return 'china'
    
    @dataclass
    class Employee:
        name: str
        age: int
        city: str = field(default='patna')
        work: str = field(default_factory=get_work)
    
    e1 = Employee('zoey', 18)
    print(e1)
    
    Employee(name='zoey', age=18, city='patna', work='china')
    

    work字段接收函数get_work,返回china

    (3)参数3:init,如果为true,该字段将作为生成的__init__()方法的参数包含

    from dataclasses import dataclass, field
    
    @dataclass
    class Employee:
        name: str
        age: int
        city: str
        work: str = field(init=False, default='china')
    
    e1 = Employee('zoey', 18, 'patna')
    print(e1)
    
    Employee(name='zoey', age=18, city='patna', work='china')
    

    work字段的init=False,初始化生成e1时不能传入这个参数,否则会报错;

    work: str = field(init=True, default='china')
    
    e1 = Employee('zoey', 18, 'patna', 'korea')
    print(e1)
    
    Employee(name='zoey', age=18, city='patna', work='korea')
    

    如果init=True,那么可以输入这个参数,并且保留这个参数的值

    (4)参数4:repr,如果为true,该字段将作为生成的__repr__()方法的参数

    class Employee:
        name: str
        age: int
        city: str
        work: str = field(init=False, default='china', repr=False)
    
    e1 = Employee('zoey', 18, 'patna')
    print(e1)
    
    Employee(name='zoey', age=18, city='patna')
    

    work字段的repr=False,输出e1时没有显示work=‘china’

    work: str = field(init=False, default='china', repr=True)
    
    e1 = Employee('zoey', 18, 'patna')
    print(e1)
    
    Employee(name='zoey', age=18, city='patna', work='china')
    

    如果work字段的repr=True,输出e1后会显示work=‘china’

    (5)参数5:compare,如果为true,字段会作为生成的富比较方法参数

    首先要设置order=True,然后设置compare值来设置数据对象的字段是否参与比较,compare是默认为True

    @dataclass(order=True)
    class Employee:
        name: str = field(compare=False)
        age: int = field(compare=False)
        city: str = field(compare=False)
        work: int
    
    e2 = Employee('joe', 19, 'patna', 21)
    e4 = Employee('judy', 17, 'india', 22)
    print(e2 < e4)
    
    True
    

    只比较work字段的大小,如果要自定义多个属性,参考在1.5节数据对象排序

    作者:bujbujbiu

    物联沃分享整理
    物联沃-IOTWORD物联网 » Python DataClasses模块深度解析与总结

    发表回复