背景：今年开始搞 Data science ，学了 python 小半年，但一直没时间整理整理。这篇文章很基础，就是根据廖雪峰的 python 教程整理了一下基础知识，再加上自己的一些拓展，方便自己以后查阅。

一、基础

1、简介

(1) 历史

Python是著名的“龟叔”Guido van Rossum（荷兰人）在1989年圣诞节期间，为了打发无聊的圣诞节而编写的一个编程语言。

Python 学习笔记（基础篇）-LMLPHP

(2) 解释型语言

Python是解释型语言，跟 C 这种编译型语言相比：

（1）第一个缺点就是运行速度慢，在执行时会一行一行地翻译成CPU能理解的机器码，这个翻译过程非常耗时，所以很慢。而C程序是运行前直接编译成CPU能执行的机器码，所以非常快。

（2）第二个缺点就是代码不能加密。如果要发布你的Python程序，实际上就是发布源代码，这一点跟C语言不同，C语言不用发布源代码，只需要把编译后的机器码（也就是你在Windows上常见的xxx.exe文件）发布出去。要从机器码反推出C代码是不可能的，所以，凡是编译型的语言，都没有这个问题，而解释型的语言，则必须把源码发布出去。

2、安装

以 MacOS 为例：

方法一： brew install python3

方法二【推荐】：直接安装 Anaconda （内置 python）。详情可看我之前的一篇： Anaconda / Conda 实践

下面都以 python 版本为 3.7 为例展开。

3、运行

# 进入交互命令行 [ ctrl + D / exit() 退出命令行 ]

python

# 命令行运行一个.py文件

python hello.py

4、Python 解释器

（1）CPython

CPython是使用最广的Python解释器。当我们从Python官方网站下载并安装好Python 3.x后，我们就直接获得了一个官方版本的解释器：CPython。这个解释器是用C语言开发的，所以叫CPython。在命令行下运行python就是启动CPython解释器。

（2）IPython

IPython是基于CPython之上的一个交互式解释器，也就是说，IPython8只是在交互方式上有所增强，但是执行Python代码的功能和CPython是完全一样的。好比很多国产浏览器虽然外观不同，但内核其实都是调用了IE。

比方说CPython用>>>作为提示符，而IPython用In [序号]:作为提示符。

5、注意点

Python使用4个空格的缩进来组织代码块。

Python 没有 ===全等符，只有==双等符。

Python 没有 ++自增符和 --自减符。

if 和 function 语句都不能为空，若真想为空，需要加上pass。

二、数据类型

1、布尔值

在 Python2 中是没有布尔型的，它用数字 0 表示 False，用 1 表示 True。到 Python3 中，把 True 和 False 定义成关键字了，但它们的值还是 1 和 0，它们可以和数字相加。

print(False+True+True)     # 2

支持 and、or和not运算。

print(5 > 3 and 3 > 1)     # True

2、字符串

# 单引号

print('abc')

# 双引号

print("abc")

# 多行内容 (会保留格式)

print('''line1

                line2

line3''')

# 转义

print('I\'m ok.')        # I'm ok.

# 拼接(用 + 号)

print("123" + "123")    # 123123

# 查找(-1表示找不到)

'product_name'.find("product")    # 0

3、None - 空值

Python中None是唯一的。即：print(None == None) 为 True。

4、数字

没啥好说的，需注意除法：

# / 普通除法，不管如何，结果都是浮点数

9 / 3        # 3.0

# // 地板除法，只有包含了浮点数，结果才是浮点数

10 // 3     # 3

10.0 // 3     # 3.0

5、无限

float("inf")   # 正无穷

float("-inf")  # 负无穷

6、NaN（Not a Number）

float("nan")

math.isnan(float("nan"))    # True

7、list 和 tuple

（1）list

# 建

classmates = ['Michael', 'Bob', 'Tracy']

classmates = []                    # []

classmates = list()                # []

classmates = list(range(5))     # [0, 1, 2, 3, 4]

# 属性

len(classmates)        # 3

# 取(会抛错)

classmates[0]        # Michael

classmates[-1]        # Tracy

# 取(不会抛错)

# 暂不支持，建议用 try...catch 来设置默认值

try:

    a = classmates[4]

except IndexError:

    a = 'default'

# 存

classmates.append('Adam')

classmates.insert(1, 'Jack')

# 删

classmates.pop()

classmates.pop(1)

# ------

# 查 (会抛错)

try:

    thing_index = [1, 2, 3].index(12)

except ValueError:

    thing_index = -1

# 查 (不会抛错)

4 in [1, 2, 3]       # False

# ------

# 遍历

for item in classmates:

    print(item)

# 遍历 (带索引值)

for i, item in enumerate(classmates):

    print(item,i)

坑：关于字符串

# 字符串其实是一个 list

a = "asd"

a[2]        # 'd'

特性：列表生成式

列表生成式即List Comprehensions，是Python内置的非常简单却强大的可以用来创建list的生成式。

# 要生成 list [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 可以用 list(range(1, 11))

[x * x for x in range(1, 11) if x % 2 == 0]

# [4, 16, 36, 64, 100]

（2）tuple - list 的不可变版本

# 建

classmates = ('Michael', 'Bob', 'Tracy')

classmates = ()                    # ()

classmates = tuple()            # ()

classmates = tuple(range(5))     # (0, 1, 2, 3, 4)

classmates = tuple()

# 剩下操作同上

坑：关于逗号

# 可以用逗号快速创建

classmates = ('Michael', 'Bob', 'Tracy')

classmates = 'Michael', 'Bob', 'Tracy'

# 如果只有一个字符串元素，记得末位加逗号！因为字符串是个 list

classmates = ('Michael')

# classmates[0] : M

classmates = ('Michael',)

# classmates[0] : Michael

classmates = 'Michael',

# classmates[0] : Michael

# 注意：如果只有一个字符串元素，打印这个tuple是这样的：('Michael',)

（3）切片

# 同时适用于 list 和 tuple

a = list(range(10))

a[1:5]

# [1, 2, 3, 4]

a[:]

# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

8、dict 和 set

（1）dict

# 建

d = {'Michael': 95, 'Bob': 75, 'Tracy': 85}

# 取(会抛错)

d = {'Michael': 95, 'Bob': 75, 'Tracy': 85}

try:

    thing_index = d["Colin"]

except KeyError:

    thing_index = None

# 取(不会抛错)

d.get('Col    in')        # None

d.get('Colin',-1)        # -1

# 存

d['Colin'] = 123

# 删

del d['Michael']

d.pop('Michael')

# ------

### 查 (不会抛错)

# 查 key 是否存在

'Colin' in {'Michael': 95, 'Bob': 75, 'Tracy': 85}           # False

# 查 value 是否存在

95 in {'Michael': 95, 'Bob': 75, 'Tracy': 85}.values()    # True

# ------

# 提取

{'Michael': 95, 'Bob': 75, 'Tracy': 85}.keys()        # 返回 list

{'Michael': 95, 'Bob': 75, 'Tracy': 85}.values()     # 返回 list

# ------

# 遍历

for item in d:

    print(item)

# 遍历 (带索引值)

for i, item in enumerate(d):

    print(item,i,d[item])

（2）set

# 建

s = set([1, 2, 3])        # {1, 2, 3}

# 存

s.add(4)

# 删

s.remove(4)

### 查 (不会抛错)

22 in s            # False

# ------

### 操作

s1 = set([1, 2, 3])

s2 = set([2, 3, 4])

# 交集

s1 & s2            # {2, 3}

# 并集

s1 | s2         # {1, 2, 3, 4}

# ------

# 遍历

for item in s:

    print(item)

9、数据类型转换

>>> int('123')

123

>>> int(12.34)

12

>>> float('12.34')

12.34

>>> str(1.23)

'1.23'

>>> bool(1)

True

10、校验类型

Python中一切都是对象。

（1）type() - 获取类型

print(type(True))   # <class 'bool'>

print(type("1"))    # <class 'str'>

print(type(None))   # <class 'NoneType'>

print(type(2))  # <class 'int'>

# 注意，NaN 和 无限 都会被视为 float。

print(type(float("inf")))   # <class 'float'>

print(type(float("nan")))   # <class 'float'>

print(type([1,2]))  # <class 'list'>

print(type((1,2)))  # <class 'tuple'>

print(type({'Michael': 95, 'Bob': 75, 'Tracy': 85}))    # <class 'dict'>

print(type(set([1, 2, 3]))) # <class 'set'>

def a():

    pass

print(type(a))  # <class 'function'>

class b:

    pass

# 注意：class 会被视为 type。

print(type(b))  # <class 'type'>

（2）isinstance() - 判断类型

print(isinstance(True,bool))   # True

print(isinstance("1",str))    # True

# 这里不推荐用 isinstance 或者 == ，原因请看：https://stackoverflow.com/questions/3257919/what-is-the-difference-between-is-none-and-none

print(None is None)   # True

print(isinstance(2,int))  # True

import math

print(math.isinf(float("inf"))) # True

print(math.isnan(float("nan"))) # True

print(isinstance([1,2],list))  # True

print(isinstance((1,2),tuple))  # True

print(isinstance({'Michael': 95, 'Bob': 75, 'Tracy': 85},dict))    # True

print(isinstance(set([1, 2, 3]),set)) # True

import types

def a():

    pass

print(isinstance(a,types.FunctionType))  # True

# 关于 对象 的判断，详看下面关于 对象章节 的介绍

isinstance 可以一次性判断多个类型：isinstance(x, (int, float))

11、判断相等

== 比较操作符，用于判断 value(值) 是否相等

is 同一性运算符，用于判断内存地址是否相等

12、其它

（1）collections 模块

collections是Python内建的一个集合模块，提供了许多有用的集合类。

如队列和栈，更好用的封装过的 dict 等。

（2）待写

生成器(generator) & 迭代器(Iterator)

三、函数

1、参数定义类型

（1）位置参数 & 默认参数 & 返回值

def my_abs(x, y=0, z=1):

    if not isinstance(x, (int, float)):

        raise TypeError('bad operand type')

    # 返回单个值

    # if x >= 0:

    #     return x

    # else:

    #     return -x

    # 返回 tuple

    return x, y, z   # 等于 return (x, y)

    # （哪怕函数里没有 return ，也会默认 return

    return  # 等于 return None

print(my_abs(-3))

print(my_abs(-3,2))

print(my_abs(-3,z=2)) # 跳过 y 而给 z 赋值，可这样写

注意：默认参数需要出现在所有位置参数的右边。

坑：函数对引用对象都是引用传递，但如果默认参数也是引用类型，那么要小心每次调用函数，它都是同一个：

def add_end(L=[]):

    L.append('END')

    return L

# 建议改造成下面这样：

def add_end(L=None):

    if L is None:

        L = []

    L.append('END')

    return L

（2）可变参数 & 关键字参数



# 可变参数 —— 传list或tuple，接受tuple

def my_fun1(x, *y):

    return x, y

print(my_fun1(0))   # (0, ())

print(my_fun1(0, 1, 2, 3, 4, 5))    # (0, (1, 2, 3, 4, 5))

print(my_fun1(0, *(1, 2, 3, 4, 5)))  # (0, (1, 2, 3, 4, 5))

# 关键字参数 —— 传dict，接受dict

def my_fun2(x, **y):

    return x, y

print(my_fun2(0))   # (0, {})

# (0, {'city': 'Beijing', 'job': 'Engineer'})

print(my_fun2(0, city='Beijing', job='Engineer'))

# (0, {'city': 'Beijing', 'job': 'Engineer'})

print(my_fun2(0, **{'city': 'Beijing', 'job': 'Engineer'}))

注意：可变参数和关键字参数接受后的 list/tuple/dict 虽然都是引用类型，但是却不是引用传递而是值传递（复制了后的值）。

（3）总结

注意：

1、参数定义的顺序必须是：必选参数、默认参数、可变参数、命名关键字参数和关键字参数。

2、虽然可以组合多达5种参数，但不要同时使用太多的组合，否则函数接口的可理解性很差。

2、函数式编程

Python对函数式编程提供部分支持。因此，Python不是纯函数式编程语言。

待写

四、模块与包

1、模块

在Python中，一个.py文件就称之为一个模块（Module）。

下面我们自定义一个自己的模块，并尝试导入并调用。

（1）定义

myModule.py

#!/usr/bin/env python3    # 可以让此文件直接在Unix/Linux/Mac上运行

' it is my module'    # 模块的介绍（规范）

__author__ = 'xjnotxj'    # 模块的作者（规范）

import sys

def test():

    args = sys.argv

    if len(args)==1:

        print('Hello, world!')

    elif len(args)==2:

        print('Hello, %s!' % args[1])    

    else:

        print('Too many arguments!')

def test2():

    print("test2")

test3 = "123"

if __name__=='__main__':

    test()

if __name__=='__main__' 这句话的意思是，如果我们直接运行myModule.py 在别处 import myModule，都会调用 test() 函数。而后者不是我们所希望的，加了这句就可以防止。

（2）引用

test.py

import myModule

myModule.test()

myModule.test2()

print(myModule.test3)

假设 myModule.py 和 test.py 在同一目录下，即：

myModule.py
test.py

当我们运行python test.py 111，打印出：

Hello, 111!

test2

123

还可以 import 模块里的部分，即 test.py 改为：

from myModule import test

test()

注意：模块名不要和系统模块名冲突（包名也不要冲突），最好先查看系统是否已存在该模块，检查方法是在Python交互环境执行import abc，若成功则说明系统存在此模块。

2、包

（1）定义

还以上面的myModule.py为例。但是目录结构发生了变化：

myModule
- __init__.py
- myPackage.py
test.py

这里的 myModule 文件夹，即为 包（Package），他可以按目录来组织模块。

（2）引用

test.py 改为：

import myPackage.myModule

myPackage.myModule.test()

# 或者使用 as 重命名

import myPackage.myModule as myModule

myModule.test()

注意：每一个包目录下面都会有一个__init__.py的文件，这个文件是必须存在的，否则，Python就把这个目录当成普通目录，而不是一个包。__init__.py可以是空文件，也可以有Python代码，因为__init__.py本身就是一个模块，而它的模块名就是 myModule。

3、安装第三方模块

（1）pip / pip3

（2）Anaconda

推荐。详情可看我之前的一篇： Anaconda / Conda 实践

（3）模块搜索路径

默认情况下，Python解释器会搜索：

可以在sys模块的path变量中查看这些目录路径：

>>> import sys

>>> print(sys.path)

如果我们自定义了模块/包，需要被引用，但不在上面所说的默认搜索范围内，那么有两种方法：

1、临时修改（在运行时修改，运行结束后失效。）

直接修改sys.path：

>>> import sys

>>> sys.path.append('/Users/michael/my_py_scripts')

2、长期修改

设置环境变量 PYTHONPATH。

4、依赖管理

（1）生成依赖文件

# 生成所有的依赖

pip freeze > requirements.txt

# 生成当前目录下的依赖（需提前安装：pip install pipreqs）

pipreqs ./

执行后，会在当前目录生成 requirements.txt 依赖文件，内容长这样：

alembic==1.0.10

appnope==0.1.0

astroid==2.2.5

……

（2）安装依赖文件

pip install -r requirements.txt

5、环境管理

推荐用 Anaconda 自己的环境管理功能代替上面的依赖管理或 virtualenv。详情可看我之前的一篇： Anaconda / Conda 实践

五、面向对象

1、基本概念

下面的 demo 涉及类、实例化对象、类变量/实例变量、公有/私有变量。

class Student(object):

    # 类变量

    class_name = "七年二班"

    # 类方法

    @classmethod

    def class_foo(cls, x):

        print("executing class_foo(%s, %s)" % (cls, x))

    # 静态方法

    @staticmethod

    def static_foo(x):

        print("executing static_foo(%s)" % x)

    def __init__(self, name, score):

        # 公有 实例变量(可以通过.访问)

        self.name = name

        # 私有 实例变量，可被继承(只是一种规范，还是可以通过.访问)

        self._score = score

        # 私有 实例变量，不可被继承(不可以通过.访问，但可以通过 hack 方法：例如 _Student__max_score))

        self.__max_score = 100

    # 使 score 可以 get

    @property

    def score(self):

        return self._score

    # 使 score 之可以 set

    @score.setter

    def score(self, value):

        if not isinstance(value, int):

            raise ValueError("score must be an integer!")

        if value < 0 or value > 100:

            raise ValueError("score must between 0 ~ 100!")

        self._score = value

    def print_score(self):

        print("%s: %s(满分%s)" % (self.name, self._score, self.__max_score))

jay = Student("周杰伦", 59)

quinlivan = Student("昆凌", 59)

# start do something……

print(Student.class_name)

jay.class_foo("aaa")

jay.static_foo("bbb")

jay.print_score()

quinlivan.print_score()

quinlivan.score = 87

quinlivan.print_score()

输出：

七年二班

executing class_foo(<class '__main__.Student'>, aaa)

executing static_foo(bbb)

周杰伦: 59(满分100)

昆凌: 59(满分100)

昆凌: 87(满分100)

（1）操作变量的辅助方法 —— hasattr()、getattr()、setattr()

# 判断是否存在 变量的方法 —— hasattr()

print(hasattr(jay,"score"))

# get 变量的方法 —— getattr()

print(getattr(jay,"score"))

print(getattr(jay,"score",99))    # 默认值

# set 变量的方法 —— setattr()

jay.score = 99

setattr(jay,"score",99)

print(getattr(jay,"score"))

（2）操作方法

from types import MethodType

# 定义一个函数作为实例方法

def say_hello(self):

    print("大家好，我是"+self._name)

# 给实例绑定一个方法

quinlivan.say_hello = MethodType(say_hello, quinlivan)

quinlivan.say_hello()

# 给类绑定一个方法

Student.say_hello = say_hello

jay.say_hello()

2、继承和多态

（1）基本概念



class Animal(object):

    def run(self):

        print('Animal is running...')

class Dog(Animal):

    def run(self):

        print('Dog is running...')

class Cat(Animal):

    def run(self):

        print('Cat is running...')

class Duck(Animal):

     pass

dog = Dog()

dog.run()

cat = Cat()

cat.run()

duck = Duck()

duck.run()

多态：调用方只管调用，不管细节

解释：因为你知道这个子类是Animal类型，所以你只管调用 run() ，具体 run() 的实现细节，由它run()的具体实现决定，哪怕没有，也可以往上追溯父类的实现。

（2）多重继承

用途：我们不需要复杂而庞大的继承链，只要选择组合不同的类的功能，就可以快速构造出所需的子类。这种设计通常称之为MixIn。

如下面的例子，新建一个狗，属于动物，同时引入宠物的功用。



class Animal(object):

    def run(self):

        print('Animal is running...')

class Pet(object):

    def stay(self):

        print('I am a pet, I can stay at home....')

class Dog(Animal,Pet):

    pass

dog = Dog()

dog.run()

dog.stay()

3、常用方法

（1）判断变量属于什么类型 —— `isinstance()`

上面的 数据类型 章节里有提到 isinstance()，但对于对象/类的使用，在这里介绍：

# 还以上面的代码【2、继承和多态】为例：

class Animal(object):

    def run(self):

        print('Animal is running...')

class Dog(Animal):

    def run(self):

        print('Dog is running...')

class Cat(Animal):

    def run(self):

        print('Cat is running...')

class Duck(Animal):

     pass

dog = Dog()

dog.run()

cat = Cat()

cat.run()

duck = Duck()

duck.run()

# 可以匹配自己的类

print(isinstance(dog, Dog))             # True

# 可以匹配自己的父类

print(isinstance(dog, Animal))          # True

# 可以同时匹配多个值（有一个满足即为 true）

print(isinstance(dog, (Dog,Animal)))    # True

（2）获取一个变量的所有属性和方法

# 还以上面的代码【1、基本概念】为例：

class Student(object):

    # 类变量

    class_name = '七年二班'

    def __init__(self, name, score):

        # 公开 实例变量

        self.name = name

        # 私有 实例变量

        self.__score = score

    def get_score(self):

        return self.__score

    def set_score(self, score):

        if 0 <= score <= 100:

            self.__score = score

        else:

            raise ValueError('bad score')

    def print_score(self):

        print('%s: %s' % (self.name, self.__score))

jay = Student('周杰伦', 59)

print(dir('ABC'))

print(dir(Student))

print(dir(jay))

输出如下：

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith','expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'class_name', 'name', 'print_score', 'score']

['_Student__max_score', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__','__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_name', '_score', 'class_name', 'name', 'print_score', 'score']

4、定制类

（1）打印对象 —— `str()`、 `repr()`

改造前：

class Student(object):

    pass

print(Student())    # <__main__.Student object at 0x1041985c0>

改造后：

class Student(object):

    def __str__(self):

        return 'Student object'

    __repr__ = __str__

student = Student()

print(student)    # __str__负责打印的值：Student object

student    #__repr__负责调试的值：Student object

（2）可调用对象 —— `call()`

class Student(object):

    def __init__(self, name):

        self.name = name

    def __call__(self):

        print('My name is %s.' % self.name)

s = Student('Michael')

s() # My name is Michael.

延伸：还记得 python 一切皆对象吗？其实 python 的函数就是拥有 __call__() 的对象。所以，我们可以用callable(var) ，来判断一个变量是不是函数。

（3）其他

__len__() 作用于len()

__iter__() 作用于for ... in循环

__getitem__() 、__setitem__()、__delitem__()作用于 list 下标（如f[3]）

5、枚举类

枚举类的好处：

不可修改
加了 @unique 可以防止重复项
形成逻辑共同体
增加代码可读性

枚举类的每个枚举项，分 key 和 value。

from enum import Enum, unique

@unique

class Weekday(Enum):

    Sun = 0

    Mon = 1

    Tue = 2

    Wed = 3

    Thu = 4

    Fri = 5

    Sat = 6

# 取枚举项的 key

print(Weekday.Tue)  # Weekday.Tue

print(Weekday['Tue'])  # Weekday.Tue

print(Weekday(1))  # Weekday.Mon

# 取枚举项的 value

print(Weekday.Tue.value)  # 2

print(type(Weekday))  # <class 'enum.EnumMeta'>

print(type(Weekday.Tue))  # <enum 'Weekday'>

# 判断相等

print(Weekday.Mon == Weekday.Mon)  # True

print(Weekday.Mon.value == Weekday.Thu.value)  # False

# 判断 key，value 是否在枚举类中

print("Thu" in Weekday.__members__) # True

print(6 in Weekday._value2member_map_) # True

# 遍历枚举类

for name, member in Weekday.__members__.items():

    print(name, '=>', member, member.value)

# Sun => Weekday.Sun 0

# Mon => Weekday.Mon 1

# Tue => Weekday.Tue 2

# Wed => Weekday.Wed 3

# Thu => Weekday.Thu 4

# Fri => Weekday.Fri 5

# Sat => Weekday.Sat 6

6、元类

涉及：type()、metaclass 以及实现一个简单的 ORM。

待写

六、错误 & 调试 & 测试

1、错误处理

# 自定义异常类

class NewZeroDivisionError(ZeroDivisionError):

    pass

try:

    print('try...')

    r = 10 / int('0') # 会被 ZeroDivisionError 捕获

    # raise NewZeroDivisionError('NewZeroDivisionError is throw') # 会被 ZeroDivisionError 捕获

    print('result:', r)

except ValueError as e:

    print('ValueError:', e)

except ZeroDivisionError as e:

    print('ZeroDivisionError:', e)

except BaseException as e:

    print('BaseException:', e)

    raise   # 原样抛出

else:

    print('no error!')

finally:

    print('finally...')

print('END')

Python 的错误是 class 类型，都继承自 BaseException，所以 except BaseException 会把所有异常类“一网打尽”。
只有在必要的时候才定义我们自己的错误类型（如上面代码里的 NewZeroDivisionError ）。尽量使用 Python 内置的错误类型（比如ValueError，TypeError）。更多错误类型请查看：https://docs.python.org/3/library/exceptions.html#exception-hierarchy

2、调试

（1）print

缺点：代码不能删除

（2）assert

a = 1

b = 2

c = a + b

assert c == 3, 'c is calculation error!'

注意：如果断言失败，则抛 AssertionError 异常。

特点：代码虽然跟 print 一样也不能删除，但是可以忽略执行，即用 python -O test.py

（3）logging

推荐。用 python 自带的 logging 模块。详情可参考我之前的一篇： Node.js / Python 日志

（4）vscode

推荐。结合 vscode 做调试更方便。

3、单元测试 & 文档测试

待写

七、IO 编程

1、文件读写

# read / read(size)

with open('./log.txt', 'r') as f:

    print(f.read())

    # print(f.read(size)) # size 表示一次读取的最大字节数

# readlines

with open('./log.txt', 'r') as f:

    for line in f.readlines():

        print(line.strip())

# ------------

# write

with open('crawler/demo/tutorial/spiders/log.txt', 'w') as f:

    f.write('1\n2\n3\n')

    # 两者相等

    f.write('1\n')

    f.write('2\n')

    f.write('3\n')

# writelines - 参数是序列

with open('crawler/demo/tutorial/spiders/log.txt', 'w') as f:

    f.write('1')

    f.write('2')

    f.write('3')

    # 两者相等

    f.writelines(['1','2','3'])

1、“r” 可以换成：

r - 读
rb - 读二进制
w - 写
wb - 写二进制
a - 写追加

2、open() 的默认编码是 utf-8，或者可以手动指定：open('./log.txt', 'r', encoding='gbk')

2、StringIO 和 BytesIO

StringIO 和 BytesIO 是用读写文件一致的接口在内存中操作str和bytes的方法（而无需从存储里读取，即无 open 操作）。

应用：比如你要从某个接口下载一个文件，只做解析工作获取你想要的信息就够过了，无需保存。那麻烦的是，下载后还要手动删掉它。现在则没有那么麻烦，直接内存中就搞定，同时代码也不用修改，还用原来的读写文件的接口就好。

# 用 StringIO 写 string

from io import StringIO

f = StringIO()

f.write('hello')

f.write(' ')

f.write('world!')

print(f.getvalue())

# 用 StringIO 读 string

from io import StringIO

f = StringIO('Hello!\nHi!\nGoodbye!')

for line in f.readlines():

    print(line.strip())

# ------

# 用 BytesIO 写 bytes

from io import BytesIO

f = BytesIO()

f.write('中文'.encode('utf-8'))

print(f.getvalue())

# 用 BytesIO 读 bytes

from io import BytesIO

f = BytesIO(b'\xe4\xb8\xad\xe6\x96\x87')

print(f.read().decode('utf-8'))

3、操作文件和目录 & 获取 OS 信息

os 模块可获取操作系统信息、环境变量、操作文件和目录等。

第三方 psutil 模块还可以获取 CPU、内存、磁盘、网络、进程等信息。

4、序列化 (Serialization)

我们把变量从内存中变成可存储或传输的过程称之为序列化，在Python中叫pickling，在其他语言中也被称之为serialization，marshalling，flattening等等，都是一个意思。

（1）pickle

from io import BytesIO

import pickle

d = dict(name='Bob', age=20, score=88)

picklingData = pickle.dumps(d)

# print(picklingData) # picklingData 已经变成二进制。可以把 picklingData 持久化或传输。

f = BytesIO(picklingData)

d_new = pickle.load(f) # 注意 pickle.load 的参数是 file （即 open() 返回值）

print(d_new)

缺点：Pickle的问题和所有其他编程语言特有的序列化问题一样，就是它只能用于Python，并且可能不同版本的Python彼此都不兼容。

（2）JSON

import json

d = dict(name='Bob', age=20, score=88)

json_str = json.dumps(d)

print(json_str)

# {"name": "Bob", "age": 20, "score": 88}

print(json.loads(json_str))

# {"name": "Bob", "age": 20, "score": 88}

优点：JSON 跟 Pickle 比的好处就是跨平台。推荐使用。

（3）XML

待写

拓展：with 语句

上面的 文件读写 的代码中，有个 with 语句，他的作用是 open() 后自动调用 f.close() 这个函数，避免手写的繁琐。具体如下：

with open('./log.txt', 'r') as f:

    print(f.read())

# 两者相等

try:

    f = open('./log.txt', 'r')

    print(f.read())

finally:

    if f:

        f.close()

并不是只有 open() 函数返回的f对象才能使用with语句。实际上，任何对象，只要正确实现了上下文管理，即 class 内部有 __enter__和__exit__ 函数，就可以用于with语句。例如：

class Query(object):

    def __init__(self, name):

        self.name = name

    def __enter__(self):

        print('Begin')

        return self

    def __exit__(self, exc_type, exc_value, traceback):

        if exc_type:

            print('Error')

        else:

            print('End')

    def query(self):

        print('Query info about %s...' % self.name)

with Query('Bob') as q:

    q.query()

# 打印出：

Begin

Query info about Bob...

End

Python

Python 学习笔记（基础篇）

一、基础

1、简介

(1) 历史

(2) 解释型语言

2、安装

3、运行

4、Python 解释器

（1）CPython

（2）IPython

5、注意点

二、数据类型

1、布尔值

2、字符串

3、None - 空值

4、数字

5、无限

6、NaN（Not a Number）

7、list 和 tuple

（1）list

（2）tuple - list 的不可变版本

（3）切片

8、dict 和 set

（1）dict

（2）set

9、数据类型转换

10、校验类型

（1）type() - 获取类型

（2）isinstance() - 判断类型

11、判断相等

12、其它

（1）collections 模块

（2）待写

三、函数

1、参数定义类型

（1）位置参数 & 默认参数 & 返回值

（2）可变参数 & 关键字参数

（3）总结

2、函数式编程

四、模块与包

1、模块

（1）定义

（2）引用

2、包

（1）定义

（2）引用

3、安装第三方模块

（1）pip / pip3

（2）Anaconda

（3）模块搜索路径

4、依赖管理

（1）生成依赖文件

（2）安装依赖文件

5、环境管理

五、面向对象

1、基本概念

（1）操作变量的辅助方法 —— hasattr()、getattr()、setattr()

（2）操作方法

2、继承和多态

（1）基本概念

（2）多重继承

3、常用方法

（1）判断变量属于什么类型 —— isinstance()

（2）获取一个变量的所有属性和方法

4、定制类

（1）打印对象 —— __str__()、 __repr__()

（2）可调用对象 —— __call__()

（3）其他

5、枚举类

6、元类

六、错误 & 调试 & 测试

1、错误处理

2、调试

（1）print

（2）assert

（3）logging

（4）vscode

3、单元测试 & 文档测试

七、IO 编程

（1）判断变量属于什么类型 —— `isinstance()`

（1）打印对象 —— `str()`、 `repr()`

（2）可调用对象 —— `call()`