引言

本篇是之前有一个需求,需要用python来画箱型图,但要求很多,所以我也不断再版,今天突然想起来这个东西可以总结一下,正好马上得思考下一步做啥了,有足够的空闲时间,所以准备把一些基础概念再好好复习一遍。

箱型图原理

关于原理,这里推荐两篇本站写得比较好的:

Matplotlib - 箱线图、箱型图 boxplot () 所有用法详解

Python 箱型图的绘制并提取特征值

我这里也是根据这两篇作为参考,箱型图的介绍如第二篇中画的那张原理图一样:

python 绘制箱型图一些技巧-LMLPHP

而如果用python来绘制箱型图,具体的源码字段为:

# Autogenerated by boilerplate.py.  Do not edit as changes will be lost.
@_copy_docstring_and_deprecators(Axes.boxplot)
def boxplot(
        x, notch=None, sym=None, vert=None, whis=None,
        positions=None, widths=None, patch_artist=None,
        bootstrap=None, usermedians=None, conf_intervals=None,
        meanline=None, showmeans=None, showcaps=None, showbox=None,
        showfliers=None, boxprops=None, labels=None, flierprops=None,
        medianprops=None, meanprops=None, capprops=None,
        whiskerprops=None, manage_ticks=True, autorange=False,
        zorder=None, capwidths=None, *, data=None):
    return gca().boxplot(
        x, notch=notch, sym=sym, vert=vert, whis=whis,
        positions=positions, widths=widths, patch_artist=patch_artist,
        bootstrap=bootstrap, usermedians=usermedians,
        conf_intervals=conf_intervals, meanline=meanline,
        showmeans=showmeans, showcaps=showcaps, showbox=showbox,
        showfliers=showfliers, boxprops=boxprops, labels=labels,
        flierprops=flierprops, medianprops=medianprops,
        meanprops=meanprops, capprops=capprops,
        whiskerprops=whiskerprops, manage_ticks=manage_ticks,
        autorange=autorange, zorder=zorder, capwidths=capwidths,
        **({"data": data} if data is not None else {}))

(引用自:https://github.com/matplotlib/matplotlib/blob/v3.7.1/lib/matplotlib/pyplot.py#L2473-L2494)

而根据上述两篇中的解释,更改了一些介绍为:

那下面直接进入实战阶段。

箱型图的绘制

这里直接给出一个简版,因为我的点是从无人机视频流中的人提取出来的,所以就省略前面的细节,直接给出一个简版,首先是提取行人平均行动轨迹:

def throw_time(array,start_x,end_x,y):
    indexs = []
    index = 1
    person_throw_time = []
    for i in range(max(array[:,1])):
        if i == 0:
            continue
        each_person_data = array[array[:,1] == i]

        each_person_data = each_person_data[each_person_data[:,2]>start_x]
        each_person_data = each_person_data[each_person_data[:,2]<end_x]
        each_person_data = each_person_data[each_person_data[:,3]>y]
        if each_person_data.shape[0] < 4:
            continue
        each_person_data[:,2] = each_person_data[:,2] + (each_person_data[:,4] / 2)
        each_person_data[:,3] = each_person_data[:,3] + (each_person_data[:,5] / 2)
        person_time = (each_person_data[-1,0] - each_person_data[0,0])*0.04
        print("person time = ",person_time)
        if person_time < 5:
            continue
        person_throw_time.append(person_time)
        indexs.append(index)
        index = index + 1
    return indexs,person_throw_time

indexs1,person_throw_time1 = throw_time(array1,500,1400,400)      
# print(person_throw_time1)
# [10.36, 9.76, 9.48, 9.56, 6.16, 8.36, 8.6, 8.76, 5.6000000000000005, 9.84, 8.0, 9.88, 8.36, 9.16, 8.0, 8.92, 8.32, 9.68, 7.6000000000000005, 8.24, 7.08, 8.8, 8.6, 9.88, 9.64, 9.36, 10.16, 9.56, 7.4, 9.32, 8.48, 9.88, 9.16, 9.48, 9.64, 8.76]
indexs2,person_throw_time2 = throw_time(array2,500,1400,400)
indexs3,person_throw_time3 = throw_time(array3,450,1300,400)
indexs4,person_throw_time4 = throw_time(array4,600,1400,400)

然后就会得到一系列的散点以及它们的索引坐标,这时候再根据这个去画图:

    matplotlib.rc("font", family='Times New Roman')
    plt.ylabel('time(s)', fontsize=18)        
    
    # # 绘图
    ax = plt.subplot()
    ax.boxplot([person_throw_time1, person_throw_time2, person_throw_time3, person_throw_time4], widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})
    # 设置轴坐标值刻度的标签
    ax.set_xticklabels(['List 1', 'List 2', 'List 3', 'List 4'], fontsize=14)
    plt.show()

python 绘制箱型图一些技巧-LMLPHP

这里我选用的代码创建了一个包含四个框的箱线图,每个框包含来自 [person_throw_time1、person_throw_time2、person_throw_time3、person_throw_time4] 列表之一的数据。方框填充了天蓝色,并在它们周围绘制了黑色边缘,在每个框的平均值处绘制一条红线,以及不显示离群值。

或许大部分人都是做到这就满足需求了,我开始也以为是的,因为上述是基于第一版的一些偏差颜色以及图例错误后的第二版改进,但最终我做到了第6版,并且又重新更改了画图逻辑。

根据原理绘制箱型图

有没有一种情况,需求给出了另一组不知道从哪里得来的数据,希望我产生一个对比图,而它的数据是直接给出了箱型图的5个点,没有做过多的掩饰,我也没有一丝丝防备,就这样出现,直接丢给了我一张Excel表,我。。。然后就整理好了数据,将上述我的[person_throw_time1、person_throw_time2、person_throw_time3、person_throw_time4]转化成dataframe并使用describe找到其对应的5等分点,这里因为真实数据涉及一些安全问题,以简单的数字代替,即:

import pandas as pd

# 假设这是您的四个列表
person_throw_time1 = [1, 2, 3, 4, 5]
person_throw_time2 = [6, 7, 8, 9, 10]
person_throw_time3 = [11, 12, 13, 14, 15]
person_throw_time4 = [16, 17, 18, 19, 20]

# 将四个列表合并成一个dataframe
data = pd.DataFrame({'data1': data1, 'data2': data2, 'data3': data3, 'data4': data4})

# 使用describe方法计算统计信息
statistics = data.describe()

print(statistics)

那么可以得到相对应的数据:

           data1      data2      data3      data4
count   5.000000   5.000000   5.000000   5.000000
mean    3.000000   8.000000  13.000000  18.000000
std     1.581139   1.581139   1.581139   1.581139
min     1.000000   6.000000  11.000000  16.000000
25%     2.000000   7.000000  12.000000  17.000000
50%     3.000000   8.000000  13.000000  18.000000
75%     4.000000   9.000000  14.000000  19.000000
max     5.000000  10.000000  15.000000  20.000000

我这里重新整理了一下,三组实验结果放到一起为(PS:做了一些修改,所以非标准的五分位):

[
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]

但影响不大,这里针对上面数据重新画图为:

import matplotlib.pyplot as plt
import matplotlib

data = [
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]

# 提取数据和标签
labels = [row[0] for row in data]
box_data = [row[1:] for row in data]

# 设置字体
matplotlib.rc("font", family='Times New Roman')

# 绘制箱型图
fig, ax = plt.subplots()
ax.boxplot(box_data, widths=0.4, patch_artist=True, showfliers=False,
           boxprops={'facecolor': 'skyblue', 'linewidth': 0.8, 'edgecolor': 'black'},
           meanline=True, meanprops={'color': 'red', 'linewidth': 3})

# 设置轴标签
ax.set_ylabel('time(s)', fontsize=18)
ax.set_xticklabels(labels, rotation=45, fontsize=12)

plt.show()

python 绘制箱型图一些技巧-LMLPHP

但画完之后还有个问题,就是有些箱型图的上下界限没有了,不知道是什么原因,所以这里还需要把这个重新调试出来,这里就需要用python画箱型图的另一种格式,即将上面的data转化成字典的格式:

data = [
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]


def convert_to_dict(data):
    draw_data = []
    for row in data:
        draw_data.append({
            "whislo": row[1],
            "q1": row[2],
            "med": row[3],
            "q3": row[4],
            "whishi": row[5]
        })
    return draw_data

draw_data = convert_to_dict(data)
print(draw_data)
# [{'whislo': 6.1, 'q1': 9.15, 'med': 9.84, 'q3': 10.44, 'whishi': 11.16}, {'whislo': 7.0, 'q1': 9.47, 'med': 10.05, 'q3': 10.81, 'whishi': 12.02}, {'whislo': 14.16, 'q1': 18.41, 'med': 20.19, 'q3': 21.08, 'whishi': 25.42}, {'whislo': 6.54, 'q1': 8.65, 'med': 9.1, 'q3': 9.39, 'whishi': 10.08}, {'whislo': 7.31, 'q1': 9.1, 'med': 9.5, 'q3': 10.31, 'whishi': 10.86}, {'whislo': 10.32, 'q1': 14.18, 'med': 15.42, 'q3': 18.08, 'whishi': 20.72}, {'whislo': 6.14, 'q1': 8.1, 'med': 8.44, 'q3': 9.1, 'whishi': 9.82}, {'whislo': 6.22, 'q1': 8.3, 'med': 8.7, 'q3': 9.2, 'whishi': 10.12}, {'whislo': 8.72, 'q1': 10.61, 'med': 12.71, 'q3': 16.11, 'whishi': 17.91}, {'whislo': 7.1, 'q1': 8.75, 'med': 8.84, 'q3': 9.1, 'whishi': 10.96}, {'whislo': 7.3, 'q1': 8.85, 'med': 9.04, 'q3': 9.1, 'whishi': 11.19}, {'whislo': 7.6, 'q1': 8.3, 'med': 8.4, 'q3': 9.0, 'whishi': 12.55}]

这里拿到列表转化成的字典后,同时对 ax.boxplot() 变成 ax.bxp(),因为boxplot用于绘制单个箱线图,而bxp是多个,每个箱线图都可以由五个统计值(最小值、下四分位数、中位数、上四分位数和最大值)来描述。所以代码为:


import matplotlib.pyplot as plt
import matplotlib




data = [
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]


def convert_to_dict(data):
    draw_data = []
    for row in data:
        draw_data.append({
            "whislo": row[1],
            "q1": row[2],
            "med": row[3],
            "q3": row[4],
            "whishi": row[5]
        })
    return draw_data

draw_data = convert_to_dict(data)

matplotlib.rc("font", family='Times New Roman')
plt.ylabel('time(s)', fontsize=18)

ax = plt.subplot()
# ax.boxplot([row1_data, row2_data, row3_data, row4_data, row5_data, row6_data, row7_data, row8_data], widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})


ax.bxp(draw_data, widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})
# boxplot
# ax.bxp(draw_data, showfliers=False)

ax.set_xticklabels(['List 1', '2epochs List 1', '3epochs List 1', 'List 2', '2epochs List 2', '3epochs List 2', 'List 3', '2epochs List 3', '3epochs List 3', 'List 4', '2epochs List 4', '3epochs List 4'], fontsize=14)
plt.show()

python 绘制箱型图一些技巧-LMLPHP

05-21 21:38