在院的课程安排下在大一学完 Python 基础语法后,大二开始了 Python 数据分析的课程学习。
这次的练习题来源于 泰迪杯数据分析比赛2018年B题数睿思官网 : https://www.tipdm.org/bdrace/index.html 个人觉得挺有参考价值的。
数据素材:附件1
需要使用的库:
1 2 3 4
| 1.pandas (pip install pandas) 2.datetime (pip install datetime) 3.numpy (pip install numpy) 4.matplotlib (pip install matplotlib)
|
数据处理:
题 1 根据附件中的数据,提取每台售货机 A B C D E 对应的销售数据,分别保存为
“task-1A.csv”,”task-1B.csv”,”task-1C.csv”,”task-1D.csv”,”task-1E.csv”
实例代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
| import pandas as pd from datetime import datetime import numpy as np import matplotlib.pyplot as plt
def task1(data1, level): data = pd.read_csv(r'附件1.csv', encoding='gb2312') data.支付时间 = pd.to_datetime(data.支付时间, format='%Y/%m/%d') data.drop(columns='设备ID', axis=1, inplace=True) data = data.loc[data['地点'] == level, :] data.drop(columns='地点', axis=1, inplace=True) data.to_csv(r'D:/Study/Programming/Practice/Python/day50/' + data1, encoding='gbk') task1('task1-A.csv', 'A') task1('task1-B.csv', 'B') task1('task1-C.csv', 'C') task1('task1-D.csv', 'D') task1('task1-E.csv', 'E')
|

题 2 根据每台售货机中的数据 2017 年 6月销量前 5 的商品进行分析并绘制柱状图
思路:先提取6月数据,随后对商品进行计算得出销售量,再通过排序对销售量前五的数据进行提取。然后绘制柱状图。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| data = pd.read_csv(r'附件1.csv', encoding='gbk') data.支付时间 = pd.to_datetime(data.支付时间, format='%Y/%m/%d') data['月'] = data['支付时间'].dt.month data_1 = data.loc[data['月'] == 6] dalei = data_1['商品'].unique().tolist() datasum = [] datasem = [] for i in dalei: data_x = data_1[data_1['商品'] == i]['实际金额'].sum() data_t = data_1[data_1['商品'] == i]['商品'].size datasum.append(data_x) datasem.append(data_t) task1_2 = pd.DataFrame({'商品': dalei, '总实际金额': datasum,'销售量': datasem}) task1_2.sort_values(by='销售量', ascending=False,inplace=True) taen = task1_2.head() taen plt.rcParams['font.sans-serif'] = ['SimHei'] plt.style.use('ggplot') plt.bar(x=range(taen.shape[0]), tick_label=taen.商品, height=taen.销售量, color='blue') plt.ylabel('销售量') plt.title('6月份商品销售量前五排名') for x, y in enumerate(taen.销售量): plt.text(x, y+0.5, '%s' % round(y), ha='center') plt.show()
|

题 3 根据每台售货机每月交易额进行数据分析
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
| def task3(data1): data_2 = pd.read_csv( r'D:/Study/Programming/Practice/Python/day50/'+data1, encoding='gbk') data_2.支付时间 = pd.to_datetime(data_2.支付时间, format='%Y/%m/%d') data_2['月'] = data_2['支付时间'].dt.month date = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31] a = 0 data_vm = [] data_vs = [] data_mone = [] data_sum = [] while a < 12: a += 1 data_t = data_2.loc[data_2['月'] == a] data_meny = data_t['实际金额'].sum() data_xs = data_t['商品'].size data_vmeny = int(data_meny)/int(data_xs) data_vxs = int(data_xs)/int(date[a-1]) data_vm.append(data_vmeny) data_vs.append(data_vxs) data_mone.append(a) data_sum.append(data_meny) task = pd.DataFrame( {'月份': data_mone, '每月的每单平均交易额': data_vm, '日均订单量': data_vs}) print(task) return data_sum
sum_A = task3('task1-A.csv') sum_B = task3('task1-B.csv') sum_C = task3('task1-C.csv') sum_D = task3('task1-D.csv') sum_E = task3('task1-E.csv')
|
售货机 A
编号 |
月份 |
每月的每单平均交易额 |
日均订单量 |
0 |
1 |
4.504478 |
10.806452 |
1 |
2 |
3.859649 |
4.071429 |
2 |
3 |
3.584314 |
8.225806 |
3 |
4 |
4.035794 |
14.900000 |
4 |
5 |
4.477513 |
24.387097 |
5 |
6 |
4.047334 |
55.633333 |
6 |
7 |
4.096639 |
15.354839 |
7 |
8 |
3.357357 |
21.483871 |
8 |
9 |
4.306731 |
34.666667 |
9 |
10 |
4.020447 |
50.483871 |
10 |
11 |
4.471552 |
38.666667 |
11 |
12 |
3.787818 |
64.612903 |
售货机 B
编号 |
月份 |
每月的每单平均交易额 |
日均订单量 |
0 |
1 |
3.751366 |
11.806452 |
1 |
2 |
3.254054 |
6.607143 |
2 |
3 |
3.611321 |
8.548387 |
3 |
4 |
4.074627 |
20.100000 |
4 |
5 |
4.235903 |
28.032258 |
5 |
6 |
4.067888 |
61.866667 |
6 |
7 |
4.400000 |
11.129032 |
7 |
8 |
3.584098 |
31.645161 |
8 |
9 |
4.130086 |
58.166667 |
9 |
10 |
4.112043 |
65.354839 |
10 |
11 |
4.268341 |
67.700000 |
11 |
12 |
3.666968 |
71.290323 |
售货机 C
编号 |
月份 |
每月的每单平均交易额 |
日均订单量 |
0 |
1 |
4.327177 |
12.225806 |
1 |
2 |
3.826087 |
7.392857 |
2 |
3 |
3.768061 |
8.483871 |
3 |
4 |
4.403270 |
24.466667 |
4 |
5 |
4.726236 |
25.451613 |
5 |
6 |
4.501594 |
62.733333 |
6 |
7 |
3.988220 |
24.645161 |
7 |
8 |
3.913423 |
40.612903 |
8 |
9 |
4.427294 |
55.933333 |
9 |
10 |
4.273014 |
71.483871 |
10 |
11 |
4.352033 |
64.766667 |
11 |
12 |
3.942833 |
76.741935 |
售货机 D
编号 |
月份 |
每月的每单平均交易额 |
日均订单量 |
0 |
1 |
3.691120 |
8.354839 |
1 |
2 |
3.085106 |
5.035714 |
2 |
3 |
4.302083 |
6.193548 |
3 |
4 |
3.790068 |
14.766667 |
4 |
5 |
4.241135 |
18.193548 |
5 |
6 |
4.025962 |
34.666667 |
6 |
7 |
4.227129 |
10.225806 |
7 |
8 |
3.316084 |
23.064516 |
8 |
9 |
3.899288 |
32.766667 |
9 |
10 |
3.883642 |
38.258065 |
10 |
11 |
3.861983 |
40.333333 |
11 |
12 |
3.572459 |
53.645161 |
售货机 E
编号 |
月份 |
每月的每单平均交易额 |
日均订单量 |
0 |
1 |
4.677966 |
11.419355 |
1 |
2 |
3.635659 |
9.214286 |
2 |
3 |
4.305714 |
11.290323 |
3 |
4 |
4.159777 |
29.833333 |
4 |
5 |
4.410991 |
41.677419 |
5 |
6 |
3.817586 |
86.433333 |
6 |
7 |
3.918819 |
26.225806 |
7 |
8 |
3.804188 |
57.000000 |
8 |
9 |
4.125302 |
137.800000 |
9 |
10 |
3.675909 |
89.580645 |
10 |
11 |
4.283068 |
167.333333 |
11 |
12 |
4.168819 |
104.903226 |
题 4 根据上面的数据绘制折线图
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| mone = [] a = 0 while a < 12: a += 1 mone.append(a) plt.plot( sum_A, linestyle='-', linewidth=2, label='A') plt.plot( sum_B, linestyle='-', linewidth=2, label='B') plt.plot( sum_C, linestyle='-', linewidth=2, label='C') plt.plot( sum_D, linestyle='-', linewidth=2, label='D') plt.plot( sum_E, linestyle='-', linewidth=2, label='E') plt.ylabel('交易额') plt.ylabel('月份') plt.title('每月交易额折线图') plt.rcParams['font.sans-serif']='SimHei' plt.legend() plt.show()
|
