2020.04.22

Note1

修改pandas列名：

1
2
3

#method1,直接重新命名df1的列名
df1.columns=['a','B','c']  
print('method1:\n',df1)

参考资料：pandas 修改列名

Note2

pygal绘制折线图：

import pygal

line_chart = pygal.HorizontalLine()
line_chart.title = 'Browser usage evolution (in %)'
line_chart.x_labels = map(str, range(2002, 2013))
line_chart.add('Firefox', [None, None,    0, 16.6,   25,   31, 36.4, 45.5, 46.3, 42.8, 37.1])
line_chart.add('Chrome',  [None, None, None, None, None, None,    0,  3.9, 10.8, 23.8, 35.3])
line_chart.add('IE',      [85.8, 84.6, 84.7, 74.5,   66, 58.6, 54.7, 44.8, 36.2, 26.6, 20.1])
line_chart.add('Others',  [14.2, 15.4, 15.3,  8.9,    9, 10.4,  8.9,  5.8,  6.7,  6.8,  7.5])
line_chart.range = [0, 100]
line_chart.render_to_file("line-horizontal-line.svg")

2020.04.23

Note3

有时候在使用ModelArts时会遇到NoteBook无法运行和保存的情况，这是因为NoteBook由于定时已经自动关闭。这种情况下得手动复制代码备份一下。

Note4

使用python中的datetime获取当前的年份、月份、天数、小时数、分钟数、秒数

import datetime

now = datetime.datetime.now()

print('now is ', now)
print('the year of now is', now.year)
print('the month of now is ', now.month)
print('the day of now is ', now.day)
print('the hour of now is ', now.hour)
print('the minute of now is ', now.minute)
print('the second of now is ', now.second)

结果如下：

('now is ', datetime.datetime(2019, 6, 14, 15, 42, 27, 601210))
('the year of now is', 2019)
('the month of now is ', 6)
('the day of now is ', 14)
('the hour of now is ', 15)
('the minute of now is ', 42)

参考资料：python通过datetime获取时间信息

Note5

从字符串创建datetime对象：

1	datetime.strptime(day, format="%Y-%m-%d")

Note6

pandas函数的to_datetime生成的对象并不是datetime.datetime，而是Timestamp，它是从Python标准库的datetime类继承过来的，表示时间轴上的一个时刻，注意区别。

2020.04.24

Note7

要想在customize_service.py中推导模型时使用本地数据集，可以将加载模型相关的其他数据及文件与模型文件上传至OBS同目录下，再在customize_service.py脚本的preprocess和inference方法中实现相关读取和使用逻辑。在脚本中可以通过self.model_path获取模型文件在镜像中的路径（/home/work/predict/model/your_model_name.xxx），对这一路径做裁剪和再拼装，就可以拿到你上次至OBS的其他文件。

因为是和模型放在同一个文件夹并被下载到镜像里，需要通过model_path去获取，而不能使用相对customize_service.py的路径。比如你说的weather.csv，假如它相对你的模型文件的路径是：
obs_model_path
│-model_name.xxxx
│-data
│- weather.csv

即weather.csv在与模型文件处于同级目录的data文件夹下面它的路径可以这样获取：
1
2
3
4
relative_path_to_weather = "data/weather.csv" # 相对模型文件所在目录的路径 
(path_to_model, model_file_name) = os.path.split(self.model_path) 
path_to_weather = os.path.join(path_to_model, relative_path_to_weather) 
# 即/home/work/predict/model/data/weather

参考资料：https://developer.huaweicloud.com/hero/forum.php?mod=viewthread&tid=51715

2020.05.01

Note8

绘制交通流量折线图：

dataset["number"]=dataset["straightFlow"]+dataset["leftFlow"]
t1=dataset[dataset["cross"]=="wuhe_zhangheng"]
t1=t1.groupby(["time"]).sum().reset_index(level=["time"])

xs = list(map(lambda x : datetime.strptime(x, '%Y/%m/%d %H:%M:%S'),t1["time"]))
t1["timeindex"] = [(x.hour*60+x.minute) for x in xs]

fig1=plt.figure(num=1,figsize=(80,8),dpi=80)
ax1=fig1.add_subplot(1,1,1)
ax1.plot(xs,t1["number"])
plt.show()

Note9

matplotlib要在水平X轴上显示日期，直接使用datetime作为横坐标参数。

Note10

使用matplotlib.pyplot.subplots_adjust调整边距。

1	subplots_adjust(left=None, bottom=None, right=None, top=None, wspace=None, hspace=None)

参数
有六个可选参数来控制子图布局。值均为0~1之间。其中left、bottom、right、top围成的区域就是子图的区域。wspace、hspace分别表示子图之间左右、上下的间距。实际的默认值由matplotlibrc文件控制的。

参考资料：https://blog.csdn.net/asty9000/article/details/88881499

Note11

pandas选取特定时间范围内的数据：

1	t1[(datetime(2019,1,12)<=t1["datetime"]) & (t1["datetime"]<datetime(2019,1,13))]

注意事项

多个筛选条件并存时，不能用 and 连接，需要用单个 & 符号。
s_date <= df['trade_date'] <= e_date 等同于 and

参考资料：pandas 按日期范围筛选数据

Note12

matplotlib显示标签，且在左上角：

1 2	ax1.plot(range(288),t1[(datetime(2019, 1, 20) <= t1["datetime"]) & (t1["datetime"] < datetime(2019, 1, 21))]["number"],label="1.20") ax1.legend(loc='upper left')

2020.05.02

Note13

matplotlib绘制水平和垂直参考线：

函数功能：绘制平行于x轴的水平参考线

调用签名：plt.axhline(y=0.0, c="r", ls="--", lw=2)

y：水平参考线的出发点
c：参考线的线条颜色
ls：参考线的线条风格
lw：参考线的线条宽度

axvline()函数同理。

1 2	plt.axhline(y=0.0, c="r", ls="--", lw=2) plt.axvline(x=4.0, c="r", ls="--", lw=2)

参考资料：python可视化—-axhline()函数

2020.05.03

Note14

要将json.dump()返回的字符串解析回字典，要使用json.loads()，而不是json.load()。

2020.05.04

Note15

pandas统计并输出缺失值（NaN）：

1	print(feature.isnull().sum())

Note16

将模型更换为XGBoost，需要相应将配置文件修改为：

1 2	schema_model['model_algorithm'] = "gbtree_regression" schema_model['model_type'] = "XGBoost"

Note17

XGBoost要使用early_stopping_rounds参数则还需在模型参数中配置'eval_metric': 'rmse'，并在训练时传入evals参数，否则会报错。

2020.05.05

Note18

实现线上评分公式，传入数据为一天内5：00到21：00长度为192的预测数据和真实值列表以及预测前一天同时段的真实数据。返回值为加权总分和分类、回归单项得分。

def sigmoid(x):
    # TODO: Implement sigmoid function
    return 1/(1 + np.exp(-x))

def get_grade(pred,real,pre):
    grade_class=0
    grade_regre=0

    w_sum=sum(real)
    w_i=[]
    for i in range(16):
        w_i.append(sum(real[i*12:(i+1)*12])/w_sum)

    # classification
    w_pred=0
    for i in range(16):
        t=0
        f=lambda pred,real,pre: 0 if (pred-pre)*(real-pre)<0 else 1
        for j in range(12):
            t+=100*f(pred[i*12+j],real[i*12+j],pre[i*12+j])
        w_pred+=t*w_i[i]
    grade_class=w_pred/12

    # regression
    w_pred = 0
    for i in range(16):
        t = 0
        for j in range(0, 12):
            t += 100*sigmoid(30/(pow(real[i*12+j]-pred[i*12+j],2)+1e-9))
        w_pred += t * w_i[i]
    grade_regre = w_pred / 12

    return grade_class*0.4+grade_regre*0.6,grade_class,grade_regre

2020.05.06

Note19

在Jupyter NoteBook中显示matplotlib绘制的图像，需要在代码首行（导包前就行）添加：

1	%matplotlib inline

Note20

pandas排序：

In [1]: frame=pd.DataFrame(np.arange(12).reshape((4,3)),columns=['c','a','b'],index=['D','B','C','A'])
 
   c   a   b
D  0   1   2
B  3   4   5
C  6   7   8
A  9  10  11
 
In [2]: frame.sort_index(axis=0) # 按行排序
Out[2]:
   c   a   b
A  9  10  11
B  3   4   5
C  6   7   8
D  0   1   2
 
In [3]: frame.sort_index(axis=1) # 按列排序
Out[3]:
    a   b  c
D   1   2  0
B   4   5  3
C   7   8  6
A  10  11  9

参考资料：pandas根据列名对列重新排序

2020.05.17

Note21

在调用json.dump()函数的时候遇到报错：

Object of type ‘float32’ is not JSON serializable

解决方案如下：


...

class MyEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.integer):
            return int(obj)
        elif isinstance(obj, np.floating):
            return float(obj)
        elif isinstance(obj, np.ndarray):
            return obj.tolist()
        else:
            return super(MyEncoder, self).default(obj)
        
if __name__ == "__main__":
    
    ...

    with open('res.json', 'w') as fw:
        json.dump(res,fw,cls=MyEncoder)

    ...

参考资料：https://blog.csdn.net/jacke121/article/details/79231972

2020.05.28

Note22

在matplotlib中使用中文标签：

1 2	import matplotlib.pyplot as plt plt.rcParams['font.sans-serif']=['SimHei']

2020.05.29

Note23

之前在使用pd.isin(pd.date_range(...))函数来截取某段时间范围的数据的时候一直有问题，取出来的并不是我预期的结果，有很多数据缺失没被取出来，后来发现是参数没设置对，要设置freq="5s"来控制生成的时间序列粒度：

1	dataset['datetime'].isin(pd.date_range(start="2019/1/28 00:00:00", end="2019/2/3 23:55:00", freq="5s"))]