pandas中hdf文件使用经验

By 飞扬 | 2017年9月8日

0 Comment

类库的引入



1


import pandas as pd

python读写hdf文件常见的类库有h5py和pytable，pandas使用的是pytable。

直接将DataFrame写入到hdf文件



1
2
3


df.to_hdf(hdf_file_name,folder_father+'/'+folder_son,append=True, complib='zlib', complevel=9)

#append表示如果hdf文件中folder存在是否追加

#complib指文件压缩的类型，complevel指压缩级别

使用对象写入hdf文件



1
2
3
4


store = pd.HDFStore('stock_day.hdf5','a') 

store.append("stock",df_stock,append=True,format="table") 

store.append("index",df_index,append=True,format="table")

store.close()

使用上述代码，hdf文件中会有以table存储的数据，但貌似只是索引，其他的数据依旧不会按table存储，指定某一列也按table存储的方法是使用data_columns参数。



1


store.append("index",df_index,append=True,format="table",data_columns=["index_code","date","open"])

hdf数据的查询

hdf中数据查询使用where参数，但要使用where查询必须在写入时用data_columns指定列（column），注意指定过多的column为data_columns将会使得性能下降。另外指定索引（index）也可以使用where（store.create_table_index() ），速度上没啥区别。另外使用了索引或者指定了data_columns均会导致文件增大。



1
2
3


store = pd.HDFStore('stock_day.hdf5','a')

index_df=store.select("index",where=['date&gt;="%s" and date&lt;="%s"'%(startdate,enddate)])

store.close()

更多资料看pandas文档：http://pandas.pydata.org/pandas-docs/stable/api.html?highlight=hdfstore

发表回复取消回复

要发表评论，您必须先登录。

Iconic One Theme | Powered by Wordpress