Npz¶
- class hokohoko.standard.Npz(parameters, symbol_subset=None, origin=0, end=None, lock=None, load=True)¶
Bases:
Data
Loads the data from a numpy.npz file.
- Parameters
parameters (str) – A string of arguments, from
HokohokoConfig.data_parameters
. See Parameter Arguments below for details.symbol_subset (str) – A comma-separated string of specific symbols to load. If the requested symbol doesn’t exist in the data set it is ignored. This list may be augmented by other currency pairs required to convert Orders to the Account base currency.
Other arguments are internal to Hokohoko.
Parameters Arguments:
filename [start_override=TIMESTAMP] [end_override=TIMESTAMP] filename The name of the data file to load. start_override Specify the timestamp within the data to load from. [Not implemented yet] end_override Specify the timestamp within the data to stop at. [Not implemented yet]
The Data File:
The Hokohoko data.npz file contains 3 records:
Symbols available. This is a comma-separated list of currency pairs available in the data.
Timestamps. This is an array of numpy.float64s representing the UTC timestamp for each data period.
Exchange rate data. This is a 2D array of data, ordered by symbol then timestamp. The lines are in the same order as the Symbols available, likewise the data for timestamps. Each packet of data contains five data points:
OPEN
,HIGH
,LOW
,CLOSE
andVOLUME
.
Important
The initial data file for Hokohoko contains 7 years of per-minute data for 50 currency pairs, and is thus too large to include in the distribution. It is, however, available for download here:
Performance Issues:
Due to Windows caching issues, Hokohoko uses a shared
multiprocessing.lock
to synchronise access to the data file. Also due to Windows, the RAM requirements [currently, version 0.1.0-alpha] are significant - approximately 1GB per current process.- get_symbol_ids()¶
Returns the list of symbols currently available in the data source. If a subset was selected, this will be the intersection of symbols available and the requested subset. Additional symbols may be loaded for internal use if required.
- Returns
The list of available symbols (as ids).
- Return type
numpy.ndarray[numpy.int64]
- get_minutes()¶
Get how many minutes are available in this data source. This may vary depending on loading conditions.
- Returns
Number of minutes.
- Return type
int
- get_partial_data(origin, end)¶
Retrieve a block of data [origin, end) from the source.
- Parameters
origin (int) – The first minute to get. Note this is an index value, with 0 being the start of the available data.
end (int) – Get up to this minute.
- Returns
- Two arrays:1. Per-minute timestamps.2. Per-symbol, per-minute exchange rate data.
- Return type
tuple(numpy.ndarray, numpy.ndarray)