Npz

class hokohoko.standard.Npz(parameters, symbol_subset=None, origin=0, end=None, lock=None, load=True)

Bases: Data

Loads the data from a numpy.npz file.

Parameters
  • parameters (str) – A string of arguments, from HokohokoConfig.data_parameters. See Parameter Arguments below for details.

  • symbol_subset (str) – A comma-separated string of specific symbols to load. If the requested symbol doesn’t exist in the data set it is ignored. This list may be augmented by other currency pairs required to convert Orders to the Account base currency.

Other arguments are internal to Hokohoko.

Parameters Arguments:

filename [start_override=TIMESTAMP] [end_override=TIMESTAMP]

    filename        The name of the data file to load.

    start_override  Specify the timestamp within the
                    data to load from.
                    [Not implemented yet]

    end_override    Specify the timestamp within the
                    data to stop at.
                    [Not implemented yet]

The Data File:

The Hokohoko data.npz file contains 3 records:

  1. Symbols available. This is a comma-separated list of currency pairs available in the data.

  2. Timestamps. This is an array of numpy.float64s representing the UTC timestamp for each data period.

  3. Exchange rate data. This is a 2D array of data, ordered by symbol then timestamp. The lines are in the same order as the Symbols available, likewise the data for timestamps. Each packet of data contains five data points: OPEN, HIGH, LOW, CLOSE and VOLUME.

Important

The initial data file for Hokohoko contains 7 years of per-minute data for 50 currency pairs, and is thus too large to include in the distribution. It is, however, available for download here:

Performance Issues:

Due to Windows caching issues, Hokohoko uses a shared multiprocessing.lock to synchronise access to the data file. Also due to Windows, the RAM requirements [currently, version 0.1.0-alpha] are significant - approximately 1GB per current process.

get_symbol_ids()

Returns the list of symbols currently available in the data source. If a subset was selected, this will be the intersection of symbols available and the requested subset. Additional symbols may be loaded for internal use if required.

Returns

The list of available symbols (as ids).

Return type

numpy.ndarray[numpy.int64]

get_minutes()

Get how many minutes are available in this data source. This may vary depending on loading conditions.

Returns

Number of minutes.

Return type

int

get_partial_data(origin, end)

Retrieve a block of data [origin, end) from the source.

Parameters
  • origin (int) – The first minute to get. Note this is an index value, with 0 being the start of the available data.

  • end (int) – Get up to this minute.

Returns

Two arrays:
1. Per-minute timestamps.
2. Per-symbol, per-minute exchange rate data.

Return type

tuple(numpy.ndarray, numpy.ndarray)