Data

class hokohoko.entities.Data(parameters, symbol_subset=None, origin=0, end=None, lock=None, load=True)

Bases: object

Defines the interface for a data source. Hokohoko interacts with the Data through a with block, so __enter__ and __exit__ should both be overridden. The provided interface assumes using a memory cache for speed, and thus it is expected that generally data acquisition will be done during __enter__ with the internal parameters provided.

Parameters
  • parameters (str) – The parameters specific to this data source, e.g. filename. Stored in self.parameters.

  • symbol_subset (List[str]) – (Optional) The requested symbol list. Stored in self.symbol_subset.

  • origin (int) – (Optional) The minute to load from (inclusive). This is an index value (with 0 he first minute available in the source. Stored in self.origin.

  • end (int) – (Optional) The minute load up to. Stored in self.end.

  • lock (multiprocessing.Lock) – A single Lock shared by all instances of this type.

  • load (bool) –

    Indicates if the data is being loaded or not. For a significant speed boost, Hokohoko first initialises the data source once with this as False to request the information required to configure Periods. Each individual period process then loads it, with the value True.

    As Hokohoko asks for the available symbols and minutes, this should only be used to control lazy loading of self.data.

Internal data items available:

These internal items are initially None, and should be set (if used) during __enter__.

symbol_ids:     The intersection of available and
                requested symbols.

timestamps:     The per-minute UTC timestamps for the
                cached data.

data:           The data cache
                (len(symbols_ids), origin:end).
__enter__()

Called upon entry into a with block. This should connect to the data source using the already supplied parameters.

Returns

This should return self.

Return type

hokohoko.entities.Data

Important

This must be overridden.

__exit__(exc_type, exc_val, exc_tb)

Called when a with block is exited, either normally or through an exception. This should disconnect, close and release the data source.

Important

This must be overridden.

get_symbol_ids()

Returns the list of symbols currently available in the data source. If a subset was selected, this will be the intersection of symbols available and the requested subset. Note that Hokohoko requests additional symbols for internal use if required.

Returns

The list of available symbols (as ids).

Return type

numpy.ndarray[numpy.int64]

Important

This must be overridden.

get_minutes()

Get how many minutes are available in this data source. This should be the total amount of minutes in the data source if load == False, otherwise how many have been cached (hopefully end - origin).

Returns

Number of minutes.

Return type

int

Important

This must be overridden.

get_partial_data(origin, end)

Retrieve a block of data [origin, end) from the source.

Parameters
  • origin (int) – The first minute to load from. Note this is an index value, with 0 being the start of the available data.

  • end (int) – The last minute to load.

Returns

Two arrays:
1. Per-minute timestamps.
2. Per-symbol, per-minute exchange rate data.

Return type

tuple(numpy.ndarray, numpy.ndarray)

Important

This must be overridden.