Data¶
- class hokohoko.entities.Data(parameters, symbol_subset=None, origin=0, end=None, lock=None, load=True)¶
Bases:
object
Defines the interface for a data source. Hokohoko interacts with the Data through a
with
block, so__enter__
and__exit__
should both be overridden. The provided interface assumes using a memory cache for speed, and thus it is expected that generally data acquisition will be done during__enter__
with the internal parameters provided.- Parameters
parameters (str) – The parameters specific to this data source, e.g. filename. Stored in
self.parameters
.symbol_subset (List[str]) – (Optional) The requested symbol list. Stored in
self.symbol_subset
.origin (int) – (Optional) The minute to load from (inclusive). This is an index value (with
0
he first minute available in the source. Stored inself.origin
.end (int) – (Optional) The minute load up to. Stored in
self.end
.lock (multiprocessing.Lock) – A single Lock shared by all instances of this type.
load (bool) –
Indicates if the data is being loaded or not. For a significant speed boost, Hokohoko first initialises the data source once with this as
False
to request the information required to configure Periods. Each individual period process then loads it, with the valueTrue
.As Hokohoko asks for the available symbols and minutes, this should only be used to control lazy loading of
self.data
.
Internal data items available:
These internal items are initially
None
, and should be set (if used) during__enter__
.symbol_ids: The intersection of available and requested symbols. timestamps: The per-minute UTC timestamps for the cached data. data: The data cache (len(symbols_ids), origin:end).
- __enter__()¶
Called upon entry into a with block. This should connect to the data source using the already supplied parameters.
- Returns
This should return
self
.- Return type
Important
This must be overridden.
- __exit__(exc_type, exc_val, exc_tb)¶
Called when a with block is exited, either normally or through an exception. This should disconnect, close and release the data source.
Important
This must be overridden.
- get_symbol_ids()¶
Returns the list of symbols currently available in the data source. If a subset was selected, this will be the intersection of symbols available and the requested subset. Note that Hokohoko requests additional symbols for internal use if required.
- Returns
The list of available symbols (as ids).
- Return type
numpy.ndarray[numpy.int64]
Important
This must be overridden.
- get_minutes()¶
Get how many minutes are available in this data source. This should be the total amount of minutes in the data source if
load == False
, otherwise how many have been cached (hopefullyend - origin
).- Returns
Number of minutes.
- Return type
int
Important
This must be overridden.
- get_partial_data(origin, end)¶
Retrieve a block of data [origin, end) from the source.
- Parameters
origin (int) – The first minute to load from. Note this is an index value, with 0 being the start of the available data.
end (int) – The last minute to load.
- Returns
- Two arrays:1. Per-minute timestamps.2. Per-symbol, per-minute exchange rate data.
- Return type
tuple(numpy.ndarray, numpy.ndarray)
Important
This must be overridden.