src package

Submodules

src.utility_functions module

Helper functions

src.utility_functions.backtest_strategy(predictions: ndarray[int], X_test: DataFrame, spy: DataFrame, ml_df: DataFrame) → DataFrame[source]

Create a basic strategy dataframe

Parameters:

predictions (np.ndarray) – predictions array
X_test (pd.DataFrame) – test df
spy (pd.DataFrame) – quotation df
ml_df (pd.DataFrame) – ml df

Returns:

strategy df

Return type:

pd.DataFrame

src.utility_functions.compare_qq_plots(df: DataFrame, simulation: ndarray, n: int) → None[source]

QQ plot comparison

Parameters:

df (pd.DataFrame) – Original data
simulation (np.ndarray) – simulations
n (int) – number of steps

src.utility_functions.compare_return_distribution(df: DataFrame, simulation: ndarray, n: int) → None[source]

Plot distribution of returns for some sample data

Parameters:

df (pd.DataFrame) – Original data
simulation (np.ndarray) – simulations
n (int) – number of steps

src.utility_functions.display_report(y_test: ndarray, predictions: ndarray) → tuple[float, float, float, float][source]

Display classification report and confusion matrix

Parameters:

y_test (np.ndarray) – true values
predictions (np.ndarray) – predicted values

Returns:

true negative, false positive, false negative, true positive

Return type:

tuple[float,float,float,float]

src.utility_functions.get_df(path: str, start_date: str, end_date: str) → DataFrame[source]

src.utility_functions.get_standard_returns(s: Series) → Series[source]

Standardize series

Parameters:: s (pd.Series) – Price series
Returns:: standardized returns
Return type:: pd.Series

src.utility_functions.objective_catboost(trial: Trial, X_train: DataFrame, y_train: ndarray, metric: str, n_splits: int = 5) → float[source]

Objective function for CatBoost

Parameters:

trial (optuna.Trial) – optuna trial
X_train (pd.DataFrame) – train set
y_train (np.ndarray) – target
metric (str) – the metric used
n_splits (int, optional) – number of splits in the CV. Defaults to 5.

Returns:

score (with penalty)

Return type:

float

src.utility_functions.objective_lightgbm(trial: Trial, X_train: DataFrame, y_train: ndarray, metric: str, n_splits: int = 5, seed: int = 1968, max_iter: int = 35, max_dep: int = 12) → float[source]

Objective function for LightGBM

Parameters:

trial (optuna.Trial) – optuna trial
X_train (pd.DataFrame) – train set
y_train (np.ndarray) – target
metric (str) – the metric used
n_splits (int, optional) – number of splits in the CV. Defaults to 5
seed (int, optional) – seed for Kfold. Default to 1968
max_iter (int, optional) – maximun number of iterations, to prevent overfitting. Default to 35
max_dep (int, optional) – maximum depth, to prevent overfitting. Default to 12

Returns:

mean score (with penalty)

Return type:

float

src.utility_functions.objective_logistic_regression(trial: Trial, X_train: DataFrame, y_train: ndarray, metric: str, n_splits: int = 10) → float[source]

Objective function for Logistic Regression

Parameters:

trial (optuna.Trial) – optuna trial
X_train (pd.DataFrame) – train set
y_train (np.ndarray) – target
metric (str) – the metric used
n_splits (int, optional) – number of splits in the CV. Defaults to 10.

Returns:

metric

Return type:

float

src.utility_functions.objective_random_forest(trial: Trial, X_train: DataFrame, y_train: ndarray, metric: str, n_splits: int = 10) → float[source]

Objective function for Random Forest

Parameters:

trial (optuna.Trial) – optuna trial
X_train (pd.DataFrame) – train set
y_train (np.ndarray) – target
metric (str) – the metric used
n_splits (int, optional) – number of splits in the CV. Defaults to 10.

Returns:

metric

Return type:

float

src.utility_functions.objective_svc(trial: Trial, X_train: DataFrame, y_train: ndarray, metric: str, n_splits: int = 10) → float[source]

Objective function for SVC

Parameters:

trial (optuna.Trial) – optuna trial
X_train (pd.DataFrame) – train set
y_train (np.ndarray) – target
metric (str) – the metric used
n_splits (int, optional) – number of splits in the CV. Defaults to 10.

Returns:

metric

Return type:

float

src.utility_functions.plot_feature_imp(coefficients: ndarray[float], columns: list[str]) → None[source]

Plot feature importance

Parameters:

coefficients (np.ndarray[float]) – coefficients
columns (list[str]) – feature names

src.utility_functions.plot_ks_comparison(res_list: list[float], title: str | None = None) → None[source]

Bar plot for Kolmogorov-Smirnov test for goodness of fit results

Parameters:

res_list (list[float]) – list of results
title (str | None, optional) – plot title. Defaults to None.

src.utility_functions.plot_multiple_return_comparison(df: DataFrame, simulation: ndarray, n: int) → None[source]

Plot multiple log returns comparison

Parameters:

df (pd.DataFrame) – Original data
simulation (np.ndarray) – simulations
n (int) – number of steps

src.utility_functions.plot_return_comparison(df: DataFrame, simulation: ndarray, n: int) → None[source]

Plot the log returns comparison

Parameters:

df (pd.DataFrame) – Original data
simulation (np.ndarray) – simulations
n (int) – number of steps

src.utility_functions.plot_simulated_paths(df: DataFrame, simulation: ndarray, n: int) → None[source]

Plot simulated paths (example)

Parameters:

df (pd.DataFrame) – Original data
simulation (np.ndarray) – simulations
n (int) – number of steps

src.utility_functions.plot_strategy(strategy_df: DataFrame, model_name: str, strategy_desc: str) → None[source]

Plot the strategy vs bare strategy

Parameters:

strategy_df (pd.DataFrame) – strategy df
model_name (str) – name of the model used
strategy_desc (str) – additional stragey description

src.utility_functions.select_threshold(proba: ndarray[float], target: ndarray[int], fpr_max: float = 0.1) → float[source]

Compute the best threshold given the maximum acceptable false positive rate

Parameters:

proba (np.ndarray[float]) – predicted probabilities
target (np.ndarray[int]) – true values
fpr_max (float, optional) – maximum acceptable false positive rate. Defaults to 0.1.

Returns:

best threshold

Return type:

float

src.utility_functions.tune_params(X_train: DataFrame, y_train: ndarray, metric: str, timeout: int, max_iter: int = 35, max_dep: int = 12, n_splits: int = 5) → dict[source]

Tune hyperparameters using Otuna

Parameters:

X_train (pd.DataFrame) – Train set
y_train (np.ndarray) – train labels
metric (str) – metric to use
timeout (int) – timeout
max_iter (int, optional) – maximun number of iterations, to prevent overfitting. Default to 35
max_dep (int, optional) – maximum depth, to prevent overfitting. Default to 12
n_splits (int, optional) – number of splits in Kfold. Default to 5

Returns:

tuned parameters

Return type:

dict

Module contents

Source code of your project

src package

Subpackages

Submodules

src.utility_functions module

Module contents