src package

Subpackages

Submodules

src.utility_functions module

Helper functions

src.utility_functions.backtest_strategy(predictions: ndarray[int], X_test: DataFrame, spy: DataFrame, ml_df: DataFrame) DataFrame[source]

Create a basic strategy dataframe

Parameters:
  • predictions (np.ndarray) – predictions array

  • X_test (pd.DataFrame) – test df

  • spy (pd.DataFrame) – quotation df

  • ml_df (pd.DataFrame) – ml df

Returns:

strategy df

Return type:

pd.DataFrame

src.utility_functions.compare_qq_plots(df: DataFrame, simulation: ndarray, n: int) None[source]

QQ plot comparison

Parameters:
  • df (pd.DataFrame) – Original data

  • simulation (np.ndarray) – simulations

  • n (int) – number of steps

src.utility_functions.compare_return_distribution(df: DataFrame, simulation: ndarray, n: int) None[source]

Plot distribution of returns for some sample data

Parameters:
  • df (pd.DataFrame) – Original data

  • simulation (np.ndarray) – simulations

  • n (int) – number of steps

src.utility_functions.display_report(y_test: ndarray, predictions: ndarray) tuple[float, float, float, float][source]

Display classification report and confusion matrix

Parameters:
  • y_test (np.ndarray) – true values

  • predictions (np.ndarray) – predicted values

Returns:

true negative, false positive, false negative, true positive

Return type:

tuple[float,float,float,float]

src.utility_functions.get_df(path: str, start_date: str, end_date: str) DataFrame[source]
src.utility_functions.get_standard_returns(s: Series) Series[source]

Standardize series

Parameters:

s (pd.Series) – Price series

Returns:

standardized returns

Return type:

pd.Series

src.utility_functions.objective_catboost(trial: Trial, X_train: DataFrame, y_train: ndarray, metric: str, n_splits: int = 5) float[source]

Objective function for CatBoost

Parameters:
  • trial (optuna.Trial) – optuna trial

  • X_train (pd.DataFrame) – train set

  • y_train (np.ndarray) – target

  • metric (str) – the metric used

  • n_splits (int, optional) – number of splits in the CV. Defaults to 5.

Returns:

score (with penalty)

Return type:

float

src.utility_functions.objective_lightgbm(trial: Trial, X_train: DataFrame, y_train: ndarray, metric: str, n_splits: int = 5, seed: int = 1968, max_iter: int = 35, max_dep: int = 12) float[source]

Objective function for LightGBM

Parameters:
  • trial (optuna.Trial) – optuna trial

  • X_train (pd.DataFrame) – train set

  • y_train (np.ndarray) – target

  • metric (str) – the metric used

  • n_splits (int, optional) – number of splits in the CV. Defaults to 5

  • seed (int, optional) – seed for Kfold. Default to 1968

  • max_iter (int, optional) – maximun number of iterations, to prevent overfitting. Default to 35

  • max_dep (int, optional) – maximum depth, to prevent overfitting. Default to 12

Returns:

mean score (with penalty)

Return type:

float

src.utility_functions.objective_logistic_regression(trial: Trial, X_train: DataFrame, y_train: ndarray, metric: str, n_splits: int = 10) float[source]

Objective function for Logistic Regression

Parameters:
  • trial (optuna.Trial) – optuna trial

  • X_train (pd.DataFrame) – train set

  • y_train (np.ndarray) – target

  • metric (str) – the metric used

  • n_splits (int, optional) – number of splits in the CV. Defaults to 10.

Returns:

metric

Return type:

float

src.utility_functions.objective_random_forest(trial: Trial, X_train: DataFrame, y_train: ndarray, metric: str, n_splits: int = 10) float[source]

Objective function for Random Forest

Parameters:
  • trial (optuna.Trial) – optuna trial

  • X_train (pd.DataFrame) – train set

  • y_train (np.ndarray) – target

  • metric (str) – the metric used

  • n_splits (int, optional) – number of splits in the CV. Defaults to 10.

Returns:

metric

Return type:

float

src.utility_functions.objective_svc(trial: Trial, X_train: DataFrame, y_train: ndarray, metric: str, n_splits: int = 10) float[source]

Objective function for SVC

Parameters:
  • trial (optuna.Trial) – optuna trial

  • X_train (pd.DataFrame) – train set

  • y_train (np.ndarray) – target

  • metric (str) – the metric used

  • n_splits (int, optional) – number of splits in the CV. Defaults to 10.

Returns:

metric

Return type:

float

src.utility_functions.plot_feature_imp(coefficients: ndarray[float], columns: list[str]) None[source]

Plot feature importance

Parameters:
  • coefficients (np.ndarray[float]) – coefficients

  • columns (list[str]) – feature names

src.utility_functions.plot_ks_comparison(res_list: list[float], title: str | None = None) None[source]

Bar plot for Kolmogorov-Smirnov test for goodness of fit results

Parameters:
  • res_list (list[float]) – list of results

  • title (str | None, optional) – plot title. Defaults to None.

src.utility_functions.plot_multiple_return_comparison(df: DataFrame, simulation: ndarray, n: int) None[source]

Plot multiple log returns comparison

Parameters:
  • df (pd.DataFrame) – Original data

  • simulation (np.ndarray) – simulations

  • n (int) – number of steps

src.utility_functions.plot_return_comparison(df: DataFrame, simulation: ndarray, n: int) None[source]

Plot the log returns comparison

Parameters:
  • df (pd.DataFrame) – Original data

  • simulation (np.ndarray) – simulations

  • n (int) – number of steps

src.utility_functions.plot_simulated_paths(df: DataFrame, simulation: ndarray, n: int) None[source]

Plot simulated paths (example)

Parameters:
  • df (pd.DataFrame) – Original data

  • simulation (np.ndarray) – simulations

  • n (int) – number of steps

src.utility_functions.plot_strategy(strategy_df: DataFrame, model_name: str, strategy_desc: str) None[source]

Plot the strategy vs bare strategy

Parameters:
  • strategy_df (pd.DataFrame) – strategy df

  • model_name (str) – name of the model used

  • strategy_desc (str) – additional stragey description

src.utility_functions.select_threshold(proba: ndarray[float], target: ndarray[int], fpr_max: float = 0.1) float[source]

Compute the best threshold given the maximum acceptable false positive rate

Parameters:
  • proba (np.ndarray[float]) – predicted probabilities

  • target (np.ndarray[int]) – true values

  • fpr_max (float, optional) – maximum acceptable false positive rate. Defaults to 0.1.

Returns:

best threshold

Return type:

float

src.utility_functions.tune_params(X_train: DataFrame, y_train: ndarray, metric: str, timeout: int, max_iter: int = 35, max_dep: int = 12, n_splits: int = 5) dict[source]

Tune hyperparameters using Otuna

Parameters:
  • X_train (pd.DataFrame) – Train set

  • y_train (np.ndarray) – train labels

  • metric (str) – metric to use

  • timeout (int) – timeout

  • max_iter (int, optional) – maximun number of iterations, to prevent overfitting. Default to 35

  • max_dep (int, optional) – maximum depth, to prevent overfitting. Default to 12

  • n_splits (int, optional) – number of splits in Kfold. Default to 5

Returns:

tuned parameters

Return type:

dict

Module contents

Source code of your project