Docs vtacML

Pipeline

class vtacML.pipeline.VTACMLPipe(config_file='config/config.yaml')[source]

Bases: object

A machine learning pipeline for training and evaluating an optimal model for optical identification of GRBs for the SVOM mission.

Parameters:

config_path (str, optional) – Path to the configuration file. Default ‘config/config.yaml’

evaluate(name, plot=False, score=<function f1_score>)[source]

Evaluate the best model with various metrics and visualization.

Parameters:
  • name (str) – The name for the evaluation output.

  • plot (bool, optional) – If True, generates and saves evaluation plots, by default False.

  • score (callable, optional) – The scoring function to use for evaluation, by default f1_score.

load_best_model(model_name)[source]

Loads ‘model_name’ into current pipeline.

Parameters:

model_name (str) – The name of the model from the Outputs/models/ directory to be loaded.

load_config(config_file)[source]

Load the configuration file and prepare the data.

Parameters:

config_file (str) – The path to the configuration file.

predict(X, prob=False)[source]

Predict using the best model.

Parameters:
  • X (DataFrame) – The input features for prediction.

  • prob (bool, optional) – If True, returns the probability of the predictions, by default False.

Returns:

The predicted values or probabilities.

Return type:

ndarray

save_best_model(model_name='best_model', model_path=None)[source]

Saves best model from training to the specified path in the config file. Optionally change name and/or path of the model.

Parameters:
  • model_name (str, optional) – Name of the model to be saved. Default=’best_model’.

  • model_path (str, optional) – Path to the model to be saved. Default=’model_path’ in config file

train(save_all_model=False, resample_flag=False, scoring='f1', cv=5)[source]

Train the pipeline with the given data.

Parameters:
  • save_all_model (bool, optional) – Whether to save best model of each model type to output directory. Default is False.

  • resample_flag (bool, optional) – Whether to resample the data. Default is False

  • scoring (str, optional) – The scoring function to use. Default is ‘f1’.

  • cv (int, optional) – The cross-validation split to use. Default is 5.

Returns:

Trained machine learning pipeline.

Return type:

Pipeline

vtacML.pipeline.predict_from_best_pipeline(X: DataFrame, prob_flag=False, model_name='0.974_rfc_best_model.pkl', config_path=None)[source]

Predict using the best model pipeline.

Parameters:
  • X (array-like) – Features to predict.

  • prob_flag (bool, optional) – Whether to return probabilities, by default False.

  • model_name (str, optional) – Name of the model to use, by default ‘0.974_rfc_best_model.pkl’

  • model_path (str, optional) – Path to the model to use for prediction, by default ‘None’

  • config_path (str, optional) – Path to the configuration file, by default ‘../config/config.yaml’.

Returns:

Predicted values or probabilities.

Return type:

ndarray