Docs vtacML¶

Pipeline¶

class vtacML.pipeline.VTACMLPipe(config_file='config/config.yaml')[source]¶

Bases: object

A machine learning pipeline for training and evaluating an optimal model for optical identification of GRBs for the SVOM mission.

Parameters:: config_path (str, optional) – Path to the configuration file. Default ‘config/config.yaml’

evaluate(name, plot=False, score=<function f1_score>)[source]¶

Evaluate the best model with various metrics and visualization.

Parameters:

name (str) – The name for the evaluation output.
plot (bool, optional) – If True, generates and saves evaluation plots, by default False.
score (callable, optional) – The scoring function to use for evaluation, by default f1_score.

load_best_model(model_name)[source]¶

Loads ‘model_name’ into current pipeline.

Parameters:: model_name (str) – The name of the model from the Outputs/models/ directory to be loaded.

load_config(config_file)[source]¶

Load the configuration file and prepare the data.

Parameters:: config_file (str) – The path to the configuration file.

predict(X, prob=False)[source]¶

Predict using the best model.

Parameters:

X (DataFrame) – The input features for prediction.
prob (bool, optional) – If True, returns the probability of the predictions, by default False.

Returns:

The predicted values or probabilities.

Return type:

ndarray

save_best_model(model_name='best_model', model_path=None)[source]¶

Saves best model from training to the specified path in the config file. Optionally change name and/or path of the model.

Parameters:

model_name (str, optional) – Name of the model to be saved. Default=’best_model’.
model_path (str, optional) – Path to the model to be saved. Default=’model_path’ in config file

train(save_all_model=False, resample_flag=False, scoring='f1', cv=5)[source]¶

Train the pipeline with the given data.

Parameters:

save_all_model (bool, optional) – Whether to save best model of each model type to output directory. Default is False.
resample_flag (bool, optional) – Whether to resample the data. Default is False
scoring (str, optional) – The scoring function to use. Default is ‘f1’.
cv (int, optional) – The cross-validation split to use. Default is 5.

Returns:

Trained machine learning pipeline.

Return type:

Pipeline

vtacML.pipeline.predict_from_best_pipeline(X: DataFrame, prob_flag=False, model_name='0.974_rfc_best_model.pkl', config_path=None)[source]¶

Predict using the best model pipeline.

Parameters:

X (array-like) – Features to predict.
prob_flag (bool, optional) – Whether to return probabilities, by default False.
model_name (str, optional) – Name of the model to use, by default ‘0.974_rfc_best_model.pkl’
model_path (str, optional) – Path to the model to use for prediction, by default ‘None’
config_path (str, optional) – Path to the configuration file, by default ‘../config/config.yaml’.

Returns:

Predicted values or probabilities.

Return type:

ndarray

Docs vtacML¶

Pipeline¶

vtacML

Navigation

Related Topics