Docs vtacML¶
Pipeline¶
- class vtacML.pipeline.VTACMLPipe(config_file='config/config.yaml')[source]¶
Bases:
object
A machine learning pipeline for training and evaluating an optimal model for optical identification of GRBs for the SVOM mission.
- Parameters:
config_path (str, optional) – Path to the configuration file. Default ‘config/config.yaml’
- evaluate(name, plot=False, score=<function f1_score>)[source]¶
Evaluate the best model with various metrics and visualization.
- Parameters:
name (str) – The name for the evaluation output.
plot (bool, optional) – If True, generates and saves evaluation plots, by default False.
score (callable, optional) – The scoring function to use for evaluation, by default f1_score.
- load_best_model(model_name)[source]¶
Loads ‘model_name’ into current pipeline.
- Parameters:
model_name (str) – The name of the model from the Outputs/models/ directory to be loaded.
- load_config(config_file)[source]¶
Load the configuration file and prepare the data.
- Parameters:
config_file (str) – The path to the configuration file.
- predict(X, prob=False)[source]¶
Predict using the best model.
- Parameters:
X (DataFrame) – The input features for prediction.
prob (bool, optional) – If True, returns the probability of the predictions, by default False.
- Returns:
The predicted values or probabilities.
- Return type:
ndarray
- save_best_model(model_name='best_model', model_path=None)[source]¶
Saves best model from training to the specified path in the config file. Optionally change name and/or path of the model.
- Parameters:
model_name (str, optional) – Name of the model to be saved. Default=’best_model’.
model_path (str, optional) – Path to the model to be saved. Default=’model_path’ in config file
- train(save_all_model=False, resample_flag=False, scoring='f1', cv=5)[source]¶
Train the pipeline with the given data.
- Parameters:
save_all_model (bool, optional) – Whether to save best model of each model type to output directory. Default is False.
resample_flag (bool, optional) – Whether to resample the data. Default is False
scoring (str, optional) – The scoring function to use. Default is ‘f1’.
cv (int, optional) – The cross-validation split to use. Default is 5.
- Returns:
Trained machine learning pipeline.
- Return type:
Pipeline
- vtacML.pipeline.predict_from_best_pipeline(X: DataFrame, prob_flag=False, model_name='0.974_rfc_best_model.pkl', config_path=None)[source]¶
Predict using the best model pipeline.
- Parameters:
X (array-like) – Features to predict.
prob_flag (bool, optional) – Whether to return probabilities, by default False.
model_name (str, optional) – Name of the model to use, by default ‘0.974_rfc_best_model.pkl’
model_path (str, optional) – Path to the model to use for prediction, by default ‘None’
config_path (str, optional) – Path to the configuration file, by default ‘../config/config.yaml’.
- Returns:
Predicted values or probabilities.
- Return type:
ndarray