src.pulsarsa.tools.pipe_methods package
Submodules
src.pulsarsa.tools.pipe_methods.data_gen_methods module
- src.pulsarsa.tools.pipe_methods.data_gen_methods.instantiate_simu_modules(path_to_work_folder: str)
Instantiate simulation modules based on the configuration file.
- Parameters:
path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file.
- Returns:
tuple – A tuple containing the GenData instance, PulsarAnimator instance, and spark pattern list.
- src.pulsarsa.tools.pipe_methods.data_gen_methods.generate_data_payloads_for_training(path_to_work_folder: str)
Generates sythetic payloads using PulsarDT simulation modules.
- Parameters:
path_to_work_folder (str) – path to the working folder containing the config_pipe.yml file.
- Returns:
str – Message indicating the location of generated payloads.
- src.pulsarsa.tools.pipe_methods.data_gen_methods.run_sample_plotter(path_to_work_folder: str, num_samples=5)
Run the sample plotter script in the specified working directory.
- Parameters:
path_to_work_folder (str) – Path to the working folder where the sample plotter script will be executed.
num_samples (int, optional) – Number of sample plots to generate. Defaults to 5.
- Returns:
str – Message indicating the location of the generated sample plots.
src.pulsarsa.tools.pipe_methods.parameter_tuning module
- src.pulsarsa.tools.pipe_methods.parameter_tuning.run_pipeline_estimator_on_paramgrid(path_to_work_folder, param_grid: dict, file_type: str = 'fil')
This function is used to look into the performance metrics of the pipeline for all possible combinations of parameters
- Parameters:
path_to_work_folder (str) – path to the folder containing config.yml file containing pipeline descriptions
param_grid (dict) – dictionary representing the parameter names to test as keys and the possible corresponding values as values of the dict
file_type (str, optional) – Type of validation dataset type either ‘npy’ or ‘fil’. Defaults to ‘fil’.
- Returns:
dict – returns the best parameter mixes for each category of performance metrics ‘F_score’,’accuracy’, ‘precision’, ‘recall’
- src.pulsarsa.tools.pipe_methods.parameter_tuning.instantiate_tuner(pipeline_list: list[PipelineImageToFilterToCCtoLabels], variance_to_capture: float | int = 3)
Instantiates tuner for mixing the pipelines CNNs
- Parameters:
pipeline_list (list[PipelineImageToFilterToCCtoLabels]) – list of pipeline instances
variance_to_capture (float | int, optional) – if float then it represents the variance in pipelines NN parameters to capture, else represents the number of PCA components to consider (<= number of pipelines). Defaults to .9.
- Returns:
Tuner – returns the tuner instance
- src.pulsarsa.tools.pipe_methods.parameter_tuning.run_pipeline_estimator_on_pca_componentgrid(pipelines: list[PipelineImageToFilterToCCtoLabels], PCA_grid_resolution: float, path_to_work_folder: str, variance_to_capture: float | int = 0.9, file_type: str = 'fil') dict
This function is used to look into the performance metrics of the mixed pipeline given many similar pipelines trained on different datasets
- Parameters:
pipelines (list[PipelineImageToFilterToCCtoLabels]) – list of similar pipelines tained on different simulation datasets
PCA_grid_resolution (float) – grid resolution
path_to_work_folder (str) – path to the folder containing config.yml file containing pipeline descriptions
variance_to_capture (float | int, optional) – if float then it represents the variance in pipelines NN parameters to capture, else represents the number of PCA components to consider (<= number of pipelines). Defaults to .9.
file_type (str, optional) – Type of validation dataset type either ‘npy’ or ‘fil’. Defaults to ‘fil’.
- Returns:
dict – returns the best parameter mixes for each category of performance metrics ‘F_score’,’accuracy’, ‘precision’, ‘recall’
src.pulsarsa.tools.pipe_methods.protocol module
- src.pulsarsa.tools.pipe_methods.protocol.run_pipeline(path_to_work_folder: str, need_data_gen: bool = True, need_training: bool = True, need_validation: bool = True, skip_filter_training: bool = False, val_file_type: str = 'fil')
Run the full PulsarSA pipeline including setup, training, and validation.
- Parameters:
path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file.
need_data_gen (bool, optional) – Whether to generate data payloads. Defaults to True.
need_training (bool, optional) – Whether to train the pipeline neural networks. Defaults to True.
need_validation (bool, optional) – Whether to validate the pipeline. Defaults to True.
skip_filter_training (bool, optional) – Whether to skip the filter neural network. Defaults to False.
val_file_type (str, optional) – File type for validation (‘fil’ or ‘npy’). Defaults to ‘fil’.
src.pulsarsa.tools.pipe_methods.scale module
- src.pulsarsa.tools.pipe_methods.scale.multiply_work_folder(path_to_parent_folder: str, keys_to_modify: dict[str, float] | None = None)
Create multiple working folders with modified configuration files.
- Parameters:
path_to_parent_folder (str) – Path to the parent folder containing the original configuration files. The folder should contain: - config_pipe.yml
keys_to_modify (dict[str, float], optional) – Dictionary where keys are the configuration keys to modify (in dot notation) and values are lists of new values to set for those keys. If None, only one copy of the configuration file will be created without modifications
- src.pulsarsa.tools.pipe_methods.scale.get_list_of_work_folders(path_to_parent_folder: str, write_to_file: str | None = None) list[str]
Get a list of all child work folders in the parent fo
- Returns:
list[str] – List of paths to child work folders.
- src.pulsarsa.tools.pipe_methods.scale.get_validation_scores_from_work_folders(path_to_parent_folder: str) dict[str, dict]
Get validation scores from all child work folders in the parent folder.
- Parameters:
path_to_parent_folder (str) – Path to the parent folder containing child work folders.
- Returns:
dict[str, dict] –
- Dictionary where keys are child work folder paths and values are their corresponding
validation scores (loaded from logs/current_scores.json).
src.pulsarsa.tools.pipe_methods.setup_work_space module
- src.pulsarsa.tools.pipe_methods.setup_work_space.make_work_space(path_to_work_folder: str)
Create the working directory structure for PulsarSA pipeline.
- Parameters:
path_to_work_folder (str) – Path to the working folder where the directory structure will be created.
- src.pulsarsa.tools.pipe_methods.setup_work_space.check_if_config_exists(path_to_work_folder: str) bool
Check if the configuration file exists in the working directory.
- Returns:
bool – True if the config_pipe.yml file exists, False otherwise.
- src.pulsarsa.tools.pipe_methods.setup_work_space.check_if_pipe_neural_network_models_exist(path_to_work_folder: str) bool
Check if the neural network models exist in the working directory.
- Returns:
bool – True if the pipe_neural_network_models.py file exists, False otherwise.
- src.pulsarsa.tools.pipe_methods.setup_work_space.setup_workspace(path_to_work_folder: str)
Setup the workspace
- Parameters:
path_to_work_folder (str) – path to setup the workspace
src.pulsarsa.tools.pipe_methods.training_methods module
- src.pulsarsa.tools.pipe_methods.training_methods.train_pipeline_neural_nets(path_to_work_folder: str, skip_filter_training: bool = False)
This method trains the neural networks defined in the config file located in path_to_work_folder The work folder must contain a config_pipe.yml file along with pipe_neural_network_models.py if custom models are used
- Parameters:
path_to_work_folder (str) – Path to the working folder where the config file is located
- Returns:
None
src.pulsarsa.tools.pipe_methods.utils module
- src.pulsarsa.tools.pipe_methods.utils.read_yaml_file(path_to_config_file: str)
Reads a yaml file
- Parameters:
path_to_config_file (str) – path to the yaml config file
- Raises:
FileNotFoundError – if the file doesn’t exist
- Returns:
dict – content of the yaml file
- src.pulsarsa.tools.pipe_methods.utils.get_network_from_target_folder(path_to_work_folder: str, network_name: str)
Get the neural network from the workfolder containing the pipe_neural_network_models.py file
- Raises:
FileNotFoundError – raises error if the pipe_neural_network_models.py file is not found
ValueError – raises error if the network name is not found in the pipe_neural_network_models.py file
- Returns:
nn.Module – returns the neural network class
- src.pulsarsa.tools.pipe_methods.utils.get_loss_function_from_target_folder(path_to_work_folder: str, loss_function_name: str)
Get the loss function from the pipe_neural_network_models.py file in the workfolder
- Raises:
FileNotFoundError – _if_the_pipe_neural_network_models.py_file_is_not_found
ValueError – raises error if the loss_function name is not found in the pipe_neural_network_models.py file
- Returns:
nn.Module – returns the loss function class
src.pulsarsa.tools.pipe_methods.validation_methods module
- src.pulsarsa.tools.pipe_methods.validation_methods.create_validation_dataset_from_npy_file(path_to_work_folder: str, allow_randomness: bool = False)
create validation dataset from npy file
- Parameters:
path_to_work_folder (str) – path to the working folder
allow_randomness (bool, optional) – allow randomness in selection. Defaults to False.
- Returns:
tuple[np.memmap, np.ndarray, np.ndarray, int] – data source, all random indices, corresponding y true, random seed
- src.pulsarsa.tools.pipe_methods.validation_methods.create_validation_dataset_from_filterbank_file(path_to_work_folder: str, position_column: str = 'position', time_window: int = 512, allow_randomness: bool = False)
- Create validation dataset from a real filterbank file and its metadata file.
The metadata file should be a CSV file with at least two columns: ‘index’ and ‘position’. The ‘index’ column should contain the index of the pulse in the filterbank file, and the ‘position’ column should contain the position of the pulse in the time series. The function will select random samples from the filterbank file based on the SNR band specified in the config file.
- Parameters:
path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file.
position_column (str, optional) – Name of the column in the metadata file that contains the position of the pulse. Defaults to ‘position’.
time_window (int, optional) – Time window size for each sample. Defaults to 512.
allow_randomness (bool, optional) – Whether to allow randomness in the selection of samples. Defaults to False.
- Returns:
tuple[FilDataSource, np.ndarray, np.ndarray, int] – Data source object for the filterbank file. Array of indices of the selected samples. Array of true labels for the selected samples (1 for pulse, 0 for non-pulse). Random seed used for the selection of samples.
- src.pulsarsa.tools.pipe_methods.validation_methods.load_pipeline(path_to_work_folder: str)
creates an instance of the pipeline from the configuration file
- Parameters:
path_to_work_folder (str) – path to the working folder
- Returns:
PipelineImageToFilterToCCtoLabels – instance of the pipeline
- src.pulsarsa.tools.pipe_methods.validation_methods.calculate_y_pred(input_image, pipeline: PipelineImageToFilterToCCtoLabels)
Calculates the prediction given an input image
- Parameters:
input_image (np.ndarray) – input image
pipeline (PipelineImageToFilterToCCtoLabels) – pipeline instance
- Returns:
tuple[int, np.ndarray, np.ndarray, np.ndarray, list] – y_pred, pred_binarized, pred_binarized_filtered, labelled_skeleton, results
- src.pulsarsa.tools.pipe_methods.validation_methods.draw_detected_pulsar_results(results, ax)
Plots the results from pipeline
- Parameters:
results (tuple|list) – x_coors_sorted, y_coors_sorted, popt, func, category, num_points,snr_calculated
ax (matplotlib.pyplot.axes) – figure axis containing the plot
- src.pulsarsa.tools.pipe_methods.validation_methods.validate_pipeline(path_to_work_folder: str, file_type: str = 'fil')
Validate the trained pipeline on a real filterbank file and log the results to MLflow.
- Parameters:
path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file and trained models.
file_type (str, optional) – Type of input file either in ‘fil’ or ‘npy’. Defaults to ‘fil’.
- src.pulsarsa.tools.pipe_methods.validation_methods.binary_cross_entropy(y_true: ndarray, y_pred: ndarray, eps=1e-15)
Calculates the binary cross-entropy error
- Parameters:
y_true (np.ndarray) – list containing 0 or 1
y_pred (np.ndarray) – list containing 0 or 1
eps (float, optional) – epsilon to avoid log(0). Defaults to 1e-15.
- Returns:
float – binary cross-entropy error
- src.pulsarsa.tools.pipe_methods.validation_methods.validate_pipeline_simple(path_to_work_folder: str, file_type: str = 'fil', pipeline: PipelineImageToFilterToCCtoLabels | None = None, plot_results: bool = False) dict[str, float]
Validate the trained pipeline on a real filterbank file or npy file.
- Parameters:
path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file and trained models.
file_type (str, optional) – Type of input file either in ‘fil’ or ‘npy’. Defaults to ‘fil’.
pipeline (PipelineImageToFilterToCCtoLabels, optional) – if not provided, it will be loaded from the work folder config file defination. Defaults to None.
- Returns:
dict[str,float] – dictionary containing F1_score, accuracy, precision, recall, entropy
- src.pulsarsa.tools.pipe_methods.validation_methods.validate_pipeline_for_diff_snr_bands(path_to_work_folder: str, snr_bands: list[list[float]], file_type: str = 'fil', pipe_line: PipelineImageToFilterToCCtoLabels | None = None, save_plots: bool = False) list[dict[str, float]]
Validate the trained pipeline on different SNR bands and log the results to MLflow.
- Parameters:
path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file and trained models.
snr_bands (list[list[float]]) – List of SNR bands to validate on.
file_type (str, optional) – Type of input file either in ‘fil’ or ‘npy’. Defaults to ‘fil’.
- src.pulsarsa.tools.pipe_methods.validation_methods.create_filterbank_filedataset(filterbank_file_path: str, metadata_path: str, position_column='position', time_window=512, allow_randomness=False, batch_size: int = 10, num_batches: int = 20, freq_range: list | None = None, device: device = device(type='cpu'))
- src.pulsarsa.tools.pipe_methods.validation_methods.create_filterbank_filedataset_for_decision_tree(filterbank_file_path: str, metadata_path: str, position_column='position', time_window=512, allow_randomness=False, batch_size: int = 10, num_batches: int = 20, freq_range: list | None = None, device: device = device(type='cpu'))
- src.pulsarsa.tools.pipe_methods.validation_methods.calculate_validation_metrics(pipeline, batches, y_trues, return_time: bool = True)
- src.pulsarsa.tools.pipe_methods.validation_methods.calculate_validation_metrics_using_decision_tree(pipeline, batches, y_trues, return_time: bool = True)
Module contents
- src.pulsarsa.tools.pipe_methods.generate_data_payloads_for_training(path_to_work_folder: str)
Generates sythetic payloads using PulsarDT simulation modules.
- Parameters:
path_to_work_folder (str) – path to the working folder containing the config_pipe.yml file.
- Returns:
str – Message indicating the location of generated payloads.
- src.pulsarsa.tools.pipe_methods.run_sample_plotter(path_to_work_folder: str, num_samples=5)
Run the sample plotter script in the specified working directory.
- Parameters:
path_to_work_folder (str) – Path to the working folder where the sample plotter script will be executed.
num_samples (int, optional) – Number of sample plots to generate. Defaults to 5.
- Returns:
str – Message indicating the location of the generated sample plots.
- src.pulsarsa.tools.pipe_methods.train_pipeline_neural_nets(path_to_work_folder: str, skip_filter_training: bool = False)
This method trains the neural networks defined in the config file located in path_to_work_folder The work folder must contain a config_pipe.yml file along with pipe_neural_network_models.py if custom models are used
- Parameters:
path_to_work_folder (str) – Path to the working folder where the config file is located
- Returns:
None
- src.pulsarsa.tools.pipe_methods.validate_pipeline(path_to_work_folder: str, file_type: str = 'fil')
Validate the trained pipeline on a real filterbank file and log the results to MLflow.
- Parameters:
path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file and trained models.
file_type (str, optional) – Type of input file either in ‘fil’ or ‘npy’. Defaults to ‘fil’.
- src.pulsarsa.tools.pipe_methods.run_pipeline(path_to_work_folder: str, need_data_gen: bool = True, need_training: bool = True, need_validation: bool = True, skip_filter_training: bool = False, val_file_type: str = 'fil')
Run the full PulsarSA pipeline including setup, training, and validation.
- Parameters:
path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file.
need_data_gen (bool, optional) – Whether to generate data payloads. Defaults to True.
need_training (bool, optional) – Whether to train the pipeline neural networks. Defaults to True.
need_validation (bool, optional) – Whether to validate the pipeline. Defaults to True.
skip_filter_training (bool, optional) – Whether to skip the filter neural network. Defaults to False.
val_file_type (str, optional) – File type for validation (‘fil’ or ‘npy’). Defaults to ‘fil’.
- src.pulsarsa.tools.pipe_methods.run_pipeline_estimator_on_paramgrid(path_to_work_folder, param_grid: dict, file_type: str = 'fil')
This function is used to look into the performance metrics of the pipeline for all possible combinations of parameters
- Parameters:
path_to_work_folder (str) – path to the folder containing config.yml file containing pipeline descriptions
param_grid (dict) – dictionary representing the parameter names to test as keys and the possible corresponding values as values of the dict
file_type (str, optional) – Type of validation dataset type either ‘npy’ or ‘fil’. Defaults to ‘fil’.
- Returns:
dict – returns the best parameter mixes for each category of performance metrics ‘F_score’,’accuracy’, ‘precision’, ‘recall’
- src.pulsarsa.tools.pipe_methods.run_pipeline_estimator_on_pca_componentgrid(pipelines: list[PipelineImageToFilterToCCtoLabels], PCA_grid_resolution: float, path_to_work_folder: str, variance_to_capture: float | int = 0.9, file_type: str = 'fil') dict
This function is used to look into the performance metrics of the mixed pipeline given many similar pipelines trained on different datasets
- Parameters:
pipelines (list[PipelineImageToFilterToCCtoLabels]) – list of similar pipelines tained on different simulation datasets
PCA_grid_resolution (float) – grid resolution
path_to_work_folder (str) – path to the folder containing config.yml file containing pipeline descriptions
variance_to_capture (float | int, optional) – if float then it represents the variance in pipelines NN parameters to capture, else represents the number of PCA components to consider (<= number of pipelines). Defaults to .9.
file_type (str, optional) – Type of validation dataset type either ‘npy’ or ‘fil’. Defaults to ‘fil’.
- Returns:
dict – returns the best parameter mixes for each category of performance metrics ‘F_score’,’accuracy’, ‘precision’, ‘recall’
- src.pulsarsa.tools.pipe_methods.instantiate_tuner(pipeline_list: list[PipelineImageToFilterToCCtoLabels], variance_to_capture: float | int = 3)
Instantiates tuner for mixing the pipelines CNNs
- Parameters:
pipeline_list (list[PipelineImageToFilterToCCtoLabels]) – list of pipeline instances
variance_to_capture (float | int, optional) – if float then it represents the variance in pipelines NN parameters to capture, else represents the number of PCA components to consider (<= number of pipelines). Defaults to .9.
- Returns:
Tuner – returns the tuner instance