src.pulsarsa.tools.pipe_methods package

Submodules

src.pulsarsa.tools.pipe_methods.data_gen_methods module

src.pulsarsa.tools.pipe_methods.data_gen_methods.instantiate_simu_modules(path_to_work_folder: str)

Instantiate simulation modules based on the configuration file.

Parameters:: path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file.
Returns:: tuple – A tuple containing the GenData instance, PulsarAnimator instance, and spark pattern list.

src.pulsarsa.tools.pipe_methods.data_gen_methods.generate_data_payloads_for_training(path_to_work_folder: str)

Generates sythetic payloads using PulsarDT simulation modules.

Parameters:: path_to_work_folder (str) – path to the working folder containing the config_pipe.yml file.
Returns:: str – Message indicating the location of generated payloads.

src.pulsarsa.tools.pipe_methods.data_gen_methods.run_sample_plotter(path_to_work_folder: str, num_samples=5)

Run the sample plotter script in the specified working directory.

Parameters:

path_to_work_folder (str) – Path to the working folder where the sample plotter script will be executed.
num_samples (int, optional) – Number of sample plots to generate. Defaults to 5.

Returns:

str – Message indicating the location of the generated sample plots.

src.pulsarsa.tools.pipe_methods.parameter_tuning module

src.pulsarsa.tools.pipe_methods.parameter_tuning.run_pipeline_estimator_on_paramgrid(path_to_work_folder, param_grid: dict, file_type: str = 'fil')

This function is used to look into the performance metrics of the pipeline for all possible combinations of parameters

Parameters:

path_to_work_folder (str) – path to the folder containing config.yml file containing pipeline descriptions
param_grid (dict) – dictionary representing the parameter names to test as keys and the possible corresponding values as values of the dict
file_type (str, optional) – Type of validation dataset type either ‘npy’ or ‘fil’. Defaults to ‘fil’.

Returns:

dict – returns the best parameter mixes for each category of performance metrics ‘F_score’,’accuracy’, ‘precision’, ‘recall’

src.pulsarsa.tools.pipe_methods.parameter_tuning.instantiate_tuner(pipeline_list: list[PipelineImageToFilterToCCtoLabels], variance_to_capture: float | int = 3)

Instantiates tuner for mixing the pipelines CNNs

Parameters:

pipeline_list (list[PipelineImageToFilterToCCtoLabels]) – list of pipeline instances
variance_to_capture (float | int, optional) – if float then it represents the variance in pipelines NN parameters to capture, else represents the number of PCA components to consider (<= number of pipelines). Defaults to .9.

Returns:

Tuner – returns the tuner instance

src.pulsarsa.tools.pipe_methods.parameter_tuning.run_pipeline_estimator_on_pca_componentgrid(pipelines: list[PipelineImageToFilterToCCtoLabels], PCA_grid_resolution: float, path_to_work_folder: str, variance_to_capture: float | int = 0.9, file_type: str = 'fil') → dict

This function is used to look into the performance metrics of the mixed pipeline given many similar pipelines trained on different datasets

Parameters:

pipelines (list[PipelineImageToFilterToCCtoLabels]) – list of similar pipelines tained on different simulation datasets
PCA_grid_resolution (float) – grid resolution
path_to_work_folder (str) – path to the folder containing config.yml file containing pipeline descriptions
variance_to_capture (float | int, optional) – if float then it represents the variance in pipelines NN parameters to capture, else represents the number of PCA components to consider (<= number of pipelines). Defaults to .9.
file_type (str, optional) – Type of validation dataset type either ‘npy’ or ‘fil’. Defaults to ‘fil’.

Returns:

dict – returns the best parameter mixes for each category of performance metrics ‘F_score’,’accuracy’, ‘precision’, ‘recall’

src.pulsarsa.tools.pipe_methods.protocol module

src.pulsarsa.tools.pipe_methods.protocol.run_pipeline(path_to_work_folder: str, need_data_gen: bool = True, need_training: bool = True, need_validation: bool = True, skip_filter_training: bool = False, val_file_type: str = 'fil')

Run the full PulsarSA pipeline including setup, training, and validation.

Parameters:

path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file.
need_data_gen (bool, optional) – Whether to generate data payloads. Defaults to True.
need_training (bool, optional) – Whether to train the pipeline neural networks. Defaults to True.
need_validation (bool, optional) – Whether to validate the pipeline. Defaults to True.
skip_filter_training (bool, optional) – Whether to skip the filter neural network. Defaults to False.
val_file_type (str, optional) – File type for validation (‘fil’ or ‘npy’). Defaults to ‘fil’.

src.pulsarsa.tools.pipe_methods.scale module

src.pulsarsa.tools.pipe_methods.scale.multiply_work_folder(path_to_parent_folder: str, keys_to_modify: dict[str, float] | None = None)

Create multiple working folders with modified configuration files.

Parameters:

path_to_parent_folder (str) – Path to the parent folder containing the original configuration files. The folder should contain: - config_pipe.yml
keys_to_modify (dict[str, float], optional) – Dictionary where keys are the configuration keys to modify (in dot notation) and values are lists of new values to set for those keys. If None, only one copy of the configuration file will be created without modifications

src.pulsarsa.tools.pipe_methods.scale.get_list_of_work_folders(path_to_parent_folder: str, write_to_file: str | None = None) → list[str]

Get a list of all child work folders in the parent fo

Returns:: list[str] – List of paths to child work folders.

src.pulsarsa.tools.pipe_methods.scale.get_validation_scores_from_work_folders(path_to_parent_folder: str) → dict[str, dict]

Get validation scores from all child work folders in the parent folder.

Parameters:

path_to_parent_folder (str) – Path to the parent folder containing child work folders.

Returns:

dict[str, dict] –

Dictionary where keys are child work folder paths and values are their corresponding: validation scores (loaded from logs/current_scores.json).

src.pulsarsa.tools.pipe_methods.setup_work_space module

src.pulsarsa.tools.pipe_methods.setup_work_space.make_work_space(path_to_work_folder: str)

Create the working directory structure for PulsarSA pipeline.

Parameters:: path_to_work_folder (str) – Path to the working folder where the directory structure will be created.

src.pulsarsa.tools.pipe_methods.setup_work_space.check_if_config_exists(path_to_work_folder: str) → bool

Check if the configuration file exists in the working directory.

Returns:: bool – True if the config_pipe.yml file exists, False otherwise.

src.pulsarsa.tools.pipe_methods.setup_work_space.check_if_pipe_neural_network_models_exist(path_to_work_folder: str) → bool

Check if the neural network models exist in the working directory.

Returns:: bool – True if the pipe_neural_network_models.py file exists, False otherwise.

src.pulsarsa.tools.pipe_methods.setup_work_space.setup_workspace(path_to_work_folder: str)

Setup the workspace

Parameters:: path_to_work_folder (str) – path to setup the workspace

src.pulsarsa.tools.pipe_methods.training_methods module

src.pulsarsa.tools.pipe_methods.training_methods.train_pipeline_neural_nets(path_to_work_folder: str, skip_filter_training: bool = False)

This method trains the neural networks defined in the config file located in path_to_work_folder The work folder must contain a config_pipe.yml file along with pipe_neural_network_models.py if custom models are used

Parameters:: path_to_work_folder (str) – Path to the working folder where the config file is located
Returns:: None

src.pulsarsa.tools.pipe_methods.utils module

src.pulsarsa.tools.pipe_methods.utils.read_yaml_file(path_to_config_file: str)

Reads a yaml file

Parameters:: path_to_config_file (str) – path to the yaml config file
Raises:: FileNotFoundError – if the file doesn’t exist
Returns:: dict – content of the yaml file

src.pulsarsa.tools.pipe_methods.utils.get_network_from_target_folder(path_to_work_folder: str, network_name: str)

Get the neural network from the workfolder containing the pipe_neural_network_models.py file

Raises:

FileNotFoundError – raises error if the pipe_neural_network_models.py file is not found
ValueError – raises error if the network name is not found in the pipe_neural_network_models.py file

Returns:

nn.Module – returns the neural network class

src.pulsarsa.tools.pipe_methods.utils.get_loss_function_from_target_folder(path_to_work_folder: str, loss_function_name: str)

Get the loss function from the pipe_neural_network_models.py file in the workfolder

Raises:

FileNotFoundError – _if_the_pipe_neural_network_models.py_file_is_not_found
ValueError – raises error if the loss_function name is not found in the pipe_neural_network_models.py file

Returns:

nn.Module – returns the loss function class

src.pulsarsa.tools.pipe_methods.validation_methods module

src.pulsarsa.tools.pipe_methods.validation_methods.create_validation_dataset_from_npy_file(path_to_work_folder: str, allow_randomness: bool = False)

create validation dataset from npy file

Parameters:

path_to_work_folder (str) – path to the working folder
allow_randomness (bool, optional) – allow randomness in selection. Defaults to False.

Returns:

tuple[np.memmap, np.ndarray, np.ndarray, int] – data source, all random indices, corresponding y true, random seed

src.pulsarsa.tools.pipe_methods.validation_methods.create_validation_dataset_from_filterbank_file(path_to_work_folder: str, position_column: str = 'position', time_window: int = 512, allow_randomness: bool = False)

Create validation dataset from a real filterbank file and its metadata file.: The metadata file should be a CSV file with at least two columns: ‘index’ and ‘position’. The ‘index’ column should contain the index of the pulse in the filterbank file, and the ‘position’ column should contain the position of the pulse in the time series. The function will select random samples from the filterbank file based on the SNR band specified in the config file.

Parameters:

path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file.
position_column (str, optional) – Name of the column in the metadata file that contains the position of the pulse. Defaults to ‘position’.
time_window (int, optional) – Time window size for each sample. Defaults to 512.
allow_randomness (bool, optional) – Whether to allow randomness in the selection of samples. Defaults to False.

Returns:

tuple[FilDataSource, np.ndarray, np.ndarray, int] – Data source object for the filterbank file. Array of indices of the selected samples. Array of true labels for the selected samples (1 for pulse, 0 for non-pulse). Random seed used for the selection of samples.

src.pulsarsa.tools.pipe_methods.validation_methods.load_pipeline(path_to_work_folder: str)

creates an instance of the pipeline from the configuration file

Parameters:: path_to_work_folder (str) – path to the working folder
Returns:: PipelineImageToFilterToCCtoLabels – instance of the pipeline

src.pulsarsa.tools.pipe_methods.validation_methods.calculate_y_pred(input_image, pipeline: PipelineImageToFilterToCCtoLabels)

Calculates the prediction given an input image

Parameters:

input_image (np.ndarray) – input image
pipeline (PipelineImageToFilterToCCtoLabels) – pipeline instance

Returns:

tuple[int, np.ndarray, np.ndarray, np.ndarray, list] – y_pred, pred_binarized, pred_binarized_filtered, labelled_skeleton, results

src.pulsarsa.tools.pipe_methods.validation_methods.draw_detected_pulsar_results(results, ax)

Plots the results from pipeline

Parameters:

results (tuple|list) – x_coors_sorted, y_coors_sorted, popt, func, category, num_points,snr_calculated
ax (matplotlib.pyplot.axes) – figure axis containing the plot

src.pulsarsa.tools.pipe_methods.validation_methods.validate_pipeline(path_to_work_folder: str, file_type: str = 'fil')

Validate the trained pipeline on a real filterbank file and log the results to MLflow.

Parameters:

path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file and trained models.
file_type (str, optional) – Type of input file either in ‘fil’ or ‘npy’. Defaults to ‘fil’.

src.pulsarsa.tools.pipe_methods.validation_methods.binary_cross_entropy(y_true: ndarray, y_pred: ndarray, eps=1e-15)

Calculates the binary cross-entropy error

Parameters:

y_true (np.ndarray) – list containing 0 or 1
y_pred (np.ndarray) – list containing 0 or 1
eps (float, optional) – epsilon to avoid log(0). Defaults to 1e-15.

Returns:

float – binary cross-entropy error

src.pulsarsa.tools.pipe_methods.validation_methods.validate_pipeline_simple(path_to_work_folder: str, file_type: str = 'fil', pipeline: PipelineImageToFilterToCCtoLabels | None = None, plot_results: bool = False) → dict[str, float]

Validate the trained pipeline on a real filterbank file or npy file.

Parameters:

path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file and trained models.
file_type (str, optional) – Type of input file either in ‘fil’ or ‘npy’. Defaults to ‘fil’.
pipeline (PipelineImageToFilterToCCtoLabels, optional) – if not provided, it will be loaded from the work folder config file defination. Defaults to None.

Returns:

dict[str,float] – dictionary containing F1_score, accuracy, precision, recall, entropy

src.pulsarsa.tools.pipe_methods.validation_methods.validate_pipeline_for_diff_snr_bands(path_to_work_folder: str, snr_bands: list[list[float]], file_type: str = 'fil', pipe_line: PipelineImageToFilterToCCtoLabels | None = None, save_plots: bool = False) → list[dict[str, float]]

Validate the trained pipeline on different SNR bands and log the results to MLflow.

Parameters:

path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file and trained models.
snr_bands (list[list[float]]) – List of SNR bands to validate on.
file_type (str, optional) – Type of input file either in ‘fil’ or ‘npy’. Defaults to ‘fil’.

src.pulsarsa.tools.pipe_methods.validation_methods.create_filterbank_filedataset(filterbank_file_path: str, metadata_path: str, position_column='position', time_window=512, allow_randomness=False, batch_size: int = 10, num_batches: int = 20, freq_range: list | None = None, device: device = device(type='cpu'))

src.pulsarsa.tools.pipe_methods.validation_methods.create_filterbank_filedataset_for_decision_tree(filterbank_file_path: str, metadata_path: str, position_column='position', time_window=512, allow_randomness=False, batch_size: int = 10, num_batches: int = 20, freq_range: list | None = None, device: device = device(type='cpu'))

src.pulsarsa.tools.pipe_methods.validation_methods.calculate_validation_metrics(pipeline, batches, y_trues, return_time: bool = True)

src.pulsarsa.tools.pipe_methods.validation_methods.calculate_validation_metrics_using_decision_tree(pipeline, batches, y_trues, return_time: bool = True)

Module contents

src.pulsarsa.tools.pipe_methods.generate_data_payloads_for_training(path_to_work_folder: str)

Generates sythetic payloads using PulsarDT simulation modules.

Parameters:: path_to_work_folder (str) – path to the working folder containing the config_pipe.yml file.
Returns:: str – Message indicating the location of generated payloads.

src.pulsarsa.tools.pipe_methods.run_sample_plotter(path_to_work_folder: str, num_samples=5)

Run the sample plotter script in the specified working directory.

Parameters:

path_to_work_folder (str) – Path to the working folder where the sample plotter script will be executed.
num_samples (int, optional) – Number of sample plots to generate. Defaults to 5.

Returns:

str – Message indicating the location of the generated sample plots.

src.pulsarsa.tools.pipe_methods.train_pipeline_neural_nets(path_to_work_folder: str, skip_filter_training: bool = False)

Parameters:: path_to_work_folder (str) – Path to the working folder where the config file is located
Returns:: None

src.pulsarsa.tools.pipe_methods.validate_pipeline(path_to_work_folder: str, file_type: str = 'fil')

Validate the trained pipeline on a real filterbank file and log the results to MLflow.

Parameters:

path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file and trained models.
file_type (str, optional) – Type of input file either in ‘fil’ or ‘npy’. Defaults to ‘fil’.

src.pulsarsa.tools.pipe_methods.run_pipeline(path_to_work_folder: str, need_data_gen: bool = True, need_training: bool = True, need_validation: bool = True, skip_filter_training: bool = False, val_file_type: str = 'fil')

Run the full PulsarSA pipeline including setup, training, and validation.

Parameters:

path_to_work_folder (str) – Path to the working folder containing the config_pipe.yml file.
need_data_gen (bool, optional) – Whether to generate data payloads. Defaults to True.
need_training (bool, optional) – Whether to train the pipeline neural networks. Defaults to True.
need_validation (bool, optional) – Whether to validate the pipeline. Defaults to True.
skip_filter_training (bool, optional) – Whether to skip the filter neural network. Defaults to False.
val_file_type (str, optional) – File type for validation (‘fil’ or ‘npy’). Defaults to ‘fil’.

src.pulsarsa.tools.pipe_methods.run_pipeline_estimator_on_paramgrid(path_to_work_folder, param_grid: dict, file_type: str = 'fil')

This function is used to look into the performance metrics of the pipeline for all possible combinations of parameters

Parameters:

path_to_work_folder (str) – path to the folder containing config.yml file containing pipeline descriptions
param_grid (dict) – dictionary representing the parameter names to test as keys and the possible corresponding values as values of the dict
file_type (str, optional) – Type of validation dataset type either ‘npy’ or ‘fil’. Defaults to ‘fil’.

Returns:

dict – returns the best parameter mixes for each category of performance metrics ‘F_score’,’accuracy’, ‘precision’, ‘recall’

src.pulsarsa.tools.pipe_methods.run_pipeline_estimator_on_pca_componentgrid(pipelines: list[PipelineImageToFilterToCCtoLabels], PCA_grid_resolution: float, path_to_work_folder: str, variance_to_capture: float | int = 0.9, file_type: str = 'fil') → dict

This function is used to look into the performance metrics of the mixed pipeline given many similar pipelines trained on different datasets

Parameters:

pipelines (list[PipelineImageToFilterToCCtoLabels]) – list of similar pipelines tained on different simulation datasets
PCA_grid_resolution (float) – grid resolution
path_to_work_folder (str) – path to the folder containing config.yml file containing pipeline descriptions
variance_to_capture (float | int, optional) – if float then it represents the variance in pipelines NN parameters to capture, else represents the number of PCA components to consider (<= number of pipelines). Defaults to .9.
file_type (str, optional) – Type of validation dataset type either ‘npy’ or ‘fil’. Defaults to ‘fil’.

Returns:

dict – returns the best parameter mixes for each category of performance metrics ‘F_score’,’accuracy’, ‘precision’, ‘recall’

src.pulsarsa.tools.pipe_methods.instantiate_tuner(pipeline_list: list[PipelineImageToFilterToCCtoLabels], variance_to_capture: float | int = 3)

Instantiates tuner for mixing the pipelines CNNs

Parameters:

pipeline_list (list[PipelineImageToFilterToCCtoLabels]) – list of pipeline instances
variance_to_capture (float | int, optional) – if float then it represents the variance in pipelines NN parameters to capture, else represents the number of PCA components to consider (<= number of pipelines). Defaults to .9.

Returns:

Tuner – returns the tuner instance