src package

Subpackages

src.pulsarsa package

Module contents

class src.Payload(freqs: list[float], bandwidths: list[float] | None = None)

Bases: object

This class is used to store radio packet data.

plot(type: str = 'img')

Plots the dataframe

Parameters:: type (str, optional) – _description_. Defaults to ‘img’.
Returns:: plt.axes – returns the axes of the plot

add_flux(radio_packet: list[list[float]])

This function adds the flux to the dataframe. The flux is a list of lists, where each list is a row of the dataframe. The first element of the list is the flux and the second element is the frequency. The function checks if the frequencies of the radio packet match the frequencies of the payload. If they do, it adds the flux to the dataframe. If they don’t, it raises an error.

Parameters:: radio_packet (list[list[float]]) – list of lists, where each list is a row of the dataframe. The first element of the list is the flux and the second element is the frequency.

add_description(description: dict)

This function adds a description to the payload. The description is a dictionary with the keys ‘Pulsar’, ‘NBRFI’, ‘BBRFI’. The values of the dictionary are the descriptions of the pulsar, NBRFI and BBRFI respectively.

Parameters:: description (dict) – dictionary with the keys ‘Pulsar’, ‘NBRFI’, ‘BBRFI’. The values of the dictionary are the number of the pulsar, NBRFI and BBRFI respectively.

assign_bandwidths_to_freqchannels(bandwidths: list[float])

This function assigns bandwidths to the freqs

Parameters:: bandwidths (list[float]) – list of bandwidths.

Note

The bandwidths are assigned to the frequencies in the same order as the frequencies. The bandwidths list length must match the length of the frequencies.

assign_rot_phases(rot_phases: list[float])

This function assigns the rotation phases to the dataframe. The rotation phases are a list of floats

Parameters:: rot_phases (list[float]) – _description_

write_payload_to_jsonfile(file_name: str)

This method writes the payload to a json file.

Parameters:: file_name (str) – path + name of the file to write to

classmethod read_payload_from_jsonfile(filename: str)

This method reads the payload from a json file.

Parameters:: filename (str) – path + name of the file to read from
Returns:: Payload – returns the payload object

class src.PrepareFreqTimeImage(do_rot_phase_avg: bool = True, do_resize: bool = True, do_binarize: bool = False, resize_size: tuple = (256, 256), binarize_engine: ~src.pulsarsa.preprocessing.BinarizeToMask = <src.pulsarsa.preprocessing.BinarizeToMask object>)

Bases: object

Class to implement methods to load and pre process radio payloads to freq-time image

preparation_protocol(payload: Payload)

Protocol to call methods to prepare freq-time graphs

Parameters:: payload (Payload) – Payload class object made during the simulation
Returns:: image (np.ndarray) – freq-time image

plot(payload_address: str)

plot the freq-time graph loaded from payload file

Parameters:: payload_address (str) – path to the payload file

average_payload_rotphase(payload: Payload)

Method to averge payload to 0-360 rotphase having multiple rotations

Parameters:: payload (Payload) – Payload class object made during the simulation
Returns:: image (np.ndarray) – freq-time image

class src.ImageReader(file_type: type = <src.pulsarsa.information_packet_formats.Payload object>, resize_size: tuple = (256, 256), do_average: bool = False, do_binarize: bool = False)

Bases: object

Class ImageReader acts as an engine to load/prepare images from payload files or numpy arrays

static read_from_payload(filename: str, resize_size: tuple = (128, 128), do_average: bool = False, do_binarize: bool = False)

Method to load freq-time image from payload files

Parameters:

filename (str) – full address to payload file
resize_size (tuple, optional) – output shape of the loaded image. Defaults to (128, 128).
do_average (bool, optional) – If True then the image is created by averaging phase values over many rotations. Defaults to False.
do_binarize (bool, optional) – If True then the image is binarized. Defaults to False.

Returns:

(np.ndarray) – loaded image

class src.ImageDataSet(image_tag: str, image_directory: str, image_reader_engine: ~src.pulsarsa.pipeline_methods.ImageReader | ~src.pulsarsa.preprocessing.PrepareFreqTimeImage = <src.pulsarsa.pipeline_methods.ImageReader object>)

Bases: object

This class is used to represent or memory map a set of images

plot(idx)

plots the image from the set represented by the idx

Parameters:: idx (int) – index of the image
Returns:: plt.axis – axis of the plot

class src.LabelReader(file_type: type = <src.pulsarsa.information_packet_formats.Payload object>)

Bases: object

This class acts as an engine to read the labels from files of type payload

static read_from_payload(filename: str)

Method to read the label from pyload file

Parameters:: filename (str) – full path to the payload file
Returns:: dict – dictionary containing details of the payload file

static read_from_str(label_memomory_map: str): Method to read the label from numpy file

static correct_key_names(description: dict)

class src.LabelDataSet(image_tag: str, image_directory: str, label_reader_engine: ~src.pulsarsa.pipeline_methods.LabelReader = <src.pulsarsa.pipeline_methods.LabelReader object>)

Bases: object

This class is used to represent or memory map a set of labels of the images

plot(idx)

prints the label of idx image

Parameters:: idx (int) – index representing the image
Returns:: dict – description

class src.ImageToMaskDataset(image_tag: str, mask_tag: str, image_directory: str, mask_directory: str, image_engine: ~src.pulsarsa.preprocessing.PrepareFreqTimeImage = PrepareFreqTimeImage class object with attributes {'do_rot_phase_avg': True, 'do_resize': True, 'resize_size': (256, 256), 'do_binarize': False, 'binarize_engine': <src.pulsarsa.preprocessing.BinarizeToMask object>}, mask_engine: ~src.pulsarsa.preprocessing.PrepareFreqTimeImage = PrepareFreqTimeImage class object with attributes {'do_rot_phase_avg': True, 'do_resize': True, 'resize_size': (256, 256), 'do_binarize': True, 'binarize_engine': <src.pulsarsa.preprocessing.BinarizeToMask object>}, device: ~torch.device = device(type='cpu'))

Bases: Dataset

Class to represent Image Mask dataset

plot(index)

Plot image mask pair represented by index

Parameters:: index (int) – index of the pair

class src.InMaskToMaskDataset(image_tag: str, mask_tag: str, image_directory: str, mask_directory: str, mask_maker_engine: ~src.pulsarsa.pipeline_methods.PipelineImageToMask, image_engine: ~src.pulsarsa.preprocessing.PrepareFreqTimeImage = PrepareFreqTimeImage class object with attributes {'do_rot_phase_avg': True, 'do_resize': True, 'resize_size': (256, 256), 'do_binarize': False, 'binarize_engine': <src.pulsarsa.preprocessing.BinarizeToMask object>}, mask_engine: ~src.pulsarsa.preprocessing.PrepareFreqTimeImage = PrepareFreqTimeImage class object with attributes {'do_rot_phase_avg': True, 'do_resize': True, 'resize_size': (256, 256), 'do_binarize': True, 'binarize_engine': <src.pulsarsa.preprocessing.BinarizeToMask object>}, device: ~torch.device = device(type='cpu'))

Bases: Dataset

Class to represent InMask Mask dataset

plot(index)

Plot InMask and Mask pair

Parameters:: index (int) – index of the pair to plot

class src.ImageInMaskToMaskDataset(image_tag: str, mask_tag: str, image_directory: str, mask_directory: str, mask_maker_engine: ~src.pulsarsa.pipeline_methods.PipelineImageToMask, image_engine: ~src.pulsarsa.preprocessing.PrepareFreqTimeImage = PrepareFreqTimeImage class object with attributes {'do_rot_phase_avg': True, 'do_resize': True, 'resize_size': (256, 256), 'do_binarize': False, 'binarize_engine': <src.pulsarsa.preprocessing.BinarizeToMask object>}, mask_engine: ~src.pulsarsa.preprocessing.PrepareFreqTimeImage = PrepareFreqTimeImage class object with attributes {'do_rot_phase_avg': True, 'do_resize': True, 'resize_size': (256, 256), 'do_binarize': True, 'binarize_engine': <src.pulsarsa.preprocessing.BinarizeToMask object>}, device: ~torch.device = device(type='cpu'))

Bases: Dataset

Class to represent Image+InMask Mask dataset

plot(index): Plot the 2-channel input (image + in_mask) and the target mask.

class src.TrainImageToMaskNetworkModel(num_epochs: int, loss_criterion: Module = CustomLossUNet(), store_trained_model_at: str = './syn_data/model/trained_unet_test_v0.pt', model: Module | None = None, train_test_split: float = 0.8, learning_rate: float = 0.001, batch_size: int = 10)

Bases: object

Class involving methods to train Image (or InMask) to Mask Network

train_model(image_mask_pairset: ImageToMaskDataset | InMaskToMaskDataset | ImageInMaskToMaskDataset, early_stopping_patience: int = 5, plot: bool = False, plot_path: str | None = None, label_dataset: LabelDataSet | None = None, preferred_label: str | None = None, plot_validation_samples_at: str | None = None, ml_flow_folder: str | None = None, ml_flow_exp_name: str | None = None)

Train the network with validation, early stopping, and optional plotting.

Parameters:

image_mask_pairset (ImageToMaskDataset|InMaskToMaskDataset|ImageInMaskToMaskDataset) – Dataset containing image-mask pairs.
early_stopping_patience (int) – Epochs to wait after no val loss improvement.
plot (bool) – Whether to show a loss curve plot. Default is False.

Returns:

(list, list, list) – epochs, training losses, validation losses

test_model(image: tensor, plot_pred: bool = False)

Method to test the network

Parameters:

image (torch.tensor) – image to convert to mask
plot_pred (bool, optional) – If True, plots the prediction from the NN. Defaults to False.

Returns:

(np.ndarray) – prediction by the network as predicted mask

class src.TrainSignalToLabelModel(num_epochs: int, loss_criterion: Module = BCELoss(), store_trained_model_at: str = './syn_data/model/trained_OneDconvEncoder_test_v0.pt', model: Module | None = None)

Bases: object

Class involving methods to train nn to classify signal into labels

train_model(signal_label_pairset: SignalToLabelDataset)

Method to train the NN

Parameters:: signal_label_pairset (SignalToLabelDataset) – signal label pair dataset
Returns:: (list,list) – epoch number and the loss in each epoch

test_model_from_signal(signal: tensor, plot_pred: bool = False)

Method to test the nn to classify a signal

Parameters:

signal (torch.tensor) – signal to classify
plot_pred (bool, optional) – If True, plots the signal and with prediction probability of each categories. Defaults to False.

Returns:

np.ndarray – Predicted labels are with probabilities

test_model(mask: ndarray, plot_pred: bool = False)

Method to test model from mask

Parameters:

mask (np.ndarray) – mask from which category probability is predicted after generating the signal
plot_pred (bool, optional) – If True plots the results. Defaults to False.

Returns:

np.ndarray – Predicted labels are with probabilities

class src.UNet(in_channels=1, out_channels=1, init_features=32)

Bases: Module

This model is taken from the original paper of UNet available in PyTorch website. Some modification is done in terms of input and output channels and kernel size.

Parameters:: nn (_type_) – _description_

class src.FilterCNN(in_channels=1, out_channels=1, init_features=32)

Bases: Module

This is a simple encoder-decoder architecture for filtering the segmented pulsar signals.

class src.PipelineImageToCCtoLabels(image_to_mask_network: Module, trained_image_to_mask_network_path: str, min_cc_size_threshold: int = 10)

Bases: object

Class implementing methods in sequence to generate segmented freq-time Image, then CC then to determine CCs to categories

get_nns_from_pipeline()

This method returns the neural networks used in the pipeline

Returns:: list – list of neural networks

image_to_mask_method(image: ndarray)

Method to convert image to mask

Parameters:: image (np.ndarray) – image
Returns:: image (np.ndarray) – mask

mask_to_labelled_skeleton_method(mask: ndarray)

Method to make labelled skeleton from mask

Parameters:: mask (np.ndarray) – segmented mask
Returns:: (np.ndarray) – labelled skeleton

labelled_skeleton_to_labels_method(labelled_skeleton: ndarray, return_detailed_results: bool = False)

Method to analyze each labels of the labelled skeleton

Parameters:

labelled_skeleton (np.ndarray) – labelled skeleton
return_detailed_results (bool, optional) – _description_. Defaults to False.

Returns:

results in list of each labels

display_results_in_batch(image_data_set: ImageDataSet, mask_data_set: ImageDataSet, label_data_set: LabelDataSet, randomize: bool = True, ids_toshow: list = [0, 1], batch_size: int = 2)

Plot results of the pipeline with step outputs and comparison with pre-labelled dataset

Parameters:

image_data_set (ImageDataSet) – Image dataset
label_data_set (LabelDataSet) – Label dataset of the images
mask_data_set (ImageDataSet) – Mask dataset of the image dataset
randomize (bool, optional) – If True, randomly chooses images from the dataset. Defaults to True.
ids_toshow (list, optional) – If radomize = False, then choose ids_show from dataset. Defaults to [0, 1].
batch_size (int, optional) – If randomize=True, then chooses batch_size images from set. Defaults to 2.

test_on_real_data_from_npy_files(image_data_set: memmap, image_label_set: memmap | None = None, plot_randomly: bool = True, batch_size: int = 5)

Method to test pipeline on .npy file dataset

Parameters:

image_data_set (np.memmap) – image dataset as numpy array
image_label_set (np.memmap | None, optional) – label dataset as numpy array. Defaults to None.
plot_details (bool, optional) – if True then plot the results. Defaults to False.
plot_randomly (bool, optional) – If True then randomly choose images from dataset. Defaults to True.
batch_size (int, optional) – number of images to test. minimum is 2. Defaults to 5.

measure_accuracy(image_data_set: memmap, label_data_set: memmap, plot_results: bool = False, return_specific_key: str | None = None)

validate_efficiency(image_data_set: ImageDataSet, label_data_set: LabelDataSet, plot_results: bool = True, return_specific_key: str | None = None)

Method to validate efficiency of the pipeline from image and label dataset

Parameters:

image_data_set (ImageDataSet) – image dataset
label_data_set (LabelDataSet) – label dataset
plot_results (bool, optional) – If True then plot the results. Defaults to True.
return_specific_key (str | None, optional) – If provided, returns only the results for this specific key. Defaults to None.

Returns:

tuple – true_scores,true_negative_scores,false_scores,signal_present,f1_scores, precision, recall, accuracy

class src.PipelineImageToDelGraphtoIsPulsar(image_to_mask_network: Module, trained_image_to_mask_network_path: str, signal_to_label_network: Module, trained_signal_to_label_network: str)

Bases: object

Class implementing methods in sequence to generate segmented freq-time Image then Delay graph then to determine if pulsar is there

get_nns_from_pipeline()

image_to_mask_method(image: ndarray)

Method to convert image to mask

Parameters:: image (np.ndarray) – image
Returns:: image (np.ndarray) – mask

mask_to_signal_method(mask)

Method to convert mask to delaygraph and extract lags as signal

Parameters:: image (np.ndarray) – mask
Returns:: np.ndarray – x_lags as signal

signal_to_label_method(signal: ndarray)

Method to determine if pulsar is there based on signal

Parameters:: signal (np.ndarray) – x_lags as signal
Returns:: float – probability if pulsar is there

validate_efficiency(image_data_set: ImageDataSet, label_data_set: LabelDataSet)

Method to validate efficiency of the pipeline from image and label dataset

Parameters:

image_data_set (ImageDataSet) – image dataset
label_data_set (LabelDataSet) – label dataset

Returns:

float – efficiency measure

Plot results of the pipeline with step outputs and comparison with pre-labelled dataset

Parameters:

image_data_set (ImageDataSet) – Image dataset
mask_data_set (ImageDataSet) – Mask dataset of the image dataset
label_data_set (LabelDataSet) – Label dataset of the images
randomize (bool, optional) – If True, randomly chooses images from the dataset. Defaults to True.
ids_toshow (list, optional) – If radomize = False, then choose ids_show from dataset. Defaults to [0, 1].
batch_size (int, optional) – If randomize=True, then chooses batch_size images from set. Defaults to 2.

test_on_real_data_from_npy_files(image_data_set: memmap, image_label_set: memmap | None = None, plot_details: bool = False, plot_randomly: bool = True, batch_size: int = 5)

Method to test pipeline on .npy file dataset

Parameters:

image_data_set (np.memmap) – image dataset as numpy array
image_label_set (np.memmap | None, optional) – label dataset as numpy array. Defaults to None.
plot_details (bool, optional) – if True then plot the results. Defaults to False.
plot_randomly (bool, optional) – If True then randomly choose images from dataset. Defaults to True.
batch_size (int, optional) – number of images to test. minimum is 2. Defaults to 5.

class src.PipelineImageToFilterDelGraphtoIsPulsar(image_to_mask_network: Module, trained_image_to_mask_network_path: str, mask_filter_network: Module, trained_mask_filter_network_path: str, signal_to_label_network: Module, trained_signal_to_label_network: str, skip_filter: bool = False)

Bases: object

Class implementing methods in sequence to generate segmented freq-time Image, filter it, then Delay graph then to determine if pulsar is there

skip_filtering(skip_filter: bool = False)

Method to skip filter step

Parameters:: skip_filter (bool, optional) – If True then skip filter step. Defaults to False.

get_nns_from_pipeline()

image_to_mask_method(image: ndarray)

Method to convert image to mask

Parameters:: image (np.ndarray) – image
Returns:: image (np.ndarray) – mask

filter_mask_method(pred_binarized: ndarray)

Method to filter out wrong segments in the segmented mask

Parameters:: pred_binarized (np.ndarray) – segmented mask to filter
Returns:: np.ndarray – filtered segmented mask

mask_to_signal_method(mask)

Method to convert mask to delaygraph and extract lags as signal

Parameters:: image (np.ndarray) – mask
Returns:: np.ndarray – x_lags as signal

signal_to_label_method(signal: ndarray)

Method to determine if pulsar is there based on signal

Parameters:: signal (np.ndarray) – x_lags as signal
Returns:: float – probability if pulsar is there

validate_efficiency(image_data_set: ImageDataSet, label_data_set: LabelDataSet)

Method to validate efficiency of the pipeline from image and label dataset

Parameters:

image_data_set (ImageDataSet) – image dataset
label_data_set (LabelDataSet) – label dataset

Returns:

float – efficiency measure

Plot results of the pipeline with step outputs and comparison with pre-labelled dataset

Parameters:

image_data_set (ImageDataSet) – Image dataset
mask_data_set (ImageDataSet) – Mask dataset of the image dataset
label_data_set (LabelDataSet) – Label dataset of the images
randomize (bool, optional) – If True, randomly chooses images from the dataset. Defaults to True.
ids_toshow (list, optional) – If radomize = False, then choose ids_show from dataset. Defaults to [0, 1].
batch_size (int, optional) – If randomize=True, then chooses batch_size images from set. Defaults to 2.

test_on_real_data_from_npy_files(image_data_set: memmap, image_label_set: memmap | None = None, plot_details: bool = False, plot_randomly: bool = True, batch_size: int = 5)

Method to test pipeline on .npy file dataset

Parameters:

image_data_set (np.memmap) – image dataset as numpy array
image_label_set (np.memmap | None, optional) – label dataset as numpy array. Defaults to None.
plot_details (bool, optional) – if True then plot the results. Defaults to False.
plot_randomly (bool, optional) – If True then randomly choose images from dataset. Defaults to True.
batch_size (int, optional) – number of images to test. minimum is 2. Defaults to 5.

class src.PipelineImageToFilterToCCtoLabels(image_to_mask_network: Module, trained_image_to_mask_network_path: str, mask_filter_network: Module, trained_mask_filter_network_path: str, min_cc_size_threshold: int = 10, skip_filter: bool = False, min_axis_ratio: float = 4.0, allow_image_as_input_to_filter: bool = False, remove_backgound: bool = True, one_pulse_mode: bool = True, return_snr: bool = False, box_func_window=5, snr_thresh=2, corr_thresh=5)

Bases: object

Class implementing methods in sequence to generate a segmented frequency-time image, filter it, perform connected components (CC) analysis, and categorize the components.

The processing pipeline consists of the following steps:

Image-to-mask conversion using a neural network.
Mask filtering using a neural network.
Mask-to-labeled skeleton conversion using connected components.
Labeled skeleton to category/label identification.

Parameters:

image_to_mask_network (nn.Module) – Neural network to convert image to mask.
trained_image_to_mask_network_path (str) – Path to trained weights of the image-to-mask network.
mask_filter_network (nn.Module) – Neural network to filter mask.
trained_mask_filter_network_path (str) – Path to trained weights of the mask filter network.
min_cc_size_threshold (int) – Minimum size of a connected component to be considered valid.
skip_filter (bool) – If True, skip the filtering step.
min_axis_ratio (float) – Minimum axis ratio of a connected component to be considered valid.
allow_image_as_input_to_filter (bool) – If True, allow the image as input to the filter network.
remove_background (bool) – If True, remove background from the image before processing.
one_pulse_mode (bool) – If True, use one-pulse mode. Clustering and regularization select the best cluster of CCs, which is then identified as a single category.
return_snr (bool) – If True, return SNR in the results.
box_func_window (int) – Window size for boxcar function to align pulses in each channel on the lines from each cluster.
snr_thresh (float) – Threshold for SNR to consider a cluster as a valid pulse.
corr_thresh (float) – Threshold for correlation to identify a valid pulse component in a channel for regularization.

__call__(image: np.ndarray, return_steps: bool = False): Runs the pipeline on the given image.

plot(image: ndarray, return_steps: bool = False)

Method to plot the results of the pipeline :param image: input image :type image: np.ndarray :param return_steps: If True, return intermediate steps. Defaults to False. :type return_steps: bool, optional

Returns:: matplotlib.axes.Axes – Axes object with the plots

remove_background(image: ndarray) → ndarray

Method to remove background from the image using gaussian filtering for gradient type background

Parameters:: image (np.ndarray) – input image
Returns:: np.ndarray – image with background removed

skip_filtering(skip_filter: bool = True)

Method to skip filtering step in the pipeline

Parameters:: skip_filter (bool, optional) – If True, then skip filtering step. Defaults to True.

get_nns_from_pipeline()

This method returns the neural networks used in the pipeline

Returns:: list – list of neural networks used in the pipeline

image_to_mask_method(image: ndarray)

Method to convert image to mask

Parameters:: image (np.ndarray) – image
Returns:: image (np.ndarray) – mask

mask_to_labelled_skeleton_method(mask: ndarray)

Method to make labelled skeleton from mask

Parameters:: mask (np.ndarray) – segmented mask
Returns:: (np.ndarray) – labelled skeleton

filter_mask_method(pred_binarized: ndarray)

Method to filter out wrong segments in the segmented mask

Parameters:: pred_binarized (np.ndarray) – segmented mask to filter
Returns:: np.ndarray – filtered segmented mask

filter_imagemask_method(image_pred_binarized: ndarray) → ndarray

Filter out wrong segments in the segmented mask.

Parameters:: image_pred_binarized (np.ndarray) – input with 2 channels (image + in_mask), shape (H, W, 2)
Returns:: np.ndarray – filtered and binarized segmented mask, shape (H, W)

analyse_signal_noise_segement_ratio(binary_filtered_mask: ndarray, thresh: float = 0.3)

Method to analyze the signal to noise segment ratio in the filtered mask :param binary_filtered_mask: filtered mask :type binary_filtered_mask: np.ndarray :param thresh: minimum signal to noise segment ratio to consider a segment as valid. Defaults to 0.5. :type thresh: float, optional

Returns:: bool – True if the signal to noise segment ratio is less then threshold, False otherwise

labelled_skeleton_to_labels_method(labelled_skeleton: ndarray, return_detailed_results: bool = False)

Method to analyze each labels of the labelled skeleton

Parameters:

labelled_skeleton (np.ndarray) – labelled skeleton
return_detailed_results (bool, optional) – _description_. Defaults to False.

Returns:

results in list of each labels

labelled_skeleton_to_labels_for_one_pulse_method(labelled_skeleton: ndarray, real_image: ndarray, return_detailed_results: bool = False, return_snr: bool = False, snr_thresh=2, box_func_window=5, signal_length=20, top_n=5, corr_thresh=5)

Plot results of the pipeline with step outputs and comparison with pre-labelled dataset

Parameters:

image_data_set (ImageDataSet) – Image dataset
label_data_set (LabelDataSet) – Label dataset of the images
randomize (bool, optional) – If True, randomly chooses images from the dataset. Defaults to True.
ids_toshow (list, optional) – If radomize = False, then choose ids_show from dataset. Defaults to [0, 1].
batch_size (int, optional) – If randomize=True, then chooses batch_size images from set. Defaults to 2.

test_on_real_data_from_npy_files(image_data_set: memmap, image_label_set: memmap | None = None, plot_randomly: bool = True, batch_size: int = 5, save_plot_path: str | None = None)

Method to test pipeline on .npy file dataset

Parameters:

image_data_set (np.memmap) – image dataset as numpy array
image_label_set (np.memmap | None, optional) – label dataset as numpy array. Defaults to None.
plot_details (bool, optional) – if True then plot the results. Defaults to False.
plot_randomly (bool, optional) – If True then randomly choose images from dataset. Defaults to True.
batch_size (int, optional) – number of images to test. minimum is 2. Defaults to 5.
save_plot_path (str, optional) – If provided saves the plot in that location. Defaults to None

measure_accuracy(image_data_set: memmap, label_data_set: memmap, plot_results: bool = False, return_specific_key: str | None = None)

validate_efficiency(image_data_set: ImageDataSet, label_data_set: LabelDataSet, plot_results: bool = True, return_specific_key: str | None = None)

Method to validate efficiency of the pipeline from image and label dataset

Parameters:

image_data_set (ImageDataSet) – image dataset
label_data_set (LabelDataSet) – label dataset
plot_results (bool, optional) – If True then plot the results. Defaults to True.
return_specific_key (str | None, optional) – If provided, returns only the results for this specific key. Defaults to None.

Returns:

tuple – true_scores,true_negative_scores,false_scores,signal_present,f1_scores, precision, recall, accuracy

class src.PipelineImageToMask(image_to_mask_network: Module, trained_image_to_mask_network_path: str)

Bases: object

Class implementing methods in sequence to generate segmented freq-time Image from freq-time Image

image_to_mask_method(image: ndarray)

Method to convert image to mask

Parameters:: image (np.ndarray) – image
Returns:: image (np.ndarray) – mask

plot(image: ndarray)

plots the image from the set represented by the idx

Parameters:: image (np.ndarray) – image
Returns:: plt.axis – axis of the plot

class src.Tuner(sample_of_objects: list[Module | PipelineImageToCCtoLabels | PipelineImageToFilterToCCtoLabels | PipelineImageToFilterDelGraphtoIsPulsar | PipelineImageToDelGraphtoIsPulsar], show_steps: bool = True, reset_components: bool = True, variance_to_capture: float | int = 0.95, all_sliders: bool = False)

Bases: object

This class is used to tune the parameters of a neural network or a pipeline. It uses PCA to reduce the dimensionality of the parameters and then allows the user to modify the PCA components to generate new data points. This feature is in beta mode and subject to upgrade/change

Parameters:

sample_of_objects (list) – A list of objects that are either nn.Module or PipelineImageToCCtoLabels or PipelineImageToFilterToCCtoLabels or PipelineImageToFilterDelGraphtoIsPulsar or PipelineImageToDelGraphtoIsPulsar.
show_steps (bool, optional) – If True, it will print the steps of the tuning process. Defaults to True.
reset_components (bool, optional) – If True, it will reset the PCA components to the average scaled PCA factors. Defaults to True.
variance_to_capture (float|int, optional) – The variance to capture in PCA. Defaults to 0.95.
all_sliders (bool, optional) – If True, it will generate sliders for all PCA components. Defaults to False.

When a tuner instance is called with a folder_to_save argument, it will generate a mixed model and save the state_dict of the neural networks in the pipeline to the specified folder, and return the mixed model object.

generate_mixed_model_from_current_components(): Generates a mixed model obj from the current PCA components and returns it.

get_sample_pca_components(): Returns the sample PCA components used for tuning.

get_current_scaled_pca_factors(): Returns the current scaled PCA factors.

set_scaled_pca_factors(scaled_pca_factors: ndarray): Sets the scaled PCA factors to the given values.

generate_mixed_model_from_input_pca_factors(pca_factors: ndarray)

get_component_ranges()

class src.PipelineTunerLossFunction(tuner: Tuner, image_dataset, label_dataset, sample_size, weights, save_model: bool = False, save_path: str = './syn_data/model/')

Bases: object

This class deals with calculating the loss function of the mixed model generated by mixing PCA components. In its current state it takes a Tuner object, image dataset, label dataset, sample size and weights for the loss function. For creating an instance of this class, you need to pass the Tuner object, image dataset, label dataset, sample size and weights.

Parameters:

tuner (Tuner) – The Tuner object that contains the PCA engine and other parameters.
image_dataset (np.ndarray) – numpy array of 2d images to be used for testing the mixed model.
label_dataset (np.ndarray) – numpy array of labels corresponding to the images For example [‘Pulsar+NBRFI’,’Pulsar’].
sample_size (int) – Number of samples to randomly select from the dataset for calculating loss.
weights (list) – Weights to be applied to the loss function for each label component.
save_model (bool, optional) – Whether to save the best mixed model or not. Defaults to False.
save_path (str, optional) – Path to save the best mixed model if save_model is True. Defaults to ‘./syn_data/model/’.

Note

The Tuner object should have been initialized with the PCA engine and the number of components to be used for mixing.
If the sample size is smaller than the number of images in the dataset, it will randomly select samples from the dataset which will lead to noise in the loss function. On the other hand, if the sample size is close to the number of images in the dataset, it will lead to a more stable loss function but in case of big datasets it will take a lot of time to calculate the loss function.

catalogue_data_set_containing_pulsar_nbrfi_bbrfi_none(): This function reads the input label_numpy set and catague the index containing pulsar,NBRFI,BBRFI,None

create_normalized_ori_labels(idx: list)

create_normalized_pred_labels(predictor, idx: list)

src.find_the_best_mixed_pipeline(f: PipelineTunerLossFunction, n_calls=200, n_initial_points=10, minimizer: str = 'gp_minimize', random_seed: int | None = None, niter=1, min_minima: bool = True, save_model: bool = False, save_path: str = './syn_data/model/')

This function finds the best mixed model by optimizing the PCA components using a loss function. It uses the skopt library to minimize the loss function by varying the PCA components. The function returns the best mixed model and the results of the optimization.

Parameters:

f (PipelineTunerLossFunction) – An instance of the PipelineTunerLossFunction class that contains the loss function to be minimized.
n_calls (int, optional) – Number of calls to the loss function during optimization. Defaults to 200.
n_initial_points (int, optional) – Number of initial points to sample before optimization apart from the x0 points. Defaults to 10.
minimizer (str, optional) – Minimization method to use. Options are ‘gp_minimize’, ‘forest_minimize’. Defaults to ‘gp_minimize’.
random_seed (int, optional) – Random seed for reproducibility. Defaults to None.
niter (int, optional) – Number of iterations to run the optimization. Defaults to 1.
min_minima (bool, optional) – The type of minima to return. If True, returns the minimum minima found. If False, returns the minima closest to the mean of all minima found. Defaults to True.
save_model (bool, optional) – Whether to save the best mixed model or not. Defaults to False.
save_path (str, optional) – Path to save the best mixed model if save_model is True. Defaults to ‘./syn_data/model/’.

Returns:

tuple – A tuple containing the best mixed model and a list of results from the optimization.

src.save_best_mixed_model(best_mixed_model, save_path: str = './syn_data/model/')

Saves the best mixed model’s dynamic parameters (specifically the neural nets) to a specified path.

Parameters:

best_mixed_model (callable) – The best mixed model to be saved.
save_path (str) – The directory where the model should be saved.

src.plot_loss_in_2d_in_pairwise_parameter_combi(res, figsize_per_subplot=4, save_path=None)

Plot all pairwise 2D partial dependence plots of parameters in res on a grid. Adds a colorbar and axis labels for each subplot and shows the figure.

Parameters:

res – skopt result object (must have res.space.dimensions).
figsize_per_subplot – size per subplot in inches (default 4).
save_path – path to save the figure (default None, which means it won’t be saved).

Returns:

matplotlib.figure.Figure object

src.make_best_mixed_model_from_params(f: PipelineTunerLossFunction, best_pca_combi: list, save_model: bool = False, save_path: str = './syn_data/model/')

Creates the best mixed model from the given PCA components and saves it if required.

Parameters:

f (PipelineTunerLossFunction) – An instance of the PipelineTunerLossFunction class that contains the loss function to be minimized.
best_pca_combi (list) – The best PCA components to be used for creating the mixed model.
save_model (bool, optional) – Whether to save the best mixed model or not. Defaults to False.
save_path (str, optional) – Path to save the best mixed model if save_model is True. Defaults to ‘./syn_data/model/’.

Returns:

callable – The best mixed model created from the given PCA components.