Getting Started =============== Welcome to the **Pulsar Segmentation and Analysis** :pulsarsa:`PulsarSA` package of **ML-PPA**. .. image:: /static/animation_pipeline.gif :alt: animation_pipeline :width: 600px Introduction ------------ This package is designed to provide robust pipelines for the segmentation and analysis of pulsar signals from frequency-time dispersion graphs. The Pipeline can be divided into three main sequential steps: - **Segmentation**: Using CNN (preferably an encoder-decoder type) model to segment pulsar signals, RFIs from the dispersion graph - **Filtering**: Using another CNN (preferably an encoder-decoder type) model to filter out noisy segments (but this step is optional) - **Labelling**: Using legacy vision tools to label the segments into different signal source types (Pulsar, NBRFI, BBRFI) .. image:: /static/pulsarsadotpipeline.png :alt: pulsarsadotpipeline :width: 600px :pulsarsa:`PulsarSA` is made compatible with :pulsarsa:`PulsarDT`. The CNNs used in the pipeline can be trained using synthetic data generated by :pulsarsa:`PulsarDT` utilising modules provided in :pulsarsa:`PulsarSA`. The trained models can then be used in the pipeline to perform segmentation and filtering tasks. Algorithm --------- **Step 1: Segmentation** The first step is to segment the pulsar signals from the frequency-time dispersion graph using a CNN (preferably an encoder-decoder type) model. The model takes the dispersion graph as input and outputs a binary mask indicating the locations of pulsar signals. .. image:: /static/pulsarsadotpipelinedotsegmentation.png :alt: pulsarsadotpipeline :width: 600px The CNN is trained using synthetic data generated by :pulsarsa:`PulsarDT`. The training data consists of pairs of dispersion graphs and their corresponding binary masks. The binary masks are generated by marking the locations of pulsar signals in the dispersion graphs. Example of synthetic training data: .. image:: /static/training_synthetic_data.png :alt: training_synthetic_data :width: 600px **Step 2: Filtering (Optional)** The second step is to filter out noisy segments from the binary mask using another CNN (preferably an encoder-decoder type) model. This step is optional and can be skipped if the segmentation model is robust enough. The filtering model takes the binary mask as input (or as a 2 channel input together with the original dispersion graph) and outputs a filtered binary mask. .. image:: /static/pulsarsadotpipelinedotfiltration.png :alt: pulsarsadotpipelinedotfiltration :width: 600px The filtering CNN is also trained using synthetic data generated by :pulsarsa:`PulsarDT`. The training data consists of pairs of binary masks (or 2 channel inputs) and their corresponding filtered binary masks. The filtered binary masks are generated by the trained CNN used in the segmentation step. Example of synthetic training data for filtering: .. image:: /static/training_synthetic_data_for_filtering.png :alt: training_synthetic_data_for_filtering :width: 600px **Step 3: Labelling** The final step is to label the segments in the filtered binary mask into different signal source types (Pulsar, NBRFI, BBRFI) using legacy vision tools. .. image:: /static/pulsarsadotpipelinedotlabelling.png :alt: pulsarsadotpipelinedotlabelling :width: 600px Here different legacy image-processing tools used to identify the pulsar signals are explained - **Connected Component Analysis (CCA)**: This tool is used to identify connected regions in the binary mask. Each connected region is considered as a potential signal source. Step involved in CCA: 1. **Skeletonization**: This process reduces the connected regions to their skeletal from 2. **Unbranching**: This process removes small branches from the skeleton to simplify the structure, and make it look like line trained_cnn_eg_test_set_1_at_alpha 3. **Filtering based on eclipse axis ratio** (optional): This process filters out segments that do not meet a certain axis ratio threshold (=4), which helps to eliminate noise and irrelevant structures. 4. **Filtering based on length**: This process filters out segments that are shorter than a specified length threshold (=12), which helps to eliminate small noise artifacts. 5. **Line fitting**: This process fits a line to each segment using linear regression, then based on the slope the signal type is determined as Pulse or RFI. .. image:: /static/cc_algorithm_steps.png :alt: skeletonization :width: 600px - **Connected Component clustering**: In this method same steps are used until filtering based on length as in CCA. 1. **Major Axis of CC**: After that for a single segment or connected component PCA is applied to extract the logest axis along which the points of CC are distributed. For each point of the CC x_intercept and y_intercept are calculated having similar slope as the slope of the longest axis. 2. **Filter vertical and horizontal traces**: Then, segments with slopes close to horizontal (near 0 degrees) or vertical (near 90 degrees) are removed to filter out likely RFI or noise artifacts. The remaining segments are considered as potential traces of pulsar signals. 3. **Cluster CC points**: DBSCAN clustering algorithm is then applied on the (x_intercept, y_intercept) points of the remaining segments to group them into possible single pulse traces. 4. **Line Scan using Box window function**: For each cluster of segments, a line scan is performed for each line in the cluster using a box window function to identify the best-fitting line that represents the pulsar signal trace. Regularization method is used to align the signal trace away from the line to avoid overfitting. SNR is calculated for each line scan and sum of top snr lines are averaged to get the final snr of the cluster. The highest snr line is considered as the best fit line for the cluster representing the pulsar signal trace given that the snr is above a certain threshold. .. image:: /static/cc_algorithm_cluster_steps.png :alt: cc_algorithm_cluster_steps :width: 600px Below are some of the examples of line scan method from each cluster used to calculate snr from the mean intensity profile: .. image:: /static/cc_algorithm_clusterlinescan_steps.png :alt: cc_algorithm_clusterlinescan_steps :width: 600px And the SNR is calculated from the mean intensity profile as shown below: .. image:: /static/snr_method.png :alt: nr_method :width: 300px - **DelayGraph method** (Beta): This tool is used to analyse the temporal characteristics of the connected components. It helps to distinguish between pulsar signals and RFIs based on their time-frequency behaviour wth the help of an 1D CNN encoder model. Details about this method will be discussed after Beta testing is finished. The output of this step is a labelled mask where each segment is assigned a label corresponding to its signal source type. Installation ------------ You can install PulsarSA from the Git repository as: .. code-block:: bash git clone https://gitlab.dzastro.de/punch/ml-ppa/pulsarsa.git cd pulsarsa python -m venv .venv source .venv/bin/activate pip install uv uv pip install . Example Hello World ------------------- Here is how to use :pulsarsa:`PulsarSA` in a simple script: .. code-block:: python import numpy as np from pulsarsa import PipelineImageToFilterToCCtoLabels from pulsarsa.tools.example_neural_nets import ImageToMaskNetv3 #: Instantiate the pipeline ppl2fensemble1 = PipelineImageToFilterToCCtoLabels(image_to_mask_network=ImageToMaskNetv3(), trained_image_to_mask_network_path='./docs/example_dats/trained_cnn_eg_test_set_1_at_alpha.pt', mask_filter_network=ImageToMaskNetv3(), trained_mask_filter_network_path='./docs/example_dats/trained_cnnfilter_eg_test_set_1_at_alpha.pt', snr_thresh=4, box_func_window=5, corr_thresh=2, ) #: Load example data to test pulsar signal detection example_fake_real_pulsar_data = np.load(file='./docs/example_dats/example_fake_real_pulsar_data.npy',mmap_mode='r') example_fake_real_pulsar_data_label = np.load(file='./docs/example_dats/example_fake_real_pulsar_data_label.npy',mmap_mode='r') #: Run and detect pulsar signals ppl2fensemble1.test_on_real_data_from_npy_files( image_data_set=example_fake_real_pulsar_data, image_label_set=example_fake_real_pulsar_data_label, plot_randomly=False, batch_size=10, save_plot_path='./outputs/result.png' ) The following image shows results of pulsar signal detection generated by the above code: .. image:: /static/example_pipeline_results.png :alt: example_pipeline_results :width: 800px More examples can be found in the examples page