Examples
========

This page shows example code snippets demonstrating how to use :pulsarsa:`PulsarSA`.

Configure workspace
-------------------

The Workfolder for running the :pulsarsa:`PulsarSA` have the following structure to use the in-built methods:

.. code-block:: bash

  work_folder/
  ├── payloads/ # folder for PulsarDT synthetic files
  ├── runtime/ # folder for PulsarDT simulation parameters log files
  ├── outputs/ # folder for generated results, outputs from the pipeline
  ├── models/ # folder for Pre-trained or trained model files
  ├── config_pipe.yml # Configuration file for pipeline setup
  └── pipe_neural_network_models.py # neural network models definations

Details about the configuration file is provided below.

.. toctree::
   :maxdepth: 2

   config_pipe

pipe_neural_network_models.py contains the neural network model definitions used in the pipeline. You can define your own models or use the pre-defined model UNet provided in :pulsarsa:`PulsarSA`. 


1. Use the pipeline to detect pulsar signal from a single image

.. code-block:: python

   import numpy as np
   from pulsarsa import PipelineImageToFilterToCCtoLabels
   from pulsarsa.tools.example_neural_nets import ImageToMaskNetv3
   #: Instantiate the pipeline
   ppl2fensemble1 = PipelineImageToFilterToCCtoLabels(image_to_mask_network=ImageToMaskNetv3(),
                              trained_image_to_mask_network_path='./docs/example_dats/trained_cnn_eg_test_set_1_at_alpha.pt',
                              mask_filter_network=ImageToMaskNetv3(),
                              trained_mask_filter_network_path='./docs/example_dats/trained_cnnfilter_eg_test_set_1_at_alpha.pt',
                              snr_thresh=4,
                              box_func_window=5,
                              corr_thresh=2,
                              )

   #: Load example data to test pulsar signal detection
   example_fake_real_pulsar_data = np.load(file='./docs/example_dats/example_fake_real_pulsar_data.npy',mmap_mode='r')   
   image = example_fake_real_pulsar_data[9,:,:]
   results = ppl2fensemble1(image=image)
   ax = ppl2fensemble1.plot(image=image)
   
   fig = ax[0].get_figure() 
   fig.savefig("./outputs/pipeline_output.png", dpi=150, bbox_inches="tight")
   print(results)

output: 

.. code-block:: bash

   {'Pulsar': np.float64(77.0), 'Noise (NBRFI)': 0, 'Noise (BBRFI)': 0, 'Noise (None)': 0}
   
And the saved image output is: 

.. image:: /static/pipeline_output.png
   :alt: pipeline_output
   :width: 400px


2. Run pipeline :pulsarsa:`PulsarSA` 

   - Synthetic Data Generation:

      The :pulsarsa:`PulsarDT` simulation parameters are defined in the config_pipe.yml file. Below is the code snippets

      .. code-block:: python

         from pulsarsa.tools.pipe_methods.data_gen_methods import generate_data_payloads_for_training,run_sample_plotter
         generate_data_payloads_for_training(path_to_work_folder='./')
         run_sample_plotter(path_to_work_folder='./')
      
      or, 

      .. code-block:: python

         from pulsarsa.tools.pipe_methods import run_pipeline
         run_pipeline(path_to_work_folder='./',
                        need_data_gen=True,
                        need_training=False,
                        need_validation=False)
   
   - Training:

      The PulsarSA training parameters are defined in the config_pipe.yml file. Below is the code snippets

      .. code-block:: python

         from pulsarsa.tools.pipe_methods.training_methods import train_pipeline_neural_nets
         train_pipeline_neural_nets(path_to_work_folder='./', skip_filter_training = False)
      
      or,

      .. code-block:: python

         from pulsarsa.tools.pipe_methods import run_pipeline
         run_pipeline(path_to_work_folder='./',
                        need_data_gen=False,
                        need_training=True,
                        need_validation=False)
   
   - Validation:

      After training a pipeline instance can be loaded for pulse detection by the following script based on the congig_pipe.yml file:

      .. code-block:: python

         from pulsarsa import PipelineImageToFilterToCCtoLabels
         from pulsarsa.tools.pipe_methods.validation_methods import load_pipeline
         pipeline_instance:PipelineImageToFilterToCCtoLabels = load_pipeline(path_to_work_folder='./') # Note for now only PipelineImageToFilterToCCtoLabels is supported
                                                                                                       # for the following methods in this example

      The :pulsarsa:`PulsarSA` validation parameters are defined in the config_pipe.yml file. Entries related to fil files are 

      .. code-block:: yaml

         validation:
            mlflow_folder: ./mlflows_validation/
            num_samples: 10
            path_to_metadata_file: ../data/metadata_file.csv
            path_to_real_filterbank_file: ../data/filterbank_file.fil                       
            random_seed: 
            snr_band:
            - 10
            - 100
            resize_input: [128,128]
            time_window: 512
            position_column: position
            allow_randomness: 0


      Entries related to npy files are: 

      .. code-block:: yaml

         validation:
            mlflow_folder: ./mlflows_validation/
            num_samples: 10            
            path_to_metadata_npy_file:    ../docs/example_dats/example_fake_real_pulsar_data_label.npy
            path_to_real_npy_file: ../docs/example_dats/example_fake_real_pulsar_data.npy            
            random_seed:           
            resize_input: [128,128]
            allow_randomness: 0

      Below is the code snippets
      
      .. code-block:: python

         from pulsarsa.tools.pipe_methods.validation_methods import validate_pipeline
         validate_pipeline(path_to_work_folder='./',file_type='npy') # In future versions fil mode will be preffered
      
      or, 

      .. code-block:: python

         from pulsarsa.tools.pipe_methods import run_pipeline
         run_pipeline(path_to_work_folder='./',
                        need_data_gen=False,
                        need_training=False,
                        need_validation=True
                        val_file_type='npy')
   
      Running the above code prints the performance metrics in terminal as shown below:

      .. code-block:: bash

         F1 Score: 1.0, Accuracy: 1.0, Precision: 1.0, Recall: 1.0

      Running the above script also saves 5 plots of pulse detections and non-pulse detections as shown below.

      Running the above script also saves 5 plots of pulse and non-pulse detections shown below.

      .. list-table::
         :widths: 50 50
         :align: center

         * - **Pulse Detections**
           - **Non-Pulse Detections**
         * - 
            .. image:: /static/pulsar_examples.png
               :alt: pulsar_examples
               :scale: 50
           - 
            .. image:: /static/nonpulsar_examples.png
               :alt: nonpulsar_examples
               :scale: 50

           
      Also it activates the mlflow logging in a folder defined in config_pipe.yml for validation. Therefore multiple runs can 
      be visualized using mlflow for detailed information.
   
   For running the pipeline as a whole all keys related to each subprocess of the pipeline can be set to True like below

   .. code-block:: python

      from pulsarsa.tools.pipe_methods import run_pipeline
      run_pipeline(path_to_work_folder='./',
                     need_data_gen=True,
                     need_training=True,
                     need_validation=True)

   - Fine Tuning:

      Fine tuning of pipeline parameters like 'min_axis_ratio', 'min_cc_size_threshold', 'box_func_window' etc can be tuned for the
      best performance scores. This can be acheived by defining a dict called param_grid where possible values of such parameters can be
      defined and the function run_pipeline_estimator_on_paramgrid will return the parameter mix for the best performance scores.

      Below is the code snippet:

      .. code-block:: python

         from pulsarsa.tools.pipe_methods.parameter_tuning import run_pipeline_estimator_on_paramgrid
         param_grid = {'min_axis_ratio':[4,5],
              'min_cc_size_threshold':[10,11,12],
              'box_func_window':[3,4,5]}
         best_params = run_pipeline_estimator_on_paramgrid(path_to_work_folder='./',param_grid=param_grid,file_type='npy')
         print(best_params)

      this will return:

      .. code-block:: bash

         {'best_F_score_param': {'min_axis_ratio': 4, 'min_cc_size_threshold': 10, 'box_func_window': 3, 'score': 1.0},
          'best_accuracy_param': {'min_axis_ratio': 4, 'min_cc_size_threshold': 10, 'box_func_window': 3, 'score': 1.0},
           'best_precision_param': {'min_axis_ratio': 4, 'min_cc_size_threshold': 10, 'box_func_window': 3, 'score': 1.0},
            'best_recall_param': {'min_axis_ratio': 4, 'min_cc_size_threshold': 10, 'box_func_window': 3, 'score': 1.0}}
      
      Fine tuning is also possible, incase we have pipelines trained on different types of simulation datasets. The idea is use
      the Principal component analysis (PCA) to find the principal components (PCs) to later mix the pipelines (basically the NN models 
      used in the pipelines) to come up with the best of all pipelines. This can be acheived with the following code below

      Below is the code snippet:

      .. code-block:: python

         from pulsarsa.tools.pipe_methods.validation_methods import load_pipeline
         from pulsarsa.tools.pipe_methods.parameter_tuning import run_pipeline_estimator_on_pca_componentgrid

         pipeline0 = load_pipeline('./ensemble0/')
         pipeline1 = load_pipeline('./ensemble1/')         

         best_params = run_pipeline_estimator_on_pca_componentgrid(pipelines=[pipeline0,pipeline1],
                                                                   PCA_grid_resolution=0.1,
                                                                   variance_to_capture=1,
                                                                   path_to_work_folder='./',
                                                                   file_type='npy')
         
         print(best_params)

      this will return:

      .. code-block:: bash

         {'best_F_score_param': {'PC0': np.float64(-0.9), 'score': 1.0}, 
         'best_accuracy_param': {'PC0': np.float64(-0.9), 'score': 1.0}, 
         'best_precision_param': {'PC0': np.float64(-1.0), 'score': 1.0}, 
         'best_recall_param': {'PC0': np.float64(-0.9), 'score': 1.0}}


Subsequent examples are focused more on the applications of the core :pulsarsa:`PulsarSA` and :pulsarsa:`PulsarDT` function and methods. 

2. Generate data for training

.. code-block:: python

   from pulsardt import generate_example_payloads_for_training
   generate_example_payloads_for_training(tag='train_set_1_',
                                        num_payloads=1000,
                                        freq_channels = np.arange(0.42, .85, 0.0025),
                                        rot_phases= (0, 360*1, 1),
                                        antenna_sensitivity=0.5,
                                        plot_a_example=False,
                                        param_folder='./syn_data/runtime/',
                                        payload_folder='./syn_data/payloads/',
                                        num_cpus=10, #: choose based on the number of nodes/cores in your system,
                                        prob_bbrfi=0.2,
                                        prob_nbrfi=0.2,                                        
                                        )


3. Train a neural network

.. code-block:: python

   import numpy as np
   from pulsarsa import PrepareFreqTimeImage, ImageToMaskDataset, TrainImageToMaskNetworkModel
   from pulsarsa.tools.example_neural_nets import ImageToMaskNetv3
   from pulsarsa.neural_network_models import  WeightedBCELoss
   
   #: Instantiate the trainer
   image2mask_network_trainer = TrainImageToMaskNetworkModel(
                                model=ImageToMaskNetv3(),
                                num_epochs=20,
                                store_trained_model_at="./syn_data/model/trained_net_test.pt",
                                loss_criterion = WeightedBCELoss(pos_weight=3,neg_weight=1)                                
                                )
   #: Define datasets and dataloaders
   image_loader = PrepareFreqTimeImage(
                                       do_rot_phase_avg=True,
                                       do_binarize=False,
                                       do_resize=True,
                                       resize_size=(128,128),
                                       )
   mask_loader = PrepareFreqTimeImage(
                                    do_rot_phase_avg=True,
                                    do_binarize=True,
                                    do_resize=True,
                                    resize_size=(128,128),                                    
                                    )
   image_tag='train_set_1_*_payload_detected.json'
   image_directory='./syn_data/payloads/'
   mask_tag = 'train_set_1_*_payload_flux.json'
   mask_directory='./syn_data/payloads/'

   image_mask_dataset = ImageToMaskDataset(
                        image_tag = image_tag,
                        mask_tag= mask_tag,
                        image_directory = image_directory,
                        mask_directory = mask_directory,
                        image_engine=image_loader,
                        mask_engine=mask_loader
                        )
   #: Start training
   image2mask_network_trainer(image_mask_pairset=image_mask_dataset)

   #: Plot prediction
   idx = int(np.random.rand()*1000)
   image = image_mask_dataset[idx][0]
   image2mask_network_trainer.test_model(image=image,plot_pred=True)


.. image:: /static/example_prediction_after_training.png
   :alt: example_prediction_after_training
   :width: 800px


4. Use Tuner to manually tune mixed pipelines (Note: This part needs to be run in notebook)


   - Node 1 of the notebook should contain the following code to instantiate the tuner:

.. code-block:: python

   from pulsarsa import PipelineImageToFilterToCCtoLabels
   from pulsarsa.tools.example_neural_nets import ImageToMaskNetv3

   #: Instantiate 4 pipelines trained on different dataset ensembles
   ppl2fensemble0 = PipelineImageToFilterToCCtoLabels(image_to_mask_network=ImageToMaskNetv3(),
                                trained_image_to_mask_network_path='./docs/example_dats/trained_cnn_eg_test_set_0_at_alpha.pt',
                                mask_filter_network=ImageToMaskNetv3(),
                                trained_mask_filter_network_path='./docs/example_dats/trained_cnnfilter_eg_test_set_0_at_alpha.pt',                                
                                )
   
   ppl2fensemble1 = PipelineImageToFilterToCCtoLabels(image_to_mask_network=ImageToMaskNetv3(),
                                trained_image_to_mask_network_path='./docs/example_dats/trained_cnn_eg_test_set_1_at_alpha.pt',
                                mask_filter_network=ImageToMaskNetv3(),
                                trained_mask_filter_network_path='./docs/example_dats/trained_cnnfilter_eg_test_set_1_at_alpha.pt',                                
                                )   
   
   #: Instantiate the tuner
   tn = Tuner(sample_of_objects=[ppl2fensemble0,ppl2fensemble1],reset_components=False,variance_to_capture=2,all_sliders=True)

   To get the slider functionality run it in a jupyter notebook with all_sliders=True

.. image:: /static/sliders_params.png
   :alt: sliders_params
   :width: 200px    

- The mixed neural nets can be saved in the folder as shown below in the next node block: 

.. code-block:: python

   ppl2fmixed = tn(folder_to_save='./syn_data/model/') 


in script mixed pipeline can be created by passing the normalized PCA factors (having range (-1.0 to 1.0)) as:

.. code-block:: python

   ppl2fmixed  = tuner.generate_mixed_model_from_input_pca_factors([-0.9,0])