Analysis Workflow#

Motivation#

  • Partial wave analysis can be complicated (fitting, bookkeeping, result handling, …)

  • PyAmpTools sacrifices flexibility for ease of use and focuses on Python as the core language

  • Features:

    • Scientific Python ecosystem is massive (achieve most of the benefits of low level languages without the hassle)

    • Optimization uses gradients provided by JAX automatic differentiation (scales better than numerical diff.)

    • Hooks into several optimization frameworks spanning MLE, MCMC, to Variational Inference

      • iminuit, scipy, numpyro, iftpwa

    • YAML file is used to configure the analysis for consistency and automation

    • Fitting is generally done through the command line

Workflow Example#

The following analysis chain is an example of an input/output study. If you have data then you can skip you basically skip the first two steps.

  1. We draw a sample from the nifty prior distribution

  2. generate simulations (using gen_amp + halld_sim/Piecewise amplitude mimicking prior sample)

  3. bin the data

  4. fit the simulated data using:

    • Maximum Likelihood Estimation (MLE)

    • Markov Chain Monte Carlo (MCMC)

    • Information Field Theory (IFT)

  5. plot the results

We would like to show a high probability of reconstruction. If this occurs for a wide variety of samples (and we believe our data could be approximately described by the very-flexible nifty prior) then we have successfully demonstrated that inference would be robust.

alt text

import os
import subprocess
from pyamptools.utility.resultManager import ResultManager
from pyamptools.utility.resultManager import (
    plot_binned_intensities, 
    plot_binned_complex_plane, 
    plot_overview_across_bins, 
    plot_moments_across_bins, 
    plot_gen_curves,
    montage_and_gif_select_plots
)
from pdf2image import convert_from_path

def run_command(command):
    """Captures output so we do not have to dump everything to the terminal"""
    result = subprocess.run(command, capture_output=True, shell=True, text=True)
    return result.stdout, result.stderr

def show_pdf(pdf_path, dpi=150):
    """
    Show a PDF in the notebook cell, rasterized since vscode jupyter cannot render PDFs
    
    """
    img = convert_from_path(pdf_path, dpi=dpi)[0]
    display(img)

PYAMPTOOLS_HOME = os.getenv("PYAMPTOOLS_HOME")
BASE_DIRECTORY = "RESULTS_TEST"
Welcome to JupyROOT 6.28/06
print("Doing some pre-run cleanup...")
os.system('rm -rf main.yaml local_beam.conf *.lock')
os.system(f'rm -rf {BASE_DIRECTORY}')
Doing some pre-run cleanup...
0

Default YAML file#

First we create a copy of the default YAML file which contains comments that explain each key/knob in the analysis

main_yaml = "main.yaml"

print("Generating default YAML file...")
stdout, stderr = run_command(f"pa from_default -o {main_yaml} -t twops -c") # -c for starting clean slate

# Update required fields the user is supposed to update manually for testing
cwd = os.getcwd()
update_required_fields = [
    f"sed -i 's|_BASE_DIRECTORY_|{cwd}/{BASE_DIRECTORY}|g' {main_yaml}",
    f"sed -i 's|_DATA_SOURCES_|DATA_SOURCES|g' {main_yaml}",
]
for cmd in update_required_fields:
    os.system(cmd)
print("Updated required fields in YAML file...")
Generating default YAML file...
Updated required fields in YAML file...

Processing the event sample#

Have Data?#

If you already have a data sample: You obviously will not need to draw a sample from the prior. You will need to update the YAML file to properly use your data sample. Most of the keys you need to modify are top level keys. These keys specify your dataset, reaction, and kinematic binning. Once that is done, you can run the following commands and you will be ready to perform fits. Depending on your dataset size and model, the default settings for the optimization schemes may not be appropriate.

# Update <main_yaml> to use your data sample, reaction, kinematic binning, etc.
# In your shell, run the following commands
pa run_cfgGen <main_yaml>
pa run_divideData <main_yaml>
pa run_processEvents <main_yaml>

Drawing Simulations#

Draw a sample from the iftpwa prior model then simulate data from the model and phase space MC. Then divide the data into kinematic bins.

seed = 42 # rng seed for reproducibility
t_slope = 3.0 # slope of the momentum transfer (exponentially distributed)
min_ebeam = 8.2 # minimum energy of the photon beam
max_ebeam = 8.8 # maximum energy of the photon beam
n_data = 10000 # number of data samples to generate
n_ps = 50000   # number of phase space samples to generate

# -c for starting clean
print("Running prior simulation...")
stdout, stderr = run_command(f"pa run_priorSim {main_yaml} -s {seed} -emin {min_ebeam} -emax {max_ebeam} -t {t_slope} -nd {n_data} -np {n_ps} -c")
Running prior simulation...
print("Prior simulation complete, drawing results...")
resultManager = ResultManager(main_yaml, silence=True)
resultManager.attempt_load_all()
plot_gen_curves(resultManager, file_type='pdf')
show_pdf(f"{BASE_DIRECTORY}/PLOTS/gen_curves.pdf")
del resultManager
Prior simulation complete, drawing results...
/d/grid17/ln16/miniconda/envs/pyamptools/lib/python3.9/site-packages/nifty8/library/correlated_fields.py:460: UserWarning: unable to add JAX operator for total_N=8
  warn(f"unable to add JAX operator for total_N={total_N}")
../_images/425ede03522752efe040cd7be415c887073921ef31f99713ec7ea8d5263613db.png

Run the MLE fits using iminuit#

We begin by running a set of maximum likelihood fits on the generated data and compare the results to the generated amplitudes. The specifications for this optimization are specified in the YAML file under the mle key.

The ResultManager class, when initialized with the path to the YAML file, will automatically attempt to load all results into memory. Each result (from MLE, MCMC, IFT) is stored a separate directory for unique access. All results (complex amplitude values, intensities, and metadata like mass, likelihood, etc.) are pulled into a flat pandas dataframe for efficient plotting and analysis.

The main “money” plot (intensity and relative phases of all partial waves) can be generated with the default plotting function, plot_overview_across_bins. The MLE results a slightly jittered to avoid too much overlap. The dashed lines are the generated amplitudes

print("Running MLE fits...")
stdout, stderr = run_command(f"pa run_mle {main_yaml}")
Running MLE fits...
print("MLE fits complete, plotting results...")
resultManager = ResultManager(main_yaml, silence=True) # Silence all the helpful output for clean documentation
resultManager.attempt_load_all()
plot_overview_across_bins(resultManager, file_type='pdf')
montage_and_gif_select_plots(resultManager, file_type='pdf')
show_pdf(f"{BASE_DIRECTORY}/PLOTS/intensity_and_phases/montage_output.pdf")
del resultManager
MLE fits complete, plotting results...
=================================================================

=================================================================
/d/grid17/ln16/miniconda/envs/pyamptools/lib/python3.9/site-packages/nifty8/library/correlated_fields.py:460: UserWarning: unable to add JAX operator for total_N=8
  warn(f"unable to add JAX operator for total_N={total_N}")
../_images/b72a62da08f0137afc4b65e09791ef31d4a017c9ee95247d41b0780084e30696.png
montage: unable to read font `' @ error/annotate.c/RenderFreetype/1658.

Run MCMC fits using NUTS algorithm#

MCMC is a set of techniques that draws samples from the posterior distribution. This framework uses Hamiltonian Monte Carlo (HMC) implemented in the No U-Turn Sampler (NUTS) algorithm. For this study, we choose 8 independent chains (each starting off in a different region of parameter space, traverses and draws samples from the posterior). n_warmup designates the number of samples to discard as the chains move toward a posterior mode and is used to adapt internal NUTS parameters. n_samples is the number of samples to draw for each chain. n_chains is drawn for each mass bin independently and the work is pooled across all n_processes processes

The newly created MCMC results is automatically loaded by the ResultManager class and the same plotting calls are made

# We can also explicitly overwrite the options in the YAML file using these command line args
n_processes = 17
n_chains = 8
n_warmup = 500
n_samples = 100
print("Running MCMC fits...")
stdout, stderr = run_command(f"pa run_mcmc {main_yaml} -np {n_processes} -nc {n_chains} -nw {n_warmup} -ns {n_samples}")
Running MCMC fits...
print("MCMC fits complete, plotting results...")
resultManager = ResultManager(main_yaml, silence=True)
resultManager.attempt_load_all()
plot_overview_across_bins(resultManager, file_type='pdf')
montage_and_gif_select_plots(resultManager, file_type='pdf')
show_pdf(f"{BASE_DIRECTORY}/PLOTS/intensity_and_phases/montage_output.pdf")
del resultManager
MCMC fits complete, plotting results...
=================================================================

=================================================================
/d/grid17/ln16/miniconda/envs/pyamptools/lib/python3.9/site-packages/nifty8/library/correlated_fields.py:460: UserWarning: unable to add JAX operator for total_N=8
  warn(f"unable to add JAX operator for total_N={total_N}")
../_images/1b53c5c35ed65e2a20aecf50c106addb62c5faf7c50c1551ad7ab0decd9b246c.png
montage: unable to read font `' @ error/annotate.c/RenderFreetype/1658.

Run iftpwa fit#

Finally we run a IFT analysis with the same prior model used to generate the synthetic data to obtain a posterior estimate. These are shown in the solid lines. The settings for the IFT model are specified in the YAML file under the nifty key.

print("Running IFT fit...")
stdout, stderr = run_command(f"pa run_ift {main_yaml}")
Running IFT fit...
print("IFT fit complete, plotting results...")
resultManager = ResultManager(main_yaml, silence=True)
resultManager.attempt_load_all()
plot_overview_across_bins(resultManager, file_type='pdf')
montage_and_gif_select_plots(resultManager, file_type='pdf')
show_pdf(f"{BASE_DIRECTORY}/PLOTS/intensity_and_phases/montage_output.pdf")
del resultManager
IFT fit complete, plotting results...
=================================================================

=================================================================
/d/grid17/ln16/miniconda/envs/pyamptools/lib/python3.9/site-packages/nifty8/library/correlated_fields.py:460: UserWarning: unable to add JAX operator for total_N=8
  warn(f"unable to add JAX operator for total_N={total_N}")
/d/grid17/ln16/miniconda/envs/pyamptools/lib/python3.9/site-packages/nifty8/library/correlated_fields.py:460: UserWarning: unable to add JAX operator for total_N=8
  warn(f"unable to add JAX operator for total_N={total_N}")
../_images/7983a7066b280d2a3626e5192718427f1047279e0ae81541d6f0e5f2cfa69e4d.png
montage: unable to read font `' @ error/annotate.c/RenderFreetype/1658.

Additional Plots#

This sections contains a few additional default plots that highlight different aspects of the analysis:

  1. Plot projected moments

  2. Plot inferred amplitudes in the complex plane

  3. Plot intensity distributions

We first need to create the plots which also requires projecting the amplitudes onto the moment basis.

print("Plotting additional results...")
resultManager = ResultManager(main_yaml, silence=True)
resultManager.attempt_load_all()
resultManager.attempt_project_moments(normalization_scheme=0, pool_size=4, batch_size=200)
plot_binned_complex_plane(resultManager, file_type='pdf', silence=True)
plot_binned_intensities(resultManager, file_type='pdf', silence=True)
plot_moments_across_bins(resultManager, file_type='pdf')
montage_and_gif_select_plots(resultManager, file_type='pdf')
del resultManager
Plotting additional results...
=================================================================

=================================================================
/d/grid17/ln16/miniconda/envs/pyamptools/lib/python3.9/site-packages/nifty8/library/correlated_fields.py:460: UserWarning: unable to add JAX operator for total_N=8
  warn(f"unable to add JAX operator for total_N={total_N}")
/d/grid17/ln16/miniconda/envs/pyamptools/lib/python3.9/site-packages/nifty8/library/correlated_fields.py:460: UserWarning: unable to add JAX operator for total_N=8
  warn(f"unable to add JAX operator for total_N={total_N}")
MomentManagerTwoPS| Calculating moments assuming a TwoPseudoscalar system with max J = 4
MomentManagerTwoPS| Calculating moments assuming a TwoPseudoscalar system with max J = 4
MomentManagerTwoPS| Calculating moments assuming a TwoPseudoscalar system with max J = 4
MomentManagerTwoPS| Calculating moments assuming a TwoPseudoscalar system with max J = 4
montage: unable to read font `' @ error/annotate.c/RenderFreetype/1658.
WARNING: The convert command is deprecated in IMv7, use "magick" instead of "convert" or "magick convert"

montage: unable to read font `' @ error/annotate.c/RenderFreetype/1658.
WARNING: The convert command is deprecated in IMv7, use "magick" instead of "convert" or "magick convert"

montage: unable to read font `' @ error/annotate.c/RenderFreetype/1658.
montage: unable to read font `' @ error/annotate.c/RenderFreetype/1658.

Plot projected moments#

We can project the inferred amplitudes into moments to understand whether the scatter in the amplitude basis collapses onto a tighter space in the moment basis. This allows one to answer questions related to ambiguities and stability of the inference in the amplitude basis.

Here, we only plot the non-zero moments (given the set of inferred partial waves). The moments are normalized to the zeroth moment. The MLE point estimates are only shown (errors not propagated)

show_pdf(f"{BASE_DIRECTORY}/PLOTS/moments/montage_output.pdf")
../_images/b3be7ca74d0d1641460d4b684dc698fcb3140c367380cc254da1e8a4836a6ef7.png

Plots of the complex plane:#

The inferred amplitudes live in the complex plane. We can scatter the inferred amplitudes and compare the results from the different techniques to see how they compare. A comparison is only shown for one bin. The reference waves are shown as histograms since they have no imaginary component. The color of the scattered points in the non-reference waves is mapped to the intensity of the corresponding reference wave. In this example, the MLE and MCMC results agree and that the MLE errors track the MCMC samples. The band of MCMC solutions is from the smaller reference wave intensity not locking in a well defined phase

show_pdf(f"{BASE_DIRECTORY}/PLOTS/complex_plane/bin3_complex_plane.pdf")
../_images/455cc90135f1be6ed0d5a05712a555f607eed1b77ce6a446b773fc15bf673520.png

Plots of the intensity distributions:#

The overview plot (intensity and relative phases as a function of mass) gives a qualitative description of the results. This plot shows the intensity distribution of the results from the various techniques in a single bin. The \(D_2^-\) wave in this bin is small. The estimated MLE uncertainties are bad in this case and covers negative values. The MCMC results shows a highly skewed distribution.

show_pdf(f"{BASE_DIRECTORY}/PLOTS/intensity/bin1_intensities.pdf")
../_images/cea5a00f7a5a163c8a7de801681d5fe839218d571355707a063999926be27970.png

Cleanup#

print("Cleaning up...")
os.system('rm -rf main.yaml local_beam.conf *.lock')
os.system(f'rm -r {BASE_DIRECTORY}')
Cleaning up...
0