Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeerLab models incur heavy overhead while fitting #100

Closed
luisfabib opened this issue Mar 18, 2021 · 1 comment · Fixed by #101
Closed

DeerLab models incur heavy overhead while fitting #100

luisfabib opened this issue Mar 18, 2021 · 1 comment · Fixed by #101
Assignees
Milestone

Comments

@luisfabib
Copy link
Collaborator

luisfabib commented Mar 18, 2021

While running tests on fitsignal I noticed a slowdown in performance.

=================================================== test session starts 
platform win32 -- Python 3.8.0, pytest-6.2.2, py-1.10.0, pluggy-0.13.1
rootdir: D:\lufa\projects\DeerLab\DeerLab
collected 44 items                                                                                                                                                                                                                                                                                                  

test\test_fitsignal.py ............................................                                                                                                                                                                                                                                          [100%] 

==================================================slowest 10 durations 
24.30s call     test/test_fitsignal.py::test_ridme5
11.39s call     test/test_fitsignal.py::test_cost_value
10.31s call     test/test_fitsignal.py::test_ovl4pdeer
9.51s call     test/test_fitsignal.py::test_ridme3
9.00s call     test/test_fitsignal.py::test_5pdeer
6.87s call     test/test_fitsignal.py::test_7pdeer
6.12s call     test/test_fitsignal.py::test_global_4pdeer
4.63s call     test/test_fitsignal.py::test_global_full_parametric
4.31s call     test/test_fitsignal.py::test_global_scale_4pdeer
3.83s call     test/test_fitsignal.py::test_boundaries_adjust_ex

A simple profiling of fitsignal

import deerlab as dl 
import numpy as np 
import cProfile

@profile(sort_by='cumulative', lines_to_print=20, strip_dirs=True)
def fitfcn():
    t = np.linspace(0,5,300)
    r = np.linspace(2,6,250)
    P = dl.dd_gauss(r,[4.5, 0.25])
    Bmodel = lambda t: dl.bg_exp(t,0.4)
    K = dl.dipolarkernel(t,r,0.4,Bmodel)
    V = K@P
    fit = dl.fitsignal(V,t,r,'P',dl.bg_exp,dl.ex_4pdeer,uqanalysis=False)
fitfcn()

revealed a very slow performance for such a simple fit.


import deerlab as dl ...
         15067173 function calls (15039888 primitive calls) in 9.770 seconds

   Ordered by: cumulative time
   List reduced from 625 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    9.777    9.777 <ipython-input-7-7f1b715129a0>:5(fitfcn)
        1    0.000    0.000    9.635    9.635 fitsignal.py:16(fitsignal)
        1    0.000    0.000    9.605    9.605 fitsignal.py:582(separable_nonlinear_lsq_analysis)
        1    0.000    0.000    9.488    9.488 snlls.py:18(snlls)
        1    0.000    0.000    9.228    9.228 least_squares.py:240(least_squares)
       43    0.009    0.000    9.204    0.214 snlls.py:287(ResidualsFcn)
        1    0.000    0.000    8.507    8.507 trf.py:112(trf)
        1    0.001    0.001    8.507    8.507 trf.py:205(trf_bounds)
       46    0.001    0.000    6.206    0.135 fitsignal.py:320(multiPathwayModel)
       45    0.000    0.000    6.091    0.135 utils.py:59(<lambda>)
       45    0.000    0.000    6.090    0.135 fitsignal.py:593(<lambda>)
      188    0.000    0.000    6.023    0.032 inspect.py:1512(stack)
      188    0.019    0.000    6.023    0.032 inspect.py:1484(getouterframes)
      188    0.005    0.000    5.999    0.032 bg_models.py:314(bg_exp)    <-------------
    10063    0.054    0.000    5.995    0.001 inspect.py:1445(getframeinfo)
      187    0.003    0.000    5.993    0.032 bg_models.py:12(_parsargs)    <-------------
       93    0.003    0.000    5.983    0.064 dipolarbackground.py:10(dipolarbackground)
      186    0.000    0.000    5.970    0.032 dipolarbackground.py:79(<lambda>)
      184    0.000    0.000    5.882    0.032 fitsignal.py:339(<lambda>)
       14    0.001    0.000    4.755    0.340 least_squares.py:882(jac_wrapped)

While there are other factors that contribute to the slower performance, bg_models.py is strongly unexpected. These functions are just called to perform a simple mathematical operation and return a vector.

After closer inspection, I performed a different benchmark to see if there was a strong overhead for calling models in bg_models.py

@profile(sort_by='cumulative', lines_to_print=5, strip_dirs=True)
def Bmodel():
    for _ in range(500):
        t = np.linspace(0,5,100)
        B = dl.bg_exp(t,0.4)
Bmodel()

@profile(sort_by='cumulative', lines_to_print=5, strip_dirs=True)
def Bmath():
    for _ in range(500):
        t = np.linspace(0,5,100)
        B = np.exp(-t*0.4)
Bmath() 

which reveals a clear overhead for calling these models

       36717502 function calls (36716002 primitive calls) in 14.316 seconds

   Ordered by: cumulative time
   List reduced from 83 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.003    0.003   14.325   14.325 <ipython-input-8-2bded830471e>:1(Bmodel)
      500    0.012    0.000   14.278    0.029 bg_models.py:314(bg_exp)
      500    0.007    0.000   14.264    0.029 bg_models.py:12(_parsargs)
      500    0.001    0.000   14.244    0.028 inspect.py:1512(stack)
      500    0.036    0.000   14.243    0.028 inspect.py:1484(getouterframes)

---------------------------------------------------------------------------------------

         16002 function calls (14502 primitive calls) in 0.022 seconds

   Ordered by: cumulative time
   List reduced from 29 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002    0.022    0.022 <ipython-input-8-2bded830471e>:7(Bmath)
      500    0.000    0.000    0.020    0.000 <__array_function__ internals>:2(linspace)
 2000/500    0.003    0.000    0.020    0.000 {built-in method numpy.core._multiarray_umath.implement_array_function}
      500    0.007    0.000    0.019    0.000 function_base.py:23(linspace)
      500    0.000    0.000    0.004    0.000 <__array_function__ internals>:2(any)

I can reproduce these results for both dd_models.py and ex_models.py.

While this overhead is negligible for most operations within DeerLab, fitsignal relies heavily on calling these models each time a residual vector is computed during optimization. This propagates strongly into the performance of the fit as seen in the profile above.

@luisfabib luisfabib self-assigned this Mar 18, 2021
@luisfabib
Copy link
Collaborator Author

My current approach has been to implement the model functions is such a way, that they return a function handle model containing the basic math as part of the info dictionary, for example

info = dict(
    Parameters = ['Intercept','1st-order coefficient','2nd-order coefficient','3rd-order coefficient'],
    Units = ['','μs⁻¹','μs⁻²','μs⁻³'],
    Start = np.asarray([ 1,  -1  , -1,   -1  ]),
    Lower = np.asarray([ 0,  -200, -200, -200]),
    Upper = np.asarray([200,  200,  200,  200]),
    ModelFcn = model
     )
    return info

This way, the basic function containing the math is requested once from the model in fitsignal and then can be directly called without the need to access one of the modules bgmodels.py, ex_models.py, or dd_models.py and the corresponding model inside there, avoiding the call overhead entirely.

Taking the same benchmark example as before:

@profile(sort_by='cumulative', lines_to_print=20, strip_dirs=True)
def fitfcn():
    t = np.linspace(0,5,300)
    r = np.linspace(2,6,250)
    P = dl.dd_gauss(r,[4.5, 0.25])
    Bmodel = lambda t: dl.bg_exp(t,0.4)
    K = dl.dipolarkernel(t,r,0.4,Bmodel)
    V = K@P
    fit = dl.fitsignal(V,t,r,'P',dl.bg_exp,dl.ex_4pdeer,uqanalysis=False)
fitfcn()

Before:

15067173 function calls (15039888 primitive calls) in 9.770 seconds

Proposed change:

1509554 function calls (1470428 primitive calls) in 4.565 seconds

So this change would already reduce the runtime of fitsignal by half.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant