\( \def\bold#1{{\bf #1}} \newcommand{\Dt}{\Delta t_{d}} \newcommand{\Fo}{F_{1}} \newcommand{\Tg}{T_{g}} \newcommand{\tTg}{t/\Tg} \newcommand{\fod}{f_{1d}} \newcommand{\ftd}{f_{2d}} \newcommand{\tx}[1]{t_{\mathrm{#1}}} \newcommand{\Rz}[1]{R_{0 #1}} \newcommand{\Nz}[1]{N_{0 #1}} \newcommand{\Nzb}[1]{{\overline{N}}_{0 #1}} \newcommand{\Gt}[1]{G_{#1}(t)} \newcommand{\Nt}[1]{N_{#1}(t)} \newcommand{\Gof}[1]{G(#1)} \newcommand{\Nof}[1]{N(#1)} \newcommand{\dN}{\Delta N} \newcommand{\dNofs}{\Delta N(\Delta t=7)} \newcommand{\lnRz}[1]{\ln(\Rz{#1})} \newcommand{\dkdRz}[1]{\frac{\partial k}{\partial\Rz{#1}}} \newcommand{\dDtdRz}[1]{\frac{\partial\Dt}{\partial\Rz{#1}}} \newcommand{\dFodRz}[1]{\frac{\partial\Fo}{\partial\Rz{#1}}} \newcommand{\dNdRz}[1]{\frac{\partial \Nt{}}{\partial\Rz{#1}}} \newcommand{\dNdNz}[1]{\frac{\partial \Nt{}}{\partial\Nz{#1}}} \newcommand{\Unc}[1]{\sigma(#1)} \newcommand{\Var}[1]{\sigma^2(#1)} \newcommand{\Rho}[2]{\rho(#1,#2)} \newcommand{\Jac}[2]{\left(\begin{array}{c} #1\\ #2 \end{array}\,\, \right)} \newcommand{\Cov}[3]{\left(\begin{array}{cc} #1 & #3\\ #3 & #2 \end{array}\,\,\right)} \)

Performance of the algorithms on known inputs

Latest update 18.06.2020.

Abstract: This page collects information on the expected performance and limitations of the algorithms applied to real data. This information is obtained from reconstructing known input data. Three examples are shown. Starting from an ideal case, where the number of registered cases exactly follows the underlying distribution, the input data is deteriorated by smearing the generated number of cases per day and shifting part of their registration by one or two days.

The mean time delay between infection and registration, averaged over all cases per day, is considered a fixed number of days and is thus not simulated. Taking this into account would shift the horizontal axis by a fixed number, the shape of the distributions would not change. However, the uncertainty in this time offset is simulated by the shifting part of their registration in time.


The general setup

The simulation is based on two quantities $\Rz{}$ and $\Tg$ generally used in population biology. The net reproduction rate $\Rz{}$ denotes the mean number of offsprings, here people that get infected by a single person, and $\Tg$ is the generation time, i.e the time interval between two consecutive generations. The data simulate the evolution during seventy days starting from $t=0=01.01$, for a population of 50M people. It uses seven periods X, called A to G. For each period, the identical functional form is used, albeit with different values for $\Nz{ ,X}$ and $\Rz{X}$. For example for period A this is: $$\Nt{A} = \Nz{A} \cdot \frac{1 - \Rz{A}^{(\tTg+1)}}{1 - \Rz{A}}$$ and the chosen starting value is $N_{A}(0)=\Nz{A}=10$. This function is valid for the time interval $t = \left[0 - 7\right]$. Then period B starts with a net reproduction rate $\Rz{B}$, leading to $$\Nt{B} = \Nz{B} \cdot \frac{1 - \Rz{B}^{(\tTg+1)}}{1 - \Rz{B}}$$ The region of validity is $t = \left[8 - 13\right]$. To achieve a smooth function across the switch from period A to B, the value of $\Nz{B}$ is obtained from $$\Nz{B} = \Nz{A} \cdot \frac{N_{A}(t=13)}{N_{B}(t=13)}$$ The analogous continuity relations are applied at the remaining period boundaries. The only freely chosen parameters of this simulation are $\Nz{A}$, the length of the various periods and their $\Rz{X}$. Those values, together with the resulting parameters of the different periods are given in Table 1.

The above functions relates to the growth rate of the simple exponential used previously as follows. For $\Rz{} \gt 1$ and $t$ a few days, $\Rz{}^{\tTg + 1} \gg 1$, and the 1 in the numerator can be neglected. With this one derives: $$\Nt{}=\Nz{}\cdot\frac{1-\Rz{}^{(\tTg+1)}}{1-\Rz{}}= \frac{\Nz{}}{\Rz{}-1}\cdot\Rz{}^{(\tTg+1)}= \frac{\Nz{}\Rz{}}{\Rz{}-1}\cdot\,e^{\tTg\,\ln(\Rz{})}= \Nzb{}\cdot\,e^{\tTg\,\ln(\Rz{})} = \Nzb{}\cdot\,e^{k\,t}$$ where $\Nzb{}$ is simply a different constant and $A^{x} = e^{x \cdot \ln(A)}$ was used. The growth rate is $k(\Rz{}, \Tg)=\frac{\ln(\Rz{})}{\Tg}$. As an example $k(3, 4)=\frac{\ln(3)}{4}=0.275$.

The above approximation does no longer hold if $\Rz{}\approx1$. In this limit, the growth rate evaluates to $k=\frac{\ln(1)}{\Tg}=0$, and the spread is stopped. For the region $\Rz{}\lt1$ the logarithm $\ln(\Rz{})$ is negative, as is the denominator $1-\Rz{}\lt0$. In this situation, a similar calculation gives: $$\Nt{}=\Nz{}\cdot\frac{1-\Rz{}^{(\tTg+1)}}{1-\Rz{}}= \frac{\Nz{}}{1-\Rz{}}\cdot\,\left(1-\Rz{}e^{\tTg\,\ln(\Rz{})}\right)$$ Given the negative logarithm, for large times $t$ the exponential goes to zero and the number of cases saturates at a constant value, $\Nof{t=\infty} = \frac{\Nz{}}{1-\Rz{}}$. As an example, for $\Rz{}=0.9$, $\Nof{t=\infty}=10\cdot\Nz{}$. This behavior is similar to charging a condenser. For many regions the net reproduction rate is decreasing. Therefore, the previously used simple exponential no longer is a good approximation and has been replaced by the above fit function. The new function is somewhat more complicate, but is applicable for all values of $\Rz{}$ and enables a smooth transition from the region $\Rz{}\gt1$ to the region $\Rz{}\lt1$. A numerical comparison of the two functions for various values of $\Rz{}$ is shown below.

With the above, also the doubling time and the fractional increase can be expressed by $\Rz{}$. Using $\Tg=4$ days for all periods, the values of the various parameters are listed in Table 1. Graphical representations are given below in the Table of variables.

Table 1: Parameters of the generated number of cases for the different periods of the examples.
Period Dates Days $\Rz{}$ $\Nz{}$ $\Dt$ [days] $\Fo$ [$\%$] k
A 01.01 - 08.01 00 - 07 3.2 10 1.8 39.1 0.29
B 09.01 - 14.01 08 - 13 2.1 18 3.4 22.8 0.18
C 15.01 - 25.01 14 - 24 2.6 10 2.9 27.4 0.24
D 26.01 - 06.02 25 - 36 1.8 67 4.7 16.1 0.15
E 07.02 - 24.02 37 - 54 1.3 698 10.0 7.3 0.07
F 25.02 - 05.03 55 - 64 1.1 3423 23.6 3.2 0.02
G 06.03 - 10.03 65 - 69 0.9 16658 0.5 -0.03

The three examples use the same distribution of cases explained above, but differently deteriorate the number of registered cases per day as follows:

  1. Ideal example. The registered cases equal the generated values.

  2. The number of cases per day $\dN$ is smeared by the corresponding statistical uncertainty $\Unc{\dN} = \sqrt{\dN}$. At each day, the registration of a fraction of $\fod=0.20\pm0.05$ is shifted by one day, while an additional fraction of $\ftd=0.05\pm0.02$ is shifted by two days. Using random numbers to generate the actual fractions $\fod$, $\ftd$ per day, can lead to negative numbers of cases to be shifted for some days. Consequently, the boundary condition $\fod, \fod\ge0$ is imposed in addition.

  3. The changes are as for example two. However, this time the variations and their fluctuations are much larger. This example uses $\Unc{\dN}=3\cdot\sqrt{\dN}$, $\fod=0.35\pm0.25$ and $\ftd=0.25\pm0.15$.

The analysis is identical to the one performed on real data. The results are shown next.


Results for the three examples

As for the real data, results are given in tables showing six figures each per example in the following format and the corresponding slide shows.

Table 2: Layout of the summary tables
Daily cases Net reproduction rate $\Rz{}$
Doubling time $\Dt$ Fractional increase per day $\Fo$
Fit to latest data (logarithmic scale) Fit to latest data (linear scale)

Since for this simulation the underlying generated data are known, they are also shown in the figures as green lines. This applies to the accumulated number of cases, the net reproduction rate, the doubling time and the fractional increase.


Table 3: Results and slide shows (log, lin) for Example 1.
Example 1 new cases Example 1 net reproduction rate
Example 1 doubling time Example 1 fractional increase
Example 1 fit latest data (log) Example 1 fit latest data (lin)

By construction the simulated points and the green lines are identical in the fits to the data on linear and logarithmic scale. The changes in the net reproduction rate are easily visible in the histogram and the kinks in the slope of the number of cases in the logarithmic display. The slide shows reveal that the numbers in Table 1 are recovered by the fits. The steps in net reproduction rate, the doubling time and the factional increase are easily tracked after a few days caused by the sliding window fit. Also the step back in the doubling time, caused by the choice of $\Rz{C}\gt\Rz{B}$, is clearly seen. For this ideal case smoothing would not be needed.


Table 4: Results and slide shows (log, lin) for Example 2.
Example 2 new cases Example 2 net reproduction rate
Example 2 doubling time Example 2 fractional increase
Example 2 fit latest data (log) Example 2 fit latest data (lin)

For example 2, the simulated points and the green lines in the fits to the data on linear and logarithmic scale are no longer identical. The changes in the net reproduction rate in the histogram are clearly washed out. Despite this, the steps in the net reproduction rate, the doubling time and the factional increase are still tracked, however only after a sufficient total number of cases is accumulated. As for some of the real data, for the early phase with low real number of cases the underlying trend is hidden by the fluctuations created by deteriorating the number of cases. The slide shows reveal that after this is surpassed, the numbers in Table 1 are again recovered by the fits. Still, smoothing is not imperative.


Table 5: Results and slide shows (log, lin) for Example 3.
Example 3 new cases Example 3 net reproduction rate
Example 3 doubling time Example 3 fractional increase
Example 3 fit latest data (log) Example 3 fit latest data (lin)

For example 3, the simulated points and the green lines are even more different. The underlying changes in the net reproduction rate in the histogram are not visible anymore. Nevertheless, the steps in the net reproduction rate, the doubling time and the factional increase are still tracked, however now with much larger fluctuations around their generated values. Without smoothing, those fluctuations would be much larger.

A comparison of the three examples for five quantities is shown in the next table. The deterioration of the number of registered cases is rather small compared to differences in those quantities observed for real data. More details on the analysis and the ranking in these figures are given with the comparison of the evolution of real data.

Table 6: Comparison of the time evolution of the pandemic for the three examples.
Total cases per 100k people Net reproduction rate
Doubling time Fractional increase
Seven days incidence per 100k people

In summary, the examples demonstrate that, as soon as a significant number of cases is accumulated, the applied method leads to trustworthy results close to the underlying values. This is the case even for largely distorted distribution of new cases per day. There are however difficulties reconstructing the early phase.


Functional relationships of the various variables

The variables used in the calculations are the net reproduction rate R0, the doubling time Δtd, the fractional increase per day F1, and the growth rate k. The generation time Tg is treated as a parameter and varied within [3, 5]. The variables and their derivatives with respect to R0 are shown in the following table.

Table 7: Graphical representations of the variables and their derivatives with respect to $\Rz{}$.
$\Rz{}$ $k$ $\dkdRz{}$ $\Dt$ [days] $\dDtdRz{}$ [days] $\Fo$ $\dFodRz{}$
1.1 Exponential growth R0=1.1 Derivative exponential growth R0=1.1 Doubling time R0=1.1 Derivative doubling time R0=1.1 Fractional increase R0=1.1 Derivative fractional increase R0=1.1
 
1.5 Exponential growth R0=1.5 Derivative exponential growth R0=1.5 Doubling time R0=1.5 Derivative doubling time R0=1.5 Fractional increase R0=1.5 Derivative fractional increase R0=1.5
 
2.6 Exponential growth R0=2.6 Derivative exponential growth R0=2.6 Doubling time R0=2.6 Derivative doubling time R0=2.6 Fractional increase R0=2.6 Derivative fractional increase R0=2.6


Comparison of the fit function to an exponential

To investigate its region of applicability, a simple exponential $\Nt{}=\Nzb{}\cdot\,e^{k t}$, with $k=\frac{\ln(\Rz{})}{\Tg}$, is compared to the fit function introduced above. The functions are shown for hundred days, i.e in the range $t=[01.01, 09.04]$. For all figures $\Nz{} = 10$ is used. The value of $\Nzb{}$ is calculated such as to achieve identical values of $\Nt{}$ for both functions in the middle of the range, i.e. for $\Nof{t=19.02}$. Comparing the two functions in the entire range is generous, since in the experimental analysis only ranges of seven days are fitted, so only a correspondence in short periods of time is needed. The fit function is shown in red together with the corresponding uncertainty in black. The uncertainty is obtained using typical values achieved in fits to real data of $\Unc{\Nz{}}=3\%$ and $\Unc{\Rz{}}=2\%$ and a correlation of -1. See below for details of the calculation of the uncertainty in $\Nt{}$.

For $\Rz{}=1.5$, and after a few days the two functions are indistinguishable. For $\Rz{}=1.2$, the two functions are very close for most of the range starting from about $t=10.02$. For $\Rz{}=1.1$, the slopes of the two functions are visibly different, although the exponential stays within the uncertainty band of the fit function for large periods of time. Finally, for $\Rz{}=0.9$, the two functions are completely different. In summary, after a few days the exponential will lead to trustworthy results for $\Rz{}\ge 1.2$.

Table 8: Comparison of the fit function with a simple exponential for different values of $\Rz{}$.
Function comparison R0=1.5 Function comparison R0=1.2
 
Function comparison R0=1.1 Function comparison R0=0.9


Some details on the fit uncertainties

In general terms, the fit function depends on two parameters, namely $N(t\,\vert\,\Nz{},\,\Rz{})$. In the fitting procedure, the values of the parameters, their uncertainties $\sigma$ and the correlation $\rho$ of the two parameters are determined as: $$\Rz{}\pm\Unc{\Rz{}},\quad\Nz{}\pm\Unc{\Nz{}} \quad\mathrm{and}\quad\Rho{\Nz{}}{\Rz{}}.$$ Given the functional form, increasing the slope would increase $\Rz{}$ but decrease $\Nz{}$. Consequently, the two fitted parameters are anti-correlated, $\Rho{\Nz{}}{\Rz{}}\lt0$.

The symmetric matrix containing the uncertainties and the correlation is called covariance matrix. The covariance is defined as the product of the correlation of the two parameters and their uncertainties: $\Rho{\Nz{}}{\Rz{}}\,\Unc{\Nz{}}\,\Unc{\Rz{}}$. For the same variable, e.g. for $\Nz{}$ this evaluates to $\Rho{\Nz{}}{\Nz{}}\,\Unc{\Nz{}}\,\Unc{\Nz{}}=1\cdot\Var{\Nz{}}$, which is the uncertainty squared, called variance. The Jacobi matrix is the matrix of the partial derivatives of $\Nt{}$ with respect to the two parameters. The uncertainty in $\Nt{}$ is obtained from the propagation of uncertainties of $\Nz{}$ and $\Rz{}$. Using these matrices one obtains the variance: $$\Var{N(t\,\vert\,\Nz{},\, \Rz{})}=\Jac{\dNdNz{}}{\dNdRz{}}\cdot \Cov{\Var{\Nz{}}}{\Var{\Rz{}}}{\Rho{\Nz{}}{\Rz{}}\,\Unc{\Nz{}}\,\Unc{\Rz{}}}\cdot \Jac{\dNdNz{}}{\dNdRz{}}$$ From this, the uncertainty in $N(t\,\vert\,\Nz{},\,\Rz{})$ is obtained by taking the square root.

For fits to early data, in some of the slide shows the curve indicating the lower boundary of the uncertainty band turns negative. This is caused by the fact that for larger values of $\Unc{\Rz{}}$ and increasing time $t$ the uncertainty in $\Nt{}$ exceeds its value, i.e. $\Unc{\Nt{}}\gt\Nt{}$. An example of this is shown in the following figures for the case $\Rz{}=1.5$ from above, but increasing $\Unc{\Rz{}}$ from $2\%$ to $4\%$ and finally to $6\%$. For later periods of time, the number of cases is larger, yielding smaller $\Unc{\Rz{}}$, this no longer happens.

Table 9: Fit function for $\Rz{}=1.5$ and different uncertainties $\Unc{Rz{}}$.
Function R0=1.5 Sig(R0)=6% Function R0=1.5 Sig(R0)=6%