Statistics Course

Homework17_A

Add second order regression to your statistical application.

Code VB.Net

https://drive.google.com/file/d/1ENlAz_rxyDN45E1At6IrUjRHS00-26Cr/view?usp=sharing

https://math.stackexchange.com/questions/267865/equations-for-quadratic-regression

 Public Sub calc_regressionLineSecondOrder()
        Dim s11 As Double
        Dim s12 As Double
        Dim s22 As Double
        Dim sy1 As Double
        Dim sy2 As Double

        Dim N As Integer = listOfBDataP.Count

        Dim sumX As Double
        Dim sumX_X As Double
        Dim sumPowX As Double
        Dim sumProd As Double
        Dim sumPowPow As Double
        Dim sumXY As Double
        Dim sumY As Double
        Dim sumXYPow As Double

        For Each x In listOfBDataP
            sumX += x.x1
            sumPowX += x.x1 * x.x1
            sumProd += x.x1 * x.x1 * x.x1
            sumPowPow += x.x1 * x.x1 * x.x1 * x.x1
            sumXY += x.x1 * x.x2
            sumY += x.x2
            sumXYPow += x.x2 * x.x1 * x.x1
        Next
        sumX_X = sumX * sumX

        s11 = sumPowX - (sumX_X / N)

        s12 = sumProd - (sumX * sumPowX) / N

        s22 = sumPowPow - (sumPowX * sumPowX) / N

        sy1 = sumXY - (sumY * sumX) / N

        sy2 = sumXYPow - (sumY * sumPowX) / N

        Dim meanXPow As Double = sumPowX / N

        b2 = (sy1 * s22 - sy2 * s12) / (s22 * s11 - s12 * s12)
        b3 = (sy2 * s11 - sy1 * s12) / (s22 * s11 - s12 * s12)
        b1 = current_onlineMeanY - b2 * current_onlineMeanX - b3 * meanXPow


    End Sub

Simple variation of your application to simulate stochastic processes.
Add to your previous program 13_A the following.
Same scheme as previous program, except changing the way to compute the values at each time. Starting from value 0 at time 0, at each new time compute Y(i) = Y(i-1) + Random step(i). Where Random step(i) is a Rademacher random variable ( https://en.wikipedia.org/wiki/Rademacher_distribution ).
At time n (last time) and one other chosen inner time 1<j<n (j is a program parameter) create and represent with histogram the distribution of Y(i).

Code VB.Net

https://drive.google.com/file/d/1H_QLl2qtHPt5sRzxeM1ib5tlOHbmNYJy/view?usp=sharing

In probability theory and statistics, the Rademacher distribution (which is named after Hans Rademacher) is a discrete probability distribution where a random variate X has a 50% chance of being +1 and a 50% chance of being -1.

Public Function Generate_randomWalk() As Path
…
Dim p As Double = 0.5
Dim v As Double = R.NextDouble
If v <= p Then
path.jump.Add(1)
else
path.jump.Add(-1)
End if
…
End Function

Homework14_A

Add total variance decomposition and computation of the coefficient of determination (make sure all your computations are done with online algorithms (e.g. with online algorithms (e.g. https://www.johndcook.com/blog/running_regression// etc.).

For online algo U can also see:
https://blog.rapid7.com/2016/10/19/overview-of-online-algorithm-using-standard-deviation-example/
https://francescopigini.wordpress.com/2017/12/14/online-algorithm-for-variance-and-covariance-extra-2/

For coeff of determination:
https://en.wikipedia.org/wiki/Coefficient_of_determination

For total variance decomposition:
https://en.wikipedia.org/wiki/Law_of_total_variance

Code VB.Net

https://drive.google.com/file/d/18u8SXRmRItAd97nkQ4Pl7M82iHyZfyLG/view?usp=sharing

 'calcolo per il coefficiente di determinazione

        'Devianza totale
        Dim sum_ess As Double = 0
        For i_ess As Integer = 0 To y_est.Count - 1
            sum_ess += (y_est(i_ess) - current_onlineMeanY) * (y_est(i_ess) - current_onlineMeanY)
        Next

        'Devianza tra gruppi
        Dim sum_tss As Double = 0
        Dim y_obs As New List(Of Double)
        y_obs = listOfBDataP.Select(Function(p) p.x2).ToList
        For i_tss As Integer = 0 To y_obs.Count - 1
            sum_tss += (y_obs(i_tss) - current_onlineMeanY) * (y_obs(i_tss) - current_onlineMeanY)
        Next

        'Devianza entro i gruppi
        Dim sum_rss As Double = 0
        For i_rss As Integer = 0 To y_obs.Count - 1
            sum_rss += (y_obs(i_rss) - y_est(i_rss)) * (y_obs(i_rss) - y_est(i_rss))
        Next

        Dim coeffDet As Double = sum_ess / sum_tss
        Dim coeffDet2 As Double = 1 - sum_rss / sum_tss

Variance and covariance calculation (online algorithm)

 ...
        For Each d In listOfBDataP
            Calc_onlCov(d.x1, d.x2)
            Calc_onVar(d.x1)
        Next
...


 Public Sub Calc_onlCov(x As Double, y As Double)
        n += 1
        Dim dx As Double = x - meanX
        meanX += dx / CDbl(n)
        meanY += (y - meanY) / CDbl(n)
        b_online += dx * (y - meanY)
    End Sub

    Public Sub Calc_onVar(x As Double)
        n_v += 1
        Dim delta As Double = meanX_v + (x - meanX_v) / CDbl(n_v)
        sigma_online += (x - meanX_v) * (x - delta)
        meanX_v = delta
    End Sub

Homework12_A

Add regression lines to your revised statistical application (parser + statistical/charting engine).

Code VB.Net

https://drive.google.com/file/d/1TFEmibUFSHRXVqiH9frzuWL0XIIPsjli/view?usp=sharing

To calculate regression line we use:

y = a + bx

Dim x_prec As Double = listOfBDataP(0).x1
        Dim y_dipPrec As Double = a + b * x_prec

        For Each d In listOfBDataP.Skip(1)
            Dim y_dipCurr As Double = a + b * d.x1
            Dim p_xPrec As Double = X_Viewport(x_prec, viewport, min_x, range_x)
            Dim p_yPrec As Double = Y_Viewport(y_dipPrec, viewport, min_y, range_y)
            Dim p_xCurr As Double = X_Viewport(d.x1, viewport, min_x, range_x)
            Dim p_yCurr As Double = Y_Viewport(y_dipCurr, viewport, min_y, range_y)
            gc.DrawLine(Pens.Red, New Point(p_xPrec, p_yPrec), New Point(p_xCurr, p_yCurr))
            x_prec = d.x1
            y_dipPrec = y_dipCurr
        Next

where

b = COV(X,Y)/variance
a = y – bx
(y and x are the average of Y and X value)

Remember:

  Dim sigma = calc_var()
        Dim b As Double = Calc_cov() / sigma
        Dim a As Double = current_onlineMeanY - b * current_onlineMeanX

  Public Function Calc_cov() As Double
        Dim sum As Double
        Dim n As Double = listOfBDataP.Count
        For Each d In listOfBDataP
            sum += d.x1 * d.x2
        Next
        Return (sum - n * current_onlineMeanX * current_onlineMeanY) / (n - 1)
    End Function

    Public Function Calc_var() As Double
        Dim sigma As Double
        For Each d In listOfBDataP
            sigma += Math.Pow((d.x1 - current_onlineMeanX), 2)
        Next

        Return sigma / (listOfBDataP.Count - 1)
    End Function

https://www.webtutordimatematica.it/materie/statistica-e-probabilita/modelli-di-regressione/regressione-lineare-semplice/calcolo-parametri-retta-regressione

Homework10_A

Implement your own algorithm to compute a frequency distribution of the words from any text (possibly judiciously scraped from websites) and draw some personal graphical representation of the “word cloud”

Code VB.NET

https://drive.google.com/file/d/1nHYV1BiQIPjvwI08zaqnm7j_e2BoSfGO/view?usp=sharing

Connect to web page

'avoid pop-up
WebBrowser1.ScriptErrorsSuppressed = True

        If String.IsNullOrEmpty(url1) Then Return
        If url1.Equals("about:blank") Then Return
        If Not url1.StartsWith("http://") And
        Not url1.StartsWith("https://") Then
            url1 = "https://" & url1
        End If

'in web browser window upload the page in url
        Try
            WebBrowser1.Navigate(New Uri(url1))
        Catch ex As System.UriFormatException
            Return
        End Try

Copy all the words

WebBrowser1.Document.ExecCommand("SelectAll", False, Nothing)
        WebBrowser1.Document.ExecCommand("Copy", False, Nothing)
        WebBrowser1.Document.ExecCommand("Unselect", False, Type.Missing)
        RTXBCloud.Text = Clipboard.GetText
        Clipboard.Clear()

Build rettangle

 'orderDistrW is in decrescending order
        For Each kvp In orderDistrW
            Dim rect As RectangleF
            Dim f As New Font("arial", (kvp.Value * 200) / orderDistrW.First.Value)
            Dim s As SizeF = g.MeasureString(kvp.Key, f)


         
            Dim tries As Integer
            Do
                'build a rectangle with size of the word and save them for check if the position in the picturbeox is free with 'occupied' function
                'points x and y of rectangle are taken to not go out the picturebox
                Dim x = viewport.Left + ((viewport.Right + 1 - s.Width) - viewport.Left) * R.NextDouble()
                Dim y = viewport.Top + ((viewport.Bottom + 1 - s.Height) - viewport.Top) * R.NextDouble()

              
                rect = New RectangleF(New PointF(x, y), s)
                If Not Occupied(rect, listOfRect) Then Exit Do
                tries += 1
                If tries >= 100000 Then Continue For
            Loop

            g.DrawString(kvp.Key, f, New SolidBrush(Color.FromArgb(R.Next(256), R.Next(256), R.Next(256))), New Point(rect.X, rect.Y))
            listOfRect.Add(rect)

        Next

Homework_14RA

SDE and methods for numerical solution.

A solution to an SDE is itself a stochastic function, which means that its value X(t) at any given time t is a random variable.

The Euler-Maruyama (EM) and Milstein methods

These methods are based on the truncated Ito-Taylor expansion.
Unfortunately, in many cases analytic solutions are not available for these equations, so we are required to use numerical methods to approximate the solution.

If we add a random element or elements to the deterministic differential equation, we have transition from an ordinary differential equation to SDE.

Some analytical and numerical solutions considered numerical approximations of random periodic solutions for SDEs. On the other hand, constructed a Milstein scheme by adding an error correction term for solving stiff SDEs.

Remember SDE form:
dX(t,w) = μ(t,X(t,w))dt + σ(t,X(t,w))dW(t,w) and X(t₀,w) = X₀
where μ is the drift coefficient
where σ is the the diffusion coefficient

The Monte Carlo simulation:

Monte Carlo methods are numerical methods, where random numbers are used to conduct a computational experiment. Numerical solution of stochastic differential equations can be viewed as a type of Monte Carlo calculation.

In Monte Carlo simulation, the entire system is simulated a large number of times.
So, a set of suitable sample paths is produced on [t₀,T].
Each simulation is equally likely, referred to as a realization of the system. For each realization, all of the uncertain parameters are sampled.

Stochastic Taylor series expansion for produce a sample path solution to the SDE on [t₀,T].

The Taylor formula plays a very significant role in numerical analysis. We can obtain the approximation of a sufficiently smooth function in a neighborhood of a given point to any desired order of accuracy with the Taylor formula.

Ito-Taylor expansion obtained via Ito’s formula

First we can obtain an Ito-Taylor expansion for the stochastic case.

dX(t) = μ(X(t))dt + σ(X(t))dW(t)
where μ and σ satisfy a linear growth bound and are sufficiently smooth.

Once we have the Ito-Taylor expansion, we can construct numerical integration schemes for the proces.

Ito-Taylor expansion was based upon the use of multiple stochastic integrals. Itô–Taylor expansions are characterized by the choice of the multiple integrals which appear in them.

Euler-Maruyama method

The Milstein method is another technique for the approximate numerical solution of a stochastic differential equation.

If we truncate Ito’s formula of the stochastic Taylor series after the first order terms, we obtain the Euler method or Euler-Maruyama method.

Milstein method

If we truncate the stochastic Taylor series after second order terms, we obtain the Milstein method.

There are many other approximation schemes for SDEs, ie: Runge-Kutta method(which achieves the same convergence properties as the Milstein method, but without the need to compute derivatives of the σ(⋅) function.

What has been observed
(https://www.researchgate.net/publication/313191865_On_the_convergence_of_the_Euler-Maruyama_method_and_Milstein_scheme_for_the_solution_of_
stochastic_differential_equations)

Milstein method is more accurate than its counterpart, Euler Maruyama method. For Milstein method, the errors decrease as N increases for all N values.
Overall, Milstein method performed consistently better than Euler Maruyama method with respect to accuracy.
Euler Maruyama method is the simplest numerical method for solving stochastic differential equation but has slow convergence.

https://www.sciencedirect.com/science/article/pii/S0377042703004643
https://en.wikipedia.org/wiki/Euler%E2%80%93Maruyama_method
https://advancesindifferenceequations.springeropen.com/articles/10.1186/s13662-018-1466-5

Recommended
https://hautahi.com/sde_simulation

Homework_24R

Stochastic differential equations (SDE). What are the differences respect to the ordinary differential equations (ODE). Try to understand and explain in your own words why the Itô calculus has been introduced and what is the main intuition behind the Itô integral.

SDE vs ODE

An equation containing the derivatives of one ormore dependent variables, with respect to one or more independent variables, is said to be a differential equation.
Solve a differential equation means to find the function g(x) (called solution or integral) that makes the expression identically satisfied.

A ordinary differential equation (ODE) is a mathematical expression within which we find an unknown function y(x) and its first derivative y'(x).
A stochastic differential equation (SDE) is a differential equation in which one or more of the terms is a stochastic process, resulting in a solution which is also a stochastic process.

So:

A ordinary differential equation: dx(t)/dt = a(t)x(t) and x(0) = x₀
When we take the ODE and assume that a(t) is not a deterministic parameter but rather a stochastic parameter, we get a stochastic differential equation.

Itô calculus and Itô integral

Itô calculus extends the methods of calculus to stochastic processes such as Brownian motion. It has important applications in mathematical finance and stochastic differential equations.
The central concept is the Itô stochastic integral, a stochastic generalization of the Riemann–Stieltjes integral in analysis.

The Itô stochastic integral amounts to an integral with respect to a function which is not differentiable at any point and has infinite variation over every time interval.

https://people.unica.it/claudioconversano/files/2015/10/Dispense01_EN.pdf
https://ethz.ch/content/dam/ethz/special-interest/mavt/dynamic-systems-n-control/idsc-dam/Lectures/Stochastic-Systems/SDE.pdf
https://en.wikipedia.org/wiki/It%C3%B4_calculus#Differentiation_in_It%C3%B4_calculus

Homework_23R

The Geometric Brownian motion and its importance for applications.
The Ornstein-Uhlenbeck / Vasicek models and the concept of mean reversion.

The Geometric Brownian motion

A geometric Brownian motion (GBM) (also known as exponential Brownian motion) is a continuous-time stochastic process in which the logarithm of the randomly varying quantity follows a Brownian motion(the random motion of particles suspended in a medium) with drift (the change of the average value of a stochastic (random) process).

A stochastic process S_t is said to follow a GBM if it satisfies the following stochastic differential equation (SDE):

dS_t = μS_tdt + σS_t dW_t

Brownian motion is often used to explain the movement of time series variables, and in corporate finance the movement of asset prices.
A common assumption for stock markets is that theyfollow Brownian motion, where asset prices are constantly changing often by random amounts

Ornstein-Uhlenbeck or Vasicek process

The Ornstein-Uhlenbeck or Vasicek process is a stochastic process which is stationary, Gaussian, and Markovian.
Over time, the process tends to drift towards its long-term mean: such a process is called mean-reverting.
The Vasicek process is the unique solution to the following stochastic differential equation:

dX_t = k(p – X_t)dt + σdW_t

When p = 0 we talk of Ornstein model.
K is the “speed of reversion”: characterizes the velocity at which such trajectories will regroup around p in time

The Ornstein Uhlenbeck process is often used to model interest rates because of its mean reverting property.
Mean reversion is the process that describes that when the short-rate X_t is high, it will tend to be pulled back towards the long-term average level; when the rate is low, it will have an upward drift towards the average level. In Vasicek’s model the short-rate is pulled to a mean level p at a rate of k. The mean reversion is governed by the stochastic term σdW_t which is normally distributed.

https://en.wikipedia.org/wiki/Stochastic_drift
https://en.wikipedia.org/wiki/Geometric_Brownian_motion
https://towardsdatascience.com/geometric-brownian-motion-559e25382a55
https://ro.uow.edu.au/cgi/viewcontent.cgi?article=1705&context=aabfj
https://en.wikipedia.org/wiki/Vasicek_model
https://rstudio-pubs-static.s3.amazonaws.com/19584_ce31e798cffb430982fd2f8979b1a87f.html
https://www.sciencedirect.com/topics/economics-econometrics-and-finance/mean-reversion

Homework_24A

Refine your statistical application and your simulation station in the following way. Do a complete test, both “smart monkey” and “dumb monkey”, ( https://en.wikipedia.org/wiki/Monkey_testing ), fixing all issues and making all the desired final refinements to your 2 main applications developed during the course.

Code VB.NET

Simulation Application

https://drive.google.com/file/d/1OXINcmOoxhJx-ptDG324MYrnrg_xfqZx/view?usp=sharing

What I fixed:
Glivenko-Cantelli process
Vasicek process
Form interface and Button enable/disable
Calculation of histograms
Input control

Statistical application

https://drive.google.com/file/d/1QOazNsVqjGlmlNBqgmUSb53zzyTSTnJt/view?usp=sharing

What I fixed:
Form interface and button control
Input control
Drawing of histograms

Unit test:

https://docs.microsoft.com/it-it/visualstudio/test/walkthrough-creating-and-running-unit-tests-for-managed-code?view=vs-2019

https://docs.microsoft.com/it-it/visualstudio/test/getting-started-with-unit-testing?view=vs-2019

we can manually test programs, by entering random values or build programs that test the application, as described above.

Homework_20A

Add to your statistical application, on each variable histogram, and across the scatterplot, 3 lines indicating the 3 quartiles (use online algos for computations).

Code VB.NET

https://drive.google.com/file/d/1q1K4mAiT2GLsFMaTx5Fi3c7cAaWymjte/view?usp=sharing

Code VB.Net

Code VB.Net

Code VB.Net

Code VB.Net

Code VB.NET

The Euler-Maruyama (EM) and Milstein methods

The Monte Carlo simulation:

Stochastic Taylor series expansion for produce a sample path solution to the SDE on [t0,T].

Ito-Taylor expansion obtained via Ito’s formula

Euler-Maruyama method

Milstein method

What has been observed(https://www.researchgate.net/publication/313191865_On_the_convergence_of_the_Euler-Maruyama_method_and_Milstein_scheme_for_the_solution_of_stochastic_differential_equations)

SDE vs ODE

Itô calculus and Itô integral

The Geometric Brownian motion

Ornstein-Uhlenbeck or Vasicek process

Code VB.NET

What I fixed:Glivenko-Cantelli processVasicek processForm interface and Button enable/disableCalculation of histogramsInput control

What I fixed:Form interface and button controlInput controlDrawing of histograms

Code VB.NET

Stochastic Taylor series expansion for produce a sample path solution to the SDE on [t₀,T].

What has been observed
(https://www.researchgate.net/publication/313191865_On_the_convergence_of_the_Euler-Maruyama_method_and_Milstein_scheme_for_the_solution_of_
stochastic_differential_equations)

What I fixed:
Glivenko-Cantelli process
Vasicek process
Form interface and Button enable/disable
Calculation of histograms
Input control

What I fixed:
Form interface and button control
Input control
Drawing of histograms