Homework14_A

Add total variance decomposition and computation of the coefficient of determination (make sure all your computations are done with online algorithms (e.g. with online algorithms (e.g. https://www.johndcook.com/blog/running_regression// etc.).

For online algo U can also see:

https://blog.rapid7.com/2016/10/19/overview-of-online-algorithm-using-standard-deviation-example/
https://francescopigini.wordpress.com/2017/12/14/online-algorithm-for-variance-and-covariance-extra-2/
R2 Coeff of determination

For coeff of determination:

https://en.wikipedia.org/wiki/Coefficient_of_determination

For total variance decomposition:

https://en.wikipedia.org/wiki/Law_of_total_variance

Code VB.Net

https://drive.google.com/file/d/18u8SXRmRItAd97nkQ4Pl7M82iHyZfyLG/view?usp=sharing

 'calcolo per il coefficiente di determinazione

        'Devianza totale
        Dim sum_ess As Double = 0
        For i_ess As Integer = 0 To y_est.Count - 1
            sum_ess += (y_est(i_ess) - current_onlineMeanY) * (y_est(i_ess) - current_onlineMeanY)
        Next

        'Devianza tra gruppi
        Dim sum_tss As Double = 0
        Dim y_obs As New List(Of Double)
        y_obs = listOfBDataP.Select(Function(p) p.x2).ToList
        For i_tss As Integer = 0 To y_obs.Count - 1
            sum_tss += (y_obs(i_tss) - current_onlineMeanY) * (y_obs(i_tss) - current_onlineMeanY)
        Next

        'Devianza entro i gruppi
        Dim sum_rss As Double = 0
        For i_rss As Integer = 0 To y_obs.Count - 1
            sum_rss += (y_obs(i_rss) - y_est(i_rss)) * (y_obs(i_rss) - y_est(i_rss))
        Next

        Dim coeffDet As Double = sum_ess / sum_tss
        Dim coeffDet2 As Double = 1 - sum_rss / sum_tss

Variance and covariance calculation (online algorithm)

 ...
        For Each d In listOfBDataP
            Calc_onlCov(d.x1, d.x2)
            Calc_onVar(d.x1)
        Next
...


 Public Sub Calc_onlCov(x As Double, y As Double)
        n += 1
        Dim dx As Double = x - meanX
        meanX += dx / CDbl(n)
        meanY += (y - meanY) / CDbl(n)
        b_online += dx * (y - meanY)
    End Sub

    Public Sub Calc_onVar(x As Double)
        n_v += 1
        Dim delta As Double = meanX_v + (x - meanX_v) / CDbl(n_v)
        sigma_online += (x - meanX_v) * (x - delta)
        meanX_v = delta
    End Sub

Homework14_A

Generate and represent m paths of n point each (m, n are program parameters), where each point represents a pair of:

  • time index
  • relative frequency of success (i, f(i))
    where f(i) is the sum of i Bernoulli random variables p^x(1-p)^(1-x) observed at the times j=1, …, i.

At time n (last time) and any other inner time 1<j<n (j is a program parameter) create and represent with histogram the distribution of f(i).
At the same times (j and n), compute the absolute and relative frequency of the f(i)’s contained in the interval (p-ε, p+ε), where ε is a program parameter.

Code VB.Net

https://drive.google.com/file/d/1r2JZnVztQlfUp-w0ZCvnNpYgTZ3BU9X_/view?usp=sharing

Homework10_RA

Do a research about the various methods proposed to compute the running median (one pass, online algorithms). Store (cite sources) the algorithm that you think is a good candidate and explaining briefly how it works and possibly show a quick demo.

The running median

The median of a set of integers is the midpoint value of the data set for which an equal number of integers are less than and greater than the value.

  • If we keep each number in a sorted sequence then cost of single entry is O(n) and finding median is O(n).
    A slight modification can be done by keeping the middle pointer and adjusting it based on the insertion on its left side and right side. In that case finding median after insertion is O(1). But the overall cost for finding median still remains O(n) as insertion in sorted sequence is necessary after each number is entered.
  • We can keep two heaps which divides the entered number in two almost equal halves. Half of the number would be greater than the median and the rest would be lesser. The upper half will be maintained in a min heap and the lower half will be maintained in a max heap. In this arrangement we can find out in O(1) time whether a new number would go to the upper half or lower half. All we need to do is to compare the new number with the head of two heaps. After deciding we can insert in a heap in O(log n) time. After this insertion if the heaps are unbalanced, we can just move from one heap to another. which is again of O(log n) complexity. And now we can find the median in O(1) time. If two heaps contain same number of elements then median is the average of the head of two heaps. If one is greater, then median is the head of the larger heap.
import java.util.Comparator;
import java.util.PriorityQueue;
import java.util.Random;


public class RunningMedian
{
    PriorityQueue<Integer> upperQueue;
    PriorityQueue<Integer> lowerQueue;

    public RunningMedian()
    {
        lowerQueue=new PriorityQueue<Integer>(
          20,new Comparator<Integer>()
        {

            @Override
            public int compare(Integer o1, Integer o2)
            {

                return -o1.compareTo(o2);
            }

        });
        upperQueue=new PriorityQueue<Integer>();
        upperQueue.add(Integer.MAX_VALUE);
        lowerQueue.add(Integer.MIN_VALUE);
    }

    public double getMedian(int num)
    {
        //adding the number to proper heap
            if(num>=upperQueue.peek())
                upperQueue.add(num);
            else
                lowerQueue.add(num);
        //balancing the heaps
        if(upperQueue.size()-lowerQueue.size()==2)
            lowerQueue.add(upperQueue.poll());
        else if(lowerQueue.size()-upperQueue.size()==2)
            upperQueue.add(lowerQueue.poll());
        //returning the median
        if(upperQueue.size()==lowerQueue.size())
            return(upperQueue.peek()+lowerQueue.peek())/2.0;
        else if(upperQueue.size()>lowerQueue.size())
            return upperQueue.peek();
        else
            return lowerQueue.peek();

    }
    public static void main(String[] args)
    {
        Random random=new Random();
        RunningMedian runningMedian=new RunningMedian();
        System.out.println("num\tmedian");
        for(int i=0;i<50;++i)
        {
            int num=random.nextInt(100);
            System.out.print(num);
            System.out.print("\t");
            System.out.println(
              runningMedian.getMedian(num));
        }

    }

}

http://www.dsalgo.com/2013/02/RunningMedian.php.html

https://dcc-backup.ligo.org/public/0027/T030168/000/T030168-00.pdf