Lesson_05 – Statistics Course

Homework12_A

Add regression lines to your revised statistical application (parser + statistical/charting engine).

Code VB.Net

https://drive.google.com/file/d/1TFEmibUFSHRXVqiH9frzuWL0XIIPsjli/view?usp=sharing

To calculate regression line we use:

y = a + bx

Dim x_prec As Double = listOfBDataP(0).x1
        Dim y_dipPrec As Double = a + b * x_prec

        For Each d In listOfBDataP.Skip(1)
            Dim y_dipCurr As Double = a + b * d.x1
            Dim p_xPrec As Double = X_Viewport(x_prec, viewport, min_x, range_x)
            Dim p_yPrec As Double = Y_Viewport(y_dipPrec, viewport, min_y, range_y)
            Dim p_xCurr As Double = X_Viewport(d.x1, viewport, min_x, range_x)
            Dim p_yCurr As Double = Y_Viewport(y_dipCurr, viewport, min_y, range_y)
            gc.DrawLine(Pens.Red, New Point(p_xPrec, p_yPrec), New Point(p_xCurr, p_yCurr))
            x_prec = d.x1
            y_dipPrec = y_dipCurr
        Next

where

b = COV(X,Y)/variance
a = y – bx
(y and x are the average of Y and X value)

Remember:

  Dim sigma = calc_var()
        Dim b As Double = Calc_cov() / sigma
        Dim a As Double = current_onlineMeanY - b * current_onlineMeanX

  Public Function Calc_cov() As Double
        Dim sum As Double
        Dim n As Double = listOfBDataP.Count
        For Each d In listOfBDataP
            sum += d.x1 * d.x2
        Next
        Return (sum - n * current_onlineMeanX * current_onlineMeanY) / (n - 1)
    End Function

    Public Function Calc_var() As Double
        Dim sigma As Double
        For Each d In listOfBDataP
            sigma += Math.Pow((d.x1 - current_onlineMeanX), 2)
        Next

        Return sigma / (listOfBDataP.Count - 1)
    End Function

https://www.webtutordimatematica.it/materie/statistica-e-probabilita/modelli-di-regressione/regressione-lineare-semplice/calcolo-parametri-retta-regressione

Make a short demonstrative program where you apply both the Riemann and Lebesgue approach to integration to compute numerically (with an increasingly large number of subdivisions) the integral on a bounded continuous function of your choice and compare the results. [Optionally, show with an animation, using the graphics object, the convergence to a limit, as the number of subdivisions of the function domain (for Riemann) or range (for Lebesgue) increases.]

Update: Code VB.Net v2.0

https://drive.google.com/file/d/1wH6vjj-yGOwr3C8F_GePlK2EpNK2HS-7/view?usp=sharing

Code VB.Net

https://drive.google.com/file/d/1m_H-ZOTsFOgt_2liKjBmuKxZbNFeZkqj/view?usp=sharing

Riemann integral(left) and Lesbesgue integral(right)

'take x_1 and x_2 that denote 'the range of function' in x axis
'take interval to divide the axis(in this case y axis)
'take function f to map the x in y axis with respect to the given function
'take g as inverse to f to return to the x axis

Public Function Lebesgue_int(x_1 As Integer, x_2 As Integer, interval As Integer, f As Func(Of Double, Double), g As Func(Of Double, Double)) As Double

...

'calculate the 'interval size' as height of horizontal rectangle
Dim dy As Double = range / CDbl(interval)

...

'calculate the area of rectangle and sum it
sum += dy * (x_2 - g(low_bnd))

...

End Function

While the Riemann integral considers the area under a curve as made out of vertical rectangles, the Lebesgue definition considers horizontal slabs that are not necessarily just rectangles, and so it is more flexible. For this reason, the Lebesgue definition makes it possible to calculate integrals for a broader class of functions.

One approach to constructing the Lebesgue integral is to make use of so-called simple functions: finite real-linear combinations of indicator functions. Simple functions can be used to approximate a measurable function, by partitioning the range into layers. The integral of a simple function is equal to the measure of a given layer, times the height of that layer.

https://www.quantstart.com/articles/Function-Objects-Functors-in-C-Part-1/
https://stackoverflow.com/questions/29318682/integration-with-riemann-sum-python
https://towardsdatascience.com/integrals-are-easy-visualized-riemann-integration-in-python-87dd02e90994
https://commons.wikimedia.org/wiki/File:Lebesgue_Integration_and_Lower_Sums.gif
https://en.wikipedia.org/wiki/Lebesgue_integration

Homework16_R

Do some practical examples where you explain how the elements of an abstract probability space relates to more concrete concepts when doing statistics.

Statistics is the discipline that concerns the collection, organization, analysis, interpretation and presentation of data. When data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples.

A probability space or a probability triple ( Ω , F , P ) is a mathematical construct that provides a formal model of a random process or “experiment”.
In order to provide a sensible model of probability, these elements must satisfy a number of axioms.

We have descriptive statistics and inferential statistics.

In inferential statistics we need the concept of probability to draw conclusions from data. Probability theory provides a mathematical structure for statistical inference.
Once we make our best statistical guess about what the probability model is (what the rules are), based on looking backward, we can then use that probability model to predict the future.
The purpose of statistics is to make inference about unknown quantities from samples of data.

Statistics is applied to situations in which we have questions that cannot be answered definitively, typically because of variation in data.
Probability is used to model the variation observed in the data. Statistical inference is concerned with using the observed data to help identify the true proba-bility distribution (or distributions) producing this variation and thus gain insightinto the answers to the questions of interest

https://people.montefiore.uliege.be/kvansteen/MATH0008-2/ac20112012/Class4/Chapter4_ac1112_v5a2.pdf
https://www.britannica.com/science/probability/Risks-expectations-and-fair-contracts
https://towardsdatascience.com/basic-probability-theory-and-statistics-3105ab637213
http://www.utstat.toronto.edu/mikevans/jeffrosenthal/book.pdf (section 5.1)
https://statanalytica.com/blog/uses-of-statistics/

Homework9_RA

Do a research about the various methods to generate, from a Uniform([0,1)), all the most important random variables (discrete and continuous).

Random Variable

Informally is a variable that is described as a variable whose values depend on outcomes of a random phenomenon.

In formal a random variable is understood as a measurable function defined on a probability space that maps from the sample space to the real numbers.
So the random variable is defined as a function, that must be measurable, which performs the mapping of the outcomes of a random process to a numeric value.

Random variables can be either discrete or continuous.

Not long after research began at RAND in 1946, the need arose for random numbers that could be used to solve problems of various kinds of experimental probability procedures. These applications, called Monte Carlo methods(therefore, a class of techniques for randomly sampling a probability distribution), required a large supply of random digits and normal deviates of high quality.

Random number generators

The purpose of random number generators (RNGs) is to produce sequences of numbers that appear as if they were generated randomly from a specified probability distribution.

A random number generator produces truly random numbers (the results are unpredictable). These are generally produced by physical devices also known as noise generator which are coupled with a computer. In computing, an apparatus that produces random numbers from a physical process is called a hardware random number generator or TRNG (for true random number generator).
Computers are deterministic in nature so producing truly random numbers with a computer is challenging, which is why we generally resort to using noise generators if we need “true” randomness. However, what we can do on a computer, is develop some sort of algorithm for generating a sequence of numbers that approximates the properties of random numbers.
When numbers are produced by some sort of algorithm or formula that simulates the values of a random variable X, they are called pseudorandom numbers. And the algorithm is called a pseudorandom number generator (or PRNG). The term “simulate” here is important: it simply means that the algorithm can generate sequences of numbers which have statistical properties that are similar (and this can be tested) to that of the random variable we want to simulate. For instance if we need to simulate a random variable X with probability distribution D, then we will need to test whether the sequence of numbers produced by our PRNG has the same distribution.

For some applications, such as cryptography, it is necessary to have pseudo-random number sequences for which prediction is computationally infeasible.

Uniform random generator

The binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success/yes (with probability p) or failure/no(with probability q = 1 − p).

The Bernoulli distribution, (a special case of the binomial distribution where a single trial is conducted (so n would be 1 for such a binomial distribution)),

Image for post — Bernoulli Distribution:
https://en.wikipedia.org/wiki/Bernoulli_distribution

is the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q = 1 − p.

The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

The exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless.

The general form of Normal Distribution.
https://en.wikipedia.org/wiki/Normal_distribution

A normal (or Gaussian or Gauss or Laplace–Gauss) distribution is a type of continuous probability distribution for a real-valued random variable.

https://towardsdatascience.com/understanding-random-variable-a618a2e99b93
https://towardsdatascience.com/how-to-generate-random-variables-from-scratch-no-library-used-4b71eb3c8dc7
https://www.britannica.com/science/statistics/Random-variables-and-probability-distributions
https://statweb.stanford.edu/~owen/mc/Ch-unifrng.pdf
https://www.scratchapixel.com/lessons/mathematics-physics-for-computer-graphics/monte-carlo-methods-in-practice/generating-random-numbers
https://machinelearningmastery.com/monte-carlo-sampling-for-probability/
https://en.wikipedia.org/wiki/Poisson_distribution

Homework14_R

Think and explain in your own words what is the role that probability plays in Statistics and the relation between “empirical” objects – such as the observed distribution and frequencies etc – and “theoretical” counterparts.

Probabilities are numbers that reflect the likelihood that a particular event will occur. A probability of 0 indicates that there is no chance that a particular event will occur, whereas a probability of 1 indicates that an event is certain to occur.
While in descriptive statistics we have been taking in term of frequency a mean description of a well know population, in inferential statistics we need the concept of probability, because provide information about a small data selection, so you can use this information to infer something about the larger dataset from which it was taken.

Empirical and theoretical

The probability is an abstraction of frequencies.
In inference we take an evidence(empirical distribution) and we want identify the shape more likely of the evidence.
So the goal is find the ‘State of nature’ more likely that have generate my sample.
The sample can come from a number of infinite population that have different caratteristics that define the theoretical model.

https://stats.stackexchange.com/questions/237237/what-is-the-difference-between-the-theoretical-distribution-and-the-empirical-di

Homework15_R

Explain how parametric inference works and the main ideas of statistical induction, including the role of Bayes theorem and the different approach between “bayesian” and “frequentist”.

Inferential statistics

You have a population which is too large to study fully, so you use statistical techniques to estimate its properties from samples taken from that population.
So, in inferential statistics, we try to infer something about a population from data coming from a sample taken from it. We need the concept of probability.

Statistical inference is a particular frame of Statistical induction.
Induction is a method of reasoning in which the premises are viewed as supplying some evidence, but not full assurance, for the truth of the conclusion. Inductive reasoning is distinct from deductive reasoning, where the conclusion of a deductive argument is certain.

Parametric statistics is a branch of statistics which assumes that sample data comes from a population that can be adequately modeled by a probability distribution that has a fixed set of parameters.
Example:
The normal family of distributions all have the same general shape and are parameterized by mean and standard deviation. That means that if the mean and standard deviation are known and if the distribution is normal, the probability of any future observation lying in a given range is known.

Frequentist and bayesian approach

Bayes' theorem given as an equation with arrows showing the terms "prior", "posterior", "likelihood", and "evidence". — https://www.probabilisticworld.com/frequentist-bayesian-approaches-inferential-statistics/

The problem is that we don’t know the prior probability. So, the inferential statistics splits in two:

give it an uniform distribution
give it a shape

The frequentist way

Sampling is infinite and decision rules can be sharp. Data are a repeatable random sample – there is a frequency. Underlying parameters are fixed i.e. they remain constant during this repeatable sampling process.

The Bayesian way

Unknown quantities are treated probabilistically and the state of the world can always be updated. Data are observed from the realised sample. Parameters are unknown and described probabilistically. It is the data which are fixed.

Are you a Bayesian or a Frequentist?

You have a coin that when flipped ends up head with probability p and ends up tail with probability 1−p. (The value of p is unknown.)

Trying to estimate p, you flip the coin 14 times. It ends up head 10 times.

Then you have to decide on the following event: “In the next two tosses we will get two heads in a row.”

Would you bet that the event will happen or that it will not happen?

Using frequentist statistics, we would say that the best (maximum likelihood) estimate for p is p=10/14.
In this case, the probability of two heads is 0.714

The Bayesian approach, instead, treat p as a random variable with its own distribution of possible values.
The distribution can be defined by the existing evidence. The logic goes as follows. What is the probability of a given value of p, given the data? We find it, with the Bayes thm.

https://www.behind-the-enemy-lines.com/2008/01/are-you-bayesian-or-frequentist-or.html

https://www.youtube.com/watch?v=TSkDZbGS94k
https://en.wikipedia.org/wiki/Confidence_interval
https://link.springer.com/chapter/10.1007/978-0-387-09612-4_9
https://en.wikipedia.org/wiki/Frequentist_inference
https://www.nasa.gov/consortium/FrequentistInference
https://en.wikipedia.org/wiki/Bayesian_inference
https://en.wikipedia.org/wiki/Parametric_statistics
https://en.wikipedia.org/wiki/Inductive_reasoning
https://stats.stackexchange.com/questions/22/bayesian-and-frequentist-reasoning-in-plain-english

Archivi delle categorie:Lesson_05

Homework12_A

Code VB.Net

Homework11_A

Update: Code VB.Net v2.0

Code VB.Net

Homework16_R

Homework9_RA

Random Variable

Random number generators

Uniform random generator

Homework14_R

Empirical and theoretical

Homework15_R

Inferential statistics

Frequentist and bayesian approach

Are you a Bayesian or a Frequentist?