Lesson_08 – Statistics Course

Homework_12RA

Find out what you have just generated in exercise 18_A. How can you interpret what you see? Can you find out about all the well known distributions that “naturally (and “magically”) arise” in this process ?

Poisson Distribution

A Poisson Process is a model for a series of discrete event where the average time between events is known, but the exact timing of events is random.

A Poisson Process meets the following criteria

Events are independent of each other. The occurrence of one event does not affect the probability another event will occur.
The average rate (events per time period) is constant.
Two events cannot occur at the same time.

A discrete random variable X is said to have a Poisson distribution with parameter λ > 0 if for k = 0, 1, 2, …, the probability mass function of X is given by:

k is the number of occurrences
λ = E(x) = Var(x)

Homework_19A

Consider the general scheme we have used so far to simulate some stochastic processes (such as the relative frequency of success in a sequence of trials, the sample mean and the random walk) and now add this new process to our simulator.
Same scheme as previous program (random walk), except changing the way to compute the values of the paths at each time. Starting from value 0 at time 0, for each of m paths, at each new time compute P(i) = P(i-1) + Random step(i), for i = 1, …, n, where Random step(i) is now a Bernoulli random variable with success probability equal to λ * (1/n) (where λ is a user parameter, eg. 50, 100, …).
At time n (last time) and one (or more) other chosen inner time 1<j<n (j is a program parameter) create and represent with histogram the distribution of P(i).

Represent also the distributions of the following quantities (and any other quantity that you think of interest):
– Distance (time elapsed) of individual jumps from the origin
– Distance (time elapsed) between consecutive jumps

19_A Add to your statistical application, on each variable histogram, and across the scatterplot, 3 lines indicating the 3 quartiles (use online algos for computations).

Update: Code VB.Net v2.0

https://drive.google.com/file/d/1j3oEg5IOTGTaJl7pHzdeTjOuToOulNcl/view?usp=sharing

Code VB.Net

I must fix hinstogram and add distance, so update coming soon.

https://drive.google.com/file/d/1p_8JK3LvxbFk4SnOp73wPlG0z-_Ba2Qp/view?usp=sharing

Homework_20R

General correlation coefficient for ranks and the most common indices that can be derived by it. Can you make some interesting example of computation of these correlation coefficients for ranks?

Ranks

Are the places that observation occupies in the sample order.

A rank correlation is any of several statistics that measure an ordinal association(the relationship between rankings of different ordinal variables or different rankings of the same variable). The “ranking” is the assignment of the ordering labels “first”, “second”, “third”, etc. to different observations of a particular variable.

A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them.

Kendall (1970) showed that his τ and Spearman’s ρ are particular cases of a general correlation coefficient Γ:

Suppose we have a set of n objects characterized by two properties x and y. To any pair of individuals, say i-th and j-th, one can assign x-score a_ij= -a_ij and y-score b_ij= -b_ij, so:

Example

Suppose that two experts order four wines called {a,b,c,d}.
The first expert gives the following order: O₁= [a,c,b,d], which corresponds to the following ranks R₁ = [1,3,2,4].
The second expert orders the wines as O₂ = [a,c,d,b] which corresponds to the following ranks R₂ = [1,4,2,3]. The order given by the first expertis composed of the following 6 ordered pairs:

P₁ = {[a,c], [a,b], [a,d], [c,b], [c,d], [b,d]}

The order given by the second expert is composed of the following6 ordered pairs

P₂ = {[a,c], [a,b], [a,d], [c,b], [c,d], [d,b]}

The set of pairs which are in only one set of ordered pairs is

{[b,d] [d,b]}

which gives a value of d_∆(P₁,P₂) = 2 .
With this value of the symmetric difference distance we compute the value of the Kendall rank correlation coefficient between the order given by these two experts as:

This large value of τ indicates that the two experts strongly agree on their evaluation of the wines (in fact their agree about everything but one pair).

https://www.sciencedirect.com/science/article/pii/S0888613X16300172
https://en.wikipedia.org/wiki/Rank_correlation
https://personal.utdallas.edu/~herve/Abdi-KendallCorrelation2007-pretty.pdf

Homework19_R

Distributions of the order statistics: look on the web for the most simple (but still rigorous) and clear derivations of the distributions, explaining in your own words the methods used.

Order statistics

In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value.
It is among the most fundamental tools in non-parametric statistics and inference(With ranks).

Important special cases of the order statistics are the minimum and maximum value of a sample, and the sample median.
The sample median is a particular quantine: q is equal two, so the distribution is split into two part with the same frequencies.

If we have empirical sample, so, we can order it and find the order empirical sample:

Empirical sample

X₁,X₂,…,X_n

Empirical ordered sample

x₍₁₎,x₍₂₎,…,x_(n)

X_(k)means that is the k-th smallest value.

Note: In the case where the distribution F is continuous, we can make the stronger statement that x₍₁₎ < x₍₂₎ < x₍₃₎ < x₍₄₎ < x₍₅₎

Probability Density

The PDF, by definition, for X_(k)is:

So we can find easily:

Density for the FIRST ordinate statistic:

P{x₍₁₎ ∈ (x, x+dx)} = P(one of the X’s ∈ (x, x + dx) and all others > x) = they are iid => nf(x)dx (1 − F(x))ⁿ⁻¹ with:

nf(x)dx from definition of the density (P{one of the X’s ∈ (x, x + dx)})
(1 − F(x))ⁿ⁻¹ (P{all others n-1 obs > x + dx }) -> note that is the complement of the CDF

Density for the MAX order statistic:

P{x_(n) ∈ (x, x+dx)} = P(one of the X’s ∈ (x, x + dx) and all others < x) = they are iid => nf(x)dx F(x)ⁿ⁻¹ with:

nf(x)dx from definition of the density (P{one of the X’s ∈ (x, x + dx)})
F(x)ⁿ⁻¹ (P{all others n-1 obs < x + dx }) -> note that is the CDF

Density for GENERAL case:

We must calculate density for the general case: this is a combinatory problem. (We have different way to distribute the observation in both side)

P{x_(k) ∈ (x , x + dx)} =P(one of the X’s ∈ (x, x + dx) and exactly k−1 of the others < x):

Note that this looks like the Beta Distribution(r,s).

We have the binomial coefficient because we must choose k-1 elements from n-1 elements.

https://www2.stat.duke.edu/courses/Spring12/sta104.1/Lectures/Lec15.pdf
https://en.wikipedia.org/wiki/Order_statistic
https://en.wikipedia.org/wiki/Binomial_coefficient

Homework in collaboration with Luca Scarmozzino: https://stats4cyber.wordpress.com/