Homework4_R

A characteristic (or attribute or feature or property) of the units of observation can be measured and operationalized on different “levels”, on a given unit of observation, giving rise to possible different operative variables. Find out about the proposed classifications of variables and express your opinion about their respective usefulness

Level of measurement

Also called scale of measure is a classification that describes the nature of information within the values assigned to variables.
Stanley Smith Stevens developed the best-known classification with four levels, or scales, of measurement.

More in general, we have:

Qualitative/Categorical

  • Nominal: that cannot be put in any order
  • Ordinal: wich, even if they aren’t numbers, can be order and still does not allow for relative degree of difference between them

Quantitative/Numerical

  • Interval: the difference is meaningful(Numbers have order, like ordinal, but there are also equal intervals between adjacent categories)
  • Ratio: Differences are meaningful(Linke interval) but there is also a true zero point

Usefullness

While these levels are reasonable, they are not exhaustive. Other statisticians have proposed new typologies, but this seem the most used, because the extended levels of measurement are rarely used outside of academic geography.

We need to pay attention, cause can be that the same variable may be a different scale type depending on how it is measured and on the goals of the analysis:

Age usually is Ratio Data(Quantitaive), but in some case we can think to the age how Qualitative.

Example of advantage and disadvantage

Ordinal measurement is normally used for surveys and questionnaires. Statistical analysis is applied to the responses once they are collected to place the people who took the survey into the various categories. The data is then compared to draw inferences and conclusions about the whole surveyed population with regard to the specific variables. The advantage of using ordinal measurement is ease of collation and categorization. If you ask a survey question without providing the variables, the answers are likely to be so diverse they cannot be converted to statistics.

The same characteristics of ordinal measurement that create its advantages also create certain disadvantages. The responses are often so narrow in relation to the question that they create or magnify bias that is not factored into the survey. For example, on the question about satisfaction with the governor, people might be satisfied with his job performance but upset about a recent sex scandal. The survey question might lead respondents to state their dissatisfaction about the scandal, in spite of satisfaction with his job performance — but the statistical conclusion will not differentiate.

Statistics and geostatistics

https://petrowiki.org/Statistical_concepts

Sources:

https://en.wikipedia.org/wiki/Level_of_measurement

https://www.youtube.com/watch?v=KIBZUk39ncI

https://www.youtube.com/watch?v=eghn__C7JLQ

https://sciencing.com/advantages-disadvantages-using-ordinal-measurement-12043783.html

Homework 3_R

A frequency distribution

The frequency distribution of events is the number of times each event occourred in an experiment or study. There isn’t “best” number of bins, and different bin size can reveal different features of the data.

A frequency distribution provides a visual representation for the distribution observations within a particular test, with the goal to simplify the organization and presentation of data. Some of the graphs that can be used with frequency distributions are histograms, line charts, bar charts and pie charts. Frequency distributions are used for both qualitative and quantitative data.

Speaking of “univariate

Univariate is a term commoly used in statistics to descrive a type of data which consists of observations on only a single characteristic or attribute. So, in other words, the data has only one variable.

Information: From dataset to frequency distribution

A frequency distribution provides a snapshot view of the characteristics of a dataset. It paks information provided by dataset into ‘buckets’ or ranges, changing the amount of information that we can see.


Even if we gain knowlege, we lose information, so we can’t reconstruct a dataset given a distribution.

References:

Frequency distribution

Frequency distribution Wiki

Univariate

For further information

Homework 2_R

Variables and dataset

A variariable is an attribute that describes a person, place, thing or idea and its value can change from one entity to another.

Variables can be classified as:

  • Qualitative: Values that are names or labels
  • Quantitative: Numeric variables, that represent a mesaurable quantity

and also as:

  • Discrete: the set of values ​​it can take is finite or countable
  • Continuos: if the set of values ​​that it can assume is the set of real numbers or a range of real numbers

A dataset is a collection of data (information that are collected through observation of some object, called statistical unit).
A formal definition is: a dataset is a collection of rows or a collection of colums, where:

  • the rows represent all the attributes for one unit
  • the colums represent one attribute for all units

The values may be numbers or nominal data. More generally, values may be of any of the kinds described as a level of measurement.

Generation of a dataset

In statistics, datasets usually come from actual observations obtained by sampling a statistical population, and each row corresponds to the observations on one element of that poupolation.

Datasets may further be generated by algorithms for the purpose of testing certain kinds of software.

References:

Variables and attributes

Datasets

Homework 1_R

Statistical Population

A population is a set of similar items or events which is of interest for some question or experiment.

A group of existing objects or group of objects conceived as a generalization from experience are said Statistical Population.

Statistical Inference

Statistical inference is the process of using data analysis (a set of processes for modeling data with the goal of discovering useful information, providing conclusions and supporting decision-making) to deduce properties of an underlyning distribution of probability(matematical function that gives the probabilities of occurence of different possible outcomes for an experiment).

We need to define the population and then devise a sampling plan that produces a representative sample. The statistical results incorporate the uncertainty that is inherent in using a sample to understand an entire population.

Therefore, the population observed is represented by a subset of a popolation (a sample).

Descriptive Statistics

For descriptive statistics, we choose a group that we want to describe and then measure all subjects in that group. The statistical summary describes this group with complete certainty.

There is no uncertainty because you are describing only the people or items that you actually measure. You’re not trying to infer properties about a larger population.

In this case, the population isn’t a sample and the result rapresentation is limited of the observed group.

Summarize

Descriptive statistics: techniques used to summarize, organize and simplify data observed in interest population.

Inferential Statistics: techniques used to study samples and then make generalizations about the populations from wich they were selected.

References:

Statistical_population

Difference between descriptive and inferential statistics

Population in descriptive and in inferential

General description