Descriptive Statistics

Written by Piyush Bhartiya, MBA

Published on Mon, February 17, 2020 3:43 PM • Updated on Sat, July 4, 2020 12:36 PM • 10 mins read

A descriptive statistic is a summary statistic that quantitatively describes or summarizes features from a collection of information, while descriptive statistics (in the mass noun sense) is the process of using and analyzing those statistics. Descriptive statistics involves summarizing and organizing the data so they can be easily understood. Descriptive statistics, unlike inferential statistics, seeks to describe the data, but do not attempt to make inferences from the sample to the whole population.

Here, we typically describe the data in a sample. This generally means that descriptive statistics, unlike inferential statistics, is not developed on the basis of probability theory. Descriptive statistics provide simple summaries about the sample and about the observations that have been made. Such summaries may be either quantitative, i.e. summary statistics, or visual, i.e. simple-to-understand graphs. For example, the shooting percentage in basketball is a descriptive statistic that summarizes the performance of a player or a team. This number is the number of shots made divided by the number of shots taken.

The use of descriptive and summary statistics has an extensive history and, indeed, the simple tabulation of populations and of economic data was the first way the topic of statistics appeared. More recently, a collection of summarization techniques has been formulated under the heading of exploratory data analysis: an example of such a technique is the box plot. In the business world, descriptive statistics provide a useful summary of many types of data.

Descriptive statistics and inferential statistics

In today’s fast-paced world, statistics is playing a major role in the field of research; that helps in the collection, analysis, and presentation of data in a measurable form. It is quite hard to identify, whether the research relies on descriptive statistics or inferential statistics, as people usually lack knowledge about these two branches of statistics. As the name suggests, descriptive statistics is one that describes the population.

On the other end, inferential statistics are used to make the generalization about the population based on the samples. So, there is a big difference between descriptive and inferential statistics, i.e. what you do with your data. With inferential statistics, you take data from samples and make generalizations about a population. For example, you might stand in a mall and ask a sample of 100 people if they like shopping at Sears.

Inferential statistics are often used to compare the differences between the treatment groups.
Inferential statistics use measurements from the sample of subjects in the experiment to compare the treatment groups and make generalizations about the larger population of subjects.
Descriptive statistics uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables.
Inferential statistics make inferences and predictions about a population based on a sample of data taken from the population in question.

Example of descriptive statistics

Descriptive statistics summarizes or describes characteristics of a data set. Descriptive statistics consists of two basic categories of measures: measures of central tendency and measures of variability or spread. Measures of central tendency describe the center of a data set. Measures of variability or spread describe the dispersion of data within the set.

Types of descriptive statistics

There are 4 different types of descriptive statistics

Measures of Frequency:

Count, Percent, Frequency
Shows how often something occurs
Use this when you want to show how often a response is given.

Measures of Central Tendency

Mean, Median, and Mode
Locates the distribution by various points
Use this when you want to show how an average or most commonly indicated response

Measures of Dispersion or Variation

Range, Variance, Standard Deviation
Identifies the spread of scores by stating intervals
Range = High/Low points
Variance or Standard Deviation = difference between the observed score and mean
Use this when you want to show how “spread out” the data are. It is helpful to know when your data are so spread out that it affects the mean

Measures of Position

Percentile Ranks, Quartile Ranks
Describes how scores fall in relation to one another. Relies on standardized scores
Use this when you need to compare scores to a normalized score (e.g., a national norm)

Difference between inferential statistics and descriptive statistics

DESCRIPTIVE STATISTICS	INFERENTIAL STATISTICS
It is used to summarise and graph the data for a group that you choose.	It takes data from the sample and makes inferences about the large population.
It is used to describe a sample that is pretty straightforward	It draws conclusion from sample
It involves a large number of data samples and reducing them to a few summary values.	We need to have confidence in our sample that accurately defines a population.
It helps to gain more insights and visualize the data.	There are many requirements which affect the process.
The materials and data are collected are raw in nature.	Random sampling is convenient and more preferred.

Descriptive statistics in spss

Descriptive statistics can be used to summarize the data. If your data is categorical, try the frequencies or crosstabs procedures. If your data is scale level, try summaries or descriptive. If you have multiple response questions, use multiple response sets. The Summarize procedure can be used to get descriptive information about data.

Descriptive statistics in psychology

The field of statistics is often misunderstood, but it plays an essential role in our everyday lives. Statistics, done correctly, allows us to extract knowledge from the vague, complex, and difficult real world. Wielded incorrectly, statistics can be used to harm and mislead. A clear understanding of statistics and the meanings of various statistical measures is important to distinguish between truth and misdirection.

Python’s standard library does not support a median function; we can still find the median using many different and other processes. We must remember that the mean is calculated by summing up all the values we want and dividing by the number of items, while the median is found by simply rearranging items. If we have outliers in our data, items that are much higher or lower than the other values, it can have an adverse effect on the mean. That is to say, the mean is not robust to outliers. The median, not having to look at outliers, is robust to them.

It’s easy to get mired in the equations and details of statistical equations, but it’s important to understand what these concepts represent. One should explore the minor and minute details behind basic descriptive statistics while looking at some wine data to ground the concepts.

Descriptive statistics interpretation

Interpretation uses the mean to describe the sample with a single value that represents the center of the data. Many statistical analyses use the mean as a standard measure of the center of the distribution of the data. But unusual values, called outliers, affect the median less than they affect the mean.

Descriptive statistics for data science

Descriptive Statistical Analysis helps you to understand your data and is a very important part of Machine Learning. This is due to Machine Learning being all about making predictions. On the other hand, statistics is all about drawing conclusions from data, which is a necessary initial step. In this post you will learn about the most important descriptive statistical concepts. They will help you understand better what your data is trying to tell you, which will result in an overall better machine learning model and understanding.

Doing a descriptive statistical analysis of your dataset is absolutely crucial. A lot of people skip this part and therefore lose a lot of valuable insights about their data, which often leads to wrong conclusions.
Take your time and carefully run descriptive statistics and make sure that the data meets the requirements to do further analysis.
Statistics is a branch of mathematics that deals with collecting, interpreting, organization, and interpretation of data.
In Descriptive Statistics, you are describing, presenting, summarizing, and organizing your data (population), either through numerical calculations or graphs or tables.

Descriptive statistics book

There are many books available both online and offline printed by various publishing houses. Some of them are listed below-

Fundamentals of descriptive statistics
Statistics with JMP
Understanding and using statistics in psychology
Naked statistics
Descriptive and inferential statistics
Statistics workbook for dummies
Statistics in MATLAB
Statistics in plain English
How to lie with statistics
Principles of statistics

How to write a descriptive statistics analysis?

Complete the following steps to interpret descriptive statistics. Key output includes N, the mean, the median, the standard deviation, and several graphs.

Step 1: Describe the size of your sample

You should collect a medium to a large sample of data. Samples that have at least 20 observations are often adequate to represent the distribution of your data. However, to better represent the distribution with a histogram, some practitioners recommend that you have at least 50 observations. Larger samples also provide more precise estimates of the process parameters, such as the mean and standard deviation.

Step 2: Describe the center of your data

Use the mean to describe the sample with a single value that represents the center of the data. Many statistical analyses use the mean as a standard measure of the center of the distribution of the data. The median and the mean both measure central tendency. But unusual values, called outliers, affect the median less than they affect the mean. When you have unusual values, you can compare the mean and the median to decide which the better measure to use is. If your data are symmetric, the mean and median are similar.

Step 3: Describe the spread of your data

Use the standard deviation to determine how spread out the data is from the mean. A higher standard deviation value indicates a greater spread in the data.

Step 4: Assess the shape and spread of your data distribution

Use the histogram, the individual value plot, and the box plot to assess the shape and spread of the data, and to identify any potential outliers. Examine the shape of your data to determine whether your data appear to be skewed. When data are skewed, the majority of the data are located on the high or low side of the graph. Often, skewness is easiest to detect with a histogram.

For example, a manager at a bank collects wait time data and creates a simple histogram. The histogram appears to have two peaks. After further investigation, the manager determines that the wait times for customers who are cashing checks is shorter than the wait time for customers who are applying for home equity loans. The manager adds a group variable for customer tasks and then creates a histogram with groups.

Look for outliers-

Outliers, which are data values that are far away from other data values, can strongly affect the results of your analysis. Often, outliers are easiest to identify on a box plot.