Welcome, Guest: Join Nairaland / LOGIN! / Trending / Recent / New
Stats: 2,638,308 members, 6,172,379 topics. Date: Saturday, 27 February 2021 at 07:02 PM

Applied Statistics For Data Analysis/science - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / Applied Statistics For Data Analysis/science (601 Views)

Free Udemy Courses 100% Free [coupon Codes Applied] / Programming Not Enough Anymore: Data Analysis, Ml And A.i Is The Future. / English Premier League Players Data Analysis 2017/2018 Season (2) (3) (4)

(1) (Reply) (Go Down)

Applied Statistics For Data Analysis/science by ibromodzi: 8:38am On Dec 11, 2020
Data analysis involves inspecting data to gain insights that inform conclusions and impact decision making. The theoretical framework of data analysis is strongly built on statistics and logical techniques (mathematics) while the implementation of its concepts heavily relies on computer science. These three fields are what gave birth to data analysis/ science and as such, this tutorial series is focused on highlighting the important statistical concepts that are employed in different data analytics tasks with implementation in statistical tools of choice of the readers.

I have come in contact with a sizeable number of people who are just finding their way into data science and I've noticed that most people use the top-down approach whereby they first concentrate on using different statistical tools while paying little or no attention to the underlining theoretical concepts upon which these principles are built. The implication of this is that many start to question the reason why they got into data science in the first place because they find it somehow difficult to pinpoint the kind of problems they are solving with these tools or the exact questions they are trying to answer with the data.
if you find yourself in this category, don't be infuriated, I was once in the same dilemma, just take a deep breath, grab your tools, and follow along in this series.

2 Likes 2 Shares

Re: Applied Statistics For Data Analysis/science by ibromodzi: 6:40pm On Dec 12, 2020
Topics to cover
We are going to cover several topics that border on descriptive and inferential statistics.
Descripive statisitcis
1. Types of variables
2. Measure of central tendency
3. Measure of spread
3. Graphs and plots

Inferential statistics
1. Hypothesis testing
2. Parametric assumptions
3. Sample inference (one sample and two samples)
4. Chi-square test of independence
5. One-Way ANOVA
6. Linear regression and correlation
7. Logistic regression

Tools to use
The tremendous improvement in technology has made it possible to implement almost any statistical concept using different tools. In light of this, you can follow this series using any tool(s) of your choice, but personally, I'll be combining a number of Python libraries together with Microsoft Excel.



3 Likes 1 Share

Re: Applied Statistics For Data Analysis/science by ibromodzi: 3:54pm On Dec 14, 2020
Series 1: Variables
At the end of this series, you should be able to:
1. Define variables
2. Know different types of variables
3. Know the difference between variables and constants
4. Know what explanatory and response variables are

What are variables?

Variables are characteristics that are measured and can take on different values. In other words, something that varies between cases or observations. In contrast, a constant always remains unchanged for all observations in a research study. Let's take on few examples to understand this better.

Example 1: A researcher wants to study the relationship between the educational qualification and the level of awareness of COVID-19 protocols in a sample of 100 male passengers. The variables are;
(a) Educational qualification which could range from none to tertiary education
(b) Awareness level which could be defined using Likert scale
Also, we have 100 observations/cases, biological sex (male) is a constant.

Types of Variable

1. Categorical variable: they are names or labels (e.g gender, race, state) with no logical order or with a logical order but inconsistent differences between groups (e.g., rankings). Categorical variables are also known as qualitative variables.

2. Numerical variables: they are variables with quantifiable measurements e.g height, weight, and average rainfall. They are also known as quantitative variables.

Example 2: A team of clinical researchers want to study the relationship between age and obesity. Weight here can be quantified either in Kilogram or other units, it is therefore a quantitative(numerical) variable while gender is a category (or label) and is therefore a categorical (qualitative) variable.

Variables can as well be grouped into explanatory (independent) and response (dependent) variables. In such a case, we are trying to use one variable to predict or explain the difference in another variable.

Example 3: A researcher wants to predict nutritional status using racial origin. He then takes a random sample of 100 individuals of distinct race. The explanatory variable here is race and the response variable is nutritional status.

The next series is going to be on how to describe different variables.


Re: Applied Statistics For Data Analysis/science by ibromodzi: 7:46pm On Dec 18, 2020
Series Two
At the end of this series, you should be able to:
1. Know what measure of central tendency is
2. How to describe your data using measure of central tendency
3. Know the appropriate measure to use in describing your data

What are measures of central tendency?
A measure of central tendency is a value that attempts to describe a set of data by identifying the central position within that set of data. Therefore, measures of central tendency are also referred to as measures of central location. Because they try to give us the summary of our data, measure of central tendency are also referred to as summary statistics. The most commonly encountered measure of central tendency is the mean (also called average), but this is not the only measure as we also have the mode and the median. Now let us look at what different measure tells us about our data.

The mean is the average of all the values in the dataset; it is calculated by summing up the values which is then divided by the number of the values. For example, if we collect the age of five boys as 9, 12, 7, 8, 10, the mean is the addition of these values (46) divided by the number of the boys (i.e 5) which gives us 9.2. The mean therefore describes the most common value in your data. This, however, is rarely the actual value observed in your data.
Despite being the most common measure of central tendency, mean has two major drawbacks;

(a) Susceptibility to outliers: Outliers are unusually large or small numerical values compared to the rest of the data set. For example, if the wages of developers is a tech company is 30k, 37k, 45k, 16k, 25k and 100k. The mean salary for this six staff will be 169.9k. However, careful inspection of the raw data will reveal that the mean value might not be the best measure to describe this data as most wages fall in the 30 - 45k range. The mean is being affected by two small and large values. In this situation, using a better measure of central tendency (such as the median)should be considered.

(b) Skewness: When the dataset is heavily tailed to one side, the mean does not best describe the data as it loses its ability to provide the best central location for the data because the skewed data is dragging it away from the typical value.

When our data is arranged in order of magnitude, the median is the middle score. It is less affected by skewed data and outliers.

The mode is the most frequent score in our data set.

Let's see how we can get these measures using Python

import statistics # for calculating our mean, median and mode

def calc_measure():
# Let's create a random list
num_list = [4,2,1,9,3,6,4]

# we can calculate the measures here
mean = statistics.mean(num_list)
median = statistics.median(num_list)
mode = statistics.mode(num_list)

print(f"The mean is {mean}"wink
print(f"The median is {median}"wink
print(f"The mode is {mode}"wink


Challenge: implement mean, median and mode in Python without using any library


Re: Applied Statistics For Data Analysis/science by vakjay08(m): 12:31pm On Dec 20, 2020
well done boss
Re: Applied Statistics For Data Analysis/science by StevDesmond(m): 1:11pm On Dec 20, 2020
Re: Applied Statistics For Data Analysis/science by Kingray10: 11:45pm On Dec 21, 2020
I want to learn data analysis, please can you guide me through
I don't know where to start with.
It's kind of broad...
Re: Applied Statistics For Data Analysis/science by kingreign(m): 1:18pm On Feb 06

Hello, you tried contacting me via the mail feature. Pls contact me via my mobile number. 07034581213.
Re: Applied Statistics For Data Analysis/science by ibromodzi: 6:57am On Feb 22
Hmmm, it has been a while here.
Re: Applied Statistics For Data Analysis/science by ibromodzi: 7:00am On Feb 22
I want to learn data analysis, please can you guide me through
I don't know where to start with.
It's kind of broad...

Where exactly do you need help? You can go through the Chronicles thread created by sir Ejiod, it has a load of resources that can help you. Meanwhile, you can send a pm via email if you want to have a discussion.

(1) (Reply)

Blockchain/Fintech Developer In Nigeria or Africa For your Coin/Fintech Project / Integration Of Ms Outlook To Helpdesk Software / Help With Cgi Scripts

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2021 Oluwaseun Osewa. All rights reserved. See How To Advertise. 84
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.