Re: Chronicle Of A Data Scientist/analyst by Dahyormine(m): 1:29pm On Sep 28 
Sellout: Who has taken the Microsoft Data Analyst associate certificate? I'm planning on taking it and what to generally just know. Thank you How much is it? 
Re: Chronicle Of A Data Scientist/analyst by Sellout: 6:17pm On Sep 28 

Re: Chronicle Of A Data Scientist/analyst by Dahyormine(m): 6:34pm On Sep 28 
Sellout: $165 Thats pretty cheap, but don't know how recognized it is 
Re: Chronicle Of A Data Scientist/analyst by Sellout: 7:50pm On Sep 28 
Dahyormine: Thats pretty cheap, but don't know how recognized it is it is recognized. tho its just an entry level certificate. 
Re: Chronicle Of A Data Scientist/analyst by KlausMichaelson: 8:04pm On Sep 28 
Good day House.
I really need someone to enlighten me in the real application of Data Analysis in real life. I mean, I actually know how to provide a good Visualization for any Data, I can use the different functions in Excel to an extent. I can use the Statistical Analysis part of it especially when trying to analyze a typical student's project.
But my Question is, "is that all?" Come to think of it, if I actually get employed as a Data analyst, Do I just need to provide a Visualization for the company to see how far different factors(Demand, supply bla bla) are playing out at the end of the month?
I honestly don't get the whole thing. The only application I have come to Realize so far is based on Visualization and Statistical Analysis only. Or is the Statistical Analysis all that matters? Things like ANOVA, REGRESSION, CORELATION etc??
Please can someone tell me the real application of Data Analysis in real life. I'm very much confused. Thank you. 1 Like 
Re: Chronicle Of A Data Scientist/analyst by Marveaux(m): 8:49pm On Sep 28 
KlausMichaelson: Good day House.
I really need someone to enlighten me in the real application of Data Analysis in real life. I mean, I actually know how to provide a good Visualization for any Data, I can use the different functions in Excel to an extent. I can use the Statistical Analysis part of it especially when trying to analyze a typical student's project.
But my Question is, "is that all?" Come to think of it, if I actually get employed as a Data analyst, Do I just need to provide a Visualization for the company to see how far different factors(Demand, supply bla bla) are playing out at the end of the month?
I honestly don't get the whole thing. The only application I have come to Realize so far is based on Visualization and Statistical Analysis only. Or is the Statistical Analysis all that matters? Things like ANOVA, REGRESSION, CORELATION etc??
Please can someone tell me the real application of Data Analysis in real life. I'm very much confused. Thank you. I kinda feel like data is really just noise,data analysis involves extracting the hidden information within a set of related (or seemingly unrelated)data,in order to make guided decisions in a particular setting. 2 Likes 
Re: Chronicle Of A Data Scientist/analyst by Hardheolar: 10:52pm On Sep 28 
Magma012: Today I start my journey in becoming a data analyst... so help me God You have taken the first step, now do not relent as it will get tiring at some point. Good luck to you as you journey through 
Re: Chronicle Of A Data Scientist/analyst by Hardheolar: 10:54pm On Sep 28 
abdeiz: Really great stuff going on in this thread.
Lots of experienced advices, enquiries by newbies and so forth.. What a thread.
I started my data analyst journey for a few months now but juggling my job and learning has been tougher than I expected hence the slow pace.
I was lucky tho to get scholarships with dataquest and educative.io, currently enrolled with Hamoye virtual internship but there is a lot of question marks hanging on that.
Can't say I am yet competent as I have low confidence doing projects to showcase my skills.. Imposter syndrome or what is it called. I need to take care of that pretty soon because projects show what you can do and gets you the job.
Namaste �� Hi, which track are you on in Hamoye? 
Re: Chronicle Of A Data Scientist/analyst by ibromodzi: 10:54pm On Sep 28 
KlausMichaelson: Good day House.
I really need someone to enlighten me in the real application of Data Analysis in real life. I mean, I actually know how to provide a good Visualization for any Data, I can use the different functions in Excel to an extent. I can use the Statistical Analysis part of it especially when trying to analyze a typical student's project.
But my Question is, "is that all?" Come to think of it, if I actually get employed as a Data analyst, Do I just need to provide a Visualization for the company to see how far different factors(Demand, supply bla bla) are playing out at the end of the month?
I honestly don't get the whole thing. The only application I have come to Realize so far is based on Visualization and Statistical Analysis only. Or is the Statistical Analysis all that matters? Things like ANOVA, REGRESSION, CORELATION etc??
Please can someone tell me the real application of Data Analysis in real life. I'm very much confused. Thank you. From your write up, I can deduce that two things are missing in your data analytics armament. First is the theoretical knowledge of data analysis (most guys joke with this always) and the second one is the domain knowledge needed to navigate through any data you are working on. You can't make any meaningful conclusion about datasets without these two. Now, back to the question. Honestly, a simple google search would have given you the answers to your questions, but I'll still try to break it down according to my own little understanding of data science. Any data collected in its raw form has no meaning until it is processed according to what the company requires so as to utilize the data solely for decision making purpose (making conclusion), which in turn helps the company to grow. Now, there are two important things here, 1. The company's requirement (i.e why was the data collected in the first place?)  Let me give you a couple of real life examples here: a. The government wants to know whether a newly developed drug will be effective in the treatment of Covid19. So they do this by conducting a round of clinical trials on patients willing to participate. Remember why the clinical trials were conducted? To test the effectiveness of the drug. b. A financial institution (bank for instance) wants to develop an advanced system for detecting fraudulent transactions among their customers. They do this by getting a large record of transactions that have taken place in the past. Again, you remember why they needed the record of past transactions? I hope you are getting the gist by now? 2. Now, let's continue with the second important thing in data analysis process which is the goal/objective of obtaining the data in the first place  to make decisions or conclusion about the data. Going back to the examples I gave above, the government can conclude whether or not the drug is effective based on the outcome of the clinical trial, the financial institution can spot fraudulent transactions based on the model built with the data collected, a country can project their future population based on the available data and make appropriate policies to address any resulting economic/social challenges, Netflix can recommend movies for you based on your history, this makes you stay glued to them, which eventually generates more revenue for them, YouTube suggests videos for you based on your past activities, you are tempted to watch more which means more money for them. So what's data analysis in real life all about? Three basic things; 1. Why collecting the data? 2. What conclusion (actionable insights) are we making from the analysis? 3. How does the conclusion affect the growth or otherwise of the company? The takeaways... When carrying out any analysis, ask yourself or the client these questions; 1. What is the objective of the analysis? 2. Do we have research questions we are trying to answer? 3. Do we have any research hypothesis (which eventually leads to conclusion) ? The above are what your visualizations and statistical analysis tell you. The more experienced you are, the more accurate you are likely to get the steps. I hope you find this useful and grow with it. 21 Likes 4 Shares 
Re: Chronicle Of A Data Scientist/analyst by Hardheolar: 11:43pm On Sep 28 
ibromodzi:
From your write up, I can deduce that two things are missing in your data analytics armament. First is the theoretical knowledge of data analysis (most guys joke with this always) and the second one is the domain knowledge needed to navigate through any data you are working on. You can't make any meaningful conclusion about datasets without these two.
Now, back to the question. Honestly, a simple google search would have given you the answers to your questions, but I'll still try to break it down according to my own little understanding of data science.
Any data collected in its raw form has no meaning until it is processed according to what the company requires so as to utilize the data solely for decision making purpose (making conclusion), which in turn helps the company to grow. Now, there are two important things here, 1. The company's requirement (i.e why was the data collected in the first place?)  Let me give you a couple of real life examples here: a. The government wants to know whether a newly developed drug will be effective in the treatment of Covid19. So they do this by conducting a round of clinical trials on patients willing to participate. Remember why the clinical trials were conducted? To test the effectiveness of the drug. b. A financial institution (bank for instance) wants to develop an advanced system for detecting fraudulent transactions among their customers. They do this by getting a large record of transactions that have taken place in the past. Again, you remember why they needed the record of past transactions? I hope you are getting the gist by now?
2. Now, let's continue with the second important thing in data analysis process which is the goal/objective of obtaining the data in the first place  to make decisions or conclusion about the data.
Going back to the examples I gave above, the government can conclude whether or not the drug is effective based on the outcome of the clinical trial, the financial institution can spot fraudulent transactions based on the model built with the data collected, a country can project their future population based on the available data and make appropriate policies to address any resulting economic/social challenges, Netflix can recommend movies for you based on your history, this makes you stay glued to them, which eventually generates more revenue for them, YouTube suggests videos for you based on your past activities, you are tempted to watch more which means more money for them.
So what's data analysis in real life all about? Three basic things; 1. Why collecting the data? 2. What conclusion (actionable insights) are we making from the analysis? 3. How does the conclusion affect the growth or otherwise of the company?
The takeaways...
When carrying out any analysis, ask yourself or the client these questions; 1. What is the objective of the analysis? 2. Do we have research questions we are trying to answer? 3. Do we have any research hypothesis (which eventually leads to conclusion) ?
The above are what your visualizations and statistical analysis tell you. The more experienced you are, the more accurate you are likely to get the steps.
I hope you find this useful and grow with it.
Your takeaway sums everything up. Just like my recent boss always says, most people don't ask businesses what they want to derive from the model or analysis, they just do what they think they want, which is wrong. 1 Like 
Re: Chronicle Of A Data Scientist/analyst by KlausMichaelson: 5:13am On Sep 29 
ibromodzi:
From your write up, I can deduce that two things are missing in your data analytics armament. First is the theoretical knowledge of data analysis (most guys joke with this always) and the second one is the domain knowledge needed to navigate through any data you are working on. You can't make any meaningful conclusion about datasets without these two.
Now, back to the question. Honestly, a simple google search would have given you the answers to your questions, but I'll still try to break it down according to my own little understanding of data science.
Any data collected in its raw form has no meaning until it is processed according to what the company requires so as to utilize the data solely for decision making purpose (making conclusion), which in turn helps the company to grow. Now, there are two important things here, 1. The company's requirement (i.e why was the data collected in the first place?)  Let me give you a couple of real life examples here: a. The government wants to know whether a newly developed drug will be effective in the treatment of Covid19. So they do this by conducting a round of clinical trials on patients willing to participate. Remember why the clinical trials were conducted? To test the effectiveness of the drug. b. A financial institution (bank for instance) wants to develop an advanced system for detecting fraudulent transactions among their customers. They do this by getting a large record of transactions that have taken place in the past. Again, you remember why they needed the record of past transactions? I hope you are getting the gist by now?
2. Now, let's continue with the second important thing in data analysis process which is the goal/objective of obtaining the data in the first place  to make decisions or conclusion about the data.
Going back to the examples I gave above, the government can conclude whether or not the drug is effective based on the outcome of the clinical trial, the financial institution can spot fraudulent transactions based on the model built with the data collected, a country can project their future population based on the available data and make appropriate policies to address any resulting economic/social challenges, Netflix can recommend movies for you based on your history, this makes you stay glued to them, which eventually generates more revenue for them, YouTube suggests videos for you based on your past activities, you are tempted to watch more which means more money for them.
So what's data analysis in real life all about? Three basic things; 1. Why collecting the data? 2. What conclusion (actionable insights) are we making from the analysis? 3. How does the conclusion affect the growth or otherwise of the company?
The takeaways...
When carrying out any analysis, ask yourself or the client these questions; 1. What is the objective of the analysis? 2. Do we have research questions we are trying to answer? 3. Do we have any research hypothesis (which eventually leads to conclusion) ?
The above are what your visualizations and statistical analysis tell you. The more experienced you are, the more accurate you are likely to get the steps.
I hope you find this useful and grow with it.
Sir thank you for this. I really appreciate it. You Nailed it! 1 Like 
Re: Chronicle Of A Data Scientist/analyst by BelieverDE: 8:24am On Sep 29 
ibromodzi:
From your write up, I can deduce that two things are missing in your data analytics armament. First is the theoretical knowledge of data analysis (most guys joke with this always) and the second one is the domain knowledge needed to navigate through any data you are working on. You can't make any meaningful conclusion about datasets without these two.
Now, back to the question. Honestly, a simple google search would have given you the answers to your questions, but I'll still try to break it down according to my own little understanding of data science.
Any data collected in its raw form has no meaning until it is processed according to what the company requires so as to utilize the data solely for decision making purpose (making conclusion), which in turn helps the company to grow. Now, there are two important things here, 1. The company's requirement (i.e why was the data collected in the first place?)  Let me give you a couple of real life examples here:
a. The government wants to know whether a newly developed drug will be effective in the treatment of Covid19. So they do this by conducting a round of clinical trials on patients willing to participate. Remember why the clinical trials were conducted? To test the effectiveness of the drug. b. A financial institution (bank for instance) wants to develop an advanced system for detecting fraudulent transactions among their customers. They do this by getting a large record of transactions that have taken place in the past. Again, you remember why they needed the record of past transactions? I hope you are getting the gist by now?
2. Now, let's continue with the second important thing in data analysis process which is the goal/objective of obtaining the data in the first place  to make decisions or conclusion about the data.
Going back to the examples I gave above, the government can conclude whether or not the drug is effective based on the outcome of the clinical trial, the financial institution can spot fraudulent transactions based on the model built with the data collected, a country can project their future population based on the available data and make appropriate policies to address any resulting economic/social challenges, Netflix can recommend movies for you based on your history, this makes you stay glued to them, which eventually generates more revenue for them, YouTube suggests videos for you based on your past activities, you are tempted to watch more which means more money for them.
So what's data analysis in real life all about? Three basic things; 1. Why collecting the data? 2. What conclusion (actionable insights) are we making from the analysis? 3. How does the conclusion affect the growth or otherwise of the company?
The takeaways...
When carrying out any analysis, ask yourself or the client these questions; 1. What is the objective of the analysis? 2. Do we have research questions we are trying to answer? 3. Do we have any research hypothesis (which eventually leads to conclusion) ?
The above are what your visualizations and statistical analysis tell you. The more experienced you are, the more accurate you are likely to get the steps.
I hope you find this useful and grow with it.
You have spoken well. Thanks for your verbose explanation, I really picked some insights from it. I would like to ask; at the emboldened part: a. In the example, I suppose a two paired hypothesis test would give us the answer, right? b. I couldn't think of anyway of statically analyzing the data. I guess that's where Machine Learning come into play since we want to predict an outcome Are my answers correct? 
Re: Chronicle Of A Data Scientist/analyst by Najdorf: 9:09am On Sep 29 
This is a bit unrelated, but is there any way of getting a Microsoft 365 activation key free or is it something you just have to buy? 
Re: Chronicle Of A Data Scientist/analyst by abdeiz(m): 9:34am On Sep 29 
Hardheolar:
Hi, which track are you on in Hamoye? Data engineering 
Re: Chronicle Of A Data Scientist/analyst by sonofakin: 12:47pm On Sep 29 
Can you please share some of the platforms where you get freelance jobs? randomShek: Youtube and CourseDrive have everything you need.
I used 365 careers course for SQL, Statistics and Into to Python (available on coursedrive) while I used Excel is Fun channel on YouTube for MS Excel and Power BI.
For python libraries and machine learning, I just got started with Jose Portilla python bootcamp course.
About the job, I was given sample data to work on and submitted a documentation on how I did I worked on it, what I used and why I used them plus interpretation of the visualizations. 
Re: Chronicle Of A Data Scientist/analyst by ibromodzi: 7:01pm On Sep 29 
BelieverDE:
You have spoken well. Thanks for your verbose explanation, I really picked some insights from it.
I would like to ask; at the emboldened part:
a. In the example, I suppose a two paired hypothesis test would give us the answer, right?
b. I couldn't think of anyway of statically analyzing the data. I guess that's where Machine Learning come into play since we want to predict an outcome
Are my answers correct? I'm glad you were able to pick some things from it. Frankly speaking, I asked myself if I wasn't typing rubbish while trying to come up with a simple, yet wordy answer to the question. Pertaining to your question, yes your answers are correct to some extent. I like the way you think. In the first example, the clinical trial will most likely involve testing the new drug against a placebo on the same set of patients, in which case a paired T test (or alternative non parametric tests if assumptions of normality and homogeneity of variance are not satisfied). The second example is a predictive analytics and that's where ML comes in  Logistic Regression is a good algorithm to start with. 
Re: Chronicle Of A Data Scientist/analyst by BelieverDE: 8:16pm On Sep 29 
ibromodzi:
I'm glad you were able to pick some things from it. Frankly speaking, I asked myself if I wasn't typing rubbish while trying to come up with a simple, yet wordy answer to the question.
Pertaining to your question, yes your answers are correct to some extent. I like the way you think.
In the first example, the clinical trial will most likely involve testing the new drug against a placebo on the same set of patients, in which case a paired T test (or alternative non parametric tests if assumptions of normality and homogeneity of variance are not satisfied).
The second example is a predictive analytics and that's where ML comes in  Logistic Regression is a good algorithm to start with.
Thanks, I'm glad I am on the right track. Let's assume the population of a T test, after checking for kurtosis and skewness, is not normal. What alternative test do you think can be carried out for hypothesis testing? 
Re: Chronicle Of A Data Scientist/analyst by ibromodzi: 9:13pm On Sep 29 
BelieverDE:
Thanks, I'm glad I am on the right track.
Let's assume the population of a T test, after checking for kurtosis and skewness, is not normal. What alternative test do you think can be carried out for hypothesis testing? While skewness and kurtosis are supported for testing normality by some texts, the most acceptable standard is by using a formal test (i.e Shapiro Wilk's test) or visually by using qq/pp plots. That being said, an alternative, non parametric test for a paired T test will be Wilcoxon paired signed rank test . 
Re: Chronicle Of A Data Scientist/analyst by wisemania(m): 11:24pm On Sep 29 
ibromodzi:
While skewness and kurtosis are supported for testing normality by some texts, the most acceptable standard is by using a formal test (i.e Shapiro Wilk's test) or visually by using qq/pp plots. That being said, an alternative, non parametric test for a paired T test will be Wilcoxon paired signed rank test . Alright Share if I'm not the only one lost here. Let me know what you guys know I've been found wanting in statistics for a while, please how can you help me? Is there a practical udemy course that explains these concepts indepth? 
Re: Chronicle Of A Data Scientist/analyst by lovelybobo: 1:11am On Sep 30 
4 Likes 
Re: Chronicle Of A Data Scientist/analyst by ibromodzi: 8:10am On Sep 30 
wisemania:
Alright
Share if I'm not the only one lost here.
Let me know what you guys know
I've been found wanting in statistics for a while, please how can you help me?
Is there a practical udemy course that explains these concepts indepth? Lol.... You can try www.pythonfordatascience.org 
Re: Chronicle Of A Data Scientist/analyst by BelieverDE: 7:56pm On Sep 30 
ibromodzi:
While skewness and kurtosis are supported for testing normality by some texts, the most acceptable standard is by using a formal test (i.e Shapiro Wilk's test) or visually by using qq/pp plots. That being said, an alternative, non parametric test for a paired T test will be Wilcoxon paired signed rank test . My God!!!! When will I finish learning everything?! I read this post in the morning and decided to make more research before asking for more clarity, but I became confused the more. Different people on researchgate were proffering different answers on the best normality test. Some said when we have a sample size that is less than 50, we should use the Shapiro Wilk's test; and if the sample size is greater than 50 we should use the KolmogorovSmirnov test. Then different researchers began talking about sensitivity of the two tests to confuse me the more. To me, if the population is greater than 50, why perform another test when you know the population would be normal (thanks to the Central Limit Theorem) ? About two researchers went basic and talked about the use of histogram. I've got no time for that, and would rather use the test for skewness and kurtosis instead. You must come from a Statistics/Mathematics background, what course would you suggest for me if I want to perform a two paired T test? If the sample size is above 100, would you still use the Shapiro Wilk test? If no, why? 
Re: Chronicle Of A Data Scientist/analyst by QueTeddy: 8:42pm On Sep 30 
Have been on this thread since day one and I'll like to say thanks to Ejiod for creating a wonderful thread.
So far, I started this year and am used to: Python Pandas Matplotlib Excel A bit of Power BI.
I'll drop a few analysis.
But I want to ask, I believe matplotlib and plotly are a bit boring. I prefer Bokeh but I want to ask, is visualization just visualization or it requires a particular library?
Another thing, if I learn PowerBI is it still necessary I learn Tableau? 1 Like 
Re: Chronicle Of A Data Scientist/analyst by dauddy97(m): 9:19pm On Sep 30 
QueTeddy: Have been on this thread since day one and I'll like to say thanks to Ejiod for creating a wonderful thread.
So far, I started this year and am used to: Python Pandas Matplotlib Excel A bit of Power BI.
I'll drop a few analysis.
But I want to ask, I believe matplotlib and plotly are a bit boring. I prefer Bokeh but I want to ask, is visualization just visualization or it requires a particular library?
Another thing, if I learn PowerBI is it still necessary I learn Tableau?
Visualization with pandas or marplot lib or Bokeh are all the same. Some just give you a better view/features of it. Also, some libraries are capable of creating different charts while some can't. But, I believe since it's a programming language, you can always tweak it to build your own customize chart. I am open to be corrected. my 2 cent. 1 Like 
Re: Chronicle Of A Data Scientist/analyst by Ejiod(m): 9:25pm On Sep 30 
QueTeddy: Have been on this thread since day one and I'll like to say thanks to Ejiod for creating a wonderful thread.
So far, I started this year and am used to: Python Pandas Matplotlib Excel A bit of Power BI.
I'll drop a few analysis.
But I want to ask, I believe matplotlib and plotly are a bit boring. I prefer Bokeh but I want to ask, is visualization just visualization or it requires a particular library?
Another thing, if I learn PowerBI is it still necessary I learn Tableau?
Nice keeping up with the learning path. Great feat. If I’m to advise learn both. 1 Like 
Re: Chronicle Of A Data Scientist/analyst by ibromodzi: 9:36pm On Sep 30 
BelieverDE:
My God!!!! When will I finish learning everything?!
I read this post in the morning and decided to make more research before asking for more clarity, but I became confused the more.
Different people on researchgate were proffering different answers on the best normality test. Some said when we have a sample size that is less than 50, we should use the Shapiro Wilk's test; and if the sample size is greater than 50 we should use the KolmogorovSmirnov test. Then different researchers began talking about sensitivity of the two tests to confuse me the more.
To me, if the population is greater than 50, why perform another test when you know the population would be normal (thanks to the Central Limit Theorem) ?
About two researchers went basic and talked about the use of histogram. I've got no time for that, and would rather use the test for skewness and kurtosis instead.
You must come from a Statistics/Mathematics background, what course would you suggest for me if I want to perform a two paired T test?
If the sample size is above 100, would you still use the Shapiro Wilk test? If no, why?
That's the power of research. The more you read, the more information you have at your disposal. Visual inspection of data distribution (histogram, box plot, qq plot, pp plot) is informal, unreliable and does not guarantee that the distribution is normal. Nevertheless, presenting your data visually gives your audience the ability to judge the distribution themselves. There are many formal tests that can be used for normality aside Shapiro Wilk and KS tests. However, the former is recommended by many researchers regardless of the sample size, owing to the fact that KS test is sensitive to extreme values. You can read more about Normality test here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3693611/@ the emboldened. I'm a final year student of optometry and vision science. No statistics/math. Although, I took some basic math/stat/CS courses in 100/200L 1 Like 
Re: Chronicle Of A Data Scientist/analyst by BelieverDE: 10:32pm On Sep 30 
ibromodzi:
That's the power of research. The more you read, the more information you have at your disposal.
Visual inspection of data distribution (histogram, box plot, qq plot, pp plot) is informal, unreliable and does not guarantee that the distribution is normal. Nevertheless, presenting your data visually gives your audience the ability to judge the distribution themselves.
There are many formal tests that can be used for normality aside Shapiro Wilk and KS tests. However, the former is recommended by many researchers regardless of the sample size, owing to the fact that KS test is sensitive to extreme values.
You can read more about Normality test here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3693611/
@ the emboldened. I'm a final year student of optometry and vision science. No statistics/math. Although, I took some basic math/stat/CS courses in 100/200L I like you, brother! I suppose I'll have to focus more on the Shapiro Wilk test. There's no need exploring the KS test when it's sensitive to extreme values. I could many rows of data and I wouldn't want to be scared that extreme anomalous values can affect my result. Thanks a lot, brother. Cheers! 1 Like 
Re: Chronicle Of A Data Scientist/analyst by Hardheolar: 10:53pm On Sep 30 
abdeiz:
Data engineering That's nice, data storytelling here. Sophia has been bagging all the awards in that track 1 Like 1 Share 
Re: Chronicle Of A Data Scientist/analyst by ibromodzi: 3:04am On Oct 01 
BelieverDE:
I like you, brother!
I suppose I'll have to focus more on the Shapiro Wilk test. There's no need exploring the KS test when it's sensitive to extreme values. I could many rows of data and I wouldn't want to be scared that extreme anomalous values can affect my result. Thanks a lot, brother.
Cheers! You welcome bro. 
Re: Chronicle Of A Data Scientist/analyst by Amoto94(m): 9:29am On Oct 02 
We need 5 Big Data Engineers with strong hands on Hadoop V2, MapReduce, HDFS, Kafka, Spark and Cassandra or MongoDB.
Send CV to vacancies@tenece.com and CC dare.sunday@tenece.com Remote Job 
Re: Chronicle Of A Data Scientist/analyst by SDJosh: 2:41am On Oct 04 
cochtrane:
The sections column is the section to which the post belong. Could be politics, romance, etc.? The frontpagedate is the date and time the post made front page, while the posteddate is the date and time the thread was actually created. When you parse each title on the frontpage, you will see that it has a date and time associated with it. For the section and posteddate, you would have to go into the links and extract that. So, for each link you obtain, you go inside the link and you can obtain as much information as you want about that link. You could even obtain all the users who posted there and the time they posted. Hi. Can you kindly explain the process of getting the frontpagedate and posteddate. I've been trying to replicate your work but I'm having issues with the dates(the format you used though). 
Re: Chronicle Of A Data Scientist/analyst by ashok1525: 7:27am On Oct 04 
