What you must consider before becoming a Data Scientist - Programming

Welcome, Guest: Register On Nairaland / LOGIN! / Trending / Recent / New
Stats: 3,152,459 members, 7,816,079 topics. Date: Friday, 03 May 2024 at 02:53 AM

What you must consider before becoming a Data Scientist - Programming - Nairaland

Nairaland Forum / Science/Technology / Programming / What you must consider before becoming a Data Scientist (451 Views)

Net Salary For A Data Analyst Or Scientist Or Web Dev / Chronicle Of A Data Scientist/analyst / Aspiring Data Scientist. (2) (3) (4)

(1) (Reply) (Go Down)

What you must consider before becoming a Data Scientist by Professor2196(m): 2:30am On May 14, 2022

I am a computer engineering undergrad and I have been studying machine learning for a cumulative period of one year and I still consider myself a newbie, because anytime I think I have reached the end of the road with a concept, I discovered that there are 10 more roads waiting behind it and so on. Through my relatively short journey, I have gone through quite a few rough patches and have gotten stuck in a few quagmire, I have gotten through them with the help of authors from their books I read, helpful answers from stack overflow and through pure grits. I have gained insights that I think would be helpful to my fellow newbies who are not far behind me.

Passion isn’t everything, just as in the real world, so it is in the programming world. There are challenges that can and will try their possible best to crush the life out of that passion. Most people who go into data science are mesmerized by what can be achieved by it, what has been achieved by it, which leads them to drool at the prospects of what they can achieve by it when they use it.

And as the bible passage says “Faith cometh by hearing, and hearing by the word of God”, their passion grows stronger the more they read articles of amazing projects and developments going on in data science. Yes, they are prepared to study harder, sleep late night and all other things that people who are passionate about something do to reach a competency level, they have never been more passionate about anything else in their life before.

You may have gotten to the stage where you are now purely based on the strength of that passion, if you don’t find something else to support it, it will literally be impossible to go far.

Most newbie never for once consider that their own computer could be their greatest challenge. Data science is the only field of programming where the most emphasis is placed your computer processing capabilities. Other programming field which are near-like instantaneous like Web, Desktop (GUI) and Database development are not overly dependent on your RAM and processor speed, they can easily be developed on a low-end computer just as like on a high-end computer.

When I say low-end computers, those whose computers' have over 16gb RAM and additional GPU card are not included, I only refer to those who use computers with no more than 8GB of RAM and 1GB or less of GPU. Over 80% of developers in Nigeria uses computers with these spec, and with computer of 4GB of RAM and no GPU been the overwhelming majority.

Through my journey in studying Machine learning, my faithful companion is a Dell Windows 10 4GB RAM 2th Generation Corei5 Intel processors with no GPU. The beginning journey was extremely smooth, training on the iris, Boston housing datasets was with near-like instantaneous speed (or close enough to it). It was amazing, all my datasets could fit into memory, any error occurred could be easily be debugged, and most often than not, it could be a syntax error or a dataset not properly formatted, containing values that trips up the machine learning model (as I said easily overcome). But as my expertise increased one particular error became increasingly frequent, the Numpy Out of Memory error. I could no longer run codes written by authors without generating that error.

Obviously, The authors uses computer with a much higher RAM, and I bet, additional GPUs to execute their codes.

The out of memory error became so ubiquitous that we became old friends, it got to the point that whenever I run a code from a textbook and it didn't raise the error, I become suspicious and wonder if I may have made a mistake in implementing the author's code.

This was an almost unsurmountable obstacle for me, to the extent it forces me to give up on projects when their datasets couldn't all be loaded into memory because some machine learning model needed them all at once (e.g non-linear models) or that the dataset could be loaded into memory but there is no space left for the model to train it parameters (which is even worse).

The spirit (your passion) indeed is willing, but the flesh (your computer) is weak. - Matt 26: 41

This error forced me to gain a deeper understanding of memory management, I no longer see data types as oh those, I now say what data type with the minimum memory footprint will best represent my dataset with minimal loss of information. I adopted functional programming paradigm when executing tasks (i.e. I execute instructions in functions and return the needed value(s), leaving python to delete variables arbitrary created during execution when it has run out of scope ) to prevent leaving around unused variables that is hogging up memory.

Thank GOD for Deep learning algorithm who with all their power are not greedy enough to want all the data at once but are always hungry for chunks of them [no matter the size], and for Tensorflow data API and Numpy memmap function who provides the means to feed such chunks.

Believe me, if you persevere, you will become a more tenacious machine learning engineers than your peers who cruise with their huge RAM computer, creating a one-hot encoded arrays with a np.int64 data type, or worse not knowing what data type was used to build that array.

Should you eventually acquire a huge RAM computer, you would still maintain that frugality in assigning RAM resources.

I urge all machine learning beginners to look within themselves carefully and consider if they have the grit needed to forge on when their passion no longer pull them forward, not wasting months instead going on ahead only to then back down when faced with overwhelming obstacle, months that would have been used to learn programming in other fields (like web dev) because they think they can fake it. If you read this and still think you have what it takes, then forge on, see every obstacle as an opportunity to learn some new technical skills. Use tools in ways it wasn't designed to be used, if that still doesn't work, create your own toolset.

God loves you!

3 Likes

Re: What you must consider before becoming a Data Scientist by willian10: 3:56am On May 14, 2022

Professor2196:
I am a computer engineering undergrad and I have been studying machine learning for a cumulative period of one year and I still consider myself a newbie, because anytime I think I have reached the end of the road with a concept, I discovered that there are 10 more roads waiting behind it and so on. Through my relatively short journey, I have gone through quite a few rough patches and have gotten stuck in a few quagmire, I have gotten through them with the help of authors from their books I read, helpful answers from stack overflow and through pure grits. I have gained insights that I think would be helpful to my fellow newbies who are not far behind me.

Passion isn’t everything, just as in the real world, so it is in the programming world. There are challenges that can and will try their possible best to crush the life out of that passion. Most people who go into data science are mesmerized by what can be achieved by it, what has been achieved by it, which leads them to drool at the prospects of what they can achieve by it when they use it.

And as the bible passage says “Faith cometh by hearing, and hearing by the word of God”, their passion grows stronger the more they read articles of amazing projects and developments going on in data science. Yes, they are prepared to study harder, sleep late night and all other things that people who are passionate about something do to reach a competency level, they have never been more passionate about anything else in their life before.

You may have gotten to the stage where you are now purely based on the strength of that passion, if you don’t find something else to support it, it will literally be impossible to go far.

Most newbie never for once consider that their own computer could be their greatest challenge. Data science is the only field of programming where the most emphasis is placed your computer processing capabilities. Other programming field which are near-like instantaneous like Web, Desktop (GUI) and Database development are not overly dependent on your RAM and processor speed, they can easily be developed on a low-end computer just as like on a high-end computer.

When I say low-end computers, those whose computers' have over 16gb RAM and additional GPU card are not included, I only refer to those who use computers with no more than 8GB of RAM and 1GB or less of GPU. Over 80% of developers in Nigeria uses computers with these spec, and with computer of 4GB of RAM and no GPU been the overwhelming majority.

Through my journey in studying Machine learning, my faithful companion is a Dell Windows 10 4GB RAM 2th Generation Corei5 Intel processors with no GPU. The beginning journey was extremely smooth, training on the iris, Boston housing datasets was with near-like instantaneous speed (or close enough to it). It was amazing, all my datasets could fit into memory, any error occurred could be easily be debugged, and most often than not, it could be a syntax error or a dataset not properly formatted, containing values that trips up the machine learning model (as I said easily overcome). But as my expertise increased one particular error became increasingly frequent, the Numpy Out of Memory error. I could no longer run codes written by authors without generating that error.

Obviously, The authors uses computer with a much higher RAM, and I bet, additional GPUs to execute their codes.

The out of memory error became so ubiquitous that we became old friends, it got to the point that whenever I run a code from a textbook and it didn't raise the error, I become suspicious and wonder if I may have made a mistake in implementing the author's code.

This was an almost unsurmountable obstacle for me, to the extent it forces me to give up on projects when their datasets couldn't all be loaded into memory because some machine learning model needed them all at once (e.g non-linear models) or that the dataset could be loaded into memory but there is no space left for the model to train it parameters (which is even worse).

The spirit (your passion) indeed is willing, but the flesh (your computer) is weak. - Matt 26: 41

This error forced me to gain a deeper understanding of memory management, I no longer see data types as oh those, I now say what data type with the minimum memory footprint will best represent my dataset with minimal loss of information. I adopted functional programming paradigm when executing tasks (i.e. I execute instructions in functions and return the needed value(s), leaving python to delete variables arbitrary created during execution when it has run out of scope ) to prevent leaving around unused variables that is hogging up memory.

Thank GOD for Deep learning algorithm who with all their power are not greedy enough to want all the data at once but are always hungry for chunks of them [no matter the size], and for Tensorflow data API and Numpy memmap function who provides the means to feed such chunks.

Believe me, if you persevere, you will become a more tenacious machine learning engineers than your peers who cruise with their huge RAM computer, creating a one-hot encoded arrays with a np.int64 data type, or worse not knowing what data type was used to build that array.

Should you eventually acquire a huge RAM computer, you would still maintain that frugality in assigning RAM resources.

I urge all machine learning beginners to look within themselves carefully and consider if they have the grit needed to forge on when their passion no longer pull them forward, not wasting months instead going on ahead only to then back down when faced with overwhelming obstacle, months that would have been used to learn programming in other fields (like web dev) because they think they can fake it. If you read this and still think you have what it takes, then forge on, see every obstacle as an opportunity to learn some new technical skills. Use tools in ways it wasn't designed to be used, if that still doesn't work, create your own toolset.

God loves you!

Is it more difficult than web devepment?cos I just started learning python few weeks ago and I'm enjoying the lectures, atleast the few things I have learnt so far

Re: What you must consider before becoming a Data Scientist by sheddysk: 12:18pm On Apr 03

Diving into data science in Nigeria, or anywhere, can feel like trying to learn a new language overnight. It’s thrilling, sure, but packed with its fair share of “Oh no, what did I get myself into?” moments. Let’s unravel some common hurdles beginners face and, more importantly, how to overcome them. Click here to read the full article: https://datasetnexustech.com/data-science-in-nigeria-common-hurdles-for-beginners/

Re: What you must consider before becoming a Data Scientist by shreygautam: 7:14am On Apr 05

Building a machine learning model involves several key steps. First, you need to define your problem and gather relevant data. Next, you must preprocess the data, which includes handling missing values, encoding categorical variables, and scaling numerical features. After preprocessing, you split the data into training and testing sets.

The next step is to choose a suitable model for your problem and dataset. This involves selecting the algorithm and its hyperparameters. Once the model is chosen, you train it using the training data. Training involves feeding the model the input data and the correct output labels, allowing it to learn the underlying patterns in the data.

After training, you evaluate the model using the testing data to assess its performance. This step helps you understand how well the model generalizes to new, unseen data. Finally, you can fine-tune the model by adjusting hyperparameters or trying different algorithms to improve its performance.

In conclusion, building a machine learning model involves defining the problem, gathering and preprocessing data, choosing and training a model, evaluating its performance, and fine-tuning it for better results. This process is foundational in the field of data science and machine learning certification.

(1) (Reply)

There's No Way To Measure Learning Progress With C++ Or Java / Margin Vs Padding In Bootstrap / Username Change

(Go Up)

Sections: politics (1) business autos (1) jobs (1) career education (1) romance computers phones travel sports fashion health
religion celebs tv-movies music-radio literature webmasters programming techmarket

Links: (1) (2) (3) (4) (5) (6) (7) (8) (9) (10)

Nairaland - Copyright © 2005 - 2024 Oluwaseun Osewa. All rights reserved. See How To Advertise. 38
Disclaimer: Every Nairaland member is solely responsible for anything that he/she posts or uploads on Nairaland.