What is Big Data?
What are the skills required for a Data Analyst or a Data Scientist?
The present era is known as digital, information explosion, & ICT era. Well, for those who don’t know what’s ICT, it’s Information and communications technology. Whatever it may be, but it is sure it is the era of Big Data. The people working in Big Data industries agree that “Today’s Big Data is Not Tomorrow’s Big Data”. So it is always a challenge how to explain the concept of Big Data on the first day to the beginners?.
As I am teaching the course on Data Mining to M.Sc. Statistics students for the last ten years, I demonstrate the concept in very simple words. I hereby take an opportunity to share my way demonstration by the means of this blog.
How big is the big data?
- In the present day, almost every science student with a minimum 12th standard qualification is much aware of Microsoft Excel. As it is known that the latest MS-Excel version has 10,00,000+ rows, 16400+ columns and 5000+ pages in an Excel book, according to me the data which needs more space than this may be treated as Big Data. I think to get an idea it is sufficient.
- In the present era due to the advent of ICT, networking, expanding hardware capabilities, low-cost data storage peoples from almost all sectors give great importance to store, manage and update it. In fact, peoples treat data as the most valuable asset more than real estate, gold, silver, cash, shares.
- During the old days, before 20 years back the scope of data was restricted to the only integer, floating, character type, date type, and string type. But nowadays the scope of data is greatly wider. Apart from these traditional data types audio, video, images, animated images, sound, voice is also treated as data. Moreover, the language translating software dissolve the language barriers. Further, the Excel sheet is not suitable for these new data types, hence traditional data analysis concepts are insufficient to address the problems of concern industry.
- The constant innovations in the Internet of Things (IoT) field emerged out many handy, simple, low cost tools. This class of tools includes mobile, camera, sensors, weather mapping stations, etc. These tools are quite efficient to capture real-time data.
- The data is updated, integrated and data analysts search every time new from the same data. If problems of the industry have taken new shape then there is scope to data scientists to build new algorithms. This cycle continues and new concepts like image processing, neural networking, deep learning, AI, ML, classification and so on have emerged. Thus the data acquisition, industry problems, data management issues, data analytics limitations, data science challenges, and hardware issues are rising exponentially and hence the skilled manpower requirement in this field is unending. According to me, in the next 50 years, there will be more sexier jobs in this field than any other field and if someone asks me about 50 years beyond that, my reply will be the same.
Getting into big data and analytics
The skill sets required for budding data analysts or data scientists is one of the most important issues for discussion. In fact, anyone with the following abilities added with practice and hard work can build his/her identity as Data Scientist.
- Mathematical ability: This is one of the basic requirements. As there is always a need to convert the industry problem in mathematical setup. Once the problem is mathematically expressed then the next step is to solve it and come up with an acceptable solution. For both these steps, keen mathematical thought is essential. Just to quote an example, Linear Algebra is one of the most incredible courses required in AI and ML. It is because Linear Algebra will give a beginner the vision to look into higher dimensional problems.
- Statistical Thinking: The solution to all industry problems is based on the principle of uncertainty. Either probabilistic or Bayesian approach is required to solve some of the problems involving uncertainty. Similarly, predictive modeling requires knowledge of regression analysis. In the same way, every problem needs a sound statistical background.
- Programming Skill: As data is big, it is quite impossible to carry out manual analysis, therefore software operational skill with programming knowledge is the coherent part of this system. Every present algorithm has a life cycle and so it needs to modify, upgrade or scrape and write a new one. The expansion in data types expects that the programmer should have sufficient knowledge of graphics and programming with multi-platform integration.
Communication: Usually it is also referred to as ‘Story Telling’. The whole process, findings, and prospective business strategy that emerged out of the data science process need to be summarized in a language that feels like a story. Storytelling and listening is an interesting process to both and brings a better understanding to the Lehman as such.
In conclusion, anyone with a deep interest in Data Science can try with great zeal. With all my best wishes to budding Data Scientists, I conclude my thought here.
We at beingdatum are putting out content that is with relevance to technical on hands problems too, one such great read is working with json schema in big data using Spark API, you can read it here.
Keep following our posts at beingdatum.com, cheers!!
Follow me on LinkedIn: Click Here