What is the start line for Big data? From where can I start? How do I start?
These are the questions which often asked by many before starting to work on Big data. Before starting with Big Data everyone should have the answers of these questions.
Big Data is all about analysing the patterns in variable size of data sets. Variable size refers to the growing size of data sets i.e. to add more data from more sources as the needs grow. We used advanced analytics to find the patterns in any data sets.
Big data analysis is used by large enterprises for their benefits, to know about patterns & behaviour of data coming from different sources. For a beginner or starter, they should start small.
Big data is formed of three “Vs” – volume, velocity and variety with a “C”- complexity. Let’s discuss the points regarding the business opportunities for a big data initiative.
Aim – Before jumping right into solving any big data problem, we should step back and invest time and effort to improve our understanding of the problem i.e. what are we trying to solve. Then move step by step towards the solution.
Step by step approach- The thumb rule to analyse big data problems is approaching it step by step. First split the whole problem in different number of smaller problem and then approach to each and every part.
Collecting the data- The first step towards starting big data is collecting the data that is being produced. Collect more data than necessary. We don’t need to keep these data for the lifetime but we won’t get any idea about the data until we start to collect.
3Vs- Three 3Vs in Big data are Volume, velocity and variety. Volume refers to the size of generated data amount, Velocity refers how fast the data is generated and processed to meet the demands and the last ‘V’ Variety refers to the range of data type and sources. It is a fact that a data analyst must know and aware about the types of data.
Complexity. The management of data can be very complex when different types of data came from different sources in large amount. While analysing the data, data must be linked and correlated so the analyst can find the useful information.
Grouping of data- We need to categorize the data according to their logical information. Ex- Data useful for business purpose should be in one cluster, data useful for improvement of quality should be in another cluster. The data should be well categorized and prioritized.
Volumes- ‘Big Data’, As the name shows that the large amount of data but in starting we shouldn’t assume that data is going to be in petabyte or exabytes. If we leave some fortune 100 customers then others don’t have such a large amount of information. So initially we would have to work on some gigabyte of data.
Future Perspective- Let’s assume that your company doesn’t need big data solution now because your database is able to operate the current amount of data but when you try to find the useful information from these data setup you may not get the same results that you are expecting. So it is useful to use big data initiative.
Impact- Impact of solution on the business and organisation! Try to find out the answer of question “ How these analysis of data impact the business and organisation?”.
Selection of right Technologies- Choose the right technology according to your needs. The most famous technology used in big data is Hadoop, but there are several different tools and technologies for big data problems.