Picture all the data flowing over the web - structured data generated by business transactions, varied content on the millions of websites, sense and nonsense on the social media, log files recording site visits and visitor actions and data detected by all the sensors attached to machines. Big data analysis seeks to capture, cleanse, mine and analyze all these data, and generate meaningful insights.
Business Benefits from Big Data Analysis
Businesses could make a killing if they can identify consumer preferences as revealed by all the things we do online. We are already seeing ads that seem to be uncannily targeted at us. If we click an ad for a particular product, we see ads for that product popping up again and again while we browse the web. What happened was that the browser remembered our click and this click data was used by ad servers to serve ads based on our past actions.
Big data can help businesses improve their marketing, customer service, operational efficiencies and competitive advantage.
As to ourselves, the consumers, well, we stand to lose much of our privacy. What we do with each click, post, comment, purchase, etcetera will be "common knowledge" for the machines that do the things like showing ads to us!
Big Data Analysis:Processes and Tools
Advanced analytic tools capture, clean, analyze and interpret, and deliver useful information or predictions to decision makers. A big data project will broadly involve:
- Clarifying business goals, what business outcome do you want?
- Identifying the concrete actions that the business can take to achieve the outcome.
- Determining the data sources, and how to collect data from these.
- Capturing the data and storing these in databases that can accommodate all kinds of raw data.
- Cleaning the data to eliminate useless junk.
- Extracting the raw data, transforming them and loading these into formal databases.
- Creating models that can make the data tell something about obtaining desired outcomes.
- Presenting insights, information and predictions in a visual or other easily understood format.
- Checking whether the model is working as intended in predicting and enabling desired outcomes.
- Fine-tuning the model.
What are the tools that help in the analysis?
- Hadoop: A software platform that can store and process massive amounts of data in a distributed fashion over a computer cluster. Data from sensors, website clicks and all kinds of sources can be imported into Hadoop in a raw format. This data can then be extracted, transformed and loaded into say, a relational database for easier processing.
- Data Cleaners: Tools like NIFI can clean data before ingestion into the Hadoop files system There are also tools like OpenRefine that can cleanse huge volumes of messy data
- Data Mining: Data mining can reveal patterns not known before and is critical to gain valuable insights from the mass of data. Tools like Mahout help mine the data in the Hadoop system
- Data Analysis: Instead of revealing unknown patterns, data analysis involves asking specific questions and analyzing the data to find answers to these questions
- Visualization: Information presented in the format of a mass of numbers takes time to understand. On the other hand, the same information presented in the forms of maps, charts and other visual forms can help quick understanding. Hadoop can work with common tools like MS Excel, or you can use specialized visualization tools like Tableau
As Ginni Rometty, CEO of IBM, says: “Many more decisions will be based on predictive elements versus gut instincts.” Use that as your selling point and sell data services to smaller businesses who cannot afford to hire high paid data scientists and analysts.