Understanding Big Data
The Berkeley researchers estimated that the world had produced about 1.5 billion gigabytes of information in 1999 and in a 2003 replication of the study found out that amount to have doubled in 3 years. Data was already getting bigger and bigger and around that time.
According to World Economic Forum, In 2020, we created 44 zettabytes. In 2021, 74 zettabytes. By 2025, it’s estimated that 463 exabytes of data will be created each day globally. That’s the equivalent of 212.765.957 DVDs per day!
But how can we generate so much data? You check your email, send replies, maybe browse websites and even click on things. Every move you make online equates to data creation.
In 2021, 3.026.626 emails are sent every second, users send 31 million messages every minute each day on Facebook, more than 2.5 billion blog posts go up, 2 trillion searches every minute each day on Google, 41.666.667 messages shared by Whatsapp users every minute.
As the number of devices connected to the network increases, the amount of data produced will increase even more.
How can we define big data while the amount of data produced every year increases so much?
What is Big Data?
The first documented use of the term “big data” appeared in a 1997 paper by scientists at NASA. They had problem wtih visualization. Data sets are generally quite large, taxing the capacities of main memory, local disk, and even remote disk. They call this the problem of big data.
Here’s how the OED (Oxford English Dictionary) defines big data: “data of a very large size, typically to the extent that its manipulation and management present significant logistical challenges.”
According to Wikipedia, Big data refers to data sets that are too large or complex to be dealt with by traditional data-processing application software.
Berkeley researchers says: Big data isn’t a concept. It’s a problem to solve.
Big Data Sources
Social data comes from the Likes, Tweets & Retweets, Comments, Video Uploads, and general media that are uploaded and shared via the world’s favorite social media platforms.
Machine data is defined as information which is generated by industrial equipment, sensors that are installed in machinery, and even web logs which track user behavior. Sensors such as medical devices, smart meters, road cameras, satellites, games and the rapidly growing IOT devices.
Transactional data is generated from all the daily transactions that take place both online and offline. Invoices, payment orders, storage records, delivery receipts
Characteristics of Big Data
According to Gartner, Big Data comprises high volume, velocity, and variety of information assets that demand cost-effective, innovative forms of information processing for enhanced insights and decision-making.
However, recent studies have added two more components which describe Big Data: Veracity and value.
Volume: the size and amounts of big data that companies manage and analyze.
Velocity: the speed at which companies receive, store and manage data.
Variety: the diversity and range of different data types.
Veracity: the “truth” or accuracy of data and information assets, which often determines executive-level confidence.
Value: the value of big data usually comes from insight discovery and pattern recognition that lead to more effective operations, stronger customer relationships and other clear and quantifiable business benefits.
Types of Big Data
We can distinguish 3 main types based on the level of processing they went through.
Structured data has certain predefined organizational properties and is present in structured or tabular schema, making it easier to analyze and sort.
Unstructured data entails information with no predefined conceptual definitions and is not easily interpreted or analyzed by standard databases or data models.
Semi-structured data is a hybrid of structured and unstructured data. This means that it inherits a few characteristics of structured data but nonetheless contains information that fails to have a definite structure and does not conform with relational databases or formal structures of data models.
Application Areas of Big Data
The finance and insurance industries utilize big data and predictive analytics for fraud detection, risk assessments, credit rankings, brokerage services and blockchain technology, among other uses.
Hospitals, researchers and pharmaceutical companies are adopting big data solutions to improve and advance healthcare.
Media companies analyze our reading, viewing and listening habits to build individualized experiences.
From engineering seeds to predicting crop yields with amazing accuracy, big data and automation is rapidly enhancing the farming industry.
Data constantly informs marketing teams of customer behaviors and industry trends, and is used to optimize future efforts, create innovative campaigns and build lasting relationships with customers.
Big data is helping retailers and e-commerce stores to better understand consumer habits and optimize inventory, and it even gives customers a better shopping journey from try-on to checkout.
Educators are using big data to craft personalized lesson plans, predict learning outcomes and even help students find colleges and majors that fit their interests and skills.
Coaches rely on analytics to scout opponents and optimize play calls in game, while front offices use it to prioritize player development. Analytics also play a major role off the field, providing fans with both sports betting and fantasy sports insights.