Q&A about Big Data

Robin. J - Aug 20 - - Dev Community

Why does every website ask you to accept "cookies"? In short, this is because your browsing data is more valuable than you think. Today is the IT era, and big data gathering has become a controversial topic. The users are unaware that their information are collected, because the profitable companies want to penetrate their lifestyles with tailored recommendations, which they call "personalization".

1. What is big data?

Big data is highlighted by 3 V's: volume, variety, and velocity. As its name suggests, it refers to massive amounts of data that need to be analysed by computational methods. Different types of data are produced fast; therefore, they are stored and updated in real time.

It has 3 categories: structured, semi-structured, and unstructured.

  • In a table, structured data is discrete and can be organized by columns like "name", "address", "age", etc. Examples are employee information forms and accounting tables from any company's database.

  • Semi-structured data is a combination of structured and unstructured data. For example, web pages, emails, CSV, XML and JSON documents, and zipped files.
    Image description

  • Unstructured data includes .pdf, .doc, .txt files and videos, they cannot fit into any data table.

2. Where does big data come from and how is it organized?

  • Architecture: The management of big data adopts distributed architecture. This means there is a central repository of data that collaborates with other systems like warehouse platforms and NoSQL database.

  • Source: Big data comes from everywhere.
    Examples of internal systems: banking transactions, social media activities and healthcare records, and sensor data generated by machines
    Examples from external data: geographical conditions, markets, and scientific researches.

  • Methodology: comparative, marketing, and sentiment analyses.
    Comparative analysis enables a company to compare the quality of their product and quality with its competitors.
    Marketing analysis extracts information from hashtags on social media and consumer data, in order to advance their marketing campaigns and hold the initiative.
    Sentiment analysis focuses on all feedbacks about customer experience and is dedicated to shoot potential problems of the service.

3. What are some applications of big data?

  • Website Cookies:
    Jump to the question in the very beginning, your browsing behaviors are part of the big data pool. If preference is not managed, the cookies are used to track not just the keywords you type on their website, but the whole history of your browser. For example, if your Amazon recommends stationery discounts, and you did not search any related keyword on it, the fact might be that the cookies have recorded how often you logged in to your student account.

  • Social media
    In 2006, there was an online protest against Facebook's "personalized" news feed. The news feed showed every profile update and moment to one's followers. The users complained it as "spooky" and "intrusive" because their personal data was being stalked and exposed to people they do not trust.

4. Why should users beware of big data?

  • Big data puts users into filter bubbles. When you search the same keyword "Canada" on your and your friend's computers, the top search result will vary if you usually click on political/immigration news yet your friend tends to tap on travelling guides. Also, when you are on social media, you could have noticed that most people are interested in the things you like, and their opinions agree with yours. However, you are fooled by big data's algorithm. Your browsing behavior is being tracked by the Apps and cookies all the time, they know what you value and cater to your pleasure, in order to get good comments or make money from personalized ads.

  • The confirmation bias appears when having only access to opinions that resemble yours and taking it for granted. In this way, you can solely find evidence to support your hypothesis while neglecting counter-evidence. As a result, when you think you are learning, you are actually enhancing your current bias with another bias.

  • Big data invades users' privacy. When your transaction and medical record is monitored by the end user, you face the risk of them selling it to your network and company. For example, if you have depression and go to the psychiatrist weekly, your employer knows it from the big data they secretly bought and might stigmatize you. Although data management companies are also advancing data encryption to earn trust, it is always good to be a skeptic.

5. What is the prospect of big data?

  • Data scientists forecast future trends from the patterns in big data. For instance, they can visualize consumer data using line charts to determine which or which part of an existing item will remain popular in the market.

  • Machine Learning: AI can be trained to automate decision-making using big data. For example, before you receive the low-interest loan offer in the mailbox, the automatic email system of your bank derives the potential loaners by analyzing all of their conditions from the data pool. Furthermore, the development of natural language processing system will allow AI to understand unstructured data, like long texts.

References:

https://www.naukri.com/code360/library/difference-between-structured-semi-structured-and-unstructured-data
https://www.avenga.com/magazine/trends-and-future-forecasts-in-big-data/
https://www.techtarget.com/searchdatamanagement/definition/big-data

. .
Terabox Video Player