6 Illusions Execs Have About Big Data

A decade ago, I thought I understood big data. I had worked in information technology for more than a decade and had run a department that handled docs for some of Boston’s more infamous litigation. I remember having to order new drives and storage appliances to handle the gigabytes and gigabytes of documents and emails that our hapless associates had to search and read through. That was a lot of data… or so I thought.

Fast-forwarding seven years and a career change, I found myself at Amazon running SQL queries against their data warehouse. The scope of that database honestly blew my mind; I had to figure out tricks to even pull down a week of summary data without having it choke or overflow Excel. I thought I’d understood what big data was, but it turns out that I had no clue.

Big data has become a buzzword so prevalent that it’s practically meaningless. At a party last week, I heard someone say, “Every company is a big data company now.” When I asked him to clarify, he said that every company buys and sells big data. While I certainly agree that all companies can use big data or applications based on big data, not all companies base their business models on it. I’ve tripped over this kind of misconception a lot over my career and have even shared some of the misconceptions myself. Now that I work at a big data company, I know better.

Here are six of the biggest mistakes I see execs make when they talk about big data:

1. All data is big data.

According to Gartner, big data must be high-volume, high-velocity and/or high-variety data. This means that if your data can fit in an Excel file, you’re not dealing with big data. If you’re only handling a dataset that measures in the gigabytes and your PC can handle it, you’re not dealing with big data. Maybe you’re dealing with many gigabytes of emails and you can’t figure out how to deal with it, but that doesn’t mean that it’s big data.

2. Big data solves every problem.

I’ve run into a few execs who believe that big data fixes everything. Many of them grasp at big data analysis to solve problems rather than using common sense. I once sat in a room of executives who were trying to figure out why our week-over-week website visit numbers and sales had dipped precipitously during a week in April, but that same week the year before hadn’t experienced the same decrease. They asked for analysis after analysis until someone said, “Well we see a decrease at Easter every year, and Easter was in March last year.” Big data and analysis didn’t help us figure that out, but common sense and a calendar did.

3. Big data is meaningless.

The flip side of the “everything” misconceptions about big data is this one: that big data doesn’t matter. I find this opinion to be more understandable, because the definition of big data indicates that it’s hard to process and understand. If you can’t pull insights out of big data or use it to power your systems, it is, indeed, meaningless. I suspect execs in this camp have learned about big data but have never learned anything from it.

To make big data less meaningless, you need to be able to process and use it, which big data companies make easier. They do this by gathering the data, cleaning it up, organizing it, and outputting it in a way that data scientists or other systems can process. Once a data scientist pulls stories out of the data or your systems use data to execute business operations like supply chains, execs will start seeing value in big data.

4. Big data is easy.

Many things about big data sound easy, like thinking about getting the information and pricing for every single product in the world or tracking every single visitor to every single website. Because it’s easy to conceptualize a large dataset, many executives believe that gathering and manipulating that data set should be just as easy.

Unfortunately, this is a common misconception. Let’s look at getting the information and pricing for every single product in the world (disclaimer: that’s what my company does), for example. For a single product, like one pair of my shoes, we’d need to gather the following data:

Brand
Category
Style
Color
Heel height
Materials
Size
Width
Stores that sell it
Prices at each of those stores
Prices at each of those stores over time
Whether it’s in stock each time we look at the price

Here’s the math: Our database says that 11 different retailers carry this shoe, and it’s in one color and one width. Let’s assume that we’re gathering the price and in-stock data at each store weekly and the shoe stays on the market for one year. This means we have 572 records for this shoe. If we want to track pricing and in-stock information for all 16 women’s sizes (4½ - 12), this number goes to 9,152. And this is for a single pair of shoes -- gathering data for every pair in my shoe closet would create more data points than I’d ever admit.

Adding complexity, we gather prices more often than once a week during high-demand times and for volatile sites. Daily price and in-stock information would mean 4,015 data points for a single pair of shoes. Add in the descriptive product information and the possibility that each size may have a different price on sites like Amazon, and data for one pair of shoes rapidly expands. Imagine multiplying this times multiple billions of products and putting that into your spreadsheet. Big data’s scale challenges traditional systems of gathering and analysis.

5. Imperfect big data is useless.

This mistake drives me most crazy, because perfection at scale is basically impossible. Let’s say we hold one billion products with 520 data points each accountable to the coveted “five-nines” standard (99.999 percent) of perfection that IT departments attempt to achieve. There would still be 52 million incorrect data points in this dataset.

Big data rarely achieves this level of perfection for many reasons. Many big data sources are far from perfect. The websites that my company crawls as one of our big data sources can easily have typos in product names. Big data also requires an amount of machine learning and algorithms in order to structure and organize it; in the world of product data, these could easily mis-categorize products based on titles or names. For example, would an algorithm put a Marcy Playground album in playground equipment or in music?

Imperfection doesn’t indicate uselessness, however. A competent data analyst can remove outliers and pull vital insights out of big data even if imperfections abound. Developers can add filters to allow fewer mistakes to slip into your systems and develop training algorithms on huge datasets that will improve data quality over time. One of the biggest benefits of big data is that the volume will compensate for the occasional imperfection, allowing you better insights.

6. Only big companies need big data.

Small marketing companies need website traffic and keyword search numbers. Small social shopping companies need links to as many products as possible from the big retailers with affiliate programs. Small on-demand delivery services need reliable location data. This is only a small subset of the endless list of small companies that need big data.

Big companies may produce more of their own big data, but nearly every company in our modern economy uses big data or applications built on it. This means that all companies can get the benefit of access to the insights and information these huge datasets provide without having to build and manage the infrastructure required to create and analyze big data.

There’s no escaping big data in business these days, no matter the size of your company. Hopefully, this clears up any misconceptions you might have had -- after all, I had quite a few before living in the big data world. If executives better understand the complexity, pitfalls, and power of big data, they’ll run better businesses, make better decisions and make fewer stupid comments at parties.

Recommended Videos

Recommended Articles

8 indicted in planned sniper attack at White House UFC 250 event, DOJ says

National Guard shooting suspect in 'dire,' 'self-inflicted' health condition after refusing to eat: filing

Court hears alleged confession from accused Charlie Kirk assassin in texts with lover and more top headlines

WATCH: California parents sue luxury daycare after toddler allegedly tossed into air, dropped on head

Black bear raiding a garbage can at Lake Tahoe garage triggers dramatic wildlife encounter

ICE agents in fatal Houston shooting were not wearing body cameras, sources say

New 911 audio captures neighbors warning of 'full-out war' before police were attacked in block party chaos

California seizes 63,000 pounds of illegal cannabis worth $104 million in major crackdown

Tyler Robinson hearing: Top moments from explosive Lance Twiggs interview played in court

Florida man who contacted police about 1987 killing arrested in connection to cold case

Dramatic bodycam video captures NYPD officer rescuing woman from top of Brooklyn Bridge after emotional plea

Queens man arrested with Molotov cocktails after alleged arson attacks on Ozone Park and Woodhaven churches

North Carolina man broke into ex's home, fatally stabbed his 5-year-old son before going to Taco Bell: police

NASA chief confirms agency has unexplained UFO imagery: 'We don't know what it is'

Accused Charlie Kirk assassin allegedly confessed in texts, apologized and described motive, agent testifies

Illegal immigrant soccer coach who used alcohol and drugs to sexually abuse kids learns fate

American mother murdered in Irish tourist town as international manhunt targets alleged asylum seeker

Former cop Derek Chauvin appeals judge's rejection of bid for new trial in George Floyd murder case

Man arrested after allegedly throwing Molotov cocktail at person in wheelchair near OKC police headquarters

Family of toddler found alive in morgue after being declared dead plans legal action

New Japan's Rocky Romero previews G1 Climax

What is New Japan Pro-Wrestling's G1 Climax?

Minnesota city councilman reacts to upside down American flag at Somali Independence Day event

Charlie Kirk murder suspect Tyler Robinson confesses to killing, citing 'enough hatred'

Retired FBI agent on Charlie Kirk murder trial: 'Open and shut case'

MLW's Shotzi Blackheart talks life as champion, having target on her back

Lake Tahoe man deploys bear spray in dramatic encounter

Lance Twiggs' testimony and texts reveal alleged confessions in Tyler Robinson murder trial

Democratic Party knew of Platner's 'vulnerabilities’: California assemblywoman

Kat Timpf: This forces colleges to have some skin in the student loan game

Lt Gen Kellogg to Iranian leaders: Lay off of your neighbors

Tyler Robinson's alleged text confession to Charlie Kirk's murder revealed

MOUNTING EVIDENCE: Charlie Kirk preliminary hearing reveals more key information

Private defense contractors hold the 'most valuable evidence' of UAPs: 'The Age of Disclosure' director

Kevin McCarthy: The Democratic Party has 'lost it'

Fever coach asked about GOP lawmakers' letter to WNBA commish about Caitlin Clark

Private defense contractors 'hold all the cards' when it comes to UAPs, expert says

Donald Trump Jr: Evidence presented puts conspiracy theories around Charlie Kirk's slaying to rest

Tyler Robinson's parents leave court after emotional day of testimony

This sends a message to Iran, expert says