Here is a simple metaphor to explain the difference between Big Data and Business Analytics. If you think of Big Data as a mining operation, you can think of Business Analytics as metallurgy.
I use this metaphor because Big Data provides very raw material in very large quantities. The output of Big Data requires a great deal of work to turn it into something valuable. This is the 'value' problem in Big Data (not one of the original V's but included increasingly more). It is hard to look at the ore coming out of a mine and see expensive jewelry. And it is hard to look at the vast amount of Big Data being created and see useful data products.
Business analytics, on the other hand, takes refined data products that already exist and mixes them in attempts to find a new composite or alloy that has desirable properties not available in the component products. While it is still difficult to know before hand what value might be found in the composite data products, there are two benefits that Business Analytics has over Big Data. First, you have a better idea what the outcome might be since you are working with refined data products rather than raw materials. Second, the volume is much, much less in Business Analytics so you can afford to do more trial and error.
Here is an example of attempting to use Big Data.I should mention that this example is constructed to make Big Data comprehensible to people who are not familiar with it as most examples are a little to arcane for the average person. Let's say you get a data feed from the Internet of Things. The Internet of Things is an emerging concept which is becoming more real over time and will continue to do so. In this feed you get information from parking meters, soda vending machines, websites, smart home appliances, EZ-pass toll booths, weather transponders, automobile computers and so on. This data comes at you as it is created. It is huge in volume with a great deal of variety. You want to use if to get a better understanding of your customers but you have no idea whether they are distinguished by regular car maintenance, how often they fail to feed parking meters, or whether they have milk going sour in their refrigerators. You have a huge volume of very raw data and need to figure out how to get some value out of it.
Compare that with an example of attempting to use Business Analytics.You have a chain of grocery stores and want to know if people in different locations have different purchasing behaviors resulting from weather forecasts. So, you pool all of the sales data from all of your stores, acquire some information on weather forecasts, and look for correlations. The data you are using is much more orderly and the things you are looking for are understood much.
Friday, December 26, 2014
Wednesday, December 17, 2014
What is Big Data?
Let's start with a question that a lot of people are wondering about. What is Big Data? First, I want to say that Big Data is a Big Deal. While technology has fueled the engines of transformation for the past few decades, data will fuel the engines of transformation for the next few. But, don't worry. I am not going to go off on a philosophical rant. Let's get right down to the brass tacks.
There is a lot of hype surrounding Big Data and a lot of misuse of the term. There are many definitions of Big Data none of which are particularly satisfying. But I will make use of two of my favorites.
Many people define Big Data in terms of three V's which are volume, variety and velocity. This means that Big Data is a huge amount (volume) of complicated data (variety) coming at you very fast (velocity). I have read papers and seen presentations where more V's are added. For example, veracity and value are popular as well. And both of these V's raise important issues. But, they are not central to the essence of Big Data.
Another definition that I like is that Data is big when it cannot be processed using traditional relational database technology. Relational databases require information to be highly structured (i.e. anti-variety) and the transaction models used to update the database have limitations on transactions (or updates) per second (i.e. anti-volume and anti-velocity).
It is probably best to think of Big Data as large volume of raw material from which data products can be made. These data products, in turn, can be used to make decisions which create value for a company (another V). These decisions can be large strategic decision or small individual decisions. A problem with Big Data is that it is unclear what it refers to. This is the veracity problem (yes, I snuck another V in there). Until it is tamed (i.e. we know what it refers to) it is difficult to use it in decisions.
Note that Big Data is largely defined by the amount of it. If there were a gigantic improvement in processing power, say parallel or quantum computers, which led to computers tens of thousands of times faster, there would no longer be such a thing as Big Data. It would just be data. Unlike relational databases which contain a particular kind of data (categorical) Big Data is largely defined by the amount and messiness of it both of which lead to processing constraints.
Should you be concerned with Big Data? As I said in the first paragraph, Big Data is a Big Deal. However, there is a lot of very valuable data that does not rise to the level of Big Data. If you are not yet doing everything you can with your Not So Big Data (Terabytes and less), it makes more sense to focus on that first. Once you are getting all the value you can from that, it would be appropriate to start taking on Big Data.
There is a lot of hype surrounding Big Data and a lot of misuse of the term. There are many definitions of Big Data none of which are particularly satisfying. But I will make use of two of my favorites.
Many people define Big Data in terms of three V's which are volume, variety and velocity. This means that Big Data is a huge amount (volume) of complicated data (variety) coming at you very fast (velocity). I have read papers and seen presentations where more V's are added. For example, veracity and value are popular as well. And both of these V's raise important issues. But, they are not central to the essence of Big Data.
Another definition that I like is that Data is big when it cannot be processed using traditional relational database technology. Relational databases require information to be highly structured (i.e. anti-variety) and the transaction models used to update the database have limitations on transactions (or updates) per second (i.e. anti-volume and anti-velocity).
It is probably best to think of Big Data as large volume of raw material from which data products can be made. These data products, in turn, can be used to make decisions which create value for a company (another V). These decisions can be large strategic decision or small individual decisions. A problem with Big Data is that it is unclear what it refers to. This is the veracity problem (yes, I snuck another V in there). Until it is tamed (i.e. we know what it refers to) it is difficult to use it in decisions.
Note that Big Data is largely defined by the amount of it. If there were a gigantic improvement in processing power, say parallel or quantum computers, which led to computers tens of thousands of times faster, there would no longer be such a thing as Big Data. It would just be data. Unlike relational databases which contain a particular kind of data (categorical) Big Data is largely defined by the amount and messiness of it both of which lead to processing constraints.
Should you be concerned with Big Data? As I said in the first paragraph, Big Data is a Big Deal. However, there is a lot of very valuable data that does not rise to the level of Big Data. If you are not yet doing everything you can with your Not So Big Data (Terabytes and less), it makes more sense to focus on that first. Once you are getting all the value you can from that, it would be appropriate to start taking on Big Data.
Monday, December 15, 2014
Making Sense of Information Technology
When I first started in Information Systems, more years ago than I care to admit, there were only a few technologies we had to worry about. There were operating systems, teleprocessing monitors, databases, applications and programming languages. Everybody knew how to program and everybody specialized in one of the preceding other four. It was still daunting, but nothing like it is today.
Since then we have had to adjust to personal computers, networks, artificial intelligence, web technologies, social interaction technologies, mobile devices, and more new programming and scripting languages than I even want to think about. But, as if that were not enough to worry about, we now have analytics, and big data to contend with. And on the horizon we have virtual worlds, video games, drones, a resurgence of artificial intelligence. A bit further off we have complexity theory and agent based modelling threatening to change a game that has already changed so many times that it can hardly even be considered the same game. This list, by the way, is by no means comprehensive. I am doing this off the top of my head. So I apologize if I have left out your pet emerging technology.
How does one keep up with all this stuff? How does one know what to be concerned about and what to ignore? I routinely hear people confusing Big Data with Analytics or Relational Databases with Data Warehousing. Most people know that Facebook is a Social Interaction Technology but what about YouTube and Wikipedia? And what is the difference between a Wiki and Wikipedia. While we are at it, what is the difference between a wiki, a blog and a forum? What is the difference between a web server and a web service? If your business had $10,000 to play around with an emerging information technology which one would it be? What about $100,000 or a million?
My biggest challenge since those salad days of mainframes has been to keep up with emerging technologies. And, in the process, I have learned a few things and learned a few tricks. I routinely explain things like this in my classes. So, I thought I would create a blog to reach a wider audience. This is not my first blog. In fact I have many. But I love to write and I love to figure things out. When I can figure things out and write about them, that is as good as it gets.
I should warn you, upfront, about my eratic blogging habits based on the other blogs that I have created. I write when and where I feel like it because I do my best work that way. Often, I will post a flurry of pieces to a blog and then ignore it for a while while I use other outlets for my writing. Eventually, I will come back and write some more. My goal with this blog will be to post something of interest every week or two on the average. So, if this look interesting, please book mark it or follow it. I also have a twitter account @DrJohnArtz which you can follow. The only thing I post to the twitter account is when there are new postings to a blog that has been fallow for a while. So, I won't fill your inbox with tweets about what I had for breakfast.
Since then we have had to adjust to personal computers, networks, artificial intelligence, web technologies, social interaction technologies, mobile devices, and more new programming and scripting languages than I even want to think about. But, as if that were not enough to worry about, we now have analytics, and big data to contend with. And on the horizon we have virtual worlds, video games, drones, a resurgence of artificial intelligence. A bit further off we have complexity theory and agent based modelling threatening to change a game that has already changed so many times that it can hardly even be considered the same game. This list, by the way, is by no means comprehensive. I am doing this off the top of my head. So I apologize if I have left out your pet emerging technology.
How does one keep up with all this stuff? How does one know what to be concerned about and what to ignore? I routinely hear people confusing Big Data with Analytics or Relational Databases with Data Warehousing. Most people know that Facebook is a Social Interaction Technology but what about YouTube and Wikipedia? And what is the difference between a Wiki and Wikipedia. While we are at it, what is the difference between a wiki, a blog and a forum? What is the difference between a web server and a web service? If your business had $10,000 to play around with an emerging information technology which one would it be? What about $100,000 or a million?
My biggest challenge since those salad days of mainframes has been to keep up with emerging technologies. And, in the process, I have learned a few things and learned a few tricks. I routinely explain things like this in my classes. So, I thought I would create a blog to reach a wider audience. This is not my first blog. In fact I have many. But I love to write and I love to figure things out. When I can figure things out and write about them, that is as good as it gets.
I should warn you, upfront, about my eratic blogging habits based on the other blogs that I have created. I write when and where I feel like it because I do my best work that way. Often, I will post a flurry of pieces to a blog and then ignore it for a while while I use other outlets for my writing. Eventually, I will come back and write some more. My goal with this blog will be to post something of interest every week or two on the average. So, if this look interesting, please book mark it or follow it. I also have a twitter account @DrJohnArtz which you can follow. The only thing I post to the twitter account is when there are new postings to a blog that has been fallow for a while. So, I won't fill your inbox with tweets about what I had for breakfast.
Subscribe to:
Posts (Atom)