Big data is one of the most recent “buzz words” on the Internet. This term is normally associated to data sets so big, that they are really complicated to store, process, and search trough.
Big data is known to be a three-dimensional problem (defined by Gartner, Inc*), i.e. it has three problems associated with it:
1. increasing volume (amount of data)
2. velocity (speed of data in/out)
3. variety (range of data types, sources)
Why Big Data?
As datasets grow bigger, the more data you can extract from it, and the better the precision of the results you get (assuming you’re using right models, but that is not relevant for this post). Also better and more diverse analysis could be done against the data. Diverse corporations are increasing more and more their datasets to get more “juice” out of it. Some to get better business models, other to improve user experiences, other to get to know their audience better, the choices are virtually unlimited.
In the end, and in my opinion, big data analysis/management can be a competitive advantage for corporations. In some cases, a crucial one.
Big Data Management
Big data management software is not something you buy normally on the market, as “off-the-shelf” product (Maybe Oracle wants to change this?). One of biggest questions of big data management is what do you want to do with it? Knowing this is essential to minimize to problems related with huge data sets. Of course you can just store everything and later try to make some sense of the data you have. Again, in my opinion, this is the way to get a problem and not a solution/advantage.
Since you cannot just buy a big data management solution, a strategy has to be designed and followed until something is found that can work as a competitive advantage to the product/company.
Internally at LeaseWeb we’ve got a big data set, and we can work on it at real-time speed (we are using Cassandra** at the moment) and obtaining the results we need. To get this working, we had several trial-and-error iterations, but in the end we got what we needed and until now is living up to the expectations. How much hardware? How much development time? This all depends, the question you have to ask yourself is “What do I need?”, and after you have an answer to that, normal software planning /development time applies. It can be even the case that you don’t need Big Data at all, or that you can solve it using standard SQL technologies.
In the end, our answer to the “What do I need?” provided us with all the data we needed to search what was best for us. In this case was a mix of technologies and one of them being a NoSQL database.