Big Data


We live in a new age. In age of information, new technologies, progress. Technological progress is also related to progress on communication level. New devices, social sites, these allow us to be in touch with others, to download, gather, exchange, send and so on...


We get to the age where everything is flooded by information and everyone is storing them. Every human who uses technologies, leaves behind tracks of data and these data are stored somewhere. Every human and every company have their data stored somewhere and bigger company  means more data to store.

The problem is, when we have a lot of data and they are all of different kind. Company has this data stored somewhere, but when they want to use them, they don´t know what and how to do it.

It means, that company has a “Big Data” problem and needs to figure out, how to gather information from all off these data.



Big Data

Big Data are these type of data, which can’t be processed and analyzed with traditional processes or tools in reasonable time. Big Data are mainly defined by three basic characteristics:


Volume – how many data do we have


Velocity – how fast we are able to process this data


Variety – what kind of data do  we have


Even if size is important in this definition, it’s not only decisive factor. Other factors are format and structure of these data and at which time period we are able to analyze them. Most of the data are in unstructured form or different format and they are usually stored at many different places, so companies have to solve not only „analyzing“ this data to gather correct and usefull information, but also how and where they will stored them. 



Streamlined Data Architecture 

Our company, at this time, is co-operating with 2 technological leaders for storing and analyzing Big Data, these platforms are:

Hadoop - platform based on an open-source software framework used for distributed storage and processing of dataset of unstructured big data using the MapReduce programming model.

Vertica - is the cluster-, column-oriented platform, which was designed to manage large, fast-growing volumes of data and provide very fast query performance when used for data warehouses and other query-intensive applications.