Federation Of Big Data Analytics As A Service
We have spent years in the process of global digitization of all areas of our lives, both companies, government administrations, access to information, even our entertainment increasingly depend more on the use of technologies such as, for example, streaming platforms, social networks or video game. This unstoppable process has a side effect, the amount of data generated. It is estimated that in 2020 every second 1.7 MB of data was generated per person. This is where Big Data Analytics comes in as a means of extracting added value from this data that allows optimization of production processes.
Companies can access Big Data Analytics services offered by large technology companies in the cloud, but many times the cost of these services are not affordable by small and medium-sized companies and they opt for on-premise solutions that allow computing resources to be exploited—owned by the company itself.
In addition, another factor to take into account when deciding where to deploy a Big Data service is data privacy. By its nature, there are many data sets with sensitive information that companies need to protect from outside eyes and companies that offer Big Data Analytics services in the cloud can guarantee that no one else will access your data once you start processing it. But the truth is that the user has no control over what is really happening in the computing nodes in the cloud.
When a company decides to deploy on-premise Big Data Analytics services on its own computing resources, certain complexities arise. On the one hand, the complexity of installing, configuring, maintaining and updating each one of the technologies that make up each service, and, on the other hand, how to get the most out of computing resources that are divided into different clusters with different communication networks communication.
The Radiatus project emerged as a response to all these problems. Radiatus is a Big Data Analytics platform as a service that allows the deployment of technologies for data analysis in a very easy and intuitive way. The project is in its fourth year and already has more than twenty integrated community technologies such as Jupyter, Zeppelin, Spark, Flink, Cassandra, Kafka, MySQL, HDFS, MinIO,… In addition, we have developed our own technologies such as DistributedML, a framework for training Machine Learning models. radiatusIt also has a multi-tenancy user management system, which allows users to be organized into groups and assigned dedicated computing resources for the deployment of their services. Likewise, Radiatus can also make use of GPU resources for the execution of high-performance computing.
One of the latest developments of Radiatus has been the federation system that allows different platforms to be interconnected in order to share the computational resources of all of them with the users. This functionality is very useful, for example, to link different Radiatus platforms at the Edge, Fog, and Cloud levels, thus allowing the execution of data flow processing services with a Data Continuum model, or also for the use of computational resources hosted in different clusters for the deployment of Big Data services.
For the future, we continue to investigate new technologies that add value to Radiatus and we continue to integrate new services and update existing ones to offer the complete platform possible.