ScaDS.AI Dresden/Leipzig contributes to ease data analysis development by providing (1) fundamental building blocks for data analytics, (2) tools to generate synthetic training data and (3) large-scale data analytics platforms.
Within ScaDS.AI Dresden/Leipzig, we investigate the fundamentals of how scalable Machine Learning algorithms can be built with a LEGO-type construction and reuse metaphor. Our mission is to get the human in the loop by tackling the needs of the user with methods and tools required to build application-specific Data Analytics platforms. We aim to provide modular analytics that abstract the essentials of an analytic domain and turn them into a set of universal building blocks.
Today’s Machine Learning methods, in particular deep learning models, are very data-hungry. However, synthetic data is cheap to produce and can support deep learning model development and testing. Synthetic data generation tools generate synthetic data to match sample data while ensuring that the important statistical properties of sample data are reflected in synthetic data. In ScaDS.AI Dresden/Leipzig, we develop time series generation approaches that are able to extract important characteristics from a given dataset and reproduce them in a generated dataset, which is very important for evaluating Machine Learning systems and for providing evolved data in case there is no data available.
We build custom Data Analytics platforms that ease the use of large-scale Data Analytics methods. For example with the advent of novel satellites and models in Earth system sciences, the data amounts have become far too big to be analyzed in conventional approaches. Therefore, apply “analysis-ready data cubes” in the cloud of arbitrary dimension, and with efficient approaches for trimming and slicing, enabled using novel data formats. For analyzing large-scale web data, PIs of ScaDS.AI Dresden/Leipzig are involved in the Immersive Web Observatory, which maintains a peta-scale data center to enable web analytics at the industry level. Providing access to a web archive of petabyte scale implies new methodical challenges of multi tenancy, distributed infrastructures, processing frameworks for analytics, online Machine Learning algorithms, and networked data analytics.