The mission of ScaDS.AI is to advance research not only by contributing new methods, software, and infrastructure, but also by fostering the creation and publication of valuable datasets. The availability of high-quality, large-scale data has been one of the driving forces of recent AI technologies. Nevertheless, the curation and provisioning of such datasets does not come for free, and requires technical support as well as organizational commitment.
ScaDS.AI Dresden/Leipzig is very active in the creation and maintenance of open data and open models. This includes knowledge graphs, labeled training data, web data, medical information, and Natural Language Processing (NLP) sources of German texts. ScaDS.AI Dresden/Leipzig is also active in several consortia of the National Research Data Initiative (NFDI).
In general, the publication of trained models will be of huge importance for enabling further research and applications in AI, and ScaDS.AI encourages and supports the publication and archiving of such models from all domains where the project is active. Some of the largest and most visible results of these efforts are the large knowledge bases DBpedia and Wikidata, both of which are now widely used in AI research and applications (including intelligent agents such as Apple’s Siri and Amazon’s Alexa). ScaDS.AI is strongly involved in the creation of both resources and the underlying infrastructure. Moreover we develop valuable datasets for evaluation, analysis, and training such as the Dresden Web Table Corpus which consists of 125 million tables extracted from Web pages. ScaDS.AI researchers are also been active in the Linked Data Benchmark Council2, which publishes open datasets and data generators.
ScaDS.AI is advancing data-centric research in key application areas of AI. We improve access to data from patient cohorts from all German university hospitals via a national infrastructure being developed within the so-called Medical Informatics Initiative (MII). The universities of Leipzig and Dresden lead or participate in MII consortia (SMITH and MIRACUM and run so-called Data Integration Centers. Researchers at ScaDS.AI are also contributing to the Immersive Web Observatory, a project that maintains a petascale data center to enable web analytics at the industry level.
We offer infrastructure and AI-related services for all project members, but also for researchers at other AI-related German competence centers and research projects. In that direction ScaDS.AI focuses on research data management over the complete data life cycle. ScaDS.AI develops related concepts and services in many interdisciplinary projects.
Across all of our open data activities, ScaDS.AI is linked to the National Research Data Initiative (NFDI), e.g., NFDI4DataScience, NFDI4Health, NFDI4Earth, NFDI4Biodiversity, NFDI4Chem, and NFDI4Cat.