DPS 2021 Training Class – Big Data & Analytics With SQL Server 2019 Big Data Cluster
Abstract: This training session covers big data, data virtualization and analytics in SQL Server 2019 Big Data cluster.
SQL Server 2019 Big Data Cluster, (BDC for short), is the evolution of SQL Server, and its goals are to be able to integrate, manage and analyze data in the peta byte range.
The session starts with an overview of what the BDC is, to get everyone on the same page. After the overview, we look at data virtualization; how we can use the BDC as a data hub. We then drill down and look at how storage pools, and data pools works BDC . From there we look at how we can ingest data into the BDC, and how to analyze the data using the built-in Apache Spark engine. Part of this also looks at using Apache Kafka and Apache Spark together to ingest data into the BDC.
We also see how we can “mount” external HDFS and Azure Data Lake sources in the Big Data Cluster.
- 1. Overview of SQL Server 2019 Big Data Cluster
2. Data Virtualization in the BDC
3. Storage Pools & Data Pools
4. Apache Spark
5. Data Ingestion into the BDC
6. Analytics with Apache Spark