
For an instructor lead, in-depth look at learning SQL click below.
Structured Query Language (SQL) is a must-know language for any data analyst or database developer. One of the powerful features of SQL Server is PolyBase, a technology designed to process queries that access both semi-structured and relational data stored in Hadoop or Azure Blob Storage.
What is Polybase?
PolyBase allows you to use Transact-SQL statements to access data stored in Hadoop or Blob Storage, diverting processing to Hadoop. In other words, you can manipulate data stored in Hadoop or Azure Blob Storage just like the data in your SQL Server instances.
Working with PolyBase
To start working with PolyBase, you will need to install it during the SQL Server installation process. You then need to configure and enable PolyBase, which we will outline below.
|
1 2 3 4 5 6 |
EXEC sp_configure @configname = 'hadoop connectivity', @configvalue = 7; GO RECONFIGURE; GO |
Creating an External Data Source
The first step to using PolyBase is to create an external data source. Below is an example of the T-SQL necessary to create an external data source.
|
1 2 3 4 5 6 7 8 |
CREATE EXTERNAL DATA SOURCE RemoteHadoopCluster WITH ( TYPE = HADOOP, LOCATION = 'hdfs://HadoopServer:8020' ); |
Creating an External File Format
After the data source is created, you will need to create an external file format if one does not already exist. This format specifies the type of data stored in the external data source.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
CREATE EXTERNAL FILE FORMAT TextFileFormat WITH ( FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS ( FIELD_TERMINATOR = ',', STRING_DELIMITER = '"', FIRST_ROW = 2 ) ); |
Creating an External Table
The final step to accessing external data is to create an external table that maps to the data on your external data source.
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
CREATE EXTERNAL TABLE [dbo].[csv_external_table] ( [date] Date NOT NULL, warehouse VARCHAR(250) NOT NULL, quantity INT NOT NULL ) WITH ( DATA_SOURCE = RemoteHadoopCluster, LOCATION = '/data/', FILE_FORMAT = TextFileFormat ); |
With these steps, you will have the necessary structure to connect to and query data stored in an external source through SQL Server’s PolyBase feature.
