• No results found

Scripts for creating tables, stored procedures, and functions

There’s another tool, called Windows PowerShell, that you can use to set up your data science environment and automate the entire process.

Windows Powershell is Microsoft’s scripting environment, like Perl but much more powerful and versatile—Windows Powershell also can work with any of the .NET libraries in the Microsoft ecostructure as well as just about anything that runs in the Microsoft Windows environment. You don’t need to install anything to make this work on your server: all modern versions of Windows come with Windows PowerShell installed by default.

Note Although you don’t need to learn all about Windows PowerShell for this example, if you’d like to explore it further, go to

https://technet.microsoft.com/library/bb978526 .aspx.

Downloading scripts and data

For this example, we’ve provided Windows PowerShell and T-SQL scripts to download the

data and perform the necessary SQL Server operations, create the necessary tables, and load the data into SQL Server.3,4,5 The Windows PowerShell script RunSQL_R_Walkthrough.ps1 uses other T-SQL scripts to create the database, tables, stored procedures, and functions—it even loads data into the data table.

On the computer where you are doing

development—typically a client workstation with R-IDE installed—open a Windows PowerShell command prompt as an administrator. If you have not run Windows PowerShell before on this instance, or you do not have permission to run scripts, you might encounter an error. If so, run the following command before running the script, to temporarily allow scripts without changing system defaults:

SetExecutionPolicy Unrestricted Scope Process -Force

Run the command that follows (see Figure 3-2) to download the script files to a local directory. If you do not specify a different directory, by default folder c:\tempR is created and all files are saved there. If you want to save the files to a different directory, edit the values of the

parameter DestDir to a folder on your computer.

If you specify a folder name that does not exist, the Windows PowerShell script will create the folder for you.

$source =

'https://raw.githubusercontent.com/Azure/Azure-

MachineLearning-DataScience/master/Misc/RSQL/Download_Scripts_R_Walk through.ps1'

$ps1_dest =

"$pwd\Download_Scripts_R_Walkthrough.ps1"

$wc = New-Object System.Net.WebClient

$wc.DownloadFile($source, $ps1_dest)

.\Download_Scripts_R_Walkthrough.ps1 –DestDir 'C:\tempR'

Figure 3-2: Windows PowerShell commands for downloading scripts and data for the end-to-end data science walk-through.

After you download and run this script and sign in to the SQL Server by using SSMS, you’ll see the database, tables, functions, and stored procedures that were created (Figure 3-3). These tables and functions are used in subsequent steps of the walk-through.

Figure 3-3: A list of files downloaded after running the Windows PowerShell script. The files contain data to be loaded to the database (nytaxi1pct.csv), several SQL (.sql) script files, and an R-script (.R) file.

Creating tables, stored procedures, and functions

To set up the SQL Server data, run the Windows PowerShell script RunSQL_R_Walkthrough.ps1 (highlighted in Figure 3-3). This script creates the tables, stored procedures, and functions that you need to prepare the model. Figure 3-4 shows the resulting RSQL_Walkthrough database. Unless specified in the command line as options, the script will prompt the user to input the database name, password, and path to the data file (nyctaxi1pct.csv) to be loaded. By default, we’re connecting to SQL Server using the Named Pipes protocol.

The script performs these actions:

Checks whether the SQL Native Client and command-line utilities for SQL Server are installed

Connects to the specified instance of SQL Server and runs some T-SQL scripts that configure the database and create the tables for the model and data

Runs a SQL script to create several stored procedures

Loads the data you downloaded previously into the table nyctaxi_sample

Rewrites the arguments in the R script file to use the database name that you specify

Figure 3-4: Tables, stored procedures, and functions that are created in the database after running the Windows PowerShell script.

The following tables, stored procedures, and functions are created in the database:

Tables:

nyctaxi_sample Contains the main NYC Taxi dataset. A clustered columnstore index is added to the table to improve storage and query performance. The one-percent sample of the NYC Taxi dataset will be inserted into this table.

nyc_taxi_models Used to persist the trained models.

Stored procedures:

PersistModel Called to save a trained model. The stored procedure takes a model that has been serialized in a varbinary data type and writes it to the specified table.

PredictTipBatchMode Calls the trained model to create predictions using the model.

The stored procedure accepts a query as its input parameter and returns a column of numeric values containing the scores for the input rows.

PredictTipSingleMode Calls the trained model to create predictions using the model.

This stored procedure accepts a new observation as input, with individual feature values passed as in-line parameters and returns a value that predicts the outcome for the new observation.

Functions:

fnCalculateDistance Creates a scalar-valued function that calculates the direct distance between pickup and dropoff locations.

fnEngineerFeatures Creates a table-valued function that creates new data features for model training.

An example of running the script with parameters is presented here:

.\RunSQL_R_Walkthrough.ps1 -server

SQLinstance.subnet.domain.com -dbname MyDB –u SqlUserName –p SqlUsersPassword -csvfilepath C:\tempR\nyctaxi1pct.csv

The preceding example does the following:

Connects to the specified instance and database using the credentials of SqlUserName.

Gets data from the file C:\tempR\nyctaxi1pct.csv.

Loads the data in nyctaxi1pct.csv into the table nyctaxi_sample, in the database MyDB on the SQL Server instance named

SQLinstance.

Note If the database objects already exist, they cannot be created again. If a table already exists, data will be appended, not overwritten.

Therefore, be sure to drop any existing objects before running the scripts.

Input data and SQLServerData

Related documents