• No results found

Model consumption: scoring data with a saved model

Now that the model is built and saved in the SQL Server, you can load and use it to score datasets to predict whether a driver is likely to get a tip on a future trip. For predictions, we can use the rxPredict function from the RevoScaleR package to create a score, and, as always, you can save the scored data back to a table in SQL Server.

You can write and save the prediction or scoring functions as stored procedures. The prediction stored procedure can then simply be used from R-IDE using R-scripts, or from SSMS using T-SQL to score more data.

There are two different ways that you can call a model for scoring:

Batch scoring mode This lets you create multiple predictions based on input from a SELECT query.

Individual or single scoring mode This lets you create predictions one at a time, by passing a set of feature values for individual observations to the stored procedure, which

returns a single prediction value as the result.

We’ll use the batch scoring mode only. For individual scoring, refer to Deploying or Operationalization.1,2

Using an R environment

You can use the model object created earlier, logitObj, to score datasets, and save the scored results in SQL Server. For this, you need to define the data object to use for storing the scored results:

# Create a SQL server data object to store scored results

scoredOutput <- RxSqlServerData(

connectionString = connStr, table = "taxiScoreOutput"

)

In this example, taxiScoreOutput is the table in which scored results are stored. Figure 3-8 show what the table looks like.

Figure 3-8: The taxiScoreOutput table for saving scored results. Here the “Score” column indicates the

scored output (probability of receiving a tip), the rest of the columns are features (based on which the scores are generated), or the feature (“tipped”).

Note The schema for this table is not defined when you create it by using rxSqlServerData; rather, it is obtained from the scoredOutput object output from rxPredict. To create the table that stores the predicted values, the SQL sign-in running the rxSqlServer data function must have table creation privileges in the database. If the sign-in cannot create tables, the statement will fail.

Next, call the rxPredict function from the RevoScaleR package to score the input data, featureDataSource, and then insert the results into the taxiScoreOutputtable:

# Predict using rxPredict

rxPredict(modelObject = logitObj, data = featureDataSource, outData = scoredOutput, predVarNames = "Score", type =

"response",

writeModelVars = TRUE, overwrite

= TRUE)

In another approach to consume the model, you can create a stored procedure

(PredictTipBatchMode; see the code that follows) and use it for scoring from an R-IDE:

-- Create prediction stored procedure CREATE PROCEDURE [dbo].[PredictTipBatchMode]

@inquery nvarchar(max) AS

BEGIN

DECLARE @lmodel2 varbinary(max) = (SELECT TOP 1 model

FROM nyc_taxi_models);

EXEC sp_execute_external_script @language = N'R', @script = N' mod <- unserialize(as.raw(model));

print(summary(mod))

OutputDataSet<-rxPredict(modelObject = mod, data = InputDataSet, outData = NULL,

predVarNames = "Score", type = "response", writeModelVars = FALSE,

overwrite = TRUE); WITH RESULT SETS ((Score float));

END

This stored procedure performs the following steps:

1. The SELECT statement gets a serialized model from the database and stores the model in the R variable, mod, for further processing using R.

2. The new cases to be scored are obtained from the T-SQL query specified in @inquery, the first parameter to the stored procedure.

As the query data is read, the rows are saved in the default data frame, InputDataSet. This data frame is passed to the rxPredict

function available from the RevoScaleR library, which generates the scores.

3. rxPredict scores and returns results data to OutputDataSet, the default output data frame. The predicted values are floats representing the probability of a tip (of any amount) being given.

You can run the R script that follows in an R-IDE to query input data and score it by using the batch scoring stored procedure,

PredictTipBatchMode. First, you need to define the input query:

# Define input data using a SQL query. Here we use the table, NYCTaxiDirectDistFeatures,

# that was created earlier to store the engineered feature, direct_distance.

input = "N' SELECT passenger_count,

trip_time_in_secs, trip_distance, direct_distance FROM NYCTaxiDirectDistFeatures '"

Using the input data obtained from the query, you can run the following R script to obtain scored data in the scoredData data frame.

# The query is sent to be executed with input data, scored data frame is returned

q <- paste("EXEC PredictTipBatchMode @inquery = ", input, sep="")

scoredData <- sqlQuery (conn, q)

Using T-SQL

With the stored procedure PredictTipBatchMode, you can score data by using T-SQL. First, you need to define the query string to pass into the stored procedure. The stored procedure will execute this query, get the data to be scored, and pass the data into the rxPredict function in the stored procedure for generating the

predictions. The following code shows how to define the query for input data and use the stored procedure for scoring:

-- Specify input query

DECLARE @query_string nvarchar(max) SET @query_string=' SELECT passenger_count, trip_time_in_secs, trip_distance,

direct_distance

FROM NYCTaxiDirectDistFeatures '

-- Call stored procedure for scoring EXEC [dbo].[PredictTipBatchMode] @inquery =

@query_string;

You also can insert the scored output into a table in the SQL Server (refer to the examples of how a serialized, trained model is inserted into the table nyc_taxi_models).

Evaluating model

Related documents