%INDHD_RUN_MODEL
( INMETANAME=input-filename.SASHDMD , SCOREPGM=model_score_program_ds2_file
<, OUTDATADIR=hdfs-directory-path>
<, OUTMETADIR=hdfs-directory-path>
<, INFILETYPE=type>
<, INPUTFILE=input-file-name>
<, OUTFILEDELMITER=file-delimiter>
<, OUTTEXTQUALIFIER=text-qualifier>
<, OUTFILETYPE=output-file-type>
<, OUTRECORDFORMAT=output-record-format>
<, FORMATFILE=user-defined-format-filename>
<, FORCEOVERWRITE=TRUE | FALSE>
<, KEEP=variable-keep-list>
<, KEEPFILENAME=keep-list-configuration-filename>
<, TRACE=YES | NO>
);
%INDHD_RUN_MODEL Syntax 83
Arguments
INMETANAME=input-filename.SASHDMD
specifies the HDFS full path of the input metadata file (.sashdmd file).
Requirement The metadata file must already exist or must be generated with PROC HDMD before running the %INDHD_RUN_MODEL macro.
You do not have to create a metadata file for the input data file if the data file is created with a Hadoop LIBNAME statement that contains the HDFS_DATADIR= and HDFS_METADIR options. In this instance, metadata files are automatically generated.
Interaction This file is read by the MapReduce job.
See “Creating a Metadata File for the Input Data File” on page 88 PROC HDMD in SAS/ACCESS for Relational Databases: Reference SCOREPGM=model_score_program_ds2_file
specifies the name of the scoring model program file that is executed by the SAS Embedded Process.
OUTDATADIR=hdfs-directory-path
specifies the name of the HDFS directory where the MapReduce job stores the output files.
Interaction The hdfs-directory-path overrides what is specified in the <outputDir>
element in the input metadata file (.sashdmd file).
OUTMETADIR=hdfs-directory-path
specifies the name of the HDFS directory where the MapReduce job stores the output file metadata.
Interaction The hdfs-directory-path overrides what is specified in the <metaDir>
element in the input file metadata (.sashdmd file).
INFILETYPE=type
specifies the type of input file. type can be one of the following:
DELIMITED
specifies a delimited file.
Note This type maps to the
com.sas.access.hadoop.ep.delimited.DelimitedInputFormat input format in the <epInputFormat> element in the input file metadata (.sashdmd file).
CUSTOM
specifies a custom file.
Note This type maps to the
com.sas.access.hadoop.ep.custom.CustomFileInputFormat input format in the <epInputFormat> element in the input file metadata (.sashdmd file).
CUSTOM_SEQUENCE
specifies a custom sequence file.
84 Chapter 7 • SAS Scoring Accelerator for Hadoop
Note This type maps to the
com.sas.access.hadoop.ep.custom.CustomSequenceFileInputFormat input format in the <epInputFormat> element in the input file metadata (.sashdmd file).
SEQUENCE
specifies a sequence file.
Note This type maps to the
com.sas.access.hadoop.ep.sequence.EpSequenceFileInputFormat input format in the <epInputFormat> element in the input file metadata (.sashdmd file).
BINARY
specifies a fixed record length file.
Alias FIXED
Note This type maps to
thecom.sas.access.hadoop.ep.binar.FixedRecLenBinaryInputFormat input format in the <epInputFormat> element in the input file metadata (.sashdmd file).
XML
specifies an XML file.
Note This type maps to the
com.sas.access.hadoop.ep.xml.XmlInputFormat input format in the <epInputFormat> element in the input file metadata (.sashdmd file).
JSON
specifies a JSON file.
Note This type maps to the
com.sas.access.hadoop.ep.json.JsonInputFormat input format in the <epInputFormat> element in the input file metadata (.sashdmd file).
SPD
specifies an SPD file type.
Note This type maps to the
com.sas.hadoop.ep.spd.EPSPDInputFormat input format in the
<epInputFormat> element in the input file metadata (.sashdmd file).
Interaction The type overrides what is specified in the <epInputFormat> element in the input file metadata (.sashdmd file).
Note If this option is specified, the %INDHD_RUN_MODEL macro automatically matches the type with the correct input format Java class for the SAS Embedded Process. See each type for the mapping that is performed.
INPUTFILE=input-filename
specifies an HDFS fully qualified input filename. This file is read by the MapReduce job.
%INDHD_RUN_MODEL Syntax 85
Interaction The input-filename overrides what is specified in the <inputDir>
element in the input file metadata (.sashdmd file).
OUTFILEDELIMTER=file-delimiter
specifies the delimiter for variables (fields) in the output file. Here is how you can specify the delimiter.
• ','
• '\t'
• ^A
• ^Z
• '09'x
• 32
Default ^A
Range You can specify only a single character between the Unicode range of U+0001 to U+007F.
Restriction The value of this option cannot be the same character as for OUTTEXTQUALIFIER and cannot be a newline ('0a'x).
Requirement This option is valid only for DELIMITED. Other formats do not use it.
Note Valid values are 0–127, a comma (",”), or "\t".
OUTTEXTQUALIFIER=text-qualifier
specifies the text qualifier to be used in the output data file.
Default none
Range You can specify only a single character between the Unicode range of U+0001 to U+007F.
Restriction The value of this option cannot be the same character as for OUTFILEDELIMTER and cannot be a newline ('0a'x).
Requirement This option is valid only for DELIMITED. Other formats do not use it.
OUTFILETYPE=output-file-type
specifies the output file type. output-file-type can be one of the following values:
DELIMITED
specifies a delimited file.
BINARY
specifies a fixed record length file.
Alias FIXED SPD
specifies an SPD file.
86 Chapter 7 • SAS Scoring Accelerator for Hadoop
Default If the input file type is fixed, the output file type is fixed. Otherwise, it is delimited.
OUTRECORDFORMAT=output-record-format
specifies the output record format. output-record-format can be one of the following values:
DELIMITED
specifies a delimited format.
FIXED
specifies a fixed record length format.
Default DELIMITED
FORMATFILE=user-defined-format-filename
specifies the name of the user-defined formats that were created by the FORMAT procedure and that are referenced in the DATA step scoring model program.
Interaction This name is the same one that you specified in the
%INDHD_PUBLISH_MODEL macro’s FMTCAT argument.
See “FMTCAT=format-catalog-filename | libref.format-catalog-filename”
on page 82
FORCEOVERWRITE=TRUE | FALSE
specifies whether the output directory is deleted before the MapReduce job is executed.
Default FALSE KEEP=variable-keep-list
specifies a list of variables that the SAS score program retains.
Restriction KEEP and KEEPFILENAME are mutually exclusive.
Requirement The list of variables must be separated by spaces and should not be enclosed by single or double quotation marks.
KEEPFILENAME=keep-list-configuration-filename
specifies the name of an XML configuration file that contains the list of variables that are passed to the SAS score program.
The keep list configuration file should have the following format:
<configuration>
<property>
<name>sas.ep.ds2.keep.list</name>
<value>var1 var2 var3 var4... varn</value>
</property>
</configuration>
Restriction KEEP and KEEPFILENAME are mutually exclusive.
Requirement You must specify the full path.
TRACE=YES | NO
specifies whether debug messages are displayed.
%INDHD_RUN_MODEL Syntax 87
Default NO