R as an Analysis Engine
RapidNCA, the non-compartmental
analysis workflow tool
Building Apps
•
Java
•
.NET
•
R
Why Build Analytic Applications?
Why Build Analytic Applications
on R?
Why Analytics?
• Analytics answer many questions
• I believe there is no company in the world today who
cannot benefit from analytics in some way
Who is a good driver? What bonus should I pay? How do we win more games?
Why build Analytic Applications?
3 key reasons we see:
•
To deploy analytical tools to decision makers
•
To make an analysts life more efficient
Deploying Analytics
• Adding analytics into a business process
more informed decisions
• Complex analytics shouldn’t be attempted by
non-analysts
• Communication between the decision maker and
Deploying Analytics
If we build an application which:
• is easy for the decision maker to use
• contains the correct analysis to apply
• communicates analytical results in suitable manner this leads to some major benefits!
Benefits for
the Analyst
Benefits for the
Decision Maker
No need to wait for information
Can perform “what if” analysis
Decision not dependent on analyst availability
Less need to perform often-repetitive tasks
Comfortable that the “right” analysis is being run
Can get on with more strategic things?
User Interface Data Analytic Outputs Data Storage Analytic Code Code Mgment
Analytic App Structure
Analytic Engine
Why build Analytic Applications on R?
We want a programmable engine so that it can be readilyextended (i.e. no black boxes please)
R can be extended by the developer as needed
We often want to be able to deploy new algorithms and techniques as they become available
Why build Analytic Applications on R?
Building applications requires installing analytic engine ondesktops, servers, clusters, clouds
R is license free
Building analytic applications involves integrating an analytic engine with other technologies (data sources, UI etc.)
Formal R Development
• Creating sophisticated analytic applications requires a formal development approach
• This mostly means taking standard development practices and applying it to analytics
• Mango’s formal R development procedures and structure has been evolving since its inception ~2004
Project Mgment Requirements Behaviour Driven Code Review Review board StatET testthat roxygen2 Continuous Integration Issue Tracking Quality Manual Dev Procedures Coding Standards Knowledge Mgment TestCoverage
RapidNCA,
the non-compartmental
analysis workflow tool
• Need for RapidNCA • Using .NET
• RapidNCA Structure • Code Quality
• Connections with R.NET
Need for RapidNCA
• Customer needed to send monthly reports to dozens of
trial centres
• Small team, so time limited
• Predefined non-compartmental
analysis
Using .NET
What is .NET?
• Object-oriented environment to develop applications • Safe execution environment
• Choice of programming languages • Framework consisting of:
• runtime
• class library
Using .NET
Visual Studio
• A graphical programming tool (IDE) • Visual Studio Express - free version
Using .NET
Choice of languages
• C# is the main one
• F# is a functional language (similar concepts to OCaml)
• XAML (a Microsoft declarative XML language) for interactive
graphics
• C++/CLI useful for legacy and bespoke parallel processing
(including GPGPU) Other possibilities...
• Vb.Net is very like C# (no advantage over it)
Using .NET
“Ajar Source” Platform
Not exactly open source, but…
• Most CLI third party languages are open
• C# and VB.Net are not, but many open source projects
based on them
• Microsoft have made F# open source • Compiler is free
Using .NET
Performance
Performance is very good
• On graphics (millions of data points will plot with ease and
zoom smoothly)
• Computation is fast enough in C#, calling R adds little overhead • Standard Maths library is limited; third parties and MS maths for
“drawing” are better
• Data parallel computation is possible on the desktop (GPGPU) • F# provides further “big data” capabilities
User Interface Data Analytic Outputs Data Storage Analytic Code Code Mgment Data Service
RapidNCA Structure
Analytic EngineRapidNCA Structure
MangoNca
Analytic Code
Analyse Element AnalysisDo Get Analysis Unit Tests Data Checks
RapidNCA Structure
Code Quality
Unit Tests
• Ensure product works!
• User/Customer/Payer trust
Code Quality
Run Code, Check Output
• Working Cases
> test1 <- ncaAnalysis(Conc = c(4, 9, 8, 6, 4:1, 1), + Time = 0:8, Dose = 100, Dof = 2)
> checkEquals(test1[1, "ROutput_adjr2"], 0.9714937901, + tol = 1e-8)
[1] TRUE
> require(RUnit)
Code Quality
Error Case Unit Tests
• Use try
• Handled Error Cases
> test7 <- try(AUCLast(Conc = 1:10, Time = 9:0), + silent = TRUE) > checkEquals(test7, + "Error in checkOrderedVector(Time, ... ") [1] TRUE > test26 <- ncaAnalysis(Conc = c(4, 9, 8, 6, 4:1, 1), + Time = 0:8, Dof = 1) > checkEquals(test26[, "ROutput_Error"], + "Error in checkSingleNumeric(Dose, ... ") [1] TRUE
Connections with R.NET
• What will be provided to R? • What will be returned from R?
Connections with R.NET
Using the R Service
• R.NET allows R calls to be submitted to an R service • R.NET connects to R down to Expression level
Connections with R.NET
Data Checks
• Function may be passed data outside its anticipated
structure
> checkOrderedVector(c(0, 1, 3, 2, 4), + description = "Time")
Error in checkOrderedVector(c(0, 1, 3, 2, 4), description = "Time") :
Error: Time is not ordered. Actual value is 0 1 3 2 4
Connections with R.NET
Data Checks
• The tool expects a certain return object
• An error in an R call should be trapped by the
communicating function
• Return object passed as normal
• An error checking element of the return object can report
information about the error
> check01 <- try(checkOrderedVector(Time,
+ description = "Time"), silent = TRUE) > if (is(check01, "try-error")) { return(object) }
Connections with R.NET
_pluginsManager = new RPluginManager(PluginLocation, RLocation); _pluginsManager.SetActivePlugin();
_session = _pluginsManager.GetSession();
bool sessionOk = _pluginsManager.TryMakeSession(out _session);
R is efficiently accessed, via R.Net (as pictured in Visual Studio) via a
Connections with R.NET
User Interface Data Analytic Outputs Data Storage Analytic Code Code Mgment Analytic Engine Data Service R.NET Analytic EngineAnalysis Display Get PK Params Data Service Dialog Service App Logger Status Bar Service App Config Mgment Data Importers Project Wizard Validators Receive R Output Create R Expressns
Connections with R.NET
.NET Data Service
Connections with R.NET
Using the framework
_pluginsManager = new RPluginManager(PluginLocation, RLocation); _pluginsManager.SetActivePlugin();
_session = _pluginsManager.GetSession();
bool sessionOk = _pluginsManager.TryMakeSession(out _session);
_session.SetNumericSymbol("TimePtVector", CheckTimePointData(toAnalyse));
_session.SetNumericSymbol("ConcVector", CheckConcentrationPointData(toAnalyse));
var evalString = string.Format("ncaAnalysis(TimePtVector, ConcVector, …
MathEngineDataRowDto<double> ncaGetBack =
_session.PerformNumericEvaluation(evalString, "ROutput_Error"); _lastErrors = ncaGetBack.ErrorStrings;
_session.FlushConsole();
Complete & Deploy
RapidNCA
• Can users understand how to use tool? • How confident are we in tool output?
• On-going code review • Independent test team • Installation Qualification • Operational Qualification • Performance Qualification
Conclusions
• Great graphical interfaces can be built using .NET • Intuitive interactive features are available
• R.NET allows R analysis to be accessed as a service • Good coding practice will ensure application is robust • Work on a well engineered framework will be rewarded