Predictive Analytics, Data Mining and Big Data
9781137379276_01_prexii.indd i
9781137379276_01_prexii.indd i 5/12/2014 8:52:45 PM5/12/2014 8:52:45 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay
Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17
This page intentionally left blank
10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay
Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17
Predictive Analytics,
Data Mining and
Big Data
Myths, Misconceptions and Methods
Steven Finlay
9781137379276_01_prexii.indd iii
9781137379276_01_prexii.indd iii 5/12/2014 8:52:45 PM5/12/2014 8:52:45 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay
Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17
© Steven Finlay 2014
All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission.
No portion of this publication may be reproduced, copied or transmitted save with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, Saffron House, 6–10 Kirby Street, London EC1N 8TS.
Any person who does any unauthorized act in relation to this publication may be liable to criminal prosecution and civil claims for damages. The author has asserted his right to be identified as the author of this work in accordance with the Copyright, Designs and Patents Act 1988. First published 2014 by
PALGRAVE MACMILLAN
Palgrave Macmillan in the UK is an imprint of Macmillan Publishers Limited, registered in England, company number 785998, of Houndmills, Basingstoke, Hampshire RG21 6XS.
Palgrave Macmillan in the US is a division of St Martin’s Press LLC, 175 Fifth Avenue, New York, NY 10010.
Palgrave Macmillan is the global academic imprint of the above companies and has companies and representatives throughout the world.
Palgrave® and Macmillan® are registered trademarks in the United States, the United Kingdom, Europe and other countries.
ISBN 978–1–137–37927–6
This book is printed on paper suitable for recycling and made from fully managed and sustained forest sources. Logging, pulping and manufacturing processes are expected to conform to the environmental regulations of the country of origin.
A catalogue record for this book is available from the British Library. A catalog record for this book is available from the Library of Congress.
Typeset by MPS Limited, Chennai, India.
9781137379276_01_prexii.indd iv
9781137379276_01_prexii.indd iv 5/12/2014 8:52:46 PM5/12/2014 8:52:46 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay
Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17
To Ruby and Samantha
9781137379276_01_prexii.indd v
9781137379276_01_prexii.indd v 5/12/2014 8:52:46 PM5/12/2014 8:52:46 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay
Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17
This page intentionally left blank
10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay
Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17
vii
Figures and Tables x Acknowledgments xii
1 Introduction 1
1.1 What are data mining and predictive analytics? 2 1.2 How good are models at predicting behavior? 6 1.3 What are the benefi ts of predictive models? 7 1.4 Applications of predictive analytics 9
1.5 Reaping the benefi ts, avoiding the pitfalls 11 1.6 What is Big Data? 13
1.7 How much value does Big Data add? 16 1.8 The rest of the book 19
2 Using Predictive Models 21
2.1 What are your objectives? 22 2.2 Decision making 23
2.3 The next challenge 31 2.4 Discussion 34
2.5 Override rules (business rules) 36
3 Analytics, Organization and Culture 39
3.1 Embedded analytics 40 3.2 Learning from failure 42 3.3 A lack of motivation 43 3.4 A slight misunderstanding 45 3.5 Predictive, but not precise 50 3.6 Great expectations 52
3.7 Understanding cultural resistance to predictive analytics 54 3.8 The impact of predictive analytics 60
Contents
9781137379276_01_prexii.indd vii
9781137379276_01_prexii.indd vii 5/12/2014 8:52:46 PM5/12/2014 8:52:46 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay
Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17
viii Contents
3.9 Combining model-based predictions and human judgment 62
4 The Value of Data 65
4.1 What type of data is predictive of behavior? 66 4.2 Added value is what’s important 70
4.3 Where does the data to build predictive models come from? 73
4.4 The right data at the right time 76
4.5 How much data do I need to build a predictive model? 79
5 Ethics and Legislation 85
5.1 A brief introduction to ethics 86 5.2 Ethics in practice 89
5.3 The relevance of ethics in a Big Data world 90 5.4 Privacy and data ownership 92
5.5 Data security 96 5.6 Anonymity 97 5.7 Decision making 99
6 Types of Predictive Models 104
6.1 Linear models 106
6.2 Decision trees (classifi cation and regression trees) 112 6.3 (Artifi cial) neural networks 114
6.4 Support vector machines (SVMs) 118 6.5 Clustering 120
6.6 Expert systems (knowledge-based systems) 122 6.7 What type of model is best? 124
6.8 Ensemble (fusion or combination) systems 128 6.9 How much benefi t can I expect to get from using an
ensemble? 130
6.10 The prospects for better types of predictive models in the future 131
7 The Predictive Analytics Process 134
7.1 Project initiation 135 7.2 Project requirements 138
7.3 Is predictive analytics the right tool for the job? 142 7.4 Model building and business evaluation 143 7.5 Implementation 145
9781137379276_01_prexii.indd viii
9781137379276_01_prexii.indd viii 5/12/2014 8:52:46 PM5/12/2014 8:52:46 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay
Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17
ix
Contents
7.6 Monitoring and redevelopment 149
7.7 How long should a predictive analytics project take? 154
8 How to Build a Predictive Model 157
8.1 Exploring the data landscape 158
8.2 Sampling and shaping the development sample 159 8.3 Data preparation (data cleaning) 162
8.4 Creating derived data 163 8.5 Understanding the data 164
8.6 Preliminary variable selection (data reduction) 165 8.7 Pre-processing (data transformation) 166
8.8 Model construction (modeling) 170 8.9 Validation 171
8.10 Selling models into the business 172 8.11 The rise of the regulator 176
9 Text Mining and Social Network Analysis 179
9.1 Text mining 179
9.2 Using text analytics to create predictor variables 181 9.3 Within document predictors 181
9.4 Sentiment analysis 184
9.5 Across document predictors 185 9.6 Social network analysis 186 9.7 Mapping a social network 191
10 Hardware, Software and All that Jazz 194
10.1 Relational databases 197 10.2 Hadoop 200
10.3 The limitations of Hadoop 202
10.4 Do I need a Big Data solution to do predictive analytics? 203
10.5 Soft ware for predictive analytics 206
Appendix A. Glossary of Terms 209
Appendix B. Further Sources of Information 218 Appendix C. Lift Charts and Gain Charts 223 Notes 227
Index 246
9781137379276_01_prexii.indd ix
9781137379276_01_prexii.indd ix 5/12/2014 8:52:46 PM5/12/2014 8:52:46 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay
Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17
preview.html[22/12/2014 16:51:21]
You have reached the end of the preview for this book / chapter.
You are viewing this book in preview mode, which allows selected pages to be viewed without a current Palgrave Connect subscription. Pages beyond this point are only available to subscribing institutions. If you would like access the full book for your institution please:
Contact your librarian directly in order to request access, or; Use our Library Recommendation Form to recommend this book to your library
(http://www.palgraveconnect.com/pc/connect/info/recommend.html), or;
Use the 'Purchase' button above to buy a copy of the title from
http://www.palgrave.com or an approved 3rd party.
If you believe you should have subscriber access to the full book please check you are accessing Palgrave Connect from within your institution's network, or you may need to login via our Institution / Athens Login page: (http://www.palgraveconnect.com/pc/nams/svc/institutelogin?
target=/index.html).
Please respect intellectual property rights
This material is copyright and its use is restricted by our standard site license terms and conditions (see
http://www.palgraveconnect.com/pc/connect/info/terms_conditions.html). If you plan to copy, distribute or share in any format including, for the avoidance of doubt, posting on websites, you need the express prior permission of Palgrave Macmillan. To request permission please contact