• No results found

Big Data Mining

N/A
N/A
Protected

Academic year: 2021

Share "Big Data Mining"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)
(2)

Predictive Analytics, Data Mining and Big Data

9781137379276_01_prexii.indd i

9781137379276_01_prexii.indd i 5/12/2014 8:52:45 PM5/12/2014 8:52:45 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay

Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17

(3)

This page intentionally left blank

10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay

Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17

(4)

Predictive Analytics,

Data Mining and

Big Data

Myths, Misconceptions and Methods

Steven Finlay

9781137379276_01_prexii.indd iii

9781137379276_01_prexii.indd iii 5/12/2014 8:52:45 PM5/12/2014 8:52:45 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay

Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17

(5)

© Steven Finlay 2014

All rights reserved. No reproduction, copy or transmission of this publication may be made without written permission.

No portion of this publication may be reproduced, copied or transmitted save with written permission or in accordance with the provisions of the Copyright, Designs and Patents Act 1988, or under the terms of any licence permitting limited copying issued by the Copyright Licensing Agency, Saffron House, 6–10 Kirby Street, London EC1N 8TS.

Any person who does any unauthorized act in relation to this publication may be liable to criminal prosecution and civil claims for damages. The author has asserted his right to be identified as the author of this work in accordance with the Copyright, Designs and Patents Act 1988. First published 2014 by

PALGRAVE MACMILLAN

Palgrave Macmillan in the UK is an imprint of Macmillan Publishers Limited, registered in England, company number 785998, of Houndmills, Basingstoke, Hampshire RG21 6XS.

Palgrave Macmillan in the US is a division of St Martin’s Press LLC, 175 Fifth Avenue, New York, NY 10010.

Palgrave Macmillan is the global academic imprint of the above companies and has companies and representatives throughout the world.

Palgrave® and Macmillan® are registered trademarks in the United States, the United Kingdom, Europe and other countries.

ISBN 978–1–137–37927–6

This book is printed on paper suitable for recycling and made from fully managed and sustained forest sources. Logging, pulping and manufacturing processes are expected to conform to the environmental regulations of the country of origin.

A catalogue record for this book is available from the British Library. A catalog record for this book is available from the Library of Congress.

Typeset by MPS Limited, Chennai, India.

9781137379276_01_prexii.indd iv

9781137379276_01_prexii.indd iv 5/12/2014 8:52:46 PM5/12/2014 8:52:46 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay

Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17

(6)

To Ruby and Samantha

9781137379276_01_prexii.indd v

9781137379276_01_prexii.indd v 5/12/2014 8:52:46 PM5/12/2014 8:52:46 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay

Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17

(7)

This page intentionally left blank

10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay

Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17

(8)

vii

Figures and Tables x Acknowledgments xii

1 Introduction 1

1.1 What are data mining and predictive analytics? 2 1.2 How good are models at predicting behavior? 6 1.3 What are the benefi ts of predictive models? 7 1.4 Applications of predictive analytics 9

1.5 Reaping the benefi ts, avoiding the pitfalls 11 1.6 What is Big Data? 13

1.7 How much value does Big Data add? 16 1.8 The rest of the book 19

2 Using Predictive Models 21

2.1 What are your objectives? 22 2.2 Decision making 23

2.3 The next challenge 31 2.4 Discussion 34

2.5 Override rules (business rules) 36

3 Analytics, Organization and Culture 39

3.1 Embedded analytics 40 3.2 Learning from failure 42 3.3 A lack of motivation 43 3.4 A slight misunderstanding 45 3.5 Predictive, but not precise 50 3.6 Great expectations 52

3.7 Understanding cultural resistance to predictive analytics 54 3.8 The impact of predictive analytics 60

Contents

9781137379276_01_prexii.indd vii

9781137379276_01_prexii.indd vii 5/12/2014 8:52:46 PM5/12/2014 8:52:46 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay

Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17

(9)

viii Contents

3.9 Combining model-based predictions and human judgment 62

4 The Value of Data 65

4.1 What type of data is predictive of behavior? 66 4.2 Added value is what’s important 70

4.3 Where does the data to build predictive models come from? 73

4.4 The right data at the right time 76

4.5 How much data do I need to build a predictive model? 79

5 Ethics and Legislation 85

5.1 A brief introduction to ethics 86 5.2 Ethics in practice 89

5.3 The relevance of ethics in a Big Data world 90 5.4 Privacy and data ownership 92

5.5 Data security 96 5.6 Anonymity 97 5.7 Decision making 99

6 Types of Predictive Models 104

6.1 Linear models 106

6.2 Decision trees (classifi cation and regression trees) 112 6.3 (Artifi cial) neural networks 114

6.4 Support vector machines (SVMs) 118 6.5 Clustering 120

6.6 Expert systems (knowledge-based systems) 122 6.7 What type of model is best? 124

6.8 Ensemble (fusion or combination) systems 128 6.9 How much benefi t can I expect to get from using an

ensemble? 130

6.10 The prospects for better types of predictive models in the future 131

7 The Predictive Analytics Process 134

7.1 Project initiation 135 7.2 Project requirements 138

7.3 Is predictive analytics the right tool for the job? 142 7.4 Model building and business evaluation 143 7.5 Implementation 145

9781137379276_01_prexii.indd viii

9781137379276_01_prexii.indd viii 5/12/2014 8:52:46 PM5/12/2014 8:52:46 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay

Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17

(10)

ix

Contents

7.6 Monitoring and redevelopment 149

7.7 How long should a predictive analytics project take? 154

8 How to Build a Predictive Model 157

8.1 Exploring the data landscape 158

8.2 Sampling and shaping the development sample 159 8.3 Data preparation (data cleaning) 162

8.4 Creating derived data 163 8.5 Understanding the data 164

8.6 Preliminary variable selection (data reduction) 165 8.7 Pre-processing (data transformation) 166

8.8 Model construction (modeling) 170 8.9 Validation 171

8.10 Selling models into the business 172 8.11 The rise of the regulator 176

9 Text Mining and Social Network Analysis 179

9.1 Text mining 179

9.2 Using text analytics to create predictor variables 181 9.3 Within document predictors 181

9.4 Sentiment analysis 184

9.5 Across document predictors 185 9.6 Social network analysis 186 9.7 Mapping a social network 191

10 Hardware, Software and All that Jazz 194

10.1 Relational databases 197 10.2 Hadoop 200

10.3 The limitations of Hadoop 202

10.4 Do I need a Big Data solution to do predictive analytics? 203

10.5 Soft ware for predictive analytics 206

Appendix A. Glossary of Terms 209

Appendix B. Further Sources of Information 218 Appendix C. Lift Charts and Gain Charts 223 Notes 227

Index 246

9781137379276_01_prexii.indd ix

9781137379276_01_prexii.indd ix 5/12/2014 8:52:46 PM5/12/2014 8:52:46 PM 10.1057/9781137379283 - Predictive Analytics, Data Mining and Big Data, Steven Finlay

Cop yright material fr om www .palgra veconnect.com - licensed to npg - P algra veConnect - 2016-02-17

(11)

preview.html[22/12/2014 16:51:21]

You have reached the end of the preview for this book / chapter.

You are viewing this book in preview mode, which allows selected pages to be viewed without a current Palgrave Connect subscription. Pages beyond this point are only available to subscribing institutions. If you would like access the full book for your institution please:

Contact your librarian directly in order to request access, or; Use our Library Recommendation Form to recommend this book to your library

(http://www.palgraveconnect.com/pc/connect/info/recommend.html), or;

Use the 'Purchase' button above to buy a copy of the title from

http://www.palgrave.com or an approved 3rd party.

If you believe you should have subscriber access to the full book please check you are accessing Palgrave Connect from within your institution's network, or you may need to login via our Institution / Athens Login page: (http://www.palgraveconnect.com/pc/nams/svc/institutelogin?

target=/index.html).

Please respect intellectual property rights

This material is copyright and its use is restricted by our standard site license terms and conditions (see

http://www.palgraveconnect.com/pc/connect/info/terms_conditions.html). If you plan to copy, distribute or share in any format including, for the avoidance of doubt, posting on websites, you need the express prior permission of Palgrave Macmillan. To request permission please contact

References

Related documents

Students have the responsibility to work through examples in the assignments and in class discussions or lectures and to ask questions if they do not understand concepts

Another process inoles producing acrylic acid from propylene through acrolein as an Another process inoles producing acrylic acid from propylene through

Chair, Regis University Task Force on Intellectual Property and Copyright, 2005— Chair, Shared Collection Development Committee, Colorado Alliance, 2004— 2009 Secretary/Board

Try Scribd FREE for 30 days to access over 125 million titles without ads or interruptions.. Start

The Insertion of Corrupt Software Into Machines Prior to Election Day, Wireless and other Remote Control Attacks, Attacks on Tally Servers, Mis- calibration of Machines, Shut off

No part of this publication may be copied, reproduced or distributed in any form without express written permission from

Notice: No material in this publication may be reproduced, stored in a retrieval system, or transmitted by any means, in whole or in part, without the express written permission

No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording without the written permission of the