Ethical Issues in Data Mining
Mandana Mir Moftakhari
PhD Student at Hacettepe University, Department of Information Management. Email: [email protected]
Güleda Doğan
PhD Student & Research Assistant of Hacettepe University Department of Information Management. Email: [email protected]
We will discuss about:
Big Data
Knowledge Discovery
Data mining
Big Data !
Data overload is a serious problem that has
been grown by technical advances.
Human beings have to cope with such
overwhelming amounts of data and manage it in
order to obtain relevant information and
Big Data !
Organizations have to overcome with massive data volume to achieve opportunities for:
better decision-making
knowledge Discovery in Databases
as a Solution
knowledge discovery in databases (KDD) is “the nontrivial process of identifying
valid
novel
potentially useful and
ultimately understandable patterns in data”
KDD Involves Different Steps
Selection Preprocessing Transformation Data mining Interpretation or evaluation (Fayyad et al., 1996)What Is Data Mining?
Data mining as the center process of KDD is
“the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner.”
Data mining Using Areas
Customer service support
Prediction
Estimation
Forecasting
Decision support
Data Mining Process
Identifying the aim areas
Determining sources of data
Gathering and cleaning the data into a data
warehouse
Choosing proper analyzing tools
Finding new patterns
Prepare reports and implementing the
Ethical issues in Data mining
Individuals not only expect qualified services,
but also they require high level privacy and security of their personal details.
These issues cannot be overlooked because of
their consequences and effects on consumers, individuals and society.
Ethical issues
Privacy
Data accuracy
Database security
Privacy Threats
Many consumers feel that their privacy is violated by information-gathering practices.
Secondary use of the personal information
Handling misinformation
Secondary Use of the Personal
Information
Recent surveys on privacy show a great concern about the use of personal data for purposes other than the one for which data has been collected.
Handling Misinformation
Misinformation can cause serious and long-term damage, so individuals should be able challenge the correctness of data about themselves.
Granulated Access to Personal
Information
The access to personal data should be on a need-to-know basis, and limited to relevant information only.
Type of data
Some types of personal information are seen as being more sensitive than others.
What complicates this issue is that sensitivity level varies according to the individual.
Database security
Database security inhibits the unauthorized dissemination of personal data.
Data accuracy
Collected data have originated from many diverse, possibly external, sources.
Might be noisy, obsolete, inaccurate, or
incomplete
Not enough new
Different from the present situation of
Consent
The purpose of data mining is to discover new insights and new uses for the information that companies already have.
This makes it nearly impossible to allow the consumer to have the right of giving informed consent for each use of his data.
Conclusion and Recommendations
Data mining is the process of searching in order to discover relationships between data sets and find useful information. Ethical issues should be observed in all steps of the process.
Conclusion and Recommendations
Consider the expectations of the customers
Develop a customer-oriented privacy policy
Research and understand all laws that may
have jurisdiction over sensitive data
Control access to data warehouses
Give customers more control over their data
References
American Library Association. (1995). Code of ethics of the American Library Association.
Bhambri, V. &Gagandeep, (2012).Coexistence of data mining and privacy of data.International Journal of
Research in IT & Management,2(2).
Brankovic, L., &Estivill-Castro, V. (1999, July). Privacy issues in knowledge discovery and data mining. In Australian institute of computer ethics conference. 89-99.
Cary, C., Wen, H.J. &Mahatanankoon, P. (2003).Data mining: consumer privacy, ethical policy, and systems development practices. Human Systems Management, 22(4), 157–168.
Cavoukian, A. (1998). Data mining: staking a claim on your privacy.
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P. &Uthurusamy, R.. (1996). Advances in knowledge discovery
and data mining. Cambridge, Menlo Park, Calif.: AAAI Press.Retrieved on May ,10, 2014, from
http://www.amazon.ca/gp/product/0262560976
Fayyad, U. M., Piatetsky-Shapiro, G., & Smyth, P. (1996). From data mining to knowledge discovery in databases.AI Magazine, 17, 37–54.
Nicholson, S. (2003). The bibliomining process: Data warehousing and data mining for library decision-making.
Information Technology and Libraries, 22(4), 146-151.
Nicholson, S. & Stanton, J. (2003).Gaining strategic advantage through bibliomining: Data mining for management decisions in corporate, special, digital, and traditional libraries.In Nemati, H. &Barko, C. (Eds.).Organizational data mining: Leveraging enterprise data resources for optimal performance. Hershey, PA: Idea Group Publishing. 247-262.
Payne, D. &. Trumbach, C. C. (2009). Data mining: proprietary rights, people and proposals, Business Ethics
Quarterly, vol.
18(3).