Advanced Data Mining Models: Scalability and Biological Application TIN C02

(1)

Advanced Data Mining Models: Scalability

and Biological Application

TIN2007-68084-C02

Jesús S. Aguilar-Ruiz

*

U. Pablo de Olavide, de Sevilla

José C. Riquelme

**

U. de Sevilla

Abstract

The main objective of this project is to apply the current software technology to large, heterogeneous data bases with the aim of generating applicable knowledge in general, in science, business and industry and, particularly, in genomics. The information is more and more complex, distributed and, above all, massive. Extracting useful, understandable and beneficial knowledge for organizations and companies entails an important challenge for research, encouraged by the recent interest shown by industrial applications and by the exponential growth of biological information.

Keywords: Data Mining, Bioinformatics.

1 Project Objectives

The two groups applying for this project have a common research line aimed at the development of Data Mining (DM) techniques, and in general, on heuristics used in DM real-world problems: decision rule extraction, attribute selection, clustering, etc. During the last years, we have developed a set of heuristics capable of dealing with huge amount of data. In this project we aim at continuing on this research direction from different perspectives:

1. [Program TIN, topic 3, Objectives 3.3 and 3.4]. Development of new heuristics that can obtain results in DM areas that are still open, and that are characterized by data of huge dimensionality and cardinality: biclustering, influence networks, feature selection, visualization, result validation, etc.

1.1. Finding relevant patterns with data clustering and biclustering.

1.2. Defining association models that can establish relations among attributes. 1.3. Identifying relevant characteristics using ranking search heuristics.

1.4. Applying optimization techniques for evaluating the quality of the found patterns: numeric, heuristics and multi-objectives.

*_Email:

[email protected] **_Email:

(2)

2. [Program TIN, topic 7, objective 7.3]. Sequential computation implies some limitations on the dimension of data that can be handled in DM. Nowadays the development of heuristics based on multiprocessor architectures allows dealing with DM problems that, due to their computational cost, are suitable for being implemented on high performance machines.

2.1. Designing a set of algorithms for optimization heuristics, especially evolutionary algorithms that can be run on multiprocessor architectures.

2.2. Adapting the algorithms previously developed by the group in order to be parallelized. 3. [Program Biotechnology, topic 4, objective 4.5]. Bioinformatics is a research field that deals

with data coming from biology and computer science techniques. We aim at further developing on specific objectives: genetic regulation networks, biclustering, gene selection and automatic validation.

3.1. Adapting the developed DM techniques to the biological environment.

3.2. Designing a platform for the integration of the adapted techniques and the automatic validation.

3.3. Designing a architecture for the automatic validation of results by retrieving data from remote DB (Gene Ontology and Kyoto Encyclopedia of Genes and Genomes).

To date, the research group is composed by:

Subproject 1 [UPO]: 7 PhDs and 5 non-PhDs. Subproject 2 [US]: 5 PhDs and 8 non-PhDs.

In October 2009, both PIs asked the Ministry for one-year extension (later approved) due to the imbalanced situation in the assessment of the project objectives. Two out of three main goals have been achieved at level of 80%. However, the task related to the adaptation of heuristics in multi-processor architectures is currently ongoing, at level of 30%. The extension will allow us not only to test the new developments, but also to publish the results in case of success. In short, we are convinced that the extension will contribute to achieve the level of quality initially planned.

2 Project Progress and Achievements

In general terms, we are very satisfied with the project goals achieved at the moment. The number and quality of publications, the applicability of results, and the collaborative work (between the subprojects and the research contacts strengthened by the visits), provide us enough confidence to reach a high level of success at the end of the project.

The project outcome is divided into two main groups: Applied Data Mining (it includes the new data mining approaches, so as the application of own techniques to real-world problems) and Bioinformatics (it describes the research directions in advanced status, which report interesting published results).

2.1 Applied Data Mining

Electricity

Two main research directions have been addressed: prediction of demand and price of electric energy, and planning of energy production. The nearest neighbor technique [2] and a new approach based on patterns of frequency similarity were applied to electricity time series [24, 30]. Firstly, data

(3)

is labeled by means of clustering, and then labels are used to search for similar patterns in the past. A variant of this approach [31] is the discovering of frequent episodes in order to estimate the real values in the future, which has great interest for predicting the demand in the electricity market. Regarding energy production, an evolutionary computation technique was designed to plan the generation of electricity in the short term (24 hours). The equations that define this problem show the non-linear and non-convex features of the system, with high number of continuous and discrete variables. Thus, the evolutionary approach performed accurately to plan the optimum switching on and off of power stations. The application to the Spanish system provides good performance, given the dimensionality and complexity of the problem [6].

Software Engineering and Databases

Data mining techniques have been used to characterize the reliability of software modules. From data stored in public repositories of software project (e.g. PROMISE), the aim is to find rules that drive the search for metrics which inform of programming fails with high probability [21, 25]. The main difficulty is the enormous unbalance in data, as the number of records that contain fails is proportionally low. An improved version of the SMOTE algorithm [38] was implemented to tackle with this issue [49].

In addition, we have developed an indexing technique for fuzzy numerical databases [8] and an access mechanism based on B+-trees to increase retrieval performance of fuzzy databases when querying using necessity-measured flexible conditions [11].

Environmental Data

In collaboration with the Andalusia Environmental Council, we are working on remote sensing and atmospheric pollution. Satellite images and LIDAR (Light Detection and Ranging) sensors provide information about the structure of terrain, including height and vegetation, which determine the use of land. An approach to find the Digital Terrain Model (DTM) by clustering LIDAR data was presented in [46]. A refined model by building and selecting relevant attributes from LIDAR data allows classifying quickly and efficiently the potential use of land, with no need of multi-spectral images [54]. Later, this technique has been useful to analyze the deterioration of natural spaces against the urban advance in Andalusia [65]. Finally, analyzing jointly satellite images with LIDAR data the results were improved [57]. The application to several areas in the regions of Andalusia and Galicia is now under review.

The study of the relationship among climatological (temperature, humidity, wind direction and speed, etc.) and polluting agents (sulfur, carbon monoxide, ozone) is important for the prediction and description of the global climate system. An evolutionary algorithm [53] has been designed to discover association rules with which climate value ranges are related to polluting agents. The prediction approach has been accepted in [17], and an association rule-based method is currently under review [20].

2.2 Bioinformatics

In the field of bioinformatics, we have devoted great effort to design or improve the process of management and analysis of biological information with different targets.

(4)

Applied Soft Computing

Soft Computing techniques have been applied to conceptual clustering, introducing multiobjective optimization to retrieve meaningful substructures from structural databases [9]. The analysis of gene expression has been addressed maximizing accuracy by weighting both sensitivity and specificity functions individually by means of multiobjective optimization [13]. The evaluation of the quality of biclusters is an open problem that we have deal with evolutionary algorithms in order to include several aspects to be optimized, such as volume and deviation, and also to provide a new measure named virtual error [18, 24].

Biological Validation

We have developed several applications for the search for patterns in biological data and for validating the results biologically. A program for the analysis of evolutionary patterns of repetitive non-coding sequences, named satDNA Analyzer, was presented in [10]. Another tool designed for the biological validation of clusters or biclusters was introduced in [14]. This software tool, called CARGEN, is able to characterize set of genes, that were obtained by clustering or biclustering techniques, by means of the analysis of coherent participation of genes in metabolic pathways stored in The Kyoto Encyclopedia of Genes and Genomes (KEGG), a huge database of biological information. In addition, an exhaustive review of gene association analysis, including approaches and typology of association rules, is presented in [15].

3 Assessment of Project Goals

Coordination

The coordination between both subprojects is patent due to the fact that almost two thirds of the publications are jointly authored by members of the two projects. The scientific collaboration is high, mainly due to the nearness, but also taking good advantage from the background knowledge of the members. This strength is visible throughout the project.

Originality of results

The originality and relevancy of the results are shown by the external evaluation of scientific production. The project has generated to date 25 international journals (17 already published, 4 in press, and 4 submitted), 33 international conferences (6 of which are highly top conferences), 8 national conferences, 17 papers in national workshops, 9 book chapters and 2 proceedings editions.

The quality of journal articles is high (17 out of 21 accepted papers are indexed by the Journal Citation Report, such as IEEE TEC, IEEE TKDE, Bioinformatics or Briefings in Bioinformatics). Out of the 33 papers in international conferences, 6 are in top conferences, such as IEEE ICDM or FUZZ-IEEE.

Table 1 summarizes the scientific production of the two subprojects, and also includes the effect of the appropriate coordination by separating publications in which both subprojects collaborate.

(5)

Authorship Int’l Journals Top Conferences Int’l Conferences National Conferences Nat’l WS Books & Chapters Subproject 1 8 [7,1,0] 1 5 1 0 3 Subproject 2 4 [3,0,1] 0 1 2 1 4 Joint 13 [7,3,3] 5 21 5 16 4 Total 25 [17,4,4] 6 27 8 17 11

Table 1. Summary of publication record. The column International Journals show four numbers “A [B,C,D]”. A is the total number of journal publications; B is the number of published articles in journals indexed by JCR; C is the number of published articles not indexed by JCR; and D is the number of papers currently submitted. The impact factors of the 17 journal articles are included in the References section, at the end of this document.

Training

From the beginning both subprojects have incorporated new members: subproject 1 (D. de Acuña, PhD in Physics; C. Rubio, PhD in Computer Science; and M. Martínez, MSc in Computer Science); subproject 2 (2 MSc in Computer Science will be included shortly: G. Asencio and A. Márquez). The PhD dissertations finished during the period are: Fuzzy Object-Relational Databases: model, architecture and applications (Carlos D. Barranco González) and Pattern Sequence Analysis to Forecast Time Series (Francisco Martínez Alvarez), both with the European distinction.

The senior members of the group highly drive to junior members to visit outstanding research centers. This aspect is carefully fostered because the impact on education and quality of results might be considerable. In this sense, several research visits have been done:

C. D. Barranco. With Sven Helmer at Birkbeck College, London University (7/1/2008 – 9/30/2008). The aim was to study the index structures for advanced fuzzy database management systems.

B. Pontes. With Elena Marchiori at the Department of Information and Knowledge Systems (IRIS), Radboud University, Nijmegen, Netherlands (8/31/2008-3/10/2008 and 6/1/2009-7/31/2009). The aim was to analyze in depth the protein-protein interaction networks by means of clustering techniques that include biological information in the building process.

F. Martínez. With Jean-Francoise Boulicaut at the School of Engineering (INSA) in Lyon, France (6/15/2008-9/15/2008). The goal was to study techniques for mining frequent episodes as a previous step for predicting time series.

I. Nepomuceno. With Francisco Azuaje at the Laboratory of Cardiovascular Research of the Public Research Centre for Health (CRP-Santé), Luxemburg (8/1/2009-9/30/2009). The goal was to improve the methodology for the analysis of gene expression data and the extraction of gene networks. As a result, a new method has been develop to predict the risk of high/low injection fraction in patients with heart attack.

J. García. With Luis Gonçalves Seco, at the University of Porto, Portugal (6/1/2009-9/1/2009). The aim was to analyze the state-of-the-art in full-ware LIDAR data processing, and to evaluate several intelligent techniques to be applied in the environmental realm.

D. S. Rodriguez, Federico Divina and Jesús S. Aguilar. With Takashi Gojobori, at the National Institute of Genetics, Japan (11/15/2009-12/1/2009). The aim was to start up the research collaboration in metagenomics.

(6)

Research Collaborations

Apart from the strong collaborations with researchers whom members of the group have done research with during their visits, there are other national and international groups that maintain common interests.

The National Network of Data Mining (TIN2006-27675-E), leaded by José C. Riquelme, allowed us to collaborate with several members of the network. The Excellence Project MINDAT-Plus, funded by Junta de Andalucia, helps work closely with Andalusia research groups. Also, there is a close relationship with Igor Zwir, University of Granada, and the project entitled “Identification of Complex Information in Biology: from data collections to Knowledge based on Self-Organized Maps”.

In the international realm, the members of the project belong to the network CYTED EUREKA IBEROAMERICA-Iberoamerican Network for the discovering of knowledge, 507RT0325. The group is also involved in an International Action, funded by the Ministry of Science and Innovation, and it entails a great opportunity to approach to outstanding research in the DDBJ (DNA Data Base of Japan), at the National Institute of Genetics, Mishima, Japan. This research center is one out three in the world that manage all the DNA information.

Organization Tasks

The groups are active in organization tasks. The following workshops have been (or are being) organized by members of the projects:

J.C. Riquelme, R. Ruiz and D. Rodríguez. Workshop on Decision-making support systems in Software Engineering, held in the Spanish Conference on Software Engineering and Data Bases (JISBD). Two editions: October, 2008 (Gijón) and September 2009 (San Sebastián).

D. Rodríguez, R. Ruiz. Workshop on Software Engineering for Large Scale Computing (SELSC), Int. Conference on Computational Science (ICCS), Kraków (Poland) June2008.

R. Giráldez, A. Troncoso. Evolutionary Algorithms in Bioinformatics, held in VI Spanish Conference on Metaheuristics, Evolutionary and Bio-Inspired Algorithms (MAEB), Málaga, February 2009.

N. Díaz-Díaz y D.S. Rodríguez-Baena. Spanish Workshop on the Knowledge Discovery and Validation in Biomedical Data Bases (EVABIO), held in the Spanish Conference in Artificial Intelligence (CAEPIA). Two editions: Salamanca, Nov 2007, and Sevilla, Nov 2009.

A. Troncoso y M. Arias. Workshop on Mining Non-Conventional Data (MINCODA), held in the Spanish Conference in Artificial Intelligence (CAEPIA), in Sevilla, November 2009.

R. Giráldez, A. Troncoso, R. Ruiz. II Workshop on Intelligent Systems and Data Mining Techniques for Bioinformatics, held in the 9th International Conference on Intelligent Systems Design and applications - ISDA), Pisa, December, 2009.

F. Divina and R. Giráldez. Evolutionary Computation Track in the 25h ACM Symposium on Applied Computing - SAC 2010 Sierre (Switzerland), March 2010.

A. Troncoso and J.C. Riquelme. Symposium on Theory and Applications of Data Mining (TAMIDA), to be hold in the Spanish Conference on Computer Science (CEDI), Valencia, September 2010.

(7)

Researchers of the groups have participated as Program Committee members of the main Spanish Computer Science conferences: CAEPIA, JISBD, MAEB, and several international conferences: ICML, PPSN, GECCO, HAIS, IDEAL, ISDA, CEC, etc.

The project coordinator, Jesús S. Aguilar has become Editor-in-Chief of the BioData Mining Journal (BioMed Central Pub) [22]. Group members have collaborated in reviewing tasks for international journals: IEEE Transactions on System, Man and Cybernetics, International Journal of Information Technology & Decision Making, Pattern Recognition, Soft Computing, Information Fusion, International Journal of Computer Mathematics, Fuzzy Sets and Systems, Bioinformatics, BMC Bioinformatics, Functional and Integrative Genomics, Journal of Bioinformatics and Computational Biology, Pattern Recognition Letters, and Data Mining and Knowledge Discovery.

Finally, both principal investigators have participated in several Evaluation Commissions of Research Projects and Fellowships: European Commission (FP7), National Commission for Project Evaluation (TIN), Ramon y Cajal, Juan de la Cierva and Torres Quevedo Programs.

Socioeconomic environment

The group is very concerned about the socioeconomic relationships with the regional environment, and is collaborating with the Andalusia Environmental Council, Junta de Andalucía, in topics related to pollution [3, 17]. Also, the group is collaborating with EGMASA, public company, on the analysis of variation of land uses in Andalusia [65].

The group is truly concerned about technology transfer, and it is putting a great effort on this aspect since 2008. At the moment, we have signed a contract with TB-Solutions for the project “Transporte Inteligente de Mercancía Intermodal (TIMI)”, which is placed in a framework of strategic development for the knowledge transfer to industry. This contract (39.000€) is part of a CENIT project.

The research group has become a member of REDIAM, the Network of Environmental Information, in June 2008, which is helping advance in the analysis of environmental data.

Project Management

In our humble opinion, the project is achieving the initial objectives. In general, the management of the project is being easy, due to the willingness, reliability and performance of the members. Slight adjustments in tasks or chronogram are considered appropriately during the periodical meetings.

4 References

Journals indexed by JCR

1. [IF=3.73] Aguilar-Ruiz JS, Giráldez R, Riquelme JC, “Natural Encoding for Evolutionary Supervised Learning”, IEEE Transactions on Evolutionary Computation, Vol. 11, n. 4, pp. 466-479, 2007.

2. [IF=1.87] Troncoso A, Riquelme JM et al., “Electricity market price forecasting based on weighted nearest neighbors techniques”, IEEE Transactions on Power Systems, Vol. 22, nº 3, pp.1294-1301, 2007.

(8)

3. [IF=1.03] Aroba J, Grande JA, Andujar JM, de la Torre ML, Riquelme JC, “Application of fuzzy logic and data mining techniques as tools for qualitative interpretation of acid mine drainage processes”, Environmental Geology. vol. 53, n. 1, pp. 135-145, 2007.

4. [IF=0.61] Ruiz R, "New heuristics in feature selection for high dimensional data", AI Communications. Nº 20, pp. 129 - 131, 2007

5. [IF=0.20] Gama J, Rodrigues P, Aguilar-Ruiz JS, “An overview on learning from data streams”, New Generation Computing, Vol. 25 (1), pp. 1-4, 2007.

6. [IF=1.63] Troncoso A, Riquelme JC, Aguilar-Ruiz JS, Riquelme JM, “Evolutionary Techniques applied to the optimal short-term scheduling of the electrical energy production”, European J. of Operational Research, Vol. 185, nº 3, 2008.

7. [IF=0.43] Gama J, Aguilar-Ruiz JS, Klinkenberg R, “Knowledge discovery from data streams”, Intelligent Data Analysis, Vol. 12 (3), pp. 251-252, 2008.

8. [IF=1.83] Barranco CD, J.R. Campaña, Juan M. Medina, “A B+-tree based indexing technique for fuzzy numerical data”, Fuzzy Sets and Systems, pp. 1431-1449, 2008.

9. [IF=3.74] Romero-Záliz R, Rubio-Escudero C, J.Perren Cobb, F. Herrera, O. Cordón, I. Zwir. “A multi-objective evolutionary conceptual clustering methodology for gene annotation within structural databases: a case of study on the gene ontology database”. IEEE Transactions on Evolutionary Computation 12(6) 679-701, 2008.

10.[IF=4.33] Navajas-Perez R, C Rubio-Escudero, JL Aznarte, M Ruiz-Rejón and MA Garrido-Ramos. “SatDNA analyzer: a computing tool for satellite-dna evolutionary analysis”. Bioinformatics, 23(6), 767-768, 2008

11.[IF=1.00] Barranco CD, Campaña J.R., Medina JM, “Indexing fuzzy numerical data with a B+ tree for fast retrieval using necessity-measured flexible conditions”, International Journal of Uncertainty, Fuzziness and Knowledge-based Systems, pp. 1-23, 2009.

12.[IF=4.15] Solano E, K von Braun, A. Velasco, D. R. Ciardi, R. Gutiérrez, D. L. McElroy, M. López, M. Abajian, M. García-Torres et al., “The LAEX and NASA Portals for CoRot Public Data”, Astronomy & Astrophysics, Vol. 506, pp. 455-463, 2009.

13.[IF=2.37] Romero-Záliz R, C Rubio-Escudero, I Zwir, C del Val. “Optimization of Multi-classifiers for Computational Biology: Application to gene finding and expression”. Theoretical Chemistry Accounts. Vol. 125, Numbers 3-6, pp. 599-611, 2010.

14.[IF=0.67] Aguilar-Ruiz JS, D. Rodríguez-Baena, N. Diaz-Diaz, I. Nepomuceno, “CARGENE:

Characterization of Set of Genes based on Metabolic Pathway Analysis”, International Journal of Data Mining and Bioinformatics. Accepted, 2010.

15.[IF=4.62] Alves R, Rodriguez-Baena DS, Aguilar-Ruiz JS, “Gene association analysis: a survey of frequent pattern mining from gene expression data”, Briefings in Bioinformatics. Accepted, in press, 2010. 16.[IF=2.24] Martínez-Álvarez F, A Troncoso, JC Riquelme, J.S. Aguilar-Ruiz, “Energy time series

forecasting based on pattern sequence similarity”, IEEE Transactions on Knowledge and Data Engineering (TKDE). Accepted, in press, 2010.

17.[IF=0.62] Martínez-Ballesteros M, F Martínez, A Troncoso, JC Riquelme, “Evolutionary Computation-Based Mining Quantitative Association Rules and its Application to Atmospheric Pollution”, Integrated Computer-Aided Engineering. Accepted, in press, 2010.

18.Divina F, JS Aguilar-Ruiz, B Pontes, R Giraldez, “An effective measure for assessing the quality of biclusters”, IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics. Submitted. 19.García-Gutiérrez J, L Gonçalves Seco, JC Riquelme, “Automatic environmental quality assessment for

(9)

20.Martínez-Ballesteros M, F Martínez-Álvarez, A Troncoso, JC Riquelme, “Discovering Quantitative Association Rules over Time Series Based on Evolutionary Algorithms” Journal of Multiple-Valued Logic and Soft Computing. Submitted.

21.Ruiz R, D Rodríguez, Riquelme JC, JS Aguilar-Ruiz "Searching for Rules to detect Defective Modules". Information Sciences. Submitted.

Other Journals

22.Aguilar-Ruiz JS, JH Moore, MD Ritchie. Filling the gap between biology and computer science, BioData Mining, 1:1, 2008.

23.Ruiz R, Aguilar-Ruiz JS, Riquelme JC, “Best agglomerative ranked subset for feature selection”, Journal Machine Learning Research Workshop and Conference Proceedings, Vol. 4: New challenges for feature selection in data mining and knowledge discovery, pp. 148-162, 2008.

24.Pontes B., F. Divina, R. Giraldez, J. Aguilar-Ruiz, “Improved Biclustering on Expression Data through Overlapping Control”. International Journal of Intelligent Computing and Cybernetics, Vol. 2 nº 3, pp. 477-493, 2009.

25.Riquelme J.C., R. Ruiz, D. Rodríguez, J. S. Aguilar-Ruiz, “Finding Defective Software Modules by Means of Data Mining Techniques”. IEEE Latin America Transactions, Volume: 7, Issue: 3, pp. 377-382, 2009.

Top International Conferences

26.Divina F, Aguilar-Ruiz JS, “A Multi-Objective Approach to Discover Biclusters in Microarray Data”, Proc. of GECCO ’07, pp. 385- 392, 2007.

27.Martínez-Álvarez F, Troncoso A, Riquelme JC y Aguilar-Ruiz JS, “Detection of Microcalcifications in Mammographies Based on Linear Pixel Prediction and Support-Vector Machines”. IEEE Int. Conf. on Computer-Based Medical Systems, pp. 141-147, 2007.

28.Giraldez R, F Divina, B Pontes, JS Aguilar-Ruiz, “Evolutionary Search of Biclusters by Minimal Intrafluctuation”, Proc. IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE), pp. 1751-1756, 2007. 29.Rodríguez D, Ruiz R, Cuadrado-Gallego J, Aguilar-Ruiz J, "Detecting fault modules applying feature

selection to classifiers", IEEE International Conference on Information Reuse and Integration (IRI), pp. 667-672, 2007.

30.Martínez-Álvarez F, Troncoso A, Riquelme JC, Aguilar-Ruiz JS, “LBF: A Labeled-Based Forecasting Algorithm and its Application to Electricity Price Time Series”, Proceedings of the Eighth IEEE International Conference on Data Mining, ICDM, pp. 453-461, 2008.

31.Martínez-Álvarez F., A.Troncoso, J.C. Riquelme, “Improving time series forecasting by discovering frequent episodes in sequences”, International Symposium on Intelligent Data Analysis (IDA) LNCS, Vol. 5772, pp. 357-368, 2009.

International Conferences

32.Martínez-Álvarez F, Troncoso A, Riquelme JC y Riquelme JM. “Discovering

Patterns in Electricity Price Using Clustering Techniques”. Int. Conf. on Renewable Energy and Power Quality (ICREPQ), pp. 67-68, 2007.

33.Martínez-Álvarez F, Troncoso A, Riquelme JC y Riquelme JM. , “Partitioning-clustering techniques applied to electricity price time series”. Proc. of 8th IC Intell. Data Eng. and Autom. Learning (IDEAL), LNCS. 4881, pp. 990-999, Springer, 2007.

34.Nepomuceno JA, Troncoso-Lora A, Aguilar-Ruiz J, Garcia-Gutiérrez J, "Biclusters Evaluation Based on Shifting and Scalling Patterns", Proc. of 8th Int. Conf. on Intell. Data Eng. and Autom. Learning (IDEAL), LNCS. Vol. 4881, pp. 840-849, 2007.

(10)

35.Nepomuceno-Chamorro I, JS Aguilar-Ruiz, N. Díaz-Díaz, DS Rodríguez-Baena, J. García, "A

Deterministic Model to Infer Gene Networks from Microarray Data",

Proc. of 8th Int. Conf. on Intelligent Data Engineering and Automated Learning (IDEAL), LNCS. Vol. 4881, pp. 850-859, Springer-Verlag, 2007.

36.Rodríguez D, Aguilar J, Díaz N, Nepomuceno I "Discovering Alpha-Patterns In Gene Expression Data", Proc. of 8th Int. Conf. on Intell. Data Engin. and Automated Learning (IDEAL), LNCS Vol. 4881, pp. 831-839, Springer-Verlag, 2007.

37.Pontes B, F Divina, R Giráldez, JS. Aguilar-Ruiz, “Virtual Error: A New Measure for Evolutionary Biclustering”, Proc. 5th Eur. Conf. on Evol. Comp. Mach. Learning and Data Mining in Bioinformatics (EvoBio), LNCS vol. 4447, pp. 217-226, 2007.

38.Rodríguez D, Ruiz R, Cuadrado-Gallego J, Aguilar-Ruiz J, Garre M, "Attribute

selection in software engineering datasets for detecting fault modules", , 33rd EUROMICRO Conf. on Soft. Eng. and Advanced Appl., pp. 418 – 424, 2007.

39.Pontes B, F Divina, R Giráldez, JS Aguilar-Ruiz, “A novel approach for avoiding overlapping among biclusters in expression data”, 8th International Conference on Hybrid Intelligent Systems (HIS), pp. 813-818, 2008.

40.Rubio-Escudero C, Martinez-Alvarez F, Romero-Zaliz R, Zwir, I,"Classification of Gene Expression Profiles: Comparison of K-means and Expectation Maximization Algorithms," Eighth International Conference on Hybrid Intelligent Systems (HIS), pp. 831-836, 2008.

41.Nepomuceno I, Aguilar-Ruiz JS, “DMRT to Infer gene-gene interactions from Microarray Data” European Science Foundation Conference in Biomedicine. System Biology. Poster. 2008.

42.Barranco C.D., J. R. Campaña, J.M. Medina, “A B+-tree based indexing technique for necessity measured flexible conditions on fuzzy numerical data”, IPMU`08 : Information Processing and Management of Uncertainty in Knowledge-based Systems, pp. 1717-1724, 2008.

43.Armañanzas R, Y. Saeys, I. Inza, M. García-Torres, Y. Van de Peer, C. Bielza and P. Larrañaga. Mass spectrometry data analysis: it's all in the preprocessing. Benelux Bioinformatics Conference (BBC), pp. 92, 2008.

44.Ruiz R, Riquelme JC, Aguilar-Ruiz JS, “Best agglomerative ranked subset for feature selection”, WS. Feature Selection in Data Mining ECML-PKDD, pp. 124-135, 2008.

45.Troncoso A, F. Martinez-Alvarez, JC Riquelme, JS Aguilar-Ruiz, “Advanced Techniques Applied To Forecast Energy Time Series”, CLAIO XIV Latin Ibero-American Congress on Operations Research. I EUREKA Workshop on Knowledge Discovery, Knowledge Management and Decision Making, 2008. 46.García-Gutiérrez J, F Martínez-Álvarez, D Laguna-Ruiz, JC Riquelme, “Remote Mining: from clustering

to DTM”, Proc. 8th International Conference on LiDAR Applications in Forest Assessment and Inventory, SilviLaser ’08, pp. 389 – 397, 2008.

47.Barranco C.D., Sven Helmer, “Increasing the performance of fuzzy retrieval using impact ordering”, Proc. 13th World Congress Int. Fuzzy Sys. Assoc. and 6th Conf. of the Eur. Soc. for Fuzzy Logic and Tech. (IFSA-EUSFLAT), pp. 957-962, 2009.

48.Barranco CD, Jesús R. Campaña, Juan M. Medina, “Flexible retrieval of x-ray images based on shape descriptors using a fuzzy object-relational database”, Proc. 13th World Congress of the Int Fuzzy Sys. Assoc. and 6th Conf. of the Eur. Soc. for Fuzzy Logic and Tech. (IFSA-EUSFLAT), pp. 903-908, 2009. 49.Rodríguez D., J.C. Riquelme, R. Ruiz, J.S. Aguilar-Ruiz, “Searching for Rules to find Defective Modules in

Unbalanced Data Sets”, 1st International Symposium on Search Based Software Engineering, pp.89-92, 2009.

(11)

50.Nepomuceno JA, A. Troncoso, J Aguilar-Ruiz, “A Hybrid Metaheuristic for Biclustering based on Scatter Search and Genetics Algorithms”, Pattern Recognition in Bioinformatics (PRIB), Lecture Notes in Bioinformatics 5780, pp. 199-210, 2009.

51.Nepomuceno JA, A. Troncoso, J Aguilar-Ruiz, “An Overlapping Control–Biclustering Algorithm from Gene Expression Data”. 9th International Conference on Intelligent Systems Design and Applications, (ISDA) pp. 1239–1244, 2009.

52.Nepomuceno I, F. Azuaje, P.V. Nazarov, A. Muller, Y. Devaux, L. Vallar, J.S. Aguilar, D.R. Wagner, “Supervised prediction of heart failure through transcriptional association networks”, Benelux Bioinformatics Conference, BBC ’09.

53.Martínez-Ballesteros M, F Martínez-Álvarez, A. Troncoso, J.C. Riquelme, “Quantitative association rules applied to climatological time series forecasting”, International Conference on Intelligent Data Engineering and Automated Learning (IDEAL'09), Lecture Notes in Computer Science, Vol. 5788, pp. 284-291, 2009.

54.García-Gutiérrez J, Luis Gonçalves Seco, J.C. Riquelme, “Decision trees on LIDAR to classify land uses and covers”, ISPRS Conf. LASERSCANNING, pp. 323 – 328, 2009.

55.Nepomuceno JA, A. Troncoso, J.S. Aguilar-Ruiz, “Evolutionary Metaheuristic for Biclustering based on Linear Correlations among Genes”. Aceptado en 25th ACM Symposium On Applied Computing (SAC ). Marzo 2010.

56.Nepomuceno J.A., A. Troncoso, J.S. Aguilar-Ruiz, “Correlation--Based Scatter Search for Discovering Biclusters from Gene Expression Data. Aceptado en 8th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. (EVOBIO) Abril 2010.

57.García-Gutiérrez J., F. Martínez-Álvarez, J.C. Riquelme, “Using remote data mining on LIDAR and imagery fusion data to develop land cover maps”. 23th IC on Industrial, Eng. & Applications of Applied Intelligent Systems IEA-AIE 2010.

58.Montero M. A., R. Ruiz, M. García-Torres y L.M. Sarro, "Feature selection applied to data from the Sloan Digital Sky Survey". 23th IC on Industrial, Eng. & Applications of App. Intell. Systems. IEA-AIE 2010.

National Conferences

59.Hernández-Arauzo A., M. García-Torres, A. Bahamonde, “Ranking attributes using learning of preferences by means of SVM.”, XII Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA), 2007.

60.Ruiz R, Riquelme JC, Aguilar-Ruiz JS, "Mejor subconjunto aglomerativo ordenado para selección de atributos", V Congreso español sobre metaheurística, algoritmos evolutivos y bioinspirados (MAEB). pp. 501 - 508, 2007.

61.Díaz-Díaz N, R Blanquero, E Carrizosa. “Aprendizaje Semisupervisado basado en VNS”, V Congreso Español sobre Metaheurísticas, Algorítmos Evolutivos y Bioinspirados (MAEB), 2007

62.Divina F., F. Martínez, J. S. Aguilar-Ruiz,”Método basado en algoritmos genéticos para encontrar biclusters significativos. V congreso español de metaheurísticas, algoritmos evolutivos y bioinspirados (MAEB), pp. 639-646, 2007

63.Pontes B, R Giráldez, JS Aguilar-Ruiz, “Spade: Algoritmo Evolutivo para Descubrir Patrones de Desplazamiento en Microarrays”, V Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados (MAEB), pp.633-637, 2007.

64.Riquelme JC, Ruiz R, Rodríguez D, Aguilar-Ruiz JS, “Identificación de Fallos en Módulos Software Utilizando Técnicas de Minería de Datos”, XIII Jornadas de Ingeniería del Software y Bases de Datos (JISBD 2008), pp. 195-204, 2008.

(12)

65.García-Gutiérrez J, F Martínez-Álvarez, JC Riquelme, “Aprendizaje automático sobre datos LIDAR para monitorizar el avance urbano en suelo natural”, Actas de la XIII Conf. de la Asoc. Española para la Int. Artificial (CAEPIA), pp. 581-590, 2009.

66.Martínez-Ballesteros M, V. M. Rivas, “EvFuzzySystem: Evolución de Sistemas Difusos para Problemas de Regresión Multi-Dimensionales”, XV Congreso Español sobre tecnologías y Lógica Fuzzy (ESTYLF), pp.109-114, 2010.

National Workshops

67.Rodríguez Baena DS, JS. Aguilar-Ruiz, J. García Gutiérrez, "Análisis de datos de expresión genética para la obtención de patrones alpha", V Taller Nacional de Minería de Datos y Aprendizaje (TAMIDA), pp. 265 - 272, 2007

68.Díaz-Díaz N, JS Aguilar Ruiz, Jorge García Gutiérrez, "Interclus: Clustering basado en la vecindad de la interacción gen-gen", V Taller Nacional de Minería de Datos y Aprendizaje (TAMIDA), pp. 121 - 130 , 2007

69.Martínez-Álvarez F, Troncoso A, Riquelme JC y Riquelme JM. “Técnicas de clustering particionales aplicadas a series temporales de precio del mercado eléctrico”. V Taller Nacional de Minería de Datos y Aprendizaje (TAMIDA), pp. 293-302, 2007.

70.Ruiz R, Aguilar-Ruiz JS, Riquelme J, "Selección de atributos: una revisión",

V Taller Nacional de Minería de Datos y Aprendizaje (TAMIDA), pp. 75-93, 2007.

71.Martínez-Álvarez F, Troncoso A, Riquelme JC y Riquelme JM. “Técnicas basadas en vecinos cercanos para la predicción de precios de la energía en el mercado eléctrico”. Simposio de Inteligencia Computacional (SICO), pp. 425-432, 2007.

72.Pontes B, R Giráldez, F Divina, F Martínez-Álvarez, “Evaluación de biclusters en un entorno evolutivo”, Actas del V Taller Nacional de Minería de Datos y Aprendizaje (TAMIDA - CEDI), pp. 1-10, 2007. 73.Pontes B, R Giraldez, F Divina, F Martínez-Álvarez, “Evaluación de biclusters mediante

intra-fluctuaciones mínimas: un enfoque multi-objetivo”, Actas de las I Jornadas de Alg. Evolutivos y Metaheurísticas (JAEM - CEDI), pp. 121-128, 2007.

74.Martínez F, Rodríguez DS, Troncoso A y García J. “Técnicas de aprendizaje no supervisado aplicadas a datos de expresión genéticas”. Workshop sobre Extracción y Validación de Conocimiento en BBDD Biomédicas (EvaBio). pp. 55-63, 2007.

75.Nepomuceno JA, Troncoso A, Aguilar-Ruiz J, "Evaluación de Biclusters basada en patrones de desplazamiento y escalado", Proc. de Extracción y Validación de Conocimiento en BBDD Biomédicas (EvaBio - CAEPIA 2007) pp. 85-94, 2007

76.García J, N Díaz-Díaz, DS Rodríguez Baena, F Martínez Álvarez, "Software y técnicas de validación de conocimiento en bioinformática", I Workshop sobre Extracción y Validación de conocimiento en bases de datos biomédicas (EvaBio), pp. 75-84, 2007

77.Nepomuceno I, JS Aguilar-Ruiz, N Díaz-Díaz , “Gene Networks from Data Mining for Microarray Data”, I Workshop Español sobre Extracción y Validación de Conocimiento en Bases de Datos Biomédicas (EvaBio), pp. 35-44, 2007.

78.Chaparro A, B Pontes, R Giráldez, “Estudio Comparativo de Medidas de Calidad para Biclusters en Microarrays”, Actas I Workshop Sobre Extracción y Validación de Conocimiento en Bases de Datos Biomédicas (EvaBio - CAEPIA), pp. 13-24, 2007.

79.Riquelme JC, R Ruiz, D Rodríguez, J Moreno, “Finding Defective Modules from Highly Unbalanced Datasets”, Actas del 8º Taller sobre el Apoyo a la Decisión en Ingeniería del Software (ADIS-JISBD), pp. 67-74, 2008

(13)

80.Moreno J., D. Rodríguez, M.A. Sicilia, J.C. Riquelme y R. Ruiz, “SMOTE-I: mejora del algoritmo SMOTE para balanceo de clases minoritarias”. Actas Talleres de las Jornadas de Ingeniería del Software y Bases de Datos, Vol. 3, No. 1, pp.73-80, 2009.

81.Nepomuceno JA, A. Troncoso, J Aguilar-Ruiz, “Un algoritmo de Biclustering basado en Busqueda Dispersa y Algoritmos Geneticos”. II Workshop sobre Extracción y Validación de conocimiento en BBDD biomédicas (EVABIO-09), pp. 20–29, 2009.

82.Martínez-Álvarez F., A. Troncoso, J.C. Riquelme, “Reconocimiento de patrones aplicado a la predicción de series temporales”, Workshop on Mining of Non-Conventional Data (MINCODA'09), en CAEPIA’ 09 pp. 66-73

83.Martínez-Ballesteros M., F. Martínez-Álvarez, A. Troncoso, J.C. Riquelme, “Descubriendo reglas de asociación numéricas en series temporales”, Workshop on Mining of Non-Conventional Data (MINCODA'09), pp. 16-24.

Proceedings Editor

84.Ferrer-Troyano FJ, A. Troncoso, JC Riquelme (Eds), Actas del V taller de Minería de Datos y Aprendizaje (TAMIDA 07), 353 páginas, en el II Congreso Español de Informática (CEDI), Thomson.

85.N Díaz-Díaz y S Rodríguez-Baena (Eds), Actas del I Workshop Español de Extracción y Validación de Conocimiento en Bases de Datos Biomédicas (Evabio'07), 94 páginas, en CAEPIA 07, Salamanca Noviembre 2007.

Book Chapters

86.Dolado JJ, D. Rodríguez, JC Riquelme FJ Ferrer-Troyano, JJ Cuadrado, “A Two Stage Zone Regression Method for Global Characterization of a Project Database”, in Advances in Machine Learning Applications in Software Engineering (Zhang D. and Tsai J. Eds), 2007.

87.Ruiz R, Riquelme JC, Aguilar-Ruiz JS, “Efficient Incrementa-Ranked Feature Selection in Massive Data”, in Computational Methods of Feature Selection (Huan Liu, Hiroshi Motoda Eds) Chapman & Hall/CRC, 2007.

88.Ferrer F, Giráldez R, Ruiz R, "Técnicas cuantitativas para la gestión en la

ingeniería del software", en Técnicas Cuantitativas para la Gestión en la Ingeniería del Software, pp. 245 - 266, Netbiblo, 2007.

89.Aroba, J, Ramos I, Riquelme JC, “Estimación y toma de decisiones mediante técnicas tradicionales y lógica borrosa”, en Técnicas Cuantitativas para la Gestión en la Ingeniería del Software, pp. 223-243. Netbiblo, 2007.

90.Álvarez JL, Mata J, Riquelme JC, “Estimación y toma de decisiones mediante algoritmos evolutivos”, en Técnicas Cuantitativas para la Gestión en la Ingeniería del Software, pp. 267-290. Netbiblo, 2007. 91.Barranco CD, Jesús R. Campaña, Juan M. Medina, “Towards a fuzzy object-relational database model “,

Handbook of Research on Fuzzy Information Processing in Databases 435-461, 2008.

92.Harari O, C Rubio-Escudero, P Traverso, M Santos, I Zwir. “Learning Robust Dynamic Networks in Prokaryotes by Gene Expression Networks Iterative Explorer (GENIE)”. Studies in Computational Intelligence. Springer-Verlag, 2008.

93.Barranco CD, JR Campaña, JM Medina, “A real estate management system based on soft computing”, Applications of Soft Computing, 31-40, 2009.

94.Vogt P, Divina, F. “Social symbol grounding and language evolution”. In: T. Belpaeme, S. J. Cowley and K. F. MacDorman (Eds.) Symbol Grounding. Benjamin Current Topics Volume 21. pp. 33-53. 2009.