Automatic Plant Identification: Is Shape the Key Feature?

(1)

Procedia Computer Science 76 ( 2015 ) 436 – 442

Peer-review under responsibility of organizing committee of the 2015 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS 2015) doi: 10.1016/j.procs.2015.12.287

ScienceDirect

2015 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS 2015)

Automatic Plant Identification: Is Shape the Key Feature?

Nursuriati Jamil*, Nuril Aslina Che Hussin, Sharifalillah Nordin, Khalil Awang

Digital Image, Audio and Speech Technology Group (DIAST), Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA,

40450 Shah Alam, Selangor, Malaysia

Abstract

Shape is the most popular feature used in plant leaf identification, be it manual or automatic plant identification. In this paper, a study is conducted to investigate the most contributing features among three low-level features for plant leaf identification. Intra- and inter-class identification are conducted using 455 herbal medicinal plant leaves, with 70% allocated for training and 30% for testing dataset. Shape feature is extracted using Scale Invariant Feature Transform (SIFT); colour is represented using colour moments; and Segmentation-Based Fractal Texture Analysis (SFTA) is utilized to describe texture feature. Intra-class analysis showed that fusion of texture and shape surpassed fusion of texture, shape and colour. Single texture feature identification also achieved highest identification rate compared to identification using colour or shape. Inter-class analysis further support texture to be the discriminative feature among the low-level features. Results demonstrate that single texture feature outperformed colour or shape feature achieving 92% identification rate. Furthermore, fusion of all three features accomplished 94% identification rate. © 2015 Nursuriati Jamil, Nuril Aslina Che Hussin, Sharifalillah Nordin, Khalil Awang. Published by Elsevier B.V.

Peer-review under responsibility of organizing committee of the 2015 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS 2015).

Keywords:plant identification; feature extraction; SIFT; colour moment; fractal texture analysis

1. Introduction

Leaf-based plant identification is favourable against molecular biology techniques as it does not require the expertise of a botanist1_{. Leaves are easier accessible and abundance compared to other plant morphological}

structures such as flowers, barks or fruits. In almost all automatic leaf plant identification, shape of the leaves is the most common feature used for identification2_{as it is claimed to be the most discriminative feature of a plant’s leaf}3_.

* Nursuriati Jamil. Tel.: +60-12-213-1164; fax: +603-5543-5501. E-mail address:[email protected]

Peer-review under responsibility of organizing committee of the 2015 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS 2015)

(2)

As early as 1912, botanist such as Schneider4_{has been using leaf shape as taxonomic keys for text-based plant}

identification. Different species of plants have distinct shape characteristic, thus the shape differences can be obvious even for non-expert. One of the earliest automatic leaf identification that utilized leaf shape parameters to differentiate weed species is done by Petry and Kuhbauch5_{in 1989. Recent similar work such as}6,7,8_{also used leaf’s}

shape as one of the feature for plant identification. Leaf’s shape, however is subject to deformations caused by disease, insects or even human and mechanical damage. Therefore, colour and texture features are further investigated to improve leaf-based plant identification.

Colour is the most obvious morphological feature of a leaf. In spite of that, colour is the least popular low-level feature for plant identification because it is considerably unstable due to its seasonal colour changes. In 9,10,13_{, colour}

moments in RGB colour space comprising mean, standard deviation, skewness and kurtosis are adopted to represent colour features of plant leaves. This colour feature is known for its low dimension and low computational complexity11_{, thus making it convenient for real-time applications. Texture is another feature that can be used in}

plant identification to describe the vein structure or leaf’s surface. Similar to colour feature, texture is considered as an additional feature to better describe properties of the leaves. Kadir et al.9 _{extracted angular second moment,}

contrast, inverse different moment, entropy and correlation from the gray-level occurrence matrix of the leaf. More recent studies utilized Radial Base function12_{, Gabor filters}8_{, Haar wavelet}13_{and multifractal detrended fluctuation}

analysis14_{to improve identification accuracy. A summary of the common features used in selected leaf-based plant}

identification for the previous five years is tabulated in Table 1.

Table 1. Recent work of plant identification.

Authors Co lou r Texture Shap e Feature Descriptors Bama et al., 2011 ¥ ¥ Scale-invariant feature transform,

Log-Gabor filter

Kadir et al., 2011 ¥ ¥ ¥ Colour moment, Zernike moments, Gray-level occurrence matrix Arora et al., 2012 ¥ Complex network, tooth and

morphological features. Kumar et al., 2012 ¥ Curvature-based shape features Chaki et al., 2014 ¥ ¥ Curvelet, Gabor filter, GLCM Ghasab et al., 2015 ¥ ¥ ¥ Geometric features, colour moment,

GLCM

Zhao et al., 2015 ¥ ¥ ¥ Pyramid histograms of oriented gradients, colour moment, Haar wavelet

Fang et al., 2015 ¥ Multifractal detrended fluctuation analysis

As can be seen in Table 1, shape has always been the de facto feature for leaf-based plant identification. Colour and texture features are seen as contributing features to improve the identification accuracy. Other than the most recent work by Fang et al.14_{, previous work has always considered shape to be the upmost feature for plant}

identification. Up to the time of this writing, no study has been done to identify the most contributing features among these three low-level features. Therefore, the aim of this paper is to revisit the performance of the low-level features for the purpose of plant leaf identification.

2. Data acquisition and pre-processing

The plant leaf images for this study are herbal medicinal plants of Malaysia. The leaves are plucked from the plants in their natural habitat, placed on a uniform background in an open space and photographed using a DSLR camera. These images are referred as scan-like photos and few examples are illustrated in Fig. 1. They are acquired

(3)

within a period of 6 months because the herbal plants are not easily accessible. Seven common herb species are collected, comprising 5 different leaves from each species. All the 35 leaves are carefully selected from matured, disease-free and complete leaves as deformations may cause other challenge which is not the main focus of this paper. Other metadata associated with the images are date and time, species scientific and Malay common name, general weather condition and locality.

Chromolaena odorata Ficus Ampelas Piper sarmentosum Ardisia crispa

Fig. 1. Scan-like leaf photos and its scientific names.

All the leaf images are resized to 600 x 500 resolution and unsharp masking are applied to them to enhance the edges. The next stage is to produce geometrically transformed images for sufficient training and testing datasets. Each original image is subjected to 12 geometrical transformations as listed in Table 2 to create various images of different orientation and sizes. After geometrical transformations, each original leaf produced an additional 12 leaves. Thus, each species have a total collection of 65 leaf images (13 transformations x 5 leaves) to be used for intra-class identification. On the other hand, a total of 455 leaves (13 transformation x 7 species x 5 leaves) are collected to be used for inter-class identification. Both inter- and intra-class identification allocated 70% of the dataset for training and 30% for testing.

Table 2. Twelve geometrical transformations.

No Description 1. Image is enlarged by 1.2x

2. Image is enlarged by 1.2x, then rotated at 450

3. Image is enlarged by 1.4x

4. Image is enlarged by 1.4x, then rotated at 900

5. Image is rotated at 100

9. Image is reduced by 0.5x

10. Image is reduced by 0.5x, then rotated at 100

11. Image is reduced by 0.75x

12. Image is reduced by 0.75x, then rotated at 200

3. Methodology

Prior to plant leaf identification, two main processes involved are feature extraction and training of Adaboost classifier.

3.1. Feature extraction

The low-level features extracted from the plant leaves for the purpose of identification are colour, texture and shape. In this paper, colour moment is used to represent the leaf’s colour as it is robust to scaling and rotation9_.

Unlike previous work that implemented colour moment in RGB colour space9_{, LUV}15_{or L*a*b}17_{, we opted for HSV}

(4)

by the mean captured by first-order moment (ȝc); standard deviation (ıc)using second-order moment; and third root

of the skewness derived (șc)from third-order moment.

Texture of the leaf is represented using Segmentation-Based Fractal Texture Analysis (SFTA) descriptor as this method is known to achieve higher precision and accuracy than Gabor and Haralick filter banks for content-based image retrieval and image classification18_{. The two steps approach of SFTA is discussed as follows:}

Step 1: Two-threshold binary decomposition (TTBD). TTBD decomposes the grayscale leaf image into a set of binary images. It uses the graylevel distribution information to compute a set of threshold values, nt that is

defined by the user. In this paper, nt = 8 is chosen. Threshold calculation is done by employing multi-level Otsu

algorithm19_{to minimize the input image intra-class variance. The set of binary images are obtained from the}

grayscale image, Ib(x,y) by applying two threshold segmentation defined as:

(1)

where tl and tu denote lower and upper threshold values, respectively.

Step 2: SFTA extraction algorithm. In this step, SFTA feature vector is constructed consisting of the binary images’ size, mean graylevel and boundaries’ fractal dimension. The fractal dimension, D is computed from each border image using the box counting algorithm. Firstly, the image is divided into a grid composed of squares of size

ȯ x ȯ. The next step consists of counting the number N (ȯ) vs log ȯ-1 curve. Finally, this curve is approximated by a straight line using line fitting method. The fractal dimension, D corresponds to the slope of this line.

The shape feature is extracted by using Scale Invariant Feature Transform (SIFT). SIFT is a method that is robust to geometrical transformation20_{. It is well localized in both the spatial and frequency domains making it also robust}

to noise and illumination16_{. SIFT is accomplished in a 4-steps process and are described in detail in our previous}

paper21_.

3.2. Training of Adaboost classifier

Training is important to build the classification function to identify plant species. Since intra-class and inter-class identification are done, training datasets are separated into two groups. The intra-class training is done using 70% of 65 images, amounting to 45 images. While the inter-class training dataset comprised 70% of the total dataset of 455 leaf images, thus 319 images are selected. Training is done independently for intra- and inter-class identifications. The training of AdaBoost classifier ensued in the steps enlisted as follows:

Step 1: Initialization. All the colour, shape and texture feature vectors extracted from the training datasets are fed into the boosting process. (i.e. 1: true species identification; 0: false species identification).

Step 2. Training and control datasets. Divide data into two sets that is training and control set. Control set is used to evaluate the training set and is selected based on the performance of the previous weak learners.

Step 3.Construction of learners. A decision tree is constructed for the weak learner and is used for boosting. Combination of two or more features become weak learner and multiple weak learners can be combined to generate a more accurate ensemble, known as strong learner.

Step 4.(Boosting of weak learners) The weak learner is boosted using Gentle AdaBoost algorithm to produce

learner and weight.

Step 5. (Classification) Calculate classifier output based on learner and weight values. Step 6. (Determine iteration) Calculate error by comparing with control set.

4. Plant leaf identification experiments

The main aim of this paper is to compare the identification performance of three low-level features individually and at feature fusion level. Seven experiments are conducted to measure the plant leaf identification accuracy and suggest the feature that gives the highest discriminant factor.

All the plant leaf identification experiments are conducted at intra- and inter-class level. Intra-class analysis identifies a leaf against geometrical transformed images within the same class of species. On the other hand,

(5)

class analysis identified leaves among different class of species. At intra-class level, 30% of the species datasets comprising 20 images are used as testing data. On the other hand, inter-class experiments utilized 30% of the total dataset amounting to 136 leaf images for testing. Evaluation of the plant identification is measured using identification rate:

(2)

5. Results and discussions

Results of the plant leaf identification are analysed at intra- and interclass level. Table 3 shows the intra-class identification rate based on species.

Table 3. Intra-class performance comparison of identification rate (%).

Feature(s) Used S1 S2 S3 S4 S5 S6 S7 Average

Shape 65 70 70 70 90 90 85 77.14 Colour 65 70 85 65 75 70 70 71.43 Texture 75 70 90 95 95 85 70 82.86 Colour+Shape 75 70 80 85 95 80 80 80.71 Colour+Texture 70 75 80 90 90 80 85 81.43 Texture+Shape 85 95 95 100 85 75 95 90 Colour+Texture+Shape 75 80 95 100 95 80 80 86.43 Average 72.9 75.7 85 86.4 89.2 80 80.7

From Table 3, it can be seen that fusion of texture and shape achieved the highest identification rate at 90%; even higher than fusion of all shape, texture and colour features. Is shape the most contributing feature? When we analyse the single feature’s performance, texture has the highest identification rate at 82.86% compared to shape and colour features. Fusion of colour and texture features is also higher than fusion of colour and shape, implying that texture is the key feature in plant leaf identification at intra-class level.

When analysing at species level, species 1 and 2 performed the worst. The single feature identification rate does not distinguish any feature to be discriminative at intra-level for these two species. However, the result of shape and texture features significantly improves the identification rate. In this case, even though texture has the highest rate compared to shape and colour, texture alone is not sufficient to differentiate leaves at intra-class level as these two species have similar low-level properties. Fig. 2 illustrates a sample of leaf from species 1 and 2, respectively.

Species 1 Species 2 Fig. 2. Species of similar low-level characteristics.

Table 4 presents the inter-class plant leaf identification rate. Results of inter-class identification strongly suggest that texture is the main contributing feature with 92% rate for single feature identification and the highest rate achieved by fusion of colour texture and shape features. For two feature fusions, when texture feature is excluded, the identification rates significantly drop from 93.4% (i.e. colour+texture fusion) and 91.2% (i.e. texture+shape fusion) to 86% (i.e. colour+shape fusion). Is shape feature important for plant leaf identification? Based on the

(6)

performance of single feature, shape’s identification rate is marginally equal with colour feature. However, when shape is fused with colour and texture, the identification rate performed the highest at 94%.

Table 3. Intra-class performance comparison of identification rate (%).

Feature(s) Used Average Identification Rate

Shape 86.8% Colour 86% Texture 92% Colour+ Shape 86% Colour+Texture 93.4% Texture+Shape 91.2% Colour+Texture+Shape 94%

6. Conclusion and future work

While shape is perceived to be the key feature for automatic plant leaf identification, this preliminary study revealed that texture is the feature that highly influenced the identification rate. However, further investigation need to be done to positively confirmed the claim. Experiments of this paper are limited to an initial small collection of Malaysia medicinal herbal plants as local requirement persists. The proposed methodology should also be conducted on any of the publicly available image dataset such as Flavia Dataset, the SmithSonian Leaf Dataset, Swedish Leaf Dataset, or the ImageCLEF dataset. Other feature extraction method for the low-level features should also be tested to verify consistencies of results. In this paper, a simple identification rate is used for performance evaluation. Further evaluation measurements such as recall-precision curve, F-measure or accuracy should be done before reaching conclusive findings.

Acknowledgements

The authors gratefully acknowledged the funding support received from Universiti Teknologi MARA (UiTM), Malaysia and the Ministry of Higher Education, Malaysia under Research Acculturation Grant Scheme (No. 600-RMI/RAGS 5/3 (18/2012)).

References

1. Arora A, Gupta A, Bagmar N, Mishra S, Bhattacharya A. A plant identification system using shape and morphological features on segmented leaflets. Team IITK, CLEF 2012 Working Notes. In CLEF (Online Working Notes/Labs/Workshop).

2. Babatunde O, Armstrong L, Diepeveen D, Jinsong L. A survey of computer-based vision systems for automatic identification of plant species. Journal of Agricultural Informatics 2015; 6(1):1-10.

3. 3. Cope JS, Corney D, Clark JY, Remagnino P, Wilkin P. Plant species identification using digital morphometrics: A review, Expert Systems with Applications 2012; 39(8): 7562-7573.

4. 4. Schneider C K, Handbuch der Laubholz-kunde. Germany:Jena; 1912.

5. 5. Petry W, Kuhbauch W. Automatisierte unterscheidung von unkrauten nach formparametern mit hilfe der quantitativen bild analyse. J. Agronomy Crop Sci 1989; 163:345–351.

6. 6. Zhao ZQ, Ma LH, Cheung YM, Wu X, Tang Y, Chen CLP. ApLeaf: An efficient android-based plant leaf identification system.

Neurocomputing 2015;151:1112-1119.

7. 7. Kumar N, Belhumeur P, Biswas A, Jacobs D, Kress W, Lopez I, Soares J. Leafsnap: a computer vision system for automatic plant species identification. In Proceedings of the 12th European Conference on Computer Vision, (ECCV'12), Springer Berlin Heidelberg; 2012. p. 502-516.

8. 8. Chaki J, Parekh R, Bhattacharya R. Plant leaf recognition using texture and shape features with neural classifiers. Pattern Recognition Letters 2015; 58: 61-68.

9. 9. Kadir A, Nugroho LE, Susanto A, Santosa PI. Experiments of Zernike moments for leaf identification. Journal of Theoretical and Applied Information Technology 2012; 41(1):83-93.

10. 10. Ghasab MAJ, Khamis S, Mohammad F, Fariman HJ. Feature decision-making ant colony optimization system for an automated recognition of plant species. Expert Systems with Applications 2015; 42(5):2361-2370.

11. 11. Stricker MA, Orengo M. Similarity of color images. IS&T/SPIE's Symposium on Electronic Imaging: Science & Technology; 1995. p. 381–392.

(7)

International Journal of Computer Science & Information Technology 2011; 3(4):93-100.

13. 13. Zhao ZQ, Ma LH, Cheung YM, Wu X, Tang Y, Chen CLP. AppLeaf: An efficient android-based plant leaf identification system. Neurocomputing 2015; 151(3):1112-1119.

14. 14. Wang F, Liao DW, Li JW, Liao GP. Two-dimensional multifractal detrended fluctuation analysis for plant identification. Plant Methods 2015; 11(1):12.

15. 15. Chen MY, Hauptmann A, Li H. Combining motion understanding and key frame image: analysis for broadcast video information extraction. In Proc. SPIE Defense, Security and Sensing 7704, International Society for Optics and Photonics; 2010. p. 77040H-77040H.

16. 16. Wu, SG, Bao FS, Xu EY, Wang YX, Chang YF, Xiang QL. A leaf recognition algorithm for plant classification using probabilistic neural network. IEEE International Symposium Signal Processing and Information Technology; 2007. p.11-16. 17. 17. Yu H, Li M, Zhang HJ, Feng J. Colour texture moment for content-based image retrieval. International Conference Image

Processing; 2002. p. 929- 932.

18. 18. Costa AF, Humpire-Mamani G, Traina, AJM. An efficient algorithm for fractal analysis of textures. 25th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI); 2012, p. 39-46.

19. 19. Liao PS, Chen TS, Chung PC. A fast algorithm for multilevel thresholding. J. Inf. Sci. Eng, 2001;17(5):713–727.

20. 20. Lowe DG. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 2004; 60 (2) :91-110.

21. 21. Che Hussin NA, Jamil N, Nordin, S, Awang K. Plant species identification by using Scale Invariant Feature Transform (SIFT) and Grid Based Colour Moment (GBCM), 2013 IEEE Conference on Open Systems; 2013. p. 226-230.