Finally, recent research by Kitware and AFRL , has found that change detection in WAMI images can be greatly enhanced by applying the differencing to a detection response map rather than to the raw image pixels. By running a Histogram of Oriented Gradients (HOG) based vehicle detection Support Vector Machine (SVM), this paper is able to create a heat map of sort denoting the likelihood that a vehicle is present. When two of these maps are registered and differenced, the pixels with large differences are taken to be potential changes. This process significantly reduces the number of residual bright areas caused by illumination and parallax differences. This method of detecting changes between detection maps rather than raw images was a motivation for the supervised detection methodology presented in this thesis.
When it comes to general objects datasets, ImageNet and MSCOCO are favored due to the large number of images, many categories and detailed annotations. ImageNet has the largest number of images among all object detection datasets. However, the average number of instances per im- age is far smaller than MSCOCO and our DOTA, plus the limitations of its clean backgrounds and carefully selected scenes. Images in DOTA contain an extremely large number of object instances, some of which have more than 1,000 in- stances. PASCAL VOC Dataset  is similar to ImageNet in instances per image and scenes but the inadequate num- ber of images makes it unsuitable to handle most detection needs. Our DOTA resembles MSCOCO in terms of the in- stance numbers and scene types, but DOTA’s categories are not as many as MSCOCO because objects which can be seen clearly in aerialimages are quite limited.
Building detection from aerial and satellite images has been a main research issue for decades and is of great interest since it plays a key role in building model generation, map updating, ur- ban planning and reconstruction (Davydova et al., 2016). Various methods have been developed and difference data sources such as aerialimages, digital surface/eleviation models, LIDAR data, multi-spectral images, synthetic aperture radar images, have been used for building detection. In this section we briefly review rele- vant methods in the literature on building detection. Decades ago the initial endeavor for building detection was relying on group- ing of low level image features such as edge/line segments and/or corners to form building hypotheses (Ok, 2013). For instance, a generic model of the shapes of building was adopted in (Huer- tas and Nevatia, 1988) and shadows cast by buildings were used to confirm building hypotheses and to estimate their height. A computational techniques for utilizing the relationship between shadows and man-made structures to aid in the automatic extrac- tion of man-made structures from aerial imagery is described in (Irvin and McKeown, 1989). An approach to perceptual grouping for detecting and describing 3-D objects in complex images was proposed in (Mohan and Nevatia, 1989) and was illustrated by ap- plying it to the task of detecting and describing complex buildings in aerialimages. The vertical and horizontal lines identified us- ing image orientation information and vanishing point calculation were used in (McGlone and Shufelt, 1994) to constrain the set of possible building hypotheses, and vertical lines are extracted at corners to estimate structure height and permit the generation of three dimensional building models from monocular views. Due to the neglected performance evaluation in building detection, a comprehensive comparative analysis of four building extraction systems was presented in (Shufelt, 1999) and he concluded that none of the developed systems were capable of handling all of
Various techniques have been proposed in the literature to solve the problem of car detection in aerialimages and similar related issues. The main challenge being the small size and the large number of objects to detect in aerial views, which may lead to information loss when performing convolution operations, as well as the difficulty to discern features because of the angle of view. In this scope, Chen et al.  applied a technique based on a hybrid deep convolutional neural network (HDNN) and a sliding window search to solve the vehicle detection problem. The maps of particular layers of the CNN are split into blocks of variable field sizes, to be able to extract features of various scales. They obtained an improved detection rate compared to the traditional deep architectures at that time, but with the expense of high execution time (7s per image, using a GPU).
Detection and classification of objects on earth were and still are important fields for researchers in different majors including Remote Sensing and Photogrammetry (Rottensteiner et el., 2011). As emerging new sensors like laser scanners, developing Photogrammetry field, and utilizing digital cameras, methods of detection and classification are got into new era. High resolution and high spectral digital cameras have lead researchers to develop and introduce new indices to detect a variety of objects on earth. LiDAR data give 3D coordinates of points directly, that this ability makes it a simple task to differentiate between smooth and rough planes. Smooth planes usually designate man-made objects and rough planes mostly designate natural grounds. Because LiDAR is an active sensor it has no problem dealing with shadow areas, while shadow areas are challenging in aerialimages. At high resolution aerialimages boundary of objects like Buildings is clearly notable, but in LiDAR data there are some problems in detecting such boundaries. Considering advantages and disadvantages both LiDAR and aerial imagery it seems integrating these two data sources is the best option (Rottensteiner et el., 2008). Tree detection in complex city scenes because of existing high buildings is a more difficult task than tree detection in plains and cities with low buildings. There are lots of methods and algorithms in detection and classification field but it is not possible to compare those together. This is because of lack of bench mark data sets. In other hand most of algorithms and methods have tested in different data sets. To overcome this problem and making it easier to compare methods together a
Received: 24 August 2020; Accepted: 29 September 2020; Published: 3 October 2020 Abstract: An oriented bounding box (OBB) is preferable over a horizontal bounding box (HBB) in accurate object detection. Most of existing works utilize a two-stage detector for locating the HBB and OBB, respectively, which have suffered from the misaligned horizontal proposals and the interference from complex backgrounds. To tackle these issues, region of interest transformer and attention models were proposed, yet they are extremely computationally intensive. To this end, we propose a semi-anchor-free detector (SAFDet) for object detection in aerialimages, where a rotation-anchor-free-branch (RAFB) is used to enhance the foreground features via precisely regressing the OBB. Meanwhile, a center-prediction-module (CPM) is introduced for enhancing object localization and suppressing the background noise. Both RAFB and CPM are deployed during training, avoiding increased computational cost of inference. By evaluating on DOTA and HRSC2016 datasets, the efficacy of our approach has been fully validated for a good balance between the accuracy and computational cost.
The Extraction of reliable information from aerialimages is difficult problem, but it has numerous important utilizations: The disaster monitoring crop monitoring in precision agriculture, border surveillance, traffic monitoring, and so on. different image processing techniques were considered. Texture analysis techniques are used to detect and segment regions of interest and, particularly roads, from aerialimages in and but the choice of the representative features depends on the specific context of the application that uses it. The authors in consider also a supervised learning approach to detect road textures using a neural network. UAV (multi- copter type) is proposed for efficient road detection and tracking. Different road features and information as the Stroke Width Transform, colors, and width, are combined to highlight possible road candidates . In order to increase the accuracy and robustness of road detection in a deep Convolutional Neural
Shadows are present in a wide range of aerialimages from forested scenes to urban environments. The presence of shadows degrades the performance of computer vision al- gorithms in a diverse set of applications such as image registration, object segmentation, object detection and recognition. Therefore, detection and mitigation of shadows is of paramount importance and can significantly improve the performance of computer vision algorithms in the aforementioned applications. There are several existing approaches to shadow detection in aerialimages including chromaticity methods, texture-based meth- ods, geometric, physics-based methods, and approaches using neural networks in machine learning.
This article is devoted to the problem of shadow detection in color aerialimages. Hue singularity pixels are extracted. The candidate shadow and nonshadow regions are con- structed on the base of the modified ratio maps by using the Otsu ’ s thresholding method and the connected compo- nent analysis. The intensity property, chromaticity property of the shadow areas, and the color attenuation relationship derived from Planck ’ s blackbody irradiance law are used iteratively to segment each candidate region into smaller sub-regions, so that whether each sub-region is true shadow region is identified. The extracted hue singularity pixels are classified on the base of its neighboring pixels. From the above experimental results, it could be concluded that our proposed shadow detection algorithm presents best shadow detection accuracy when compared with Tsai ’ s and Chung et al. ’ s algorithms. Future work need to be done to solve the auto thresholds selection problem.
• The Stanford dataset  consists of a large-scale collection of aerialimages and videos of a university campus containing various agents (cars, buses, bicycles, golf carts, skateboarders and pedestrians). It was obtained using a 3DR solo quadcopter (equipped with a 4k camera) that flew over various crowded campus scenes, at an altitude of around 80 meters. It is originally composed of eight scenes, but since we are exclusively interested in car detection, we chose only three scenes that contains the largest percentage of cars: Nexus (in which 29.51% of objects are cars), Gates (1.08%), and DeathCircle (4.71%). All other scenes contain less than 1% of cars. We used the two first scences for training, and the third one for testing. Besides, we have removed images that contain no cars. We noticed that the ground-truth bounding boxes in some images contain some mistakes (bounding boxes containing no objects) and imprecisions (many bounding boxes are much larger than the objects inside them), as can be seen in Figure 8, but we used them as they are in order to assess the impact of annotation errors on detection performance. In fact, the Stanford Drone Dataset was not primarily designed for object detection, but for trajectory forecasting and tracking. Table 3 shows the number of images and instances in the training and testing datasets. The images in the selected scene have variable sizes, as shown in Table 4, and contain cars of various sizes, as depicted in Figure 7. The average car size (calculated based on the ground-truth bounding boxes) is shown in Table 8. The discrepancy observed between the training and testing datasets in terms of car sizes is explained by the fact that we used different scenes for the training and testing datasets, as explained above. This discrepancy will constitute an additional challenge for the considered object detection algorithms.
Over the past few years, CNNs have become the mainstream approaches in the computer vision field and achieved breakthrough performances in many tasks, including image clas- sification  and object detection . In the literature, the CNN based detection algorithms are divided into two categories : the two-stage region-based framework (R-CNN , Fast R-CNN , and Faster R-CNN) and the one-stage unified ap- proach (OverFeat , SSD , YOLO [139, 27]). Compared to the traditional object detection algorithms, R-CNN was among the early attempts to apply CNNs for object detection. The algorithm first applied the selective search approach  to generate a large number (around 2000) of object candidate regions (region proposals). Then, a pre- trained CNN network (AlexNet ) was utilized to extract a 4096-dimensional feature vector for each region proposal. Following that, a linear support vector machine (SVM) was trained to determine the object categories. To improve the efficiency in generating feature vectors for all region proposals, Fast R-CNN  was proposed. It extends the R-CNN approach by sharing the computations across region proposals and incorporating the ROI pooling layer to get the fixed-length feature representations for image regions. To further improve the detection speed, faster R-CNN  proposed the region proposal network (RPN), which learns to generate region proposals and completely eliminates the need for getting pre-defined candidate regions. Specifically, two separate network branches were attached to the base network to generate region proposals and object classification results.
Image classification is one of the primary tasks in remote sensing which is vastly evolving with the improvements of machine learning techniques. Applications of image classification are mostly found in crop and forestry management (Liakos et al., 2018). Some of the applications of CNN in image classification tasks have been discussed in section 2.1. Those are few examples that proves the capability of deep learning base approaches specially CNN in remote sensing context. One of the common limitations of those methods is the requirement of a large number of training data to achieve better accuracy. Although, Olivares, 2019 proposed a method which requires reduced number of training samples, individual tree identification was not possible. Hence, this thesis proposed a method addressing the gap of previous methods with the use of object detection techniques.
4.6 (a) Original input image of size 900x900. (b) The detected crowd region using patch size of 90x90 and GLCM feature of (b) homogeneity, (c) entropy, (c) contrast and (d) energy. Red region shows region of correctly detected non-crowd area and green as correctly detected crowd area. Accuracy for (b) is 100.00%, (c) 89.19% (d) 97.3% and (e) 75.68%. However, there is misclassification in all images.
Geometrical information is also an important cue. Geometrical properties of buildings can be obtained from a digital surface model (DSM), which is extracted either from stereo images or laser scan data. Stereo or multiple images give important cues to infer three-dimensional (3D) structures. Height differences to initial height, volume reduction rate, debris size, change of roof structure and inclination can be employed in order to assess building damages . In this study, we assume pre- and post-earthquake aerialimages together with the building polygons are available. The goal is to develop the method to generate automatically a map of damaged buildings. We propose an integrative method fusing both image and object space cues to perform more accurate collapse detection and classification.
We present a new approach based on Convolutional Neural Networks (CNN) for the automatic classification of ships and small Unidentified Floating Objects (UFOs) from optical aerial imagery acquired in the visible spectrum. CNNs have obtained state-of-the-art results in image classification tasks  that are similar to the problem addressed in this work. We propose a CNN architecture adapted to aerial image classification of ships. CNNs are able to identify the most distinctive features for the task at hand avoiding the use of hand-engineered features. The proposed method is evaluated with different topologies frequently used for image classification in literature, and also compared to other conventional machine learning techniques that use well-known features extracted from the images by means of both signal and image processing methods. The proposed method has also been applied to the classification of images acquired from different satellites in order to assess its robustness and generalization. This approach classifies ships in portions (224 × 224 pixels) of satellite images, although it could also be applied to larger imagesusing a sliding window. In this sense, it could be considered that it performs a detection with the precision of the window size.
Abstract: Clouds and shadows pose severe problems in discernment of the scene and identification of objects in aerial photography. The changes in illumination,ensued by the presence of cloud and the shadow,aresome of the reasons that lead to ambiguity, while carrying out image segmentation leading to detection of targeted objects. Conventional methods are efficient in detecting thick clouds in contrastive background, but perform poorly in the perception of thin clouds, multiple clouds and their shadows. Reference images for the input are needed in most cases, and separate algorithms are pursued, to identify clouds and shadows in an image, which might not be feasible in all scenarios. Techniques used in this paperto detect cloud and shadows,obviating the need for reference images, are image enhancement, analysis of color histogram of input images, adoption of automatic thresholding and mathematical morphology on the input image. The proposed algorithm,was found to be fast,and experimented on various images that contained multiple white cloud clusters of different shapes, thickness and their shadows. The algorithmwas validated with an accuracy of 94.6% and 87.2% for identification of clouds and shadows, respectively.
Localization and detection techniques [4, 57, 36, 5, 16, 71, 66], on the other hand, are designed to learn the bounding boxes around the identified ob- jects in the image to extract the location information. Semantic segmenta- tion [51, 56, 26] takes a further step towards a fine-grained localization of objects of interest. It entails grouping parts of images so that each pixel in a group corresponds to the object class of the group as a whole. In a binary class setting, the dataset contains two distinct classes: a class for the object of interest and another class for the background. A multi-class setting is characterised by more than two classes. In particular, the goal of semantic segmentation is to predict the class of each pixel in the input aerial image. One of the reasons for the increasing popularity of research in this field is the recent advances in remote sensing instruments that has made it possible to generate more and more different types of aerialimages with different spatial, spectral and temporal resolutions. Another reason is the advances of image processing algorithms, particularly CNNs. The consistent success of CNNs in achieving super-human results in computer vision tasks has inspired researchers in the field to adopt them as an algorithm of choice for aerialimages classification tasks. The availability of powerful machines like GPUs that make it possible to train very deep CNNs for classification tasks further fuels the growing interest.
There have been many innovative techniques proposed in the past to build an accurate fire detection system which are broadly based on image processing and computer vision techniques. The state-of-the-art vision-based techniques for fire and smoke detection have been comprehensively evaluated and compared in . The colour analysis technique has been widely used in the literature to detect and analyse fire in images and videos [2, 13, 16, 20]. On top of colour analysis, many novel methods have been used to extract high level features from fire images like texture analysis , dynamic temporal analysis with pixel-level filtering and spatial analysis with envelope decomposition and object labelling , fire flicker and irregular fire shape detection with wavelet transform , etc. These tech- niques give adequate performance but are outperformed by Machine Learning techniques. A comparative analysis between colour-based models for extraction of rules and a Machine Learning algorithm is done for the fire detection problem in . The machine learning technique used in  is Logistic Regression which is one of the simplest techniques in Machine Learning and still outperforms the colour-based algorithms in almost all scenarios. These scenarios consist of images containing different fire pixel colours of different intensities, with and without smoke.
Several colour detection strategies are developed so far. for instance, colour detection for road and traffic signs with taking pictures in a automobile and changing them into HSV colour space .Vehicle detection utilizing normalized colour and edge map, this detection is completely different from old strategies as it introduces a replacement colour rework model to seek out vehicle colour . Tagging and pursuit in video with neural network colour detection and spatial filters, employing a developed rule with neural network that is generated from the primary frame of a video sequence to discover the item of a selected colour  .our paper describes a new developed technique with the help of neural network in a hierarchical data structure utilizing RGB colour space for colour recognition in every images.
the corresponding parameters of an edge detection process, it is possible to combine region growing and edge detection for image segmentation. The important process in the automated system is brain image classification. The main objective of this step is to differentiate the different abnormal brain images based on the optimal feature set. Though this approach claimed a faster convergence rate, it may not be much useful because of its low accuracy than Artificial Intelligent (AI) techniques. Ahmed Kharrat&KarimGasmi proposed a hybrid approach for classification of brain tissues in MRI based on genetic algorithm . The optimal texture features are extracted from normal and tumor regions by usingspatial gray level dependence method. It is concluded that, Gabor filters are poor due to their lack of orthogonality that results in redundant features at different scales or channels. While Wavelet Transform is capable of representing textures at the most suitable scale, by varying the spatial resolution and there is also a wide range of choices for the wavelet function. Application of various artificial neural networks for image classification is analysed by classifying MR brain images into normal, cancerous and non-cancerous brain tumors in particular, is a crucial task, a wavelet and co-occurrence matrix method based texture feature extraction and Probabilistic Neural Network for classification has been used as new method of brain tumor classification.