Figure 7: Localization Network IOU Example - The maximum IOU is recorded between the predicted square and green circle
The average IOU of all predictions with their respective ground truths is shown in Table 1. The final column shows the percentage of predictions which match the ground truth location with an IOU greater than the given threshold. An IOU of 0.5 is the standard used in consumer image detection challenges such as the Pascal Visual Object Challenge , however in these challenges the target boxes tend to contain many thousands of pixels, and as such small deviations in bounding boxes don’t effect the overall IOU significantly. In contrast, if a 32x32 px bounding box with an IOU of 0.5 is offset by only 4 pixels in both directions, then the IOU can decrease to 0.3. As such multiple IOU thresholds, including 0.1, 0.3, and 0.5, are tested.
Shadows are present in a wide range of aerialimages from forested scenes to urban environments. The presence of shadows degrades the performance of computer vision al- gorithms in a diverse set of applications such as image registration, object segmentation, object detection and recognition. Therefore, detection and mitigation of shadows is of paramount importance and can significantly improve the performance of computer vision algorithms in the aforementioned applications. There are several existing approaches to shadow detection in aerialimages including chromaticity methods, texture-based meth- ods, geometric, physics-based methods, and approaches using neural networks in machine learning.
Methods based on graphical models are quite popular for building detection as well. An MRF model was used in (Krishnamachari and Chellappa, 1996) to group these lines to delineate buildings in aerialimages. First straight lines are extracted from images by using an edge detector followed by a line extractor. Then an MRF model is formulated on these extracted lines with a suit- able neighborhood. The probabilistic model is chosen to support the properties of the shapes of buildings. In the end, the energy function associated with the MRF is minimized, resulting in the grouping of lines. However, no quantitative results were provided for evaluation. Similarly, a stochastic image interpretation model, which combines both 2-D and 3-D contextual information of the imaged scene, was proposed in (Katartzis and Sahli, 2008) for the identification of building rooftops. However, the approach is on- ly applicable to building rooftops with low inclination. A graph- based approach was developed in (Kim and Muller, 1999) for building detection. The whole process of building detection is di- vided into four small stages of line extraction, line-relation-graph generation, building hypothesis generation, and building hypoth- esis verification. This method is yet limited to certain building shapes. Another method for building detection based a graphical method was proposed by (Sirmacek and Unsalan, 2009), where the vertices in the graph are SIFT Keypoints. A multiple subgraph matching method was applied to detection individual building by matching graphs corresponding to a template and a test image. Nevertheless this method is applicable only to urban areas with well-separated buildings. A different method based on graphi- cal modeling of buildings was proposed by in (Cui et al., 2012), where the vertices in the graph are the intersections of line seg- ments. The graph was then adapted to the buildings by filtering the edges by considering the region properties. Then all cycles are search by an algorithm and the most probable cycle were considered as the best building candidate. A novel system was developed by (Izadi and Saeedi, 2012) for automatic detection and height estimation of buildings with polygonal shape roofs in singular satellite images and it employs image primitives such as lines, and line intersections, and examines their relationships with each other using a graph-based search to establish a set of rooftop hypotheses. The height of each rooftop hypothesis is estimated using shadows and acquisition geometry.
1 M.tech, Dept. of CSE, Acharya Nagarjuna University, India
2 Asst.Profeesor, Dept. of CSE, Acharya Nagarjuna University, India
Abstract- In This paper, proposes a system architecture based on deep convolutional neural network (CNN) for road detection and segmentation from aerial images.images are acquired by an unmanned aerial vehicle implemented by the authors. The algorithm for image segmentation has two phases.one is learning phase and another one is operating phase.the input images are decomposed and preprocessed in matlab and partitioned in dimension of 33x33 pixels using a sliding box algorithm these are considered as input into deep CNN.and CNN was design by MatConvNet and some structures. those are four convolutional layers, four pooling layers, one ReLu layer, one full connected layer, and a Softmax layer. The CNN was implemented using programming in MATLAB on GPU and the results are promising.
Abstract: Automatic crowd detection in aerialimages is certainly a useful source of information to prevent crowd disasters in large complex scenarios of mass events. A number of publications employ regression-based methods for crowd counting and crowd density estimation. However, these methods work only when a correct manual count is available to serve as a reference. Therefore, it is the objective of this paper to detect high-density crowds in aerialimages, where counting– or regression–based approaches would fail. We compare two texture–classification methodologies on a dataset of aerial image patches which are grouped into ranges of different crowd density. These methodologies are: (1) a Bag–of–words (BoW) model with two alternative local features encoded as Improved Fisher Vectors and (2) features based on a Gabor filter bank. Our results show that a classifier using either BoW or Gabor features can detect crowded image regions with 97% classification accuracy. In our tests of four classes of different crowd-density ranges, BoW–based features have a 5%–12% better accuracy than Gabor.
We have proposed an orientation selective building detection framework for aerialimages, introducing orientation as a novel feature for object ex- traction purposes. The algorithm starts with feature point detection, used as a directional sampling set to compute orientation statistics and to deﬁne the dominant directions of the urban area. The orientation information is then applied to create a novel improved edge map, emphasizing edges only in the main directions. By integrating color, shadow and the improved edge features, and using the illumination information, building candidates are lo- calized. To ﬁnd the remaining candidates with limited feature evidence, an orthogonality check is introduced. The contours of the localized candidates are extracted by the Chan-Vese active contour algorithm, which might re- sult in diverse, yet less accurate contours. To compensate for this, a novel orientation-selective morphological operator is introduced to reﬁne the ﬁnal outlines. The extensive object- and pixel-level quantitative evaluation and comparison with six state-of-the-art methods conﬁrm and support the supe- riority of the introduced approach.
Abstract—Detecting vehicles in aerialimages provides impor- tant information for traffic management and urban planning. Detecting the cars in the images is challenging due to the relatively small size of the target objects and the complex background in man-made areas. It is particularly challenging if the goal is near real-time detection - within few seconds - on large images without any additional information, e.g. road database, accurate target size. We present a method which can detect the vehicles on a 21 MPixel original frame image without an accurate scale information within seconds on a laptop single threaded. Beside the bounding box of the vehicles we extract also an orientation and type (car/truck) information. First we apply a fast binary detector using Integral Channel Features in a Soft Cascade structure. In the next step we apply a multiclass classifier on the output of the binary detector which gives the orientation and type of the vehicles. We evaluate our method on a challenging dataset of original aerialimages over Munich and a dataset captured from a UAV.
1 Prince Sultan University, Riyadh, Saudi Arabia
2 CISTER Research Centre, ISEP, Polytechnic Institute of Porto, Porto, Portugal
* Correspondence: AA: firstname.lastname@example.org; AK: email@example.com
Abstract: In this paper, we address the problem of car detection from aerialimagesusing Convolutional Neural Networks (CNN). This problem presents additional challenges as compared to car (or any object) detection from ground images because features of vehicles from aerialimages are more difficult to discern. To investigate this issue, we assess the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN, which is the most popular region-based algorithm, and YOLOv3, which is known to be the fastest detection algorithm. We analyze two datasets with different characteristics to check the impact of various factors, such as UAV’s altitude, camera resolution, and object size. The objective of this work is to conduct a robust comparison between these two cutting-edge algorithms. By using a variety of metrics, we show that YOLOv3 yields better performance in most configurations, except that it exhibits a lower recall and less confident detectionswhen object sizes and scales in the testing dataset differ largely from those in the training dataset.
Abstract—In this paper, we address the problem of car de- tection from aerialimagesusing Convolutional Neural Networks (CNN). This problem presents additional challenges as compared to car (or any object) detection from ground images because features of vehicles from aerialimages are more difficult to discern. To investigate this issue, we assess the performance of two state-of-the-art CNN algorithms, namely Faster R-CNN, which is the most popular region-based algorithm, and YOLOv3, which is known to be the fastest detection algorithm. We analyze two datasets with different characteristics to check the impact of various factors, such as UAV’s altitude, camera resolution, and object size. The objective of this work is to conduct a robust comparison between these two cutting-edge algorithms. By using a variety of metrics, we show that none of the two algorithms outperforms the other in all cases.
Detection and classification of objects on earth were and still are important fields for researchers in different majors including Remote Sensing and Photogrammetry (Rottensteiner et el., 2011). As emerging new sensors like laser scanners, developing Photogrammetry field, and utilizing digital cameras, methods of detection and classification are got into new era. High resolution and high spectral digital cameras have lead researchers to develop and introduce new indices to detect a variety of objects on earth. LiDAR data give 3D coordinates of points directly, that this ability makes it a simple task to differentiate between smooth and rough planes. Smooth planes usually designate man-made objects and rough planes mostly designate natural grounds. Because LiDAR is an active sensor it has no problem dealing with shadow areas, while shadow areas are challenging in aerialimages. At high resolution aerialimages boundary of objects like Buildings is clearly notable, but in LiDAR data there are some problems in detecting such boundaries. Considering advantages and disadvantages both LiDAR and aerial imagery it seems integrating these two data sources is the best option (Rottensteiner et el., 2008). Tree detection in complex city scenes because of existing high buildings is a more difficult task than tree detection in plains and cities with low buildings. There are lots of methods and algorithms in detection and classification field but it is not possible to compare those together. This is because of lack of bench mark data sets. In other hand most of algorithms and methods have tested in different data sets. To overcome this problem and making it easier to compare methods together a
(d) pseudo-image of test image1 (e) pseudo-image of test image2 (f) pseudo-image of test image3 Figure 10 original aerialimages and corresponding pseudo-images
5. CONCLUTIONS AND FUTURE WORK
In sum, it’s not necessary to use the highest density of point clouds would get the nearly same pseudo-image results. It means that it’s workable to substitute pseudo-images based on point clouds with lower density with pseudo-images based on point clouds with highest density for change detection analysis. It will increase the work efficiency for both point clouds and images analysis these study mentioned. For the change detection based on point clouds, topography change can be count by two-period point clouds. Furthermore, with classified point clouds we can acquire the positions and attributes of land-cover change area simultaneously rather than identify land-cover with human eyes.
delineated. In this case, many techniques from the computer vision and machine learning fields were adopted to identify the geospatial objects, including cars, ships, and building structures.
Next, another challenging issue that amounts to the detection difficulties refers to the intra-variations of the object appearances. For instance, the differences between the spec- tral signatures of the observed targets in hyperspectral images result from unpredictable sensor noises and artifacts, as well as the changing illumination conditions when the hy- perspectral images are captured . Similarly, the object variations in high resolution aerialimages usually include, but not limited to, object orientations, object occlusions, and object scales [8, 9]. In order to take into account of all these variations, it is important to resort to some a-priori information when designing the detection algorithms. However, it is nontrivial to develop an algorithm that has a powerful generalization capability for all types of object variations.
When it comes to general objects datasets, ImageNet and MSCOCO are favored due to the large number of images, many categories and detailed annotations. ImageNet has the largest number of images among all object detection datasets. However, the average number of instances per im- age is far smaller than MSCOCO and our DOTA, plus the limitations of its clean backgrounds and carefully selected scenes. Images in DOTA contain an extremely large number of object instances, some of which have more than 1,000 in- stances. PASCAL VOC Dataset  is similar to ImageNet in instances per image and scenes but the inadequate num- ber of images makes it unsuitable to handle most detection needs. Our DOTA resembles MSCOCO in terms of the in- stance numbers and scene types, but DOTA’s categories are not as many as MSCOCO because objects which can be seen clearly in aerialimages are quite limited.
The detection of crowd from surveillance imagery is important to monitor public places and to ensure public safety. Hence, this work proposes crowd detection from static image captured from Unmanned Aerial Vehicle. The proposed methodology consists of three steps: FAST feature extraction, Gray Level Co- Occurrence Matrix (GLCM) feature computation and the use of Support Vector Machine (SVM) for classification. The use of FAST corner detector is to obtain regions of interest where possible existence of crowd. The application of GLCM is to extract second order statistical texture features for texture analysis. The result of GLCM then, will be classified to crowd and non-crowd using SVM. For evaluation, ten different images were used taken in various crowd formation, event and location.
Remote sensing methods have been used in maritime scenarios for many years with different scopes that can be attributed to maritime security and safety . Passive optical sensors in multi-spectral or hyper-spectral configurations are widely used for the monitoring of large-scale ecological issues like algal blooms, coral reef studies, or the analysis of sediment transport in estuaries . The inclusion of thermal infrared allows for additional applications like monitoring thermal plumes of warm water dis- charges caused by power plants . With the constant improvement of spatial reso- lution, also ship detection is now possible from satellite based passive optical systems . Radar and especially synthetic aperture radar (SAR) have been studied for sea state monitoring , oil spill  and ship detection , especially exploiting the benefits of a satellite platform regarding the vast area of interest. Also satellite based receivers for ‘Automatic Identification System’ (AIS) are under study and in experimental use .
7. C ONCLUSIONS
Crop and forestry management fields often get benefits from a vast range of remote sensing applications to maintain sustainable management. Image classification is one of the important tasks which reveals more information for decision making. Image classification techniques are developing rapidly with the advancement of deep learning. Although, a considerable number of researches have been carried out using CNN in this field and proven its capabilities, very few researches have been conducted to assess the performance of recently developed object detection technique RCNN in agricultural field. This thesis evaluates the performance of faster RCNN in identifying palm trees which may lead to many applications in crop management field. It is also an example of showing the application of RCNN to detect individual trees with a smaller number of training samples.
After an earthquake, the image-based interpretation methods are powerful tools for detection and classification of damaged buildings. A method based on two kinds of image-extracted features comparing stereo pairs of aerialimages before and after an earthquake is presented. Comparing pre- and post event DSMs - generated from stereo images - could be a solution for detecting the extent of demolished areas of buildings. However such DSMs are not sufficiently accurate due to image matching problems. We propose “Regularity indices” to describe the appearance of the building as regular or irregular. Regularity indices were defined by taking account of lines composition with regards to building footprint. In addition, a normalized value of average differences between DSMs (within each building polygon) is added into the classification procedures. Three kinds of classification methods: k-NN, naive Bayes and support vector machine (SVM) are used and compared. Experiments are performed on two datasets of the Kobe and Bam earthquakes including vast varieties of real collapsed buildings. The numerical results achieved for our datasets are very promising to detect and classify collapsed buildings automatically.
weather conditions (clear or cloudy). Nevertheless, SAR images have some limitations depending on the type of SAR, its radiometric and spectral resolution (which is usually quite low), along with the materials and the geometry of the objects to be detected.
On the one hand, optical images have a higher spatial resolution than most SARs, and this fact helps to improve detection and recognition. On the other, AIS or VMS can identify the ship detected, and its absence may be indicative of a ship performing illegal activities or that it is another kind of floating object. The technical advances of visible spectrum sensors have made it possible to obtain images with more spatial resolution than those obtained with SAR. In contrast to SAR, many satellites are equipped with them, such as SPOT, RedEye, LandSat, QuickBird, and CBERS, among others.
A radio propagation model is a mathematical formulation that characterizes the radio-wave condition as a function of the environment between the antennas in a wireless system. Because each wireless link exists under different conditions, it is difficult to express a mathematical equation that includes all the link environments. In this thesis, we used computer vision and image segmentation to estimate the environment using satellite images. As with image classification, convolutional neural networks (CNNs) have shown potential on image segmentation problems [1–3]. Despite the up-convolutional layers, fully connected neural networks (FCNs) produce coarse segmentation maps due to loss of information during pooling .
The most important and telling part of the thesis is Chapter 4, where we showed that it was indeed possible to build a system that could detect and classify elephants from our data. With a recall and true negative rate above 98% from our best performing model, we reduced the number of proposed regions from 55,507 to 1,875. This removed almost 97% of the manual verification work, while only missing 1% of the elephants detected during the region proposal phase. By revisiting the problem in 2017 we could compare the improvements in the field since the begin- ning of 2015. We can conclude that we do not have enough data available to train a model from scratch, as these models do not generalise well, even though the neural networks are capable of learning core features that distinguish elephants from their environment. Transfer learning is a powerful tool that allowed us to take advantage of the resources available to large companies such as Google and apply it to our problem. Combining modern architectures such as Incep- tionV3, cloud infrastructure from Amazon, an experimentation platform like Valohai and open source frameworks like TensorFlow and Keras we were able to create an affordable and realistic solution to our problem.