首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Edge detection plays an important role in image processing. With the development of deep learning, the accuracy of edge detection has been greatly improved, and people have more requirements for edge detection tasks. Most edge detection algorithms are binary edge detection methods, but there are usually multiple categories of edges in an image. In this paper, we present an accurate multi-category edge detection network, the richer category-aware semantic edge detection network (R-CASENet). In order to make full use of convolutional neural network’s powerful feature expression capabilities, we attempt to use more information from feature maps for edge feature extraction and classification. Using the ResNet101 network as the backbone, firstly we merge the building blocks in different composite blocks and down-sample to obtain the feature maps. Then we fuse the feature maps in different composite blocks to obtain the final fused classifier. Experimental results show that R-CASENet can achieve state-of-the-art performance on the large SBD dataset. Furthermore, to get precise one-pixel width edges, we also propose an edge refinement network (ERN) structure. The proposed scheme is an end-to-end method and the proposed ERN can reduce redundant points and improve computational efficiency, especially for further image processing.  相似文献   

2.
Multiresolution (orpyramid) approaches to computer vision provide the capability of rapidly detecting and extracting global structures (features, regions, patterns, etc.) from an image. The human visual system also is able to spontaneously (orpreattentively) perceive various types of global structure in visual input; this process is sometimes calledperceptual organization. This paper describes a set of pyramid-based algorithms that can detect and extract these types of structure; included are algorithms for inferring three-dimensional information from images and for processing time sequences of images. If implemented in parallel on cellular pyramid hardware, these algorithms require processing times on the order of the logarithm of the image diameter.  相似文献   

3.
Video cameras provide a simple, noninvasive method for monitoring a subject’s eye movements. An important concept is that of the resolution of the system, which is the smallest eye movement that can be reliably detected. While hardware systems are available that estimate direction of gaze in real time from a video image of the pupil, such systems must limit image processing to attain real-time performance and are limited to a resolution of about 10 arc minutes. Two ways to improve resolution are discussed. The first is to improve the image processing algorithms that are used to derive an estimate. Offline analysis of the data can improve resolution by at least one order of magnitude for images of the pupil. A second avenue by which to improve resolution is to increase the optical gain of the imaging setup (i.e., the amount of image motion produced by a given eye rotation). Ophthalmoscopic imaging of retinal blood vessels provides increased optical gain and improved immunity to small head movements but requires a highly sensitive camera. The large number of images involved in a typical experiment imposes great demands on the storage, handling, and processing of data. A major bottleneck had been the real-time digitization and storage of large amounts of video imagery, but recent developments in video compression hardware have made this problem tractable at a reasonable cost. Images of both the retina and the pupil can be analyzed successfully using a basic toolbox of image-processing routines (filtering, correlation, thresholding, etc.), which are, for the most part, well suited to implementation on vectorizing supercomputers.  相似文献   

4.
Deep convolutional neural networks (CNNs) are the dominant technology in computer vision today. Much of the recent computer vision literature can be thought of as a competition to find the best architecture for vision within the deep convolutional framework. Despite all the effort invested in developing sophisticated convolutional architectures, however, it’s not clear how different from each other the best CNNs really are. This paper measures the similarity between two well-known CNNs, Inception and ResNet, in terms of the properties they extract from images. We find that the properties extracted by Inception are very similar to the properties extracted by ResNet, in the sense that either feature set can be well approximated by an affine transformation of the other. In particular, we find evidence that the information extracted from images by ResNet is also extracted by Inception, and in some cases may be more robustly extracted by Inception. In the other direction, most but not all of the information extracted by Inception is also extracted by ResNet.The similarity between Inception and ResNet features is surprising. Convolutional neural networks learn complex non-linear features of images, and the architectural differences between systems suggest that these non-linear functions should take different forms. Nonetheless, Inception and ResNet were trained on the same data set and seem to have learned to extract similar properties from images. In essence, their training algorithms hill-climb in totally different spaces, but find similar solutions. This suggests that for CNNs, the selection of the training set may be more important than the selection of the convolutional architecture. keyword: ResNet, Inception, CNN, Feature Evaluation, Feature Mapping.  相似文献   

5.
The idea of efficient coding in the visual brain allows for predictions for the processing of various types of images, including certain artworks, natural images and uncomfortable images. Efficient processing is thought to result in lower responses compared to less efficient processing. The efficiency of the processing is suggested to depend on the architecture of the visual system and the properties of the input image. In this study, neural responses were estimated using EEG across the categories of a set of five images of abstract artworks, a set of five photographs of natural images and a set of five computer-generated uncomfortable images. EEG responses to contrast-matched images were found to be lower for the set of five abstract artworks used in the study compared to the set of photographs of natural images, lending preliminary support for the idea that certain abstract artworks, for example the work of Pollock, may be processed efficiently.  相似文献   

6.
Images that are presented with targets of an unrelated detection task are better remembered than images that are presented with distractors (the attentional boost effect). The likelihood that any of three mechanisms, attentional cuing, prediction-based reinforcement learning, and perceptual grouping, underlies this effect depends in part on how it is modulated by the relative timing of the target and image. Three experiments demonstrated that targets and images must overlap in time for the enhancement to occur; targets that appear 100 ms before or 100 ms after the image without temporally overlapping with it do not enhance memory of the image. However, targets and images need not be synchronized. A fourth experiment showed that temporal overlap of the image and target is not sufficient, as detecting targets did not enhance the processing of task-irrelevant images. These experiments challenge several simple accounts of the attentional boost effect based on attentional cuing, reinforcement learning, and perceptual grouping.  相似文献   

7.
In image processing, image enhancement is a vital processing chore. The image enhancement can improve the image quality by removing either blur or any kind of noise in the image. Image enhancement technique is utilized in many applications, such as medical, satellite, agriculture, oceanography and so on. This paper focuses on the IoT satellite applications. Most of the satellite images are essential to have high resolution satellite images, low resolution images are majorly affected by absorption, scattering, spatial resolution and spectral resolution issues. For better resolution of these kinds of issued images, Discrete Wavelet Transform (DWT) based interpolation method, combination of DWT and stationary wavelet transform (SWT) methods, bicubic interpolation methods are utilized. However, DWT with SWT method is failed avoid distorted in the resultant images, the bicubic interpolation method is quite complex and cannot give a clear image. DWT based interpolation method lose linear features and unwanted oscillations are occurred and edges data is lost. Therefore, DWT and Gabor technique is proposed to overcome existing method issues. DWT is decomposed into multiple sub-bands; GWT is employed to minimize the loss of information in wavelet domain. The advantages of the GWT are less complexity, remove the noise, and also sharp the image. The proposed method of the PSNR, MSE is compared with existing methods by using the different satellite images.  相似文献   

8.
In person identification, recognition failure due to variations of illumination is common. In this study, we employed image‐processing techniques to tackle this problem. Participants performed recognition and matching tasks where the face stimuli were either original images or computer‐processed images in which shading was weakened via a number of image‐processing techniques. The results show that whereas recognition accuracy in a memory task was unaffected, some of the techniques significantly improved the identification performance in a face‐matching task. We conclude that relative to long‐term face memory, face matching is more susceptible to discrepancy of shading in different images of a face. Reducing the discrepancy by certain preprocessing techniques can facilitate person identification when original face images contain large illumination differences. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

9.
Computational algorithms of image processing were developed and evaluated to select, by motion detection, images of resting artificial pigs and to segment the pigs (mixture of black and white pigs) from their background. Motion detection of the pigs was implemented by detecting interframe differences of postural behavioral images. This algorithm combines the advantages of likelihood ratio method and shading model method and shows a stable performance under noisy and dynamic illumination conditions. Segmentation of the pigs from their background was implemented by employing multilevel thresholding and background reference techniques. The algorithm automatically determines the number of thresholds needed and produces satisfactory segmentation when both black and white pigs with different image intensities are present at the same time (the most complicated situation). The reference background image is updated so that temporal changes in illumination and/or spatial changes of the pen condition have little effect on the performance of image segmentation. The algorithm employs statistical models of the pigs and background and Bayes hypothesis testing to obtain and update the exposed portion of the reference background. Linear filters were used in this process for updating the parameters. These algorithms will serve as essential components for a novel, behavior-based, interactive approach to assess and control thermal comfort of group-housed pigs, which is expected to result in enhanced animal health and well-being.  相似文献   

10.
The popularity of deep learning has influenced the field of surveillance and human safety. We adopt the advantages of deep learning techniques to recognize potentially harmful objects inside living rooms, offices, and dining rooms during earthquakes. In this study, we propose an educational system to teach earthquake risks using indoor object recognition based on deep learning algorithms. The system is based on the You Look Only Once (YOLO) deployed on our cloud-based server named Earthquake Situation Learning System (ESLS) for the detection of harmful objects associated with risk tags. ESLS is trained on our own indoor images dataset. The user interacts with the ESLS server through video or image files, and the object detection algorithm using YOLO recognizes the indoor objects with associated risk tags. Results show that the service time of ESLS is low enough to serve it to users in 0.8 s on average, including processing and communication times. Furthermore, the accuracy of the harmful object detection is 96% in the general indoor lighting situation. The results show that the proposed ESLS is applicable to real service for teaching the earthquake disaster avoidance.  相似文献   

11.
Colour has been shown to facilitate the recognition of scene images, but only when these images contain natural scenes, for which colour is 'diagnostic'. Here we investigate whether colour can also facilitate memory for scene images, and whether this would hold for natural scenes in particular. In the first experiment participants first studied a set of colour and greyscale natural and man-made scene images. Next, the same images were presented, randomly mixed with a different set. Participants were asked to indicate whether they had seen the images during the study phase. Surprisingly, performance was better for greyscale than for coloured images, and this difference is due to the higher false alarm rate for both natural and man-made coloured scenes. We hypothesized that this increase in false alarm rate was due to a shift from scrutinizing details of the image to recognition of the gist of the (coloured) image. A second experiment, utilizing images without a nameable gist, confirmed this hypothesis as participants now performed equally on greyscale and coloured images. In the final experiment we specifically targeted the more detail-based perception and recognition for greyscale images versus the more gist-based perception and recognition for coloured images with a change detection paradigm. The results show that changes to images are detected faster when image-pairs were presented in greyscale than in colour. This counterintuitive result held for both natural and man-made scenes (but not for scenes without nameable gist) and thus corroborates the shift from more detailed processing of images in greyscale to more gist-based processing of coloured images.  相似文献   

12.
Accurate glioma detection using magnetic resonance imaging (MRI) is a complicated job. In this research, deep learning model is presented for glioma and stroke lesion detection. The proposed architecture consists of 14 layers. The first input layer is followed by three convolutional layers while 5th, 6th and 7th layers correspond to batch normalization, followed by next three layers of rectified linear unit (ReLU). Eleventh layer is average pooling 2D which is followed by fully connected (FC), softmax and classification layers respectively. The presented method is verified on six MICCAI databases namely multimodal brain tumor segmentation (BRATS) 2013, 2014, 2015, 2016, 2017 and sub-acute ischemic stroke lesion segmentation (SISS-ISLES) 2015. The computational time is also measured across each benchmark dataset such as 53 s on BRATS 2013, 26 s on BRATS 2014, 41 s on BRATS 2015, 36 s on BRATS 2016, and 38 s on BRATS 2017 and 4.13 s on ISLES 2015 proving that the proposed technique has less processing time. The proposed method achieved 0.9943 ACC, 1.00 SP, 0.9839 SE on BRATS 2013, 0.9538 ACC, 0.9991 SP, 0.7196 SE on BRATS 2014, 0.9978 ACC, 1.00 SP, 0.9919 SE on BRATS 2015, 0.9569 ACC, 0.9491 SP, 0.9755 SE on BRAST 2016, 0.9778 ACC, 0.9770 SP, 0.9789 SE on BRATS 2017 and 0.9227 ACC, 1.00 SP, 0.8814 SP on ISLES 2015 datasets respectively.  相似文献   

13.
Invariant recognition of natural objects in the presence of shadows   总被引:2,自引:0,他引:2  
Braje WL  Legge GE  Kersten D 《Perception》2000,29(4):383-398
Shadows are frequently present when we recognize natural objects, but it is unclear whether they help or hinder recognition. Shadows could improve recognition by providing information about illumination and 3-D surface shape, or impair recognition by introducing spurious contours that are confused with object boundaries. In three experiments, we explored the effect of shadows on recognition of natural objects. The stimuli were digitized photographs of fruits and vegetables displayed with or without shadows. In experiment 1, we evaluated the effects of shadows, color, and image resolution on naming latency and accuracy. Performance was not affected by the presence of shadows, even for gray-scale, blurry images, where shadows are difficult to identify. In experiment 2, we explored recognition of two-tone images of the same objects. In these images, shadow edges are difficult to distinguish from object and surface edges because all edges are defined by a luminance boundary. Shadows impaired performance, but only in the early trials. In experiment 3, we examined whether shadows have a stronger impact when exposure time is limited, allowing little time for processing shadows; no effect of shadows was found. These studies show that recognition of natural objects is highly invariant to the complex luminance patterns caused by shadows.  相似文献   

14.
McSorley E  Findlay JM 《Perception》1999,28(8):1031-1050
Three pairs of experiments were conducted to investigate the importance of spatial-frequency-processing delays imposed by the visual system on the efficacy of their subsequent integration. In the first pair of experiments filtered versions of a natural image were found to be easily discriminable from the full-bandwidth image. These images were then placed into triplets and presented sequentially. Subjects were required to detect the presence of a full-bandwidth target image within the sequence. As with previous results a low-to-high progression of spatial-frequency information did increase the number of errors of full-bandwidth-image detection within the triplets where it was not present. However, this was not found across all conditions and was suggested to be due to the increased discriminability of the constituent images which form the image triplets. The second experiment was a repetition of the first with more confusable component images. Again the weak order effect on target detection was found. It was suggested that this may be due to the nature of the stimulus used. The third pair of experiments repeated the design of the first pair of experiments but with Gabor patches which provide localised spatial-frequency information. It was found that a low-to-high progression of spatial frequencies resulted in more false detections of the target. The results are interpreted as supporting an integration mechanism which operates with greater efficacy when the presentation of spatial-frequency information mirrors that naturally imposed by the neural delays involved in the processing of spatial frequencies.  相似文献   

15.
In the attentional boost effect, participants encode images into memory as they perform an unrelated target-detection task. Later memory is better for images that coincided with a target rather than a distractor. This advantage could reflect a broad processing enhancement triggered by target detection, but it could also reflect inhibitory processes triggered by distractor rejection. To test these possibilities, in four experiments we acquired a baseline measure of image memory when neither a target nor a distractor was presented. Participants memorized faces presented in a continuous series (500- or 100-ms duration). At the same time, participants monitored a stream of squares. Some faces appeared on their own, and others coincided with squares in either a target or a nontarget color. Because the processes associated with both target detection and distractor rejection were minimized when faces appeared on their own, this condition served as a baseline measure of face encoding. The data showed that long-term memory for faces coinciding with a target square was enhanced relative to faces in both the baseline and distractor conditions. We concluded that detecting a behaviorally relevant event boosts memory for concurrently presented images in dual-task situations.  相似文献   

16.
The retinal blood vessel segmentation is required for continuously monitoring the blood vessel in most of the retinal disease diagnosis. Deep learning approaches are accepted as the promising techniques for biomedical image segmentation. In this paper, Encoder enhanced Atrous architecture is proposed for retinal blood vessel segmentation. The encoder section is enhanced by improving the depth concatenation process with the addition layers. The proposed architecture is evaluated on the publicly available databases DRIVE, STARE, CHASE_DB1 and HRF using metrics like accuracy, sensitivity, specificity, Dice coefficient, and Mathew’s correlation coefficient. The proposed architecture performs better compared to the conventional Unet architecture in terms of accuracy by 0.35% and 0.83% for DRIVE and STARE respectively. In terms of specificity and Dice score, the proposed architecture also shows improved results compared to the Unet architecture.  相似文献   

17.
Similar to an auditory chimaera (Smith et al, 2002 Nature 416 87-90), a visual chimaera can be defined as a synthetic image which has the fine spatial structure of one natural image and the envelope of another image in each spatial frequency band. Visual chimaeras constructed in this way could be useful to vision scientists interested in the study of interactions between first-order and second-order visual processing. Although it is almost trivial to generate 1-D chimaeras by means of the Hilbert transform and the analytic signal, problems arise in multidimensional signals like images given that the partial directional Hilbert transform and current 2-D demodulation algorithms are anisotropic or orientation-variant procedures. Here, we present a computational procedure to synthesise visual chimaeras by means of the Riesz transform--an isotropic generalisation of the Hilbert transform for multidimensional signals--and the associated monogenic signal--the vector-valued function counterpart of the analytic signal in which the Riesz transform replaces the Hilbert transform. Examples of visual chimaeras are shown for same/different category images.  相似文献   

18.
A novel three-dimensional eye tracker is described and its performance evaluated. In contrast to previous devices based on conventional video standards, the present eye tracker is based on programmable CMOS image sensors, interfaced directly to digital processing circuitry to permit real-time image acquisition and processing. This architecture provides a number of important advantages, including image sampling rates of up to 400/sec measurement, direct pixel addressing for preprocessing and acquisition, and hard-disk storage of relevant image data. The reconfigurable digital processing circuitry also facilitatesin-line optimization of the front-end, time-critical processes. The primary acquisition algorithm for tracking the pupil and other eye features is designed around the generalized Hough transform. The tracker permits comprehensive measurement of eye movement (three degrees of freedom) and head movement (six degrees of freedom), and thus provides the basis for many types of vestibulo-oculomotor and visual research. The device has been qualified by the German Space Agency (DLR) and NASA for deployment on the International Space Station. It is foreseen that the device will be used together with appropriate stimulus generators as a general purpose facility for visual and vestibular experiments. Initial verification studies with an artificial eye demonstrate a measurement resolution of better than 0.1° in all three components (i.e., system noise for each of the components measured as 0.006° H, 0.005° V, and 0.016° T. Over a range of ±20° eye rotation, linearity was found to be <0.5% (H), <0.5% (V), and <2.0% (T). A comparison with the scierai search coil technique yieldednear equivalent values for the systemnoise and the thickness of Listing’s plane.  相似文献   

19.
This article describes an implemented architecture for intermediate vision. By integrating a variety of Intermediate visual mechanisms and putting them to use in support of concrete activity, the implementation demonstrates their utility. The sytem, SIVS, models psychophysical discoveries about visual attention and search. It is designed to be efficiently implementable in slow, massively parallel, locally connected hardware, such as that of the brain. SIVS addresses five fundamental problems. Visual attention is required to restrict processing to task-relevant locations in the image. Visual search finds such locations. Visual routines are a means for nonuniform processing based on task demands. Intermediate objects keep track of intermediate results of this processing. Visual operators are a set of relatively abstract, general-purpose primitives for spatial analysis, out of which visual routines are assembled.  相似文献   

20.
The color information of diseased leaf is the main basis for leaf based plant disease recognition. To make use of color information, a novel three-channel convolutional neural networks (TCCNN) model is constructed by combining three color components for vegetable leaf disease recognition. In the model, each channel of TCCNN is fed by one of three color components of RGB diseased leaf image, the convolutional feature in each CNN is learned and transmitted to the next convolutional layer and pooling layer in turn, then the features are fused through a fully connected fusion layer to get a deep-level disease recognition feature vector. Finally, a softmax layer makes use of the feature vector to classify the input images into the predefined classes. The proposed method can automatically learn the representative features from the complex diseased leaf images, and effectively recognize vegetable diseases. The experimental results validate that the proposed method outperforms the state-of-the-art methods of the vegetable leaf disease recognition.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号