首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 234 毫秒
1.
Deep convolutional neural networks (CNNs) are the dominant technology in computer vision today. Much of the recent computer vision literature can be thought of as a competition to find the best architecture for vision within the deep convolutional framework. Despite all the effort invested in developing sophisticated convolutional architectures, however, it’s not clear how different from each other the best CNNs really are. This paper measures the similarity between two well-known CNNs, Inception and ResNet, in terms of the properties they extract from images. We find that the properties extracted by Inception are very similar to the properties extracted by ResNet, in the sense that either feature set can be well approximated by an affine transformation of the other. In particular, we find evidence that the information extracted from images by ResNet is also extracted by Inception, and in some cases may be more robustly extracted by Inception. In the other direction, most but not all of the information extracted by Inception is also extracted by ResNet.The similarity between Inception and ResNet features is surprising. Convolutional neural networks learn complex non-linear features of images, and the architectural differences between systems suggest that these non-linear functions should take different forms. Nonetheless, Inception and ResNet were trained on the same data set and seem to have learned to extract similar properties from images. In essence, their training algorithms hill-climb in totally different spaces, but find similar solutions. This suggests that for CNNs, the selection of the training set may be more important than the selection of the convolutional architecture. keyword: ResNet, Inception, CNN, Feature Evaluation, Feature Mapping.  相似文献   

2.
Semantic image segmentation is one of the most challenging tasks in computer vision. In this paper, we propose a highly fused convolutional network, which consists of three parts: downsampling, fused upsampling and multiple predictions. We adopt a VGG-net based downsampling structure, followed by multiple steps of upsampling. Feature maps in each pair of corresponding pooling layers and unpooling layers are combined. We also bring out multiple pre-outputs, each is generated from an unpooling layer by a one-step upsampling operation. Finally, we concatenate these pre-outputs to get the final output. As a result, our proposed network makes high use of the feature information by fusing and reusing features in low layers. In addition, when training our model, we add multiple soft cost functions on pre-outputs and the final output. In this way, we can reduce the loss reduction in backpropagation. We evaluate our model on three public segmentation datasets: CamVid, PASCAL VOC, and ADE20K. We achieve considerable segmentation performance on PASCAL VOC dataset and ADE20K dataset. Especially on CamVid dataset, we achieve state-of-the-art performance.  相似文献   

3.
Cervical cancer is the second most common cancer in women globally. A computer aided cervical disease diagnosis system that can relieve pressure on medical experts and save the cost is proposed. To implement our approach in the reality of cervical diseases diagnosis, a multi-modal framework is designed for three kinds of cervical diseases diagnosis that integrates uterine cervix images, Thinprep Cytology Test, human papillomavirus test, and patients’ age. However, too many features increase memory storage costs and computational costs, and it affects the spread of this system in poor areas. Feature selection not only eliminates redundant or irrelevant features but also finds the factors that influence the disease most first is performed in multi-modal frameworks for cervical diseases diagnosis. The detailed process of the method is as follows: first, according the representative color, an efficient image segmentation algorithm is developed; then from three different types of segmented images, we extract color features and texture features for interpreting uterine cervix images; next, Boruta algorithm is applied to feature selection; finally, the performance of Random Forests that utilizes selected features for cervical disease diagnosis is investigated. In the experiment, the proposed multi-modal diagnostic approach gives the final diagnosis for three different kinds of cervical diseases with 83.1% accuracy, which significantly outperforms methods using any single source of information alone. The validation cohort is applied to validate the efficiency of our method, and the performance of random forest obtained by using only 1.2% of features is like or even better than using 100% of features.  相似文献   

4.
In three experiments, a dual-process approach to face recognition memory is examined, with a specific focus on the idea that a recollection process can be used to retrieve configural information of a studied face. Subjects could avoid, with confidence, a recognition error to conjunction lure faces (each a reconfiguration of features from separate studied faces) or feature lure faces (each based on a set of old features and a set of new features) by recalling a studied configuration. In Experiment 1, study repetition (one vs. eight presentations) was manipulated, and in Experiments 2 and 3, retention interval over a short number of trials (0\2-20) was manipulated. Different measures converged on the conclusion that subjects were unable to use a recollection process to retrieve configural information in an effort to temper recognition errors for conjunction or feature lure faces. A single process, familiarity, appears to be the sole process underlying recognition of conjunction and feature faces, and familiarity contributes, perhaps in whole, to discrimination of old from conjunction faces.  相似文献   

5.
Adults searched for a goal in images of a rectangular environment. The goal's position was constant relative to featural and geometric cues, but the absolute position changed across trials. Participants easily learned to use the featural cues to find the target, but learning to use only geometric information was difficult. Transformation tests revealed that participants used the color and shape of distinct features to encode the goal's position. When the features at the correct and geometrically equivalent corners were removed, participants could use distant features to locate the goal. Accuracy remained above chance when a single distant feature was present, but the feature farthest from the goal yielded lower accuracy than one closer. Participants trained with features spontaneously encoded the geometric information. However, this representation did not withstand orientation transformations.  相似文献   

6.
Intact memory for complex events requires not only memory for particular features (e.g., item, location, color, size), but also intact cognitive processes for binding the features together. Binding provides the memorial experience that certain features belong together. The experiments presented here were designed to explicate these as potentially separable sources of age-associated changes in complex memory—namely, to investigate the possibility that age-related changes in memory for complex events arise from deficits in (1) memory for the kinds of information that comprise complex memories, (2) the processes necessary for binding this information into complex memories, or (3) both of these components. Young and older adults were presented with colored items located within an array. Relative to young adults, older adults had a specific and disproportionate deficit in recognition memory for location, but not for item or for color. Also, older adults consistently demonstrated poorer recognition memory for bound information, especially when all features were acquired intentionally. These feature and binding deficits separately contribute to what have been described as older adults’ context and source memory impairments.  相似文献   

7.
The development of explicit memory for basic perceptual features   总被引:1,自引:0,他引:1  
In three experiments with 164 individuals between 4 and 80 years old, we examined age-related changes in explicit memory for three perceptual features--item identity, color, and location. In Experiments 1-2, feature recognition was assessed in an incidental learning, gamelike task resembling the game Concentration. In Experiment 3, feature recognition was assessed using a pencil-and-paper task after intentional learning instructions. The form of the explicit memory function across the life span varied with the particular perceptual feature tested and the type of task. Item recognition was excellent at all ages but was significantly poorer for older adults than children, color recognition peaked in late childhood on the gamelike task, and location recognition peaked in early adulthood on the pencil-and-paper task. These findings indicate that performance on explicit memory tests is not a consistent inverted U-shaped function of age across various features. Explicit memory performance depends on what is measured and how. Because explicit memory typically reflects a composite of different features, age-related changes in explicit memory will not necessarily correspond to the function for any single one.  相似文献   

8.
9.
The speed and accuracy of perceptual recognition of a briefly presented picture of an object is facilitated by its prior presentation. Picture priming tasks were used to assess whether the facilitation is a function of the repetition of: (a) the object's image features (viz., vertices and edges), (b) the object model (e.g., that it is a grand piano), or (c) a representation intermediate between (a) and (b) consisting of convex or singly concave components of the object, roughly corresponding to the object's parts. Subjects viewed pictures with half their contour removed by deleting either (a) every other image feature from each part, or (b) half the components. On a second (primed) block of trials, subjects saw: (a) the identical image that they viewed on the first block, (b) the complement which had the missing contours, or (c) a same name-different exemplar of the object class (e.g., a grand piano when an upright piano had been shown on the first block). With deletion of features, speed and accuracy of naming identical and complementary images were equivalent, indicating that none of the priming could be attributed to the features actually present in the image. Performance with both types of image enjoyed an advantage over that with the different exemplars, establishing that the priming was visual rather than verbal or conceptual. With deletion of the components, performance with identical images was much better than that with their complements. The latter were equivalent to the different exemplars, indicating that all the visual priming of an image of an object is through the activation of a representation of its components in specified relations. In terms of a recent neural net implementation of object recognition (Hummel & Biederman, in press), the results suggest that the locus of object priming may be at changes in the weight matrix for a geon assembly layer, where units have self-organized to represent combinations of convex or singly concave components (or geons) and their attributes (e.g., aspect ratio, orientation, and relations with other geons such as TOP-OF). The results of these experiments provide evidence for the psychological reality of intermediate representations in real-time visual object recognition.  相似文献   

10.
This report compares three feature list sets for capital letters, previously proposed by different investigators, on the ability of each to predict empirical confusion matrices. Toward this end, several variants of assumed information processes in recognition were also compared. The best model incorporated: (1) variable feature retrieval probabilities, (2) a goodness-of-match lower threshold below which guessing determines response, and (3) response bias on guessing trials. This model, when combined with one particular proposed feature list set, produced stress values of less than 9% in comparisons to empirical matrices for each of three different Ss. The feature retrieval probability vectors associated with these minimum-stress predictions were highly correlated ( \(\bar r = .83\) ), suggesting considerable generality of process and feature sets between Ss.  相似文献   

11.
Feature extraction via KPCA for classification of gait patterns   总被引:1,自引:0,他引:1  
Wu J  Wang J  Liu L 《Human movement science》2007,26(3):393-411
Automated recognition of gait pattern change is important in medical diagnostics as well as in the early identification of at-risk gait in the elderly. We evaluated the use of Kernel-based Principal Component Analysis (KPCA) to extract more gait features (i.e., to obtain more significant amounts of information about human movement) and thus to improve the classification of gait patterns. 3D gait data of 24 young and 24 elderly participants were acquired using an OPTOTRAK 3020 motion analysis system during normal walking, and a total of 36 gait spatio-temporal and kinematic variables were extracted from the recorded data. KPCA was used first for nonlinear feature extraction to then evaluate its effect on a subsequent classification in combination with learning algorithms such as support vector machines (SVMs). Cross-validation test results indicated that the proposed technique could allow spreading the information about the gait's kinematic structure into more nonlinear principal components, thus providing additional discriminatory information for the improvement of gait classification performance. The feature extraction ability of KPCA was affected slightly with different kernel functions as polynomial and radial basis function. The combination of KPCA and SVM could identify young-elderly gait patterns with 91% accuracy, resulting in a markedly improved performance compared to the combination of PCA and SVM. These results suggest that nonlinear feature extraction by KPCA improves the classification of young-elderly gait patterns, and holds considerable potential for future applications in direct dimensionality reduction and interpretation of multiple gait signals.  相似文献   

12.
Pigeons (Columba livia) searched for a hidden target area in images showing a schematic rectangular environment. The absolute position of the goal varied across trials but was constant relative to distinctive featural cues and geometric properties of the environment. Pigeons learned to use both of these properties to locate the goal. Transformation tests showed that pigeons could use either the color or shape of the features, but performance was better with color cues present. Pigeons could also use a single featural cue at an incorrect corner to distinguish between the correct corner and the geometrically equivalent corner; this indicates that they did not simply use the feature at the correct corner as a beacon. Interestingly, pigeons that were trained with features spontaneously encoded geometry. The encoded geometric information withstood vertical translations but not orientation transformations.  相似文献   

13.
The models inspired by visual systems of life creatures (e.g., human, mammals, etc.) have been very successful in addressing object recognition tasks. For example, Hierarchical Model And X (HMAX) effectively recognizes different objects by modeling the V1, V4, and IT regions of the human visual system. Although HMAX is one of the superior models in the field of object recognition, its implementation has been limited due to some disadvantages such as the unrepeatability of the process under constant conditions, extreme redundancy, high computational load, and time-consuming. In this paper, we aim at revising the HMAX approach by adding the model of the secondary region (V2) in the human visual system which leads to removing the mentioned drawbacks of standard HMAX. The added layer selects repeatable and more informative features that increase the accuracy of the proposed method by avoiding the redundancy existing in the conventional approaches. Furthermore, this feature selection strategy considerably reduces the huge computational load. Another contribution of our model is highlighted when a small number of training images is available where our model can efficiently cope with this issue. We evaluate our proposed approach using Caltech5 and GRAZ-02 database as two famous benchmarks for object recognition tasks. Additionally, the results are compared with standard HMAX that validate and highlight the efficiency of the proposed method.  相似文献   

14.
Collecting samples is a challenging task for face recognition, especially for some real-world applications such as law enhancement and ID card identification, where there is usually single sample per person (SSPS) used to train a face recognition system. To extract discriminative features from the small size samples, in this paper we propose virtual samples via bidirectional feature selection with global and local structure preservation (VS-BFS-GL) to augment the number of training samples. In VS-BFS-GL, bidirectional feature selection is developed, which introduces L2,1 norm to explore the face variations from both horizontal and vertical directions. Further, to include more variations in the virtual images, the global structure information and sample-specified local structure information of the SSPP training set are considered. By integrating bidirectional feature selection, global and local structure, the limited training samples are fully utilized and more knowledge are mined. To further improve the effectiveness of VS-BFS-GL, an auxiliary database containing different face variations can be used to explore the local structure information. We extensively evaluated the proposed approach on AR and FERET database. The promising recognition results demonstrate that VS-BFS-GL is robust to expression, pose and partial occlusion variations in the faces.  相似文献   

15.
A new model of recognition memory is reported. This model is placed within, and introduces, a more elaborate theory that is being developed to predict the phenomena of explicit and implicit, and episodic and generic, memory. The recognition model is applied to basic findings, including phenomena that pose problems for extant models: the list-strength effect (e.g., Ratcliff, Clark, & Shiffrin, 1990), the mirror effect (e.g., Glanzer & Adams, 1990), and the normal-ROC slope effect (e.g., Ratcliff, McKoon, & Tindall, 1994). The model assumes storage of separate episodic images for different words, each image consisting of a vector of feature values. Each image is an incomplete and error prone copy of the studied vector. For the simplest case, it is possible to calculate the probability that a test item is "old," and it is assumed that a default "old" response is given if this probability is greater than .5. It is demonstrated that this model and its more complete and realistic versions produce excellent qualitative predictions.  相似文献   

16.
可触摸的触觉二维图像是视觉障碍人群获取图像信息的重要方式。目前大多数触觉二维图像都是直接由视觉二维图像转化为的可触摸线条图。在视觉二维图像中,通常运用透视和视角等视觉原理将三维空间关系转换为二维平面关系。视觉系统经过长期大量知觉学习,习得了这种二维到三维的映射关系。但是触觉识别二维图像时,触觉系统如何建立二维平面与三维空间的映射,目前尚有待进一步的研究。影响触觉识别二维图像中二维-三维空间信息转换的视觉因素主要有透视、视角、遮挡、纹理梯度和镂空,直接将视觉二维图像转化为的触觉二维图像时,图像中包含的上述视觉因素通常会干扰触觉识别。结合已有研究,试图提出“双表象加工模型”来解释触摸二维图像时二维到三维空间信息转换的认知机制。该模型认为触觉识别二维图像依赖于两个表象系统的整合,即物体表象系统(涉及物体的大小、形状和纹理)与空间表象系统(涉及物体的空间关系、透视和视角)。两种表象系统的信息最终进行整合,在物体表象和空间表象成功匹配的基础上建立二维图像与三维空间之间的映射,通达长时记忆中的三维物体表征。双表象加工模型将有助于我们深入认识触知觉的认知机制,也将为触觉二维图像的设计提供理论依据。  相似文献   

17.
The authors used a recognition memory paradigm to assess the influence of color information on visual memory for images of natural scenes. Subjects performed 5%-10% better for colored than for black-and-white images independent of exposure duration. Experiment 2 indicated little influence of contrast once the images were suprathreshold, and Experiment 3 revealed that performance worsened when images were presented in color and tested in black and white, or vice versa, leading to the conclusion that the surface property color is part of the memory representation. Experiments 4 and 5 exclude the possibility that the superior recognition memory for colored images results solely from attentional factors or saliency. Finally, the recognition memory advantage disappears for falsely colored images of natural scenes: The improvement in recognition memory depends on the color congruence of presented images with learned knowledge about the color gamut found within natural scenes. The results can be accounted for within a multiple memory systems framework.  相似文献   

18.
Subjects searched sets of items for targets defined by conjunctions of color and form, color and orientation, or color and size. Set size was varied and reaction times (RT) were measured. For many unpracticed subjects, the slopes of the resulting RT X Set Size functions are too shallow to be consistent with Treisman's feature integration model, which proposes serial, self-terminating search for conjunctions. Searches for triple conjunctions (Color X Size X Form) are easier than searches for standard conjunctions and can be independent of set size. A guided search model similar to Hoffman's (1979) two-stage model can account for these data. In the model, parallel processes use information about simple features to guide attention in the search for conjunctions. Triple conjunctions are found more efficiently than standard conjunctions because three parallel processes can guide attention more effectively than two.  相似文献   

19.
Models of human visual processing start with an initial stage with parallel independent processing of different physical attributes or features (e.g. color, orientation, motion). A second stage in these models is a temporally serial mechanism (visual attention) that combines or binds information across feature dimensions. Evidence for this serial mechanism is based on experimental results for visual search. I conducted a study of visual search accuracy that carefully controlled for low-level effects: physical similarity of target and distractor, element eccentricity, and eye movements. The larger set-size effects in visual search accuracy for briefly flashed conjunction displays, compared with feature displays, are quantitatively predicted by a simple model in which each feature dimension is processed independently with inherent neural noise and information is combined linearly across feature dimensions. The data are not predicted by a temporally serial mechanism or by a hybrid model with temporally serial and noisy processing. The results do not support the idea that a temporally serial mechanism, visual attention, binds information across feature dimensions and show that the conjunction-feature dichotomy is due to the noisy independent processing of features in the human visual system.  相似文献   

20.
We present three experiments to identify the specific information sources that skilled participants use to make recognition judgements when presented with dynamic, structured stimuli. A group of less skilled participants acted as controls. In all experiments, participants were presented with filmed stimuli containing structured action sequences. In a subsequent recognition phase, participants were presented with new and previously seen stimuli and were required to make judgements as to whether or not each sequence had been presented earlier (or were edited versions of earlier sequences). In Experiment 1, skilled participants demonstrated superior sensitivity in recognition when viewing dynamic clips compared with static images and clips where the frames were presented in a nonsequential, randomized manner, implicating the importance of motion information when identifying familiar or unfamiliar sequences. In Experiment 2, we presented normal and mirror-reversed sequences in order to distort access to absolute motion information. Skilled participants demonstrated superior recognition sensitivity, but no significant differences were observed across viewing conditions, leading to the suggestion that skilled participants are more likely to extract relative rather than absolute motion when making such judgements. In Experiment 3, we manipulated relative motion information by occluding several display features for the duration of each film sequence. A significant decrement in performance was reported when centrally located features were occluded compared to those located in more peripheral positions. Findings indicate that skilled participants are particularly sensitive to relative motion information when attempting to identify familiarity in dynamic, visual displays involving interaction between numerous features.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号