摘要:
Various approaches enable a user to capture image information (e.g., still images or video) about an object of interest such as the sole of a shoe or other piece of footwear (e.g., a sandal) and receive information about items that are determined to match footwear based at least in part on the image information. For example, an image analyze service or other similar service can analyze the images to determine a type of shoe included within the images based at least in part on patterns of other distinguishing features of the sole of the shoe. The image analysis service can aggregate the results and can provide information about the results as a set of matches or results to be displayed to a user in response to a visual search query. The information can include, for example, descriptions, contact information, availability, location data, pricing information, and other such information.
摘要:
Systems and approaches for searching a content collection corresponding to query content are provided. In particular, false positive match rates between the query content and the content collection may be reduced with a minimum content region test and/or a minimum features per scale test. For example, by correlating content descriptors of a content piece in the content collection with query descriptors of the query content, the content piece can be determined to match the query content when a particular region of the content piece and/or a particular region of a query descriptor have a proportionate size meeting or exceeding a specified minimum. Alternatively, or in addition, the false positive match rate between query content and a content piece can be reduced by comparing content descriptors and query descriptors of features at a plurality of scales. A content piece can be determined to match the query content according to descriptor proportion quotas for the plurality of scales.
摘要:
Various embodiments of the present invention relate to a method, system and computer program product for detecting and recognizing text in the images captured by cameras and scanners. First, a series of image-processing techniques is applied to detect text regions in the image. Subsequently, the detected text regions pass through different processing stages that reduce blurring and the negative effects of variable lighting. This results in the creation of multiple images that are versions of the same text region. Some of these multiple versions are sent to a character-recognition system. The resulting texts from each of the versions of the image sent to the character-recognition system are then combined to a single result, wherein the single result is detected text.
摘要:
Tasks such as object classification from image data can take advantage of a deep learning process using convolutional neural networks. These networks can include a convolutional layer followed by an activation layer, or activation unit, among other potential layers. Improved accuracy can be obtained by using a generalized linear unit (GLU) as an activation unit in such a network, where a GLU is linear for both positive and negative inputs, and is defined by a positive slope, a negative slope, and a bias. These parameters can be learned for each channel or a block of channels, and stacking those types of activation units can further improve accuracy.
摘要:
Tasks such as object classification from image data can take advantage of a deep learning process using convolutional neural networks. These networks can include a convolutional layer followed by an activation layer, or activation unit, among other potential layers. Improved accuracy can be obtained by using a generalized linear unit (GLU) as an activation unit in such a network, where a GLU is linear for both positive and negative inputs, and is defined by a positive slope, a negative slope, and a bias. These parameters can be learned for each channel or a block of channels, and stacking those types of activation units can further improve accuracy.
摘要:
A method, system and computer program product for encoding an image is provided. The image that needs to be represented is represented in the form of a Gaussian pyramid which is a scale-space representation of the image and includes several pyramid images. The feature points in the pyramid images are identified and a specified number of feature points are selected. The orientations of the selected feature points are obtained by using a set of orientation calculating algorithms. A patch is extracted around the feature point in the pyramid images based on the orientations of the feature point and the sampling factor of the pyramid image. The boundary patches in the pyramid images are extracted by padding the pyramid images with extra pixels. The feature vectors of the extracted patches are defined. These feature vectors are normalized so that the components in the feature vectors are less than a threshold.
摘要:
Present invention relates to a method and system for automatic searching for information on a network in response to an image query sent by a user. The image query includes an image that is captured by using a mobile communications device with a camera. The image is processed to detect the text present in it. The detected text is then recognized using an OCR. Subsequently, the text is searched for matches in the corresponding domain database, selected from the various domain databases present in the network. Thereafter, selected matches and additional related information is sent to the user.
摘要:
Various approaches enable a user to capture image information (e.g., still images or video) about an object of interest such as the sole of a shoe or other piece of footwear (e.g., a sandal) and receive information about items that are determined to match footwear based at least in part on the image information. For example, an image analyze service or other similar service can analyze the images to determine a type of shoe included within the images based at least in part on patterns of other distinguishing features of the sole of the shoe. The image analysis service can aggregate the results and can provide information about the results as a set of matches or results to be displayed to a user in response to a visual search query. The information can include, for example, descriptions, contact information, availability, location data, pricing information, and other such information.
摘要:
A method, system and computer program product for encoding an image is provided. The image that needs to be represented is represented in the form of a Gaussian pyramid which is a scale-space representation of the image and includes several pyramid images. The feature points in the pyramid images are identified and a specified number of feature points are selected. The orientations of the selected feature points are obtained by using a set of orientation calculating algorithms. A patch is extracted around the feature point in the pyramid images based on the orientations of the feature point and the sampling factor of the pyramid image. The boundary patches in the pyramid images are extracted by padding the pyramid images with extra pixels. The feature vectors of the extracted patches are defined. These feature vectors are normalized so that the components in the feature vectors are less than a threshold.
摘要:
A method, system and computer program product for encoding an image is provided. The image that needs to be represented is represented in the form of a Gaussian pyramid which is a scale-space representation of the image and includes several pyramid images. The feature points in the pyramid images are identified and a specified number of feature points are selected. The orientations of the selected feature points are obtained by using a set of orientation calculating algorithms. A patch is extracted around the feature point in the pyramid images based on the orientations of the feature point and the sampling factor of the pyramid image. The boundary patches in the pyramid images are extracted by padding the pyramid images with extra pixels. The feature vectors of the extracted patches are defined. These feature vectors are normalized so that the components in the feature vectors are less than a threshold.