Abstract:
A multiple bitrate (MBR) video encoding management tool utilizes available processing units for parallel MBR video encoding. For example, instead of focusing only on multi-threading of encoding tasks for a single picture or group of pictures (GOP), the management tool parallelizes the encoding of multiple GOPs between different processing units and/or different computing systems. With this parallel MBR video encoding architecture, different GOPs can be encoded in parallel. To facilitate such parallel encoding, data dependencies between GOPs are removed. The management tool can adjust the number of GOPs to encode in parallel on a computing system so as to favor parallelism of encoding for different GOPs at the expense of parallelism of encoding inside a GOP, or vice versa, and thereby set a suitable balance between encoding latency and throughput.
Abstract:
Image processing in mobile devices is optimized by combining at least two of the color conversion, rotation, and scaling operations. Received images, such as still images or frames of video stream, are subjected to a combined transformation after decoding, where each pixel is color converted (e.g. from YUV to RGB), rotated, and scaled as needed. By combining two or three of the processes into one, read/write operations consuming significant processing and memory resources are reduced enabling processing of higher resolution images and/or power and processing resource savings.
Abstract:
A semantic object tracking method tracks general semantic objects with multiple non-rigid motion, disconnected components and multiple colors throughout a vector image sequence. The method accurately tracks these general semantic objects by spatially segmenting image regions from a current frame and then classifying these regions as to which semantic object they originated from in the previous frame. To classify each region, the method performs a region based motion estimation between each spatially segmented region and the previous frame to compute the position of a predicted region in the previous frame. The method then classifies each region in the current frame as being part of a semantic object based on which semantic object in the previous frame contains the most overlapping points of the predicted region. Using this method, each region in the current image is tracked to one semantic object from the previous frame, with no gaps or overlaps. The method propagates few or no errors because it projects regions into a frame where the semantic object boundaries are previously computed rather than trying to project and adjust a boundary in a frame where the object's boundary is unknown.
Abstract:
A human speech detection method detects pure-speech signals in an audio signal containing a mixture of pure-speech and non-speech or mixed-speech signals. The method accurately detects the pure-speech signals by computing a novel Valley Percentage feature from the audio signal and then classifying the audio signals into pure-speech and non-speech (or mixed-speech) classifications. The Valley Percentage is a measurement of the low energy parts of the audio signal (the valley) in comparison to the high energy parts of the audio signal (the mountain). To classify the audio signal, the method performs a threshold decision on the value of the Valley Percentage. Using a binary mask, a high Valley Percentage is classified as pure-speech and a low Valley Percentage is classified as non-speech (or mixed-speech). The method further employs morphological filters to improve the accuracy of human speech detection. Before detection, a morphological closing filter may be employed to eliminate unwanted noise from the audio signal. After detection, a combination of morphological closing and opening filters may be employed to remove aberrant pure-speech and non-speech classifications from the binary mask resulting from impulsive audio signals in order to more accurately detect the boundaries between the pure-speech and non-speech portions of the audio signal. A number of parameters may be employed by the method to further improve the accuracy of human speech detection. For implementation in supervised digital audio signal applications, these parameters may be optimized by training the application a priori. For implementation in an unsupervised environment, adaptive determination of these parameters is also possible.
Abstract:
A sprite generation method used in video coding generates a sprite from the video objects in the frames of a video sequence. The method estimates the motion between a video object in a current frame and a sprite constructed from video objects for previous frames. Specifically, the method computes motion coefficients of a 2D transform that minimizes the intensity errors between pixels in the video object and corresponding pixels inside the sprite. The method uses the motion coefficients from the previous frame as a starting point to minimizing the intensity errors. After estimating the motion parameters for an object in the current frame, the method transforms the video object to the coordinate system of the sprite. The method blends the warped pixels of the video object with the pixels at corresponding positions in the sprite using rounding average such that each video object in the video sequence provides substantially the same contribution to the sprite.
Abstract:
A video compression error signal in a video compression scheme is affected by random and high frequency impulse noise. An error signal suppressor containing two filters is applied to the video compression error signal. The first filter reduces or eliminates random noise. The second filter eliminates high frequency impulse noise. Random and high frequency noise is reduced or eliminated from frequencies that are unimportant to human visual perception. The error signal suppressor reduces the overall video compression bitrate by up to between 10% and 20% which provides corresponding increases in video compression and transmission efficiency. The error signal suppressor is used in video compression encoding schemes such as MPEG to reduce random and high frequency noise.