五月婷中文字幕,久久精品国产亚州麻豆,自拍亚洲精品探花久

前言

cuda對(duì)圖像處理過程進(jìn)行加速是很常見的操作，而且圖像處理算法中sobel算子又是最為簡單的算法，本篇博客就是使用cuda opencv c++實(shí)現(xiàn)對(duì)圖像進(jìn)行sobel邊緣提取加速操作，cpu和gpu處理速度進(jìn)行一個(gè)對(duì)比試驗(yàn)

具體流程

首先貼上代碼

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <cuda.h>
#include <device_functions.h>
#include <opencv2\opencv.hpp>
#include <iostream>
using namespace std;
using namespace cv;

//Sobel算子邊緣檢測(cè)核函數(shù)
__global__ void sobelInCuda(unsigned char* dataIn, unsigned char* dataOut, int imgHeight, int imgWidth)
{
    int xIndex = threadIdx.x + blockIdx.x * blockDim.x;
    int yIndex = threadIdx.y + blockIdx.y * blockDim.y;
    int index = yIndex * imgWidth + xIndex;
    //printf("xIndex : %d,  yIndex: %d\n", xIndex, yIndex);
    //printf("index : %d\n", index);
    int Gx = 0;
    int Gy = 0;

    if (xIndex > 0 && xIndex < imgWidth - 1 && yIndex > 0 && yIndex < imgHeight - 1)
    {
        Gx = dataIn[(yIndex - 1) * imgWidth + xIndex + 1] + 2 * dataIn[yIndex * imgWidth + xIndex + 1] + dataIn[(yIndex + 1) * imgWidth + xIndex + 1]
            - (dataIn[(yIndex - 1) * imgWidth + xIndex - 1] + 2 * dataIn[yIndex * imgWidth + xIndex - 1] + dataIn[(yIndex + 1) * imgWidth + xIndex - 1]);
        Gy = dataIn[(yIndex - 1) * imgWidth + xIndex - 1] + 2 * dataIn[(yIndex - 1) * imgWidth + xIndex] + dataIn[(yIndex - 1) * imgWidth + xIndex + 1]
            - (dataIn[(yIndex + 1) * imgWidth + xIndex - 1] + 2 * dataIn[(yIndex + 1) * imgWidth + xIndex] + dataIn[(yIndex + 1) * imgWidth + xIndex + 1]);
        dataOut[index] = (abs(Gx) + abs(Gy)) / 2;
    }
}

//Sobel算子邊緣檢測(cè)CPU函數(shù)
void sobel(Mat srcImg, Mat dstImg, int imgHeight, int imgWidth)
{
    int Gx = 0;
    int Gy = 0;
    for (int i = 1; i < imgHeight - 1; i++)
    {
        uchar* dataUp = srcImg.ptr<uchar>(i - 1);
        uchar* data = srcImg.ptr<uchar>(i);
        uchar* dataDown = srcImg.ptr<uchar>(i + 1);
        uchar* out = dstImg.ptr<uchar>(i);
        for (int j = 1; j < imgWidth - 1; j++)
        {
            Gx = (dataUp[j + 1] + 2 * data[j + 1] + dataDown[j + 1]) - (dataUp[j - 1] + 2 * data[j - 1] + dataDown[j - 1]);
            Gy = (dataUp[j - 1] + 2 * dataUp[j] + dataUp[j + 1]) - (dataDown[j - 1] + 2 * dataDown[j] + dataDown[j + 1]);
            out[j] = (abs(Gx) + abs(Gy)) / 2;
        }
    }
}

int main()
{
    Mat grayImg = imread("lena.jpg", 0);

    int imgHeight = grayImg.rows;
    int imgWidth = grayImg.cols;

    Mat gaussImg;
    //高斯濾波
    GaussianBlur(grayImg, gaussImg, Size(3, 3), 0, 0, BORDER_DEFAULT);

    //Sobel算子CPU實(shí)現(xiàn)
    Mat dst(imgHeight, imgWidth, CV_8UC1, Scalar(0));
    double time1 = static_cast<double>(cv::getTickCount());
    sobel(gaussImg, dst, imgHeight, imgWidth);
    double time2 = static_cast<double>(cv::getTickCount());
    std::cout << "cpu Time use: " << 1000 * (time2 - time1) / cv::getTickFrequency() << "ms" << std::endl;//輸出運(yùn)行時(shí)間
    cv::imwrite("dst_cpu.jpg", dst);
    //CUDA實(shí)現(xiàn)后的傳回的圖像
    Mat dstImg(imgHeight, imgWidth, CV_8UC1, Scalar(0));

    //創(chuàng)建GPU內(nèi)存
    unsigned char* d_in;
    unsigned char* d_out;

    cudaMalloc((void**)&d_in, imgHeight * imgWidth * sizeof(unsigned char));
    cudaMalloc((void**)&d_out, imgHeight * imgWidth * sizeof(unsigned char));

    //將高斯濾波后的圖像從CPU傳入GPU
    cudaMemcpy(d_in, gaussImg.data, imgHeight * imgWidth * sizeof(unsigned char), cudaMemcpyHostToDevice);

    dim3 threadsPerBlock(32, 32);
    dim3 blocksPerGrid((imgWidth + threadsPerBlock.x - 1) / threadsPerBlock.x, (imgHeight + threadsPerBlock.y - 1) / threadsPerBlock.y);
       
    //記錄起始時(shí)間 
    cudaEvent_t start, stop;
    cudaEventCreate(&start);
    cudaEventCreate(&stop);
    cudaEventRecord(start, 0);
    //調(diào)用核函數(shù)

    sobelInCuda << <blocksPerGrid, threadsPerBlock >> > (d_in, d_out, imgHeight, imgWidth);  

    //將圖像傳回GPU
    cudaMemcpy(dstImg.data, d_out, imgHeight * imgWidth * sizeof(unsigned char), cudaMemcpyDeviceToHost);

    cudaEventRecord(stop, 0);
    cudaEventSynchronize(stop);

    float elapsedTime;
    cudaEventElapsedTime(&elapsedTime, start, stop);
    printf("gpu time cost: %3.5f ms", elapsedTime);
    cv::imwrite("dst_gpu.jpg", dstImg);
    //釋放GPU內(nèi)存
    cudaFree(d_in);
    cudaFree(d_out);

    cudaEventDestroy(start);
    cudaEventDestroy(stop);
   
    return 0;
}

當(dāng)然，這里的代碼是參考別人。下面著重講講對(duì)這部分代碼的理解吧
首先，Sobel算子：是離散微分算子（discrete differentiation operator），用來計(jì)算圖像灰度的近似梯度，梯度越大越有可能是邊緣。sobel邊緣提取就是使用sobel算子對(duì)圖像進(jìn)行卷積操作，這里sobel算子選用的是一個(gè)3*3的矩陣

sobel算子.jpg

sobel算子cpu的實(shí)現(xiàn)，這樣沒有太多可說的，基本上就是對(duì)一個(gè)二維矩陣進(jìn)行矩陣乘積操作，然后再將矩陣乘積結(jié)果回填到目標(biāo)像素位置

而sobel算子的gpu實(shí)現(xiàn)，則相對(duì)來說比較復(fù)雜一些。首先創(chuàng)建了block和grid的尺寸，即代碼中的threadsPerBlock和blocksPerGrid，使用cudaMalloc分配圖像尺寸大小的內(nèi)存空間，用來作為核函數(shù)的輸入變量和輸出變量，最后調(diào)用核函數(shù)完成并行操作，即sobel邊緣提取

再看看核函數(shù)內(nèi)部的實(shí)現(xiàn)

    int xIndex = threadIdx.x + blockIdx.x * blockDim.x;
    int yIndex = threadIdx.y + blockIdx.y * blockDim.y;

這兩行代碼主要是獲取線程塊內(nèi)的坐標(biāo)位置，這個(gè)位置和圖像矩陣位置是一一對(duì)應(yīng)的

int index = yIndex * imgWidth + xIndex;

而這行代碼，則是實(shí)現(xiàn)圖像矩陣坐標(biāo)和data_in， data_out這兩個(gè)指針里面存儲(chǔ)的一維數(shù)組的映射關(guān)系，(xIndex, yIndex)該坐標(biāo)像素值即為每次線程運(yùn)行時(shí)，做sobel運(yùn)算時(shí)的中心坐標(biāo)，而實(shí)際求Gx和Gy時(shí)，還需要獲取該中心坐標(biāo)周圍的8個(gè)坐標(biāo)值來進(jìn)行運(yùn)算，所以是這樣進(jìn)行書寫的代碼

 Gx = dataIn[(yIndex - 1) * imgWidth + xIndex + 1] + 2 * dataIn[yIndex * imgWidth + xIndex + 1] + dataIn[(yIndex + 1) * imgWidth + xIndex + 1]
            - (dataIn[(yIndex - 1) * imgWidth + xIndex - 1] + 2 * dataIn[yIndex * imgWidth + xIndex - 1] + dataIn[(yIndex + 1) * imgWidth + xIndex - 1]);
        Gy = dataIn[(yIndex - 1) * imgWidth + xIndex - 1] + 2 * dataIn[(yIndex - 1) * imgWidth + xIndex] + dataIn[(yIndex - 1) * imgWidth + xIndex + 1]
            - (dataIn[(yIndex + 1) * imgWidth + xIndex - 1] + 2 * dataIn[(yIndex + 1) * imgWidth + xIndex] + dataIn[(yIndex + 1) * imgWidth + xIndex + 1]);

運(yùn)算的結(jié)果存回到dataOut

整個(gè)測(cè)試中輸入圖片是女神lena

lena.jpg

cpu和gpu輸出的結(jié)果為

dst_cpu.jpg

dst_gpu.jpg

可以看出，gpu和cpu的結(jié)果圖是一模一樣的

cpu和gpu的時(shí)間測(cè)試

上一部分?jǐn)⑹隽薱pu和gpu版本的sobel算子邊緣提取，現(xiàn)在來測(cè)試一下兩個(gè)版本算法的運(yùn)行時(shí)間

cpu的版本使用的時(shí)間測(cè)試工具為opencv中自帶的計(jì)時(shí)工具，我一直在使用，比較靠譜。測(cè)試方式也比較簡單，就是在運(yùn)行sobel函數(shù)前后加上 static_cast<double>(cv::getTickCount())函數(shù)，然后前后兩次count數(shù)相減再除于count的頻次就可以了，這里我將count乘以1000，這樣測(cè)試得到的時(shí)間基本單位就是ms

gpu版本使用的時(shí)間測(cè)試工具是cuda自帶的工具cudaEventElapsedTime，在計(jì)算時(shí)延之前還需要進(jìn)行cudaEventRecord操作，避免沒有g(shù)pu線程沒有完全執(zhí)行結(jié)束

最終測(cè)試的結(jié)果如下：

cpu和gpu計(jì)時(shí).jpg

可以看出，對(duì)于同一個(gè)輸入圖片，gpu的運(yùn)行時(shí)間是cpu大概1/5的樣子

這里我對(duì)gpu進(jìn)行計(jì)時(shí)的時(shí)候?qū)D像從gpu傳到cpu的操作也進(jìn)行了計(jì)時(shí)，如果將這部分除去，gpu的時(shí)間會(huì)更短

cpu與gpu計(jì)時(shí)2.jpg

可以看出，這次sobel算子邊緣提取，真正在gpu上進(jìn)行運(yùn)算的時(shí)間僅為0.02ms，絕大部分時(shí)間是將數(shù)據(jù)從gpu拷貝到cpu的時(shí)間，這個(gè)時(shí)間會(huì)占據(jù)整個(gè)流程的85%左右

總結(jié)

整個(gè)探索過程還是比較有意義，既熟悉了sobel算子的原理，又熟悉了gpu并行運(yùn)算的整體流程，還做cpu與gpu運(yùn)算版本時(shí)間對(duì)比。

參考博客

譚升的博客
 CUDA精進(jìn)之路（四）：圖像處理——Sobel算子邊緣檢測(cè)

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av

CUDA opencv sobel算子加速

CUDA opencv sobel算子加速

前言

具體流程

cpu和gpu的時(shí)間測(cè)試

總結(jié)

參考博客

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九 欧美,1769亚洲,黄色成人av

CUDA opencv sobel算子加速

前言

具體流程

cpu和gpu的時(shí)間測(cè)試

總結(jié)

參考博客

相關(guān)閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容

色偷偷精品伊人,欧洲久久精品,欧美综合婷婷骚逼,国产AV主播,国产最新探花在线,九色在线视频一区,伊人大交九欧美,1769亚洲,黄色成人av