多线程多核绑定

时间：2010-11-02 来源：renjiangyong

多核异构的计算时代已经来临，怎样才能发挥多核和异构平台的威力呢？我们就需要使用MPI，OPENMP等扩展来使用我们的众核性能，同时我们还需要CUDA，OPENCL等来实现的加速卡的功能。如果只是使用多核我们使用MPI就可以满足我们的要求，如果我们使用加速卡，我们通过CUDA，OPENCL也能满足我们的要求。
如果只是使用众核中单核的性能我们可以通过MPI+CUDA(OPENCL)来完成我们的要求，如果我们想同时发挥多核和加速卡的性能我们就需要使用MPI+OPENMP+CUDA或者MPI+pthread+CUDA或MPI+OPENCL来完成我们的加速。
下面我们看下MPI+pthread的方式，此时在一个CPU（众核）就不能使用多进程来完全使用CPU的性能，这时我们可以让主线程来使用GPU和其中一个核的计算能力，让多核CPU内的其他核通过多线程的方式来进行计算（参考http://tech.it168.com/a2010/0723/1081/000001081479_all.shtml），通过多线程的测试我们发现采用线程绑定核的能够提高程序的效率（参考http://www.ibm.com/developerworks/cn/linux/l-cn-optimization/index.html），所以参照第一篇进程绑定CPU有了这篇线程绑定CPU的文章。如果你想使用您的加速卡的话就可以在主线程中加入您所要做的工作。

代码参照（http://www.chinaunix.net/jh/4/904906.html）
具体pthread的使用参照：https://computing.llnl.gov/tutorials/pthreads/#PthreadsAPI
CPU绑定API的设置参照：http://www.kernel.org/doc/man-pages/online/pages/man3/CPU_SET.3.html

#include<stdlib.h>
#include<stdio.h>
#include<sys/types.h>
#include<sys/sysinfo.h>
#include<unistd.h>
#include<pthread.h> //pthread线程库

#define __USE_GNU
#include<sched.h>
#include<ctype.h>
#include<string.h>
#define THREAD_MAX_NUM 100 //1个CPU内的最多进程数

int num(0); //cpu中核数
void* threadFun(void* arg) //arg 传递线程标号（自己定义）
{

    cpu_set_t mask; //CPU核的集合
    cpu_set_t get;   //获取在集合中的CPU
    int *a = (int *)arg;
    printf("the a is:%d\n",*a); //显示是第几个线程
    CPU_ZERO(&mask);    //置空
    CPU_SET(*a,&mask);   //设置亲和力值
    if (sched_setaffinity(0, sizeof(mask), &mask) == -1)//设置线程CPU亲和力
    {
       printf("warning: could not set CPU affinity, continuing...\n");
    }
    while (1)
    {
       CPU_ZERO(&get);
       if (sched_getaffinity(0, sizeof(get), &get) == -1)//获取线程CPU亲和力
       {
            printf("warning: cound not get thread affinity, continuing...\n");
       }
       for (int i = 0; i < num; i++)
       {
           if (CPU_ISSET(i, &get))//判断线程与哪个CPU有亲和力
           {
                printf("this thread %d is running processor : %d\n", i,i);
           }
       }
    }
    return NULL;
}
int main(int argc, char* argv[])
{
    num = sysconf(_SC_NPROCESSORS_CONF); //获取核数
    pthread_t thread[THREAD_MAX_NUM];
    printf("system has %i processor(s). \n", num);
    int tid[THREAD_MAX_NUM];
    for(int i=0;i<num;i++)
    {
        tid[i] = i; //每个线程必须有个tid[i]
        pthread_create(&thread[0],NULL,threadFun,(void*)&tid[i]);
    }
    for(int i=0; i< num; i++)
    {
        pthread_join(thread[i],NULL);//等待所有的线程结束，线程为死循环所以CTRL+C结束
    }
    return 0;
}

查看你的线程情况可以在执行时在另一个窗口使用top -H来查看线程的情况，

查看各个核上的情况请使用top命令然后按数字“1”来查看。