|
Page 1 of 1 |
|
Posted: Fri, 28th Nov 2014 18:13 Post subject: Generating some normal distributed random numbers in c++ |
|
 |
Hallo.
I am solving some equations and need a lot of normal distributed random numbers. They don't have to be super duper random since the equations are supposed to be mostly driven by a deterministic flow. Writing in c++ I thought std::normal_distribution might be a nice try. I seem to be making a error when using it since it is very slow. Heres is my code.
Code: | #include <stdlib.h>
#include <iostream>
#include <time.h>
#include <random>
//
using namespace std;
int main()
{
minstd_rand0 eng(time(0));
normal_distribution<double> normal(0, 1);
int N=1024;
long long int nvol=N*N;
double *q1 = new double[nvol+2*N];
double *q2 = new double[nvol+2*N];
//
int i;
int j;
long starttime=time(0);
for(int k=0; k<1000; k++){
for(i=0; i<nvol; i++)
{
q1[i]=normal(eng);
q2[i]=normal(eng);
}
if(k%100==0){
cout << time(0)-starttime << endl;
}
}
return 0;
}
|
It takes approximately 40 sec for 100 k when I run this code. To compare matlab on my laptop seems to be able to generate 1000 k in under 40 sec. How do I reach these speeds?
|
|
Back to top |
|
 |
[mrt]
[Admin] Code Monkey
Posts: 1342
|
Posted: Fri, 28th Nov 2014 19:50 Post subject: |
|
 |
Hi,
Well let me ask you the obvious question. Are you running the Release build or the Debug build? I'm asking because my i5-2500 generates 100k values in 9 seconds on a single core.
Here is the output:
Code: |
0
9
18
28
37
46
56
65
74
83
Press any key to continue . . .
|
Another thing you could do is see if you can distribute the calculation to multiple cores and considerably speed it up, or even further use SIMD instructions.
Good luck!
teey
|
|
Back to top |
|
 |
|
Posted: Fri, 28th Nov 2014 19:58 Post subject: |
|
 |
In C#, if I use a manual way to calculate the normal distribution for 10 k it required 36 seconds (Laptop with mobile i5)
Code: |
Stopwatch watch = new Stopwatch(); ;
List<Double> rndm = new List<Double>();
watch.Start();
for (int k = 0; k< 100; k++){
for (int i = 0; i < 1024*1024; i++)
{
Random rand = new Random();
double x = rand.NextDouble(), y = rand.NextDouble();
rndm.Add(Math.Sqrt(-2.0 * Math.Log(x)) * Math.Sin(2.0 * Math.PI * y));
}
Console.WriteLine(k+": "+watch.ElapsedMilliseconds);
}
Console.ReadLine();
|
|
|
Back to top |
|
 |
|
Posted: Fri, 28th Nov 2014 20:29 Post subject: |
|
 |
[mrt] wrote: | Hi,
Well let me ask you the obvious question. Are you running the Release build or the Debug build? I'm asking because my i5-2500 generates 100k values in 9 seconds on a single core.
Here is the output:
Code: |
0
9
18
28
37
46
56
65
74
83
Press any key to continue . . .
|
Another thing you could do is see if you can distribute the calculation to multiple cores and considerably speed it up, or even further use SIMD instructions.
Good luck! |
I am sort of new to all this since until recently I managed with matlab so bare with me. g++ -v gives me.
gcc version 4.9.1 20140922 (Red Hat 4.9.1-10) (GCC)
I am not sure if that is what your asking about? Is there any tricks I should know about when compiling?
About parallelizing: A stack exchange post told me that generating random numbers should in principle be quite fast such that it shouldn't be necessary for only 2 mill numbers. Further more I couldn't figure out how to do it with openmp :/.
PumpAction wrote: | In C#, if I use a manual way to calculate the normal distribution for 10 k it required 36 seconds (Laptop with mobile i5)
Code: |
Stopwatch watch = new Stopwatch(); ;
List<Double> rndm = new List<Double>();
watch.Start();
for (int k = 0; k< 100; k++){
for (int i = 0; i < 1024*1024; i++)
{
Random rand = new Random();
double x = rand.NextDouble(), y = rand.NextDouble();
rndm.Add(Math.Sqrt(-2.0 * Math.Log(x)) * Math.Sin(2.0 * Math.PI * y));
}
Console.WriteLine(k+": "+watch.ElapsedMilliseconds);
}
Console.ReadLine();
|
|
So that is in the wrong direction?
Edit: Windows Task Manager is reporting Matlab to be using ~30 in the cpu tab on a i7-3520M. I suppose that is 1 core and still it is quite a lot faster than even mrt. hmm
|
|
Back to top |
|
 |
|
Posted: Fri, 28th Nov 2014 20:40 Post subject: |
|
 |
2 million numbers?
100 * 1.024*1.024 = 100 * 1.048.576 = 104.857.600
|
|
Back to top |
|
 |
|
Posted: Fri, 28th Nov 2014 20:43 Post subject: |
|
 |
PumpAction wrote: | 2 million numbers?
100 * 1.024*1.024 = 100 * 1.048.576 = 104.857.600 |
Sorry. It' s is because k is a marching time parameter. After each k i do stuff with some different vectors and the random numbers, and then have to generate some new. So per time step I need 2*1024^2 random numbers.
Last edited by Atropa on Fri, 28th Nov 2014 20:44; edited 1 time in total
|
|
Back to top |
|
 |
|
Posted: Fri, 28th Nov 2014 20:43 Post subject: |
|
 |
@Atropa
If you are compiling from shell use -O3 optimization flag.

Last edited by WhiteBarbarian on Fri, 28th Nov 2014 20:47; edited 1 time in total
|
|
Back to top |
|
 |
|
|
Back to top |
|
 |
|
Posted: Fri, 28th Nov 2014 21:00 Post subject: |
|
 |
The rest is assembler magic 
|
|
Back to top |
|
 |
|
Posted: Fri, 28th Nov 2014 21:13 Post subject: |
|
 |
Matlab is really a high level all purpose language. Isn't a bit weird it is so much faster?
|
|
Back to top |
|
 |
[mrt]
[Admin] Code Monkey
Posts: 1342
|
|
Back to top |
|
 |
|
|
Back to top |
|
 |
|
Posted: Sun, 30th Nov 2014 17:52 Post subject: |
|
 |
I pulled my random generator from my old raytracer project, back in the day I used it for generating all kinds of sampling data for rendering. I don't remember how good it is or are there any limitations, so you should probably check the random data before going forward with it. It generates floats and not doubles like your example but you said precision wasn't important and you can convert to double later.
It takes under 2 seconds to generate 1000 k. If your CPU has AVX or better extensions you can update the code and probably make it run 2-4x faster (theoretically).
This works on MSVC, will probably not work on other compilers as I assume they have differently named intrinsic functions for dealing with vector instructions.
Code: |
#include "stdafx.h"
#include <stdlib.h>
#include <iostream>
#include <time.h>
#include <random>
//
using namespace std;
__declspec(align(16)) __m128i CurrentSeed;
#define RAND_MAX_SSE 2147483647
void RTSRand_SSE(unsigned int Seed)
{
CurrentSeed = _mm_set_epi32(Seed, Seed + 1, Seed, Seed + 1);
}
// Returns a set of 4 random float numbers between 0.0 and 1.0
void RTRand_SSE(__m128& Result)
{
__declspec(align(16)) __m128i cur_seed_split;
__declspec(align(16)) __m128i multiplier;
__declspec(align(16)) __m128i adder;
__declspec(align(16)) __m128i mod_mask;
__declspec(align(16)) static const unsigned int mult[4] = { 214013, 17405, 214013, 69069 };
__declspec(align(16)) static const unsigned int gadd[4] = { 2531011, 10395331, 13737667, 1 };
__declspec(align(16)) static const unsigned int mask[4] = { 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF, 0x7FFFFFFF };
adder = _mm_load_si128((__m128i*) gadd);
multiplier = _mm_load_si128((__m128i*) mult);
mod_mask = _mm_load_si128((__m128i*) mask);
cur_seed_split = _mm_shuffle_epi32(CurrentSeed, _MM_SHUFFLE(2, 3, 0, 1));
CurrentSeed = _mm_mul_epu32(CurrentSeed, multiplier);
multiplier = _mm_shuffle_epi32(multiplier, _MM_SHUFFLE(2, 3, 0, 1));
cur_seed_split = _mm_mul_epu32(cur_seed_split, multiplier);
cur_seed_split = _mm_and_si128(cur_seed_split, mod_mask);
cur_seed_split = _mm_shuffle_epi32(cur_seed_split, _MM_SHUFFLE(2, 3, 0, 1));
CurrentSeed = _mm_or_si128(CurrentSeed, cur_seed_split);
CurrentSeed = _mm_add_epi32(CurrentSeed, adder);
CurrentSeed = _mm_and_si128(CurrentSeed, mod_mask);
Result = _mm_div_ps(_mm_cvtepi32_ps(CurrentSeed), _mm_set_ps1((float)RAND_MAX_SSE));
}
int _tmain(int argc, _TCHAR* argv[])
{
RTSRand_SSE(345662);
int N = 1024;
long long int nvol = N*N;
float *q1 = (float*)_aligned_malloc(sizeof(float) * nvol + 2 * N, 16);
float *q2 = (float*)_aligned_malloc(sizeof(float) * nvol + 2 * N, 16);
//
int i;
int j;
long starttime = time(0);
for (int k = 0; k < 1000; k++){
for (i = 0; i < nvol; i += 4)
{
__m128 output;
RTRand_SSE(output);
_mm_store_ps(&q1[i], output);
RTRand_SSE(output);
_mm_store_ps(&q2[i], output);
}
if (k % 100 == 0){
cout << time(0) - starttime << endl;
}
}
return 0;
}
|
|
|
Back to top |
|
 |
|
Posted: Sun, 30th Nov 2014 18:02 Post subject: |
|
 |
(Didn't see you needed normal distribution, mine isn't it, sorry. You could apply a transform on top of the generated data, although you're on your own for that one.)
|
|
Back to top |
|
 |
[mrt]
[Admin] Code Monkey
Posts: 1342
|
|
Back to top |
|
 |
|
|
Back to top |
|
 |
|
Posted: Fri, 12th Dec 2014 23:12 Post subject: |
|
 |
Anybody on Yosemite who uses Brew are using Clang indirectly.
|
|
Back to top |
|
 |
LeoNatan
☢ NFOHump Despot ☢
Posts: 73238
Location: Ramat HaSharon, Israel 🇮🇱
|
Posted: Sat, 13th Dec 2014 00:15 Post subject: |
|
 |
tainted4ever wrote: | No one here uses Clang?  |

|
|
Back to top |
|
 |
|
Posted: Sun, 14th Dec 2014 09:51 Post subject: |
|
 |
|
|
Back to top |
|
 |
[mrt]
[Admin] Code Monkey
Posts: 1342
|
|
Back to top |
|
 |
Page 1 of 1 |
All times are GMT + 1 Hour |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB 2.0.8 © 2001, 2002 phpBB Group
|
|
 |
|