Bulk data generation

up vote
3
down vote

favorite

I need to generate nearly 1 Billion records of unique integers.
I tried with awk but it is not generating more than 5million records.
Below is what I had tried so far -

 awk -v loop=10000000000 -v range=10000000000 'BEGIN{

  srand()

  do {

    numb = 1 + int(rand() * range)

    if (!(numb in prev)) {

       print numb

       prev[numb] = 1

       count++

    }

  } while (count<loop)

}'

But it is not generating more than 599160237 records and process got killed automatically

edited Nov 21 at 21:34

Rui F Ribeiro

38.2k1475125

asked Dec 26 '15 at 14:43

anurag

162211

You need to provide us with what you have so far done, so that we might spot why its not working. Post the code in your question, at least the relevant parts.
– zagrimsan
Dec 26 '15 at 14:46

awk -v loop=10000000000 -v range=10000000000 'BEGIN{ srand() do { numb = 1 + int(rand() * range) if (!(numb in prev)) { print numb prev[numb] = 1 count++ } } while (count<loop) }' tried with the above, but only 599160237 records were generated and after that process got killed. :(
– anurag
Dec 26 '15 at 14:47

Please edit your question and put the relevant information there so that people see it right away without having to wade through comments (it's also easier to format text in question).
– zagrimsan
Dec 26 '15 at 14:48

python may be more suitable choice for this task: related link. stackoverflow.com/questions/2076838/…
– Ijaz Ahmad Khan
Dec 26 '15 at 15:07

Unfortunately I don't know Python :(
– anurag
Dec 26 '15 at 15:10

add a comment |

up vote
3
down vote

favorite

I need to generate nearly 1 Billion records of unique integers.
I tried with awk but it is not generating more than 5million records.
Below is what I had tried so far -

 awk -v loop=10000000000 -v range=10000000000 'BEGIN{

  srand()

  do {

    numb = 1 + int(rand() * range)

    if (!(numb in prev)) {

       print numb

       prev[numb] = 1

       count++

    }

  } while (count<loop)

}'

But it is not generating more than 599160237 records and process got killed automatically

edited Nov 21 at 21:34

Rui F Ribeiro

38.2k1475125

asked Dec 26 '15 at 14:43

anurag

162211

You need to provide us with what you have so far done, so that we might spot why its not working. Post the code in your question, at least the relevant parts.
– zagrimsan
Dec 26 '15 at 14:46

awk -v loop=10000000000 -v range=10000000000 'BEGIN{ srand() do { numb = 1 + int(rand() * range) if (!(numb in prev)) { print numb prev[numb] = 1 count++ } } while (count<loop) }' tried with the above, but only 599160237 records were generated and after that process got killed. :(
– anurag
Dec 26 '15 at 14:47

Please edit your question and put the relevant information there so that people see it right away without having to wade through comments (it's also easier to format text in question).
– zagrimsan
Dec 26 '15 at 14:48

python may be more suitable choice for this task: related link. stackoverflow.com/questions/2076838/…
– Ijaz Ahmad Khan
Dec 26 '15 at 15:07

Unfortunately I don't know Python :(
– anurag
Dec 26 '15 at 15:10

add a comment |

up vote
3
down vote

favorite

I need to generate nearly 1 Billion records of unique integers.
I tried with awk but it is not generating more than 5million records.
Below is what I had tried so far -

 awk -v loop=10000000000 -v range=10000000000 'BEGIN{

  srand()

  do {

    numb = 1 + int(rand() * range)

    if (!(numb in prev)) {

       print numb

       prev[numb] = 1

       count++

    }

  } while (count<loop)

}'

But it is not generating more than 599160237 records and process got killed automatically

edited Nov 21 at 21:34

Rui F Ribeiro

38.2k1475125

asked Dec 26 '15 at 14:43

anurag

162211

I need to generate nearly 1 Billion records of unique integers.
I tried with awk but it is not generating more than 5million records.
Below is what I had tried so far -

 awk -v loop=10000000000 -v range=10000000000 'BEGIN{

  srand()

  do {

    numb = 1 + int(rand() * range)

    if (!(numb in prev)) {

       print numb

       prev[numb] = 1

       count++

    }

  } while (count<loop)

}'

But it is not generating more than 599160237 records and process got killed automatically

awk regular-expression

edited Nov 21 at 21:34

Rui F Ribeiro

38.2k1475125

asked Dec 26 '15 at 14:43

anurag

162211

edited Nov 21 at 21:34

Rui F Ribeiro

38.2k1475125

asked Dec 26 '15 at 14:43

anurag

162211

edited Nov 21 at 21:34

Rui F Ribeiro

38.2k1475125

edited Nov 21 at 21:34

Rui F Ribeiro

38.2k1475125

edited Nov 21 at 21:34

Rui F Ribeiro

38.2k1475125

asked Dec 26 '15 at 14:43

anurag

162211

asked Dec 26 '15 at 14:43

anurag

162211

asked Dec 26 '15 at 14:43

anurag

162211

You need to provide us with what you have so far done, so that we might spot why its not working. Post the code in your question, at least the relevant parts.
– zagrimsan
Dec 26 '15 at 14:46

awk -v loop=10000000000 -v range=10000000000 'BEGIN{ srand() do { numb = 1 + int(rand() * range) if (!(numb in prev)) { print numb prev[numb] = 1 count++ } } while (count<loop) }' tried with the above, but only 599160237 records were generated and after that process got killed. :(
– anurag
Dec 26 '15 at 14:47

Please edit your question and put the relevant information there so that people see it right away without having to wade through comments (it's also easier to format text in question).
– zagrimsan
Dec 26 '15 at 14:48

python may be more suitable choice for this task: related link. stackoverflow.com/questions/2076838/…
– Ijaz Ahmad Khan
Dec 26 '15 at 15:07

Unfortunately I don't know Python :(
– anurag
Dec 26 '15 at 15:10

add a comment |

You need to provide us with what you have so far done, so that we might spot why its not working. Post the code in your question, at least the relevant parts.
– zagrimsan
Dec 26 '15 at 14:46

awk -v loop=10000000000 -v range=10000000000 'BEGIN{ srand() do { numb = 1 + int(rand() * range) if (!(numb in prev)) { print numb prev[numb] = 1 count++ } } while (count<loop) }' tried with the above, but only 599160237 records were generated and after that process got killed. :(
– anurag
Dec 26 '15 at 14:47

Please edit your question and put the relevant information there so that people see it right away without having to wade through comments (it's also easier to format text in question).
– zagrimsan
Dec 26 '15 at 14:48

python may be more suitable choice for this task: related link. stackoverflow.com/questions/2076838/…
– Ijaz Ahmad Khan
Dec 26 '15 at 15:07

Unfortunately I don't know Python :(
– anurag
Dec 26 '15 at 15:10

You need to provide us with what you have so far done, so that we might spot why its not working. Post the code in your question, at least the relevant parts.
– zagrimsan
Dec 26 '15 at 14:46

awk -v loop=10000000000 -v range=10000000000 'BEGIN{ srand() do { numb = 1 + int(rand() * range) if (!(numb in prev)) { print numb prev[numb] = 1 count++ } } while (count<loop) }' tried with the above, but only 599160237 records were generated and after that process got killed. :(
– anurag
Dec 26 '15 at 14:47

Please edit your question and put the relevant information there so that people see it right away without having to wade through comments (it's also easier to format text in question).
– zagrimsan
Dec 26 '15 at 14:48

python may be more suitable choice for this task: related link. stackoverflow.com/questions/2076838/…
– Ijaz Ahmad Khan
Dec 26 '15 at 15:07

Unfortunately I don't know Python :(
– anurag
Dec 26 '15 at 15:10

add a comment |

4 Answers
4

active

oldest

votes

up vote
5
down vote

You could use GNU seq + sort to first generate a list of unique 1B integers (in sequential order), then sort -R to shuffle them randomly).
While this is not CPU-efficient, it is memory agnostic as sort will use as much memory as available, then revert to temporary files.

This will takes several minutes (depending on your machine's CPU/Ram/disk):

$ seq 1000000000 > 1B.txt



$ ls -lhog 1B.txt 

-rw-rw-r-- 1   9.3G Dec 26 17:31 1B.txt



$ sort -R 1B.txt > 1B.random.txt

If you have access to a machine with enough RAM you can use GNU shuf:

$ shuf -i 1-1000000000 > 1B.random.txt

Empirically, shuf needed ~8GB of free ram and ~6 minutes of runtime on my machine.

answered Dec 26 '15 at 22:46

A. Gordon

42924

I think this is very good solution, and it can easily support also a requirement of the range of the numbers being different (larger) than the count of generated numbers. Just make the argument for seq the range and after shuffling of the numbers, us e.g. head 10000000 1B.random.txt > 10M.random.txt would take the first 10 million numbers from set (the question has range=10 billion and count=1 billion).
– zagrimsan
Dec 27 '15 at 7:26

Worked for me....Thank a ton buddy :)
– anurag
Dec 27 '15 at 7:44

add a comment |

up vote
1
down vote

It will be better to use a program that will not allocate much of memory to complete the task. However, there is a problem with random number generation: if you need completely random numbers, then you need to use "good" random number source like /dev/urandom.

I think this C program can help you with this task. It generates numbers on the run, and with three arguments you specify: start int, end int and number of them to generate. So to generate a 100 ints in range in (0..200), you do:

./mkrnd 0 200 100

You probably will want a redirect to file, so do

./mkrnd 0 200 100 >randomints.txt

The compiling is simple, just do gcc mkrnd.c -o mkrnd (or I can compile it for you).

Believed to be fast enough, but still will require hours to work I think. For me on Athlon64 5000+:

% time null ./mkrnd 0 1000000000 10000000                                                          



real    0m33.471s

user    0m0.000s

sys 0m0.000s

Remove #if 0 ... #endif to make it grab random integers from /dev/urandom (maybe slower).

And about memory requirements: it takes only 4K RSS on musl system during all it's runtime.

EDIT: Replacing gettimeofday with clock_gettime gives double speed.

edited Dec 27 '15 at 6:18

answered Dec 27 '15 at 1:03

user140866

add a comment |

up vote
0
down vote

in python3.4 you can generate and play with huge numbers like this:

    #!/bin/python3.4

    import random

    print(random.sample(range(1, 1000000000000),1000000000))

this will print one billion unique numbers

if there is memory problem of allocating huge sample , then one can use the range and print the numbers in a loop , but that will be in a sequence , not random:

    x=range(1, 1000000000000)

    for i in x:

      print (i)     #or process i , whatever the operation is.

edited Dec 26 '15 at 17:12

answered Dec 26 '15 at 15:57

Ijaz Ahmad Khan

3,29931334

I think you got the scale wrong, max 10 billion, not 1000 billion... Produces MemoryError here, but that's what I'd expect to have from that approach unless given huge resources.
– zagrimsan
Dec 26 '15 at 16:13

range(1, 1000000000000) will not result in memory error because it doesn’t allocate memory at once , the number after the comma is the sample size , that will result in memory error if its too huge. but the other approach woould be to use range(1, 1000000000000) and get the numbers one by one in a loop.
– Ijaz Ahmad Khan
Dec 26 '15 at 17:08

with the second approach, you're just iterating over x, that would just print numbers between 1 and 1000000000000 in order, nothing random to it
– iruvar
Dec 28 '15 at 4:35

yes , it will not be random , but instead of printing you may add randomness to it along the way and use it the way you want it.
– Ijaz Ahmad Khan
Dec 28 '15 at 11:01

add a comment |

up vote
0
down vote

The reason for the process getting killed might be that awk has a bug/limitation in arrays that your code is hitting, or your code is just so space-consuming that it hits some process based limit.

I mean, you're trying to build an array with maximum index of 10 billion (based on the range) with 1 billion defined values. So, awk needs potentially to reserve space for 10 billion variables. I'm not familiar enough to tell how much space that would mean, but 10 billion 16 bit integers would mean 18.5 GB, and even if awk is clever in building such a sparse array, it would require over 1.8 GB just so store the numbers you're generating.

To be able to keep the results unique, you will need to have all the previous values somewhere, so it will necessarily be heavy on space requirements, but it might be that some other language would allow the algorithm to finish.

How to escape from the huge memory requirements, then?

A.Gordon presents one option, by relying on a sequence and just shuffling it for randomness. That works well when there is a requirement that the result should truly be numbers and you want them to be from a given range. If the range is more complex than from one to N, you could generate the sequence with awk and then pass it to sort -R. Also see my comment on the answer for how to make the range and the count of produced numbers be different.

One option could be to use a cryptographic (hash) function for producing the random numbers, but in that case you can't define the range to be 1 to N since those functions usually produce N bit output and you can't mangle the results without risking producing a collision (a duplicate number in the set).Such functions, however, would be guaranteed to easily produce 1 billion unique outputs (as those hash functions are designed to not produce the same output twice even with a extremely large number of repeated calls). Depending on the implementation, their output might not be numbers but strings, and one could possibly convert the string output to numbers, but since their output size is typically quite large, then range of the numbers resulting from the conversion would be really huge. You could start from this Stackoverflow question if you're interested in exploring this option.

If you can risk the chance of having a collision, even if that is rather unlikely, you could try using a good source of randomness (/dev/urandom is one option) to generate the 1 billion numbers. I don't know how likely it is that you could get 1 billion unique numbers from that, but trying it out would surely be worth the try. There is no memory-efficient way of telling if there is a duplicate in the result set, though, since that would require having all the numbers in memory for comparison.

edited May 23 '17 at 11:33

Community♦

answered Dec 26 '15 at 15:32

zagrimsan

692418

Thanks for the explanation...can you please provide me piece of code to do the same task in some other language...may be python....i dont know python unfortunately :(
– anurag
Dec 26 '15 at 15:36

I think A. Gordon's approach is very good as it get's rid of the memory requirement altogether.
– zagrimsan
Dec 27 '15 at 7:20

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f251644%2fbulk-data-generation%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

up vote
5
down vote

This will takes several minutes (depending on your machine's CPU/Ram/disk):

$ seq 1000000000 > 1B.txt



$ ls -lhog 1B.txt 

-rw-rw-r-- 1   9.3G Dec 26 17:31 1B.txt



$ sort -R 1B.txt > 1B.random.txt

If you have access to a machine with enough RAM you can use GNU shuf:

$ shuf -i 1-1000000000 > 1B.random.txt

Empirically, shuf needed ~8GB of free ram and ~6 minutes of runtime on my machine.

answered Dec 26 '15 at 22:46

A. Gordon

42924

I think this is very good solution, and it can easily support also a requirement of the range of the numbers being different (larger) than the count of generated numbers. Just make the argument for seq the range and after shuffling of the numbers, us e.g. head 10000000 1B.random.txt > 10M.random.txt would take the first 10 million numbers from set (the question has range=10 billion and count=1 billion).
– zagrimsan
Dec 27 '15 at 7:26

Worked for me....Thank a ton buddy :)
– anurag
Dec 27 '15 at 7:44

add a comment |

up vote
5
down vote

This will takes several minutes (depending on your machine's CPU/Ram/disk):

$ seq 1000000000 > 1B.txt



$ ls -lhog 1B.txt 

-rw-rw-r-- 1   9.3G Dec 26 17:31 1B.txt



$ sort -R 1B.txt > 1B.random.txt

If you have access to a machine with enough RAM you can use GNU shuf:

$ shuf -i 1-1000000000 > 1B.random.txt

Empirically, shuf needed ~8GB of free ram and ~6 minutes of runtime on my machine.

answered Dec 26 '15 at 22:46

A. Gordon

42924

I think this is very good solution, and it can easily support also a requirement of the range of the numbers being different (larger) than the count of generated numbers. Just make the argument for seq the range and after shuffling of the numbers, us e.g. head 10000000 1B.random.txt > 10M.random.txt would take the first 10 million numbers from set (the question has range=10 billion and count=1 billion).
– zagrimsan
Dec 27 '15 at 7:26

Worked for me....Thank a ton buddy :)
– anurag
Dec 27 '15 at 7:44

add a comment |

up vote
5
down vote

This will takes several minutes (depending on your machine's CPU/Ram/disk):

$ seq 1000000000 > 1B.txt



$ ls -lhog 1B.txt 

-rw-rw-r-- 1   9.3G Dec 26 17:31 1B.txt



$ sort -R 1B.txt > 1B.random.txt

If you have access to a machine with enough RAM you can use GNU shuf:

$ shuf -i 1-1000000000 > 1B.random.txt

Empirically, shuf needed ~8GB of free ram and ~6 minutes of runtime on my machine.

answered Dec 26 '15 at 22:46

A. Gordon

42924

This will takes several minutes (depending on your machine's CPU/Ram/disk):

$ seq 1000000000 > 1B.txt



$ ls -lhog 1B.txt 

-rw-rw-r-- 1   9.3G Dec 26 17:31 1B.txt



$ sort -R 1B.txt > 1B.random.txt

If you have access to a machine with enough RAM you can use GNU shuf:

$ shuf -i 1-1000000000 > 1B.random.txt

Empirically, shuf needed ~8GB of free ram and ~6 minutes of runtime on my machine.

answered Dec 26 '15 at 22:46

A. Gordon

42924

answered Dec 26 '15 at 22:46

A. Gordon

42924

answered Dec 26 '15 at 22:46

A. Gordon

42924

answered Dec 26 '15 at 22:46

A. Gordon

42924

I think this is very good solution, and it can easily support also a requirement of the range of the numbers being different (larger) than the count of generated numbers. Just make the argument for seq the range and after shuffling of the numbers, us e.g. head 10000000 1B.random.txt > 10M.random.txt would take the first 10 million numbers from set (the question has range=10 billion and count=1 billion).
– zagrimsan
Dec 27 '15 at 7:26

Worked for me....Thank a ton buddy :)
– anurag
Dec 27 '15 at 7:44

add a comment |

I think this is very good solution, and it can easily support also a requirement of the range of the numbers being different (larger) than the count of generated numbers. Just make the argument for seq the range and after shuffling of the numbers, us e.g. head 10000000 1B.random.txt > 10M.random.txt would take the first 10 million numbers from set (the question has range=10 billion and count=1 billion).
– zagrimsan
Dec 27 '15 at 7:26

Worked for me....Thank a ton buddy :)
– anurag
Dec 27 '15 at 7:44

I think this is very good solution, and it can easily support also a requirement of the range of the numbers being different (larger) than the count of generated numbers. Just make the argument for seq the range and after shuffling of the numbers, us e.g. head 10000000 1B.random.txt > 10M.random.txt would take the first 10 million numbers from set (the question has range=10 billion and count=1 billion).
– zagrimsan
Dec 27 '15 at 7:26

Worked for me....Thank a ton buddy :)
– anurag
Dec 27 '15 at 7:44

add a comment |

up vote
1
down vote

./mkrnd 0 200 100

You probably will want a redirect to file, so do

./mkrnd 0 200 100 >randomints.txt

The compiling is simple, just do gcc mkrnd.c -o mkrnd (or I can compile it for you).

Believed to be fast enough, but still will require hours to work I think. For me on Athlon64 5000+:

% time null ./mkrnd 0 1000000000 10000000                                                          



real    0m33.471s

user    0m0.000s

sys 0m0.000s

Remove #if 0 ... #endif to make it grab random integers from /dev/urandom (maybe slower).

And about memory requirements: it takes only 4K RSS on musl system during all it's runtime.

EDIT: Replacing gettimeofday with clock_gettime gives double speed.

edited Dec 27 '15 at 6:18

answered Dec 27 '15 at 1:03

user140866

add a comment |

up vote
1
down vote

./mkrnd 0 200 100

You probably will want a redirect to file, so do

./mkrnd 0 200 100 >randomints.txt

The compiling is simple, just do gcc mkrnd.c -o mkrnd (or I can compile it for you).

Believed to be fast enough, but still will require hours to work I think. For me on Athlon64 5000+:

% time null ./mkrnd 0 1000000000 10000000                                                          



real    0m33.471s

user    0m0.000s

sys 0m0.000s

Remove #if 0 ... #endif to make it grab random integers from /dev/urandom (maybe slower).

And about memory requirements: it takes only 4K RSS on musl system during all it's runtime.

EDIT: Replacing gettimeofday with clock_gettime gives double speed.

edited Dec 27 '15 at 6:18

answered Dec 27 '15 at 1:03

user140866

add a comment |

up vote
1
down vote

./mkrnd 0 200 100

You probably will want a redirect to file, so do

./mkrnd 0 200 100 >randomints.txt

The compiling is simple, just do gcc mkrnd.c -o mkrnd (or I can compile it for you).

Believed to be fast enough, but still will require hours to work I think. For me on Athlon64 5000+:

% time null ./mkrnd 0 1000000000 10000000                                                          



real    0m33.471s

user    0m0.000s

sys 0m0.000s

Remove #if 0 ... #endif to make it grab random integers from /dev/urandom (maybe slower).

And about memory requirements: it takes only 4K RSS on musl system during all it's runtime.

EDIT: Replacing gettimeofday with clock_gettime gives double speed.

edited Dec 27 '15 at 6:18

answered Dec 27 '15 at 1:03

user140866

./mkrnd 0 200 100

You probably will want a redirect to file, so do

./mkrnd 0 200 100 >randomints.txt

The compiling is simple, just do gcc mkrnd.c -o mkrnd (or I can compile it for you).

Believed to be fast enough, but still will require hours to work I think. For me on Athlon64 5000+:

% time null ./mkrnd 0 1000000000 10000000                                                          



real    0m33.471s

user    0m0.000s

sys 0m0.000s

Remove #if 0 ... #endif to make it grab random integers from /dev/urandom (maybe slower).

And about memory requirements: it takes only 4K RSS on musl system during all it's runtime.

EDIT: Replacing gettimeofday with clock_gettime gives double speed.

edited Dec 27 '15 at 6:18

answered Dec 27 '15 at 1:03

user140866

edited Dec 27 '15 at 6:18

answered Dec 27 '15 at 1:03

user140866

answered Dec 27 '15 at 1:03

user140866

answered Dec 27 '15 at 1:03

user140866

add a comment |

up vote
0
down vote

in python3.4 you can generate and play with huge numbers like this:

    #!/bin/python3.4

    import random

    print(random.sample(range(1, 1000000000000),1000000000))

this will print one billion unique numbers

if there is memory problem of allocating huge sample , then one can use the range and print the numbers in a loop , but that will be in a sequence , not random:

    x=range(1, 1000000000000)

    for i in x:

      print (i)     #or process i , whatever the operation is.

edited Dec 26 '15 at 17:12

answered Dec 26 '15 at 15:57

Ijaz Ahmad Khan

3,29931334

I think you got the scale wrong, max 10 billion, not 1000 billion... Produces MemoryError here, but that's what I'd expect to have from that approach unless given huge resources.
– zagrimsan
Dec 26 '15 at 16:13

range(1, 1000000000000) will not result in memory error because it doesn’t allocate memory at once , the number after the comma is the sample size , that will result in memory error if its too huge. but the other approach woould be to use range(1, 1000000000000) and get the numbers one by one in a loop.
– Ijaz Ahmad Khan
Dec 26 '15 at 17:08

with the second approach, you're just iterating over x, that would just print numbers between 1 and 1000000000000 in order, nothing random to it
– iruvar
Dec 28 '15 at 4:35

yes , it will not be random , but instead of printing you may add randomness to it along the way and use it the way you want it.
– Ijaz Ahmad Khan
Dec 28 '15 at 11:01

add a comment |

up vote
0
down vote

in python3.4 you can generate and play with huge numbers like this:

    #!/bin/python3.4

    import random

    print(random.sample(range(1, 1000000000000),1000000000))

this will print one billion unique numbers

if there is memory problem of allocating huge sample , then one can use the range and print the numbers in a loop , but that will be in a sequence , not random:

    x=range(1, 1000000000000)

    for i in x:

      print (i)     #or process i , whatever the operation is.

edited Dec 26 '15 at 17:12

answered Dec 26 '15 at 15:57

Ijaz Ahmad Khan

3,29931334

I think you got the scale wrong, max 10 billion, not 1000 billion... Produces MemoryError here, but that's what I'd expect to have from that approach unless given huge resources.
– zagrimsan
Dec 26 '15 at 16:13

range(1, 1000000000000) will not result in memory error because it doesn’t allocate memory at once , the number after the comma is the sample size , that will result in memory error if its too huge. but the other approach woould be to use range(1, 1000000000000) and get the numbers one by one in a loop.
– Ijaz Ahmad Khan
Dec 26 '15 at 17:08

with the second approach, you're just iterating over x, that would just print numbers between 1 and 1000000000000 in order, nothing random to it
– iruvar
Dec 28 '15 at 4:35

yes , it will not be random , but instead of printing you may add randomness to it along the way and use it the way you want it.
– Ijaz Ahmad Khan
Dec 28 '15 at 11:01

add a comment |

up vote
0
down vote

in python3.4 you can generate and play with huge numbers like this:

    #!/bin/python3.4

    import random

    print(random.sample(range(1, 1000000000000),1000000000))

this will print one billion unique numbers

if there is memory problem of allocating huge sample , then one can use the range and print the numbers in a loop , but that will be in a sequence , not random:

    x=range(1, 1000000000000)

    for i in x:

      print (i)     #or process i , whatever the operation is.

edited Dec 26 '15 at 17:12

answered Dec 26 '15 at 15:57

Ijaz Ahmad Khan

3,29931334

in python3.4 you can generate and play with huge numbers like this:

    #!/bin/python3.4

    import random

    print(random.sample(range(1, 1000000000000),1000000000))

this will print one billion unique numbers

if there is memory problem of allocating huge sample , then one can use the range and print the numbers in a loop , but that will be in a sequence , not random:

    x=range(1, 1000000000000)

    for i in x:

      print (i)     #or process i , whatever the operation is.

edited Dec 26 '15 at 17:12

answered Dec 26 '15 at 15:57

Ijaz Ahmad Khan

3,29931334

edited Dec 26 '15 at 17:12

answered Dec 26 '15 at 15:57

Ijaz Ahmad Khan

3,29931334

answered Dec 26 '15 at 15:57

Ijaz Ahmad Khan

3,29931334

answered Dec 26 '15 at 15:57

Ijaz Ahmad Khan

3,29931334

I think you got the scale wrong, max 10 billion, not 1000 billion... Produces MemoryError here, but that's what I'd expect to have from that approach unless given huge resources.
– zagrimsan
Dec 26 '15 at 16:13

range(1, 1000000000000) will not result in memory error because it doesn’t allocate memory at once , the number after the comma is the sample size , that will result in memory error if its too huge. but the other approach woould be to use range(1, 1000000000000) and get the numbers one by one in a loop.
– Ijaz Ahmad Khan
Dec 26 '15 at 17:08

with the second approach, you're just iterating over x, that would just print numbers between 1 and 1000000000000 in order, nothing random to it
– iruvar
Dec 28 '15 at 4:35

yes , it will not be random , but instead of printing you may add randomness to it along the way and use it the way you want it.
– Ijaz Ahmad Khan
Dec 28 '15 at 11:01

add a comment |

I think you got the scale wrong, max 10 billion, not 1000 billion... Produces MemoryError here, but that's what I'd expect to have from that approach unless given huge resources.
– zagrimsan
Dec 26 '15 at 16:13

range(1, 1000000000000) will not result in memory error because it doesn’t allocate memory at once , the number after the comma is the sample size , that will result in memory error if its too huge. but the other approach woould be to use range(1, 1000000000000) and get the numbers one by one in a loop.
– Ijaz Ahmad Khan
Dec 26 '15 at 17:08

with the second approach, you're just iterating over x, that would just print numbers between 1 and 1000000000000 in order, nothing random to it
– iruvar
Dec 28 '15 at 4:35

yes , it will not be random , but instead of printing you may add randomness to it along the way and use it the way you want it.
– Ijaz Ahmad Khan
Dec 28 '15 at 11:01

I think you got the scale wrong, max 10 billion, not 1000 billion... Produces MemoryError here, but that's what I'd expect to have from that approach unless given huge resources.
– zagrimsan
Dec 26 '15 at 16:13

range(1, 1000000000000) will not result in memory error because it doesn’t allocate memory at once , the number after the comma is the sample size , that will result in memory error if its too huge. but the other approach woould be to use range(1, 1000000000000) and get the numbers one by one in a loop.
– Ijaz Ahmad Khan
Dec 26 '15 at 17:08

with the second approach, you're just iterating over x, that would just print numbers between 1 and 1000000000000 in order, nothing random to it
– iruvar
Dec 28 '15 at 4:35

yes , it will not be random , but instead of printing you may add randomness to it along the way and use it the way you want it.
– Ijaz Ahmad Khan
Dec 28 '15 at 11:01

add a comment |

up vote
0
down vote

The reason for the process getting killed might be that awk has a bug/limitation in arrays that your code is hitting, or your code is just so space-consuming that it hits some process based limit.

How to escape from the huge memory requirements, then?

edited May 23 '17 at 11:33

Community♦

answered Dec 26 '15 at 15:32

zagrimsan

692418

Thanks for the explanation...can you please provide me piece of code to do the same task in some other language...may be python....i dont know python unfortunately :(
– anurag
Dec 26 '15 at 15:36

I think A. Gordon's approach is very good as it get's rid of the memory requirement altogether.
– zagrimsan
Dec 27 '15 at 7:20

add a comment |

up vote
0
down vote

The reason for the process getting killed might be that awk has a bug/limitation in arrays that your code is hitting, or your code is just so space-consuming that it hits some process based limit.

How to escape from the huge memory requirements, then?

edited May 23 '17 at 11:33

Community♦

answered Dec 26 '15 at 15:32

zagrimsan

692418

Thanks for the explanation...can you please provide me piece of code to do the same task in some other language...may be python....i dont know python unfortunately :(
– anurag
Dec 26 '15 at 15:36

I think A. Gordon's approach is very good as it get's rid of the memory requirement altogether.
– zagrimsan
Dec 27 '15 at 7:20

add a comment |

up vote
0
down vote

The reason for the process getting killed might be that awk has a bug/limitation in arrays that your code is hitting, or your code is just so space-consuming that it hits some process based limit.

How to escape from the huge memory requirements, then?

edited May 23 '17 at 11:33

Community♦

answered Dec 26 '15 at 15:32

zagrimsan

692418

The reason for the process getting killed might be that awk has a bug/limitation in arrays that your code is hitting, or your code is just so space-consuming that it hits some process based limit.

How to escape from the huge memory requirements, then?

edited May 23 '17 at 11:33

Community♦

answered Dec 26 '15 at 15:32

zagrimsan

692418

edited May 23 '17 at 11:33

Community♦

edited May 23 '17 at 11:33

Community♦

edited May 23 '17 at 11:33

Community♦

answered Dec 26 '15 at 15:32

zagrimsan

692418

answered Dec 26 '15 at 15:32

zagrimsan

692418

answered Dec 26 '15 at 15:32

zagrimsan

692418

Thanks for the explanation...can you please provide me piece of code to do the same task in some other language...may be python....i dont know python unfortunately :(
– anurag
Dec 26 '15 at 15:36

I think A. Gordon's approach is very good as it get's rid of the memory requirement altogether.
– zagrimsan
Dec 27 '15 at 7:20

add a comment |

Thanks for the explanation...can you please provide me piece of code to do the same task in some other language...may be python....i dont know python unfortunately :(
– anurag
Dec 26 '15 at 15:36

I think A. Gordon's approach is very good as it get's rid of the memory requirement altogether.
– zagrimsan
Dec 27 '15 at 7:20

Thanks for the explanation...can you please provide me piece of code to do the same task in some other language...may be python....i dont know python unfortunately :(
– anurag
Dec 26 '15 at 15:36

I think A. Gordon's approach is very good as it get's rid of the memory requirement altogether.
– zagrimsan
Dec 27 '15 at 7:20

add a comment |

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sstrhsrtj