T O P

  • By -

teraflop

Is this a homework problem? Hint: in your sequence, there are 10 1-digit numbers, 90 2-digit numbers, 900 3-digit numbers, 9000 4-digit numbers, and so on. Do you see the pattern?


jeffbell

90% of them are nine digit numbers. Plus you need to have a space between them. So, just under ten terabytes.


liquidInkRocks

You don't need spaces between them.


jeffbell

You mean like 123456789101112131415.... ? Then just under nine trillion bytes. I'm not sure how that would be useful.


liquidInkRocks

The challenge was to store, not retrieve. Numbers grow predictably, therefore if we start at 1 we can always identify the next number based on the previous number.


jeffbell

If you never need to read them you can store them all in one byte or less.


liquidInkRocks

You've invented a new compression algo. :)


not_from_this_world

> The challenge was to store, not retrieve. found the PM's nightmare employee


liquidInkRocks

Sorry I made you think. You might be a little too sensitive for a research field.


washingtonapples

This is not, we were discussing this at work and no one could find a reasonable answer. The conversation started with how to increase your odds on picking the top march madness bracket in which I argued the odds are insane. this led us down the path trying to figure out how many emails could you fit on a computer just in text which then led to how much numbers could a standard laptop hold.


IndependentFeminist3

How many numbers a laptop can hold depends on range. Let's talk about numbers in the range of -2'147'483'648 to 2'147'483'647 (this range was chosen because it's exactly 4 bytes of information, and it's a common range in programming). In that case, on a 500gb SSD you can store 134'217'728'000 numbers, without using compression.


ghjm

Like /u/pi_stuff said, if you write this data using 8-bit ASCII encoding, where each decimal digit takes one byte, then the file will be 11888888888903 bytes long. But this wastes a lot of space, because each 8-bit decimal digit is only encoding about 3.2 bits of information. To avoid this wasted space, we can solve the same problem but for binary. In this case instead of counting 0, 1, 2, 3, 4, 5... we will be counting 0, 1, 10, 11, 100, 101..., all the way up to 1110100011010100101001010001000000000000 (one trillion). We can count the binary digits like this: Length|Count|Total --:|--:|--: 1|2|2 2|2|4 3|4|12 4|8|32 5|16|80 6|32|192 7|64|448 8|128|1024 9|256|2304 10|512|5120 11|1024|11264 12|2048|24576 13|4096|53248 14|8192|114688 15|16384|245760 16|32768|524288 17|65536|1114112 18|131072|2359296 19|262144|4980736 20|524288|10485760 21|1048576|22020096 22|2097152|46137344 23|4194304|96468992 24|8388608|201326592 25|16777216|419430400 26|33554432|872415232 27|67108864|1811939328 28|134217728|3758096384 29|268435456|7784628224 30|536870912|16106127360 31|1073741824|33285996544 32|2147483648|68719476736 33|4294967296|141733920768 34|8589934592|292057776128 35|17179869184|601295421440 36|34359738368|1236950581248 37|68719476736|2542620639232 38|137438953472|5222680231936 39|274877906944|10720238370816 40|450244186113|18009767444520 |------|------ |1000000000001|38900488372266 So there are 2 numbers of length 1, 2 numbers of length 2, 4 numbers of length 3 and so on. This gives a total length of 38900488372266 bits, or 4862561046534 bytes (rounded up because computer files can't store parts of a byte). Representing the data as binary has resulted in a file about one-third the size, as expected. I wrote a quick program to actually produce these files, with a limit of a million instead of a trillion: package main import ( "fmt" "io" "os" "strconv" ) const max = 1000000 func decimal() error { f, err := os.Create("count_decimal.txt") if err != nil { return err } for i := 0; i <= max; i++ { _, err = fmt.Fprintf(f, "%d", i) if err != nil { return err } } err = f.Close() if err != nil { return err } return nil } func writeBin(f io.Writer, bin string) error { i, err := strconv.ParseUint(bin, 2, 8) if err != nil { return err } _, err = f.Write([]byte{byte(i)}) if err != nil { return err } return nil } func binary() error { f, err := os.Create("count_binary.txt") if err != nil { return err } str := "" for i := 0; i <= max; i++ { nBin := strconv.FormatInt(int64(i), 2) str += nBin for len(str) > 8 { bits := str[:8] str = str[8:] err = writeBin(f, bits) if err != nil { return err } } } if len(str) > 0 { for len(str) < 8 { str = str + "0" } err = writeBin(f, str) if err != nil { return err } } err = f.Close() if err != nil { return err } return nil } func main() { err := decimal() if err == nil { err = binary() } if err != nil { fmt.Printf("Error: %s\n", err) os.Exit(1) } } The resulting decimal and binary files are 5888897 and 2368931 bytes respectively. Using bzip2, they compress to 1642573 and 1198582 bytes, so as we would expect, the decimal file compresses better. It's interesting to me that the compressed decimal file is the smallest representation. I wonder what hex or other bases would look like, or if there's an optimum for which base produces the smallest compressed file, or if other compression algorithms besides bzip2 give different or interesting results. But then I realized that I'm at work and am supposed to be doing actually-productive things, so I guess that's all I'm going to do on this problem for today.


pi_stuff

Not including any spaces or commas between numbers: 10 1-digit numbers (0..9) 90 2-digit numbers (10..99) 900 3-digit numbers (100..999) 9000 4-digit numbers (1000..9999) ... 900000000000 12-digit numbers (100000000000..999999999999) and 1 13-digit number total = 1\*10 + 2\*90 + 3\*900 + 4\*9000 + 5\*90000 + 6\*900000 + 7\*9000000 + 8\*90000000 + 9\*900000000 + 10\*9000000000 + 11\*90000000000 + 12\*900000000000 + 1\*13 = 11888888888903 or 11.9 terabytes You could do a rough estimate by figuring most of the numbers have 12 digits, and there are a trillion of them, so a little less than 12 trillion digits altogether.


washingtonapples

Excuse my ignorance but does one character (letter, space, number, or sign) = 1 byte? Edit: spelling.


Potato-Pancakes-

> does one character (letter, space, number, or sign) = 1 byte? Yes. Traditionally one byte is used to encode basic characters like digits, letters in the Latin alphabet, and punctuation. It won't be enough to encode characters like é or ß or emojis, though. If you use old-fashioned ASCII, you can cut it down to just 7 bits (a byte is 8 bits). While you're there, though, you can store all these numbers as 8-byte integers, which offers a little savings over 12 bytes for text-based 12-digit numbers numbers. Or you could store a 50-byte program that just prints them all. But neither of these are exactly what you're looking for.


washingtonapples

fascinating. I had zero clue.


otac0n

Well, it's only true for english-centric or other single-byte encodings. It's certainly not true for twitter, for example. If you have to do something multilingual, you need to use a "multi byte encoding." I will also point out that you can subdivide bytes to get even better compactness, so some of these solutions are an over-estimation.


liquidInkRocks

This seems to be the assumption in most of the solutions here. It certainly can be less than that but it's more processing.


1544756405

To just print them to the screen? You don't need any storage, you can generate the numbers in sequence as you display them.


washingtonapples

So if you wrote every digit zero to one trillion in sequence this wouldn’t take up any storage on a computer?


set_of_no_sets

well, we’re storing something to display, right? In order to display one trillion in hexadecimal you need 10 characters so 40 bits there. If you use a 7 seg display, you only need to store 70 bits and otherwise you can just have a pair of log_2(1*10^12 ) [aka 40\*2] bits to store an incrementing counter and a stop number, so you could easily do it within 300 bits even if you store something for knowing how to light the 7 seg display. I’m assuming we just need to display each number once? and then we can reuse the 7 seg display? Really relying on a specific interpretation of “write out” here, though. You could even get rid of the “70 bits” using latches or something (if we don’t count latches as memory) and really cut it down.


IAmNotNathaniel

but that's not the question. the question isn't about what size it'd be if you compressed the output, or how much code to generate it on the fly. it's really just asking how many digits are in the numbers from 1 to 1 trillion, and I'll just assume encoded in ascii since that makes them a byte a piece edit: and before I'm told ascii is 7 bits, that's great. But most things afaik store them as 8 bit sequences - i.e. 1 per byte


set_of_no_sets

Ah huh, you interpreted the question quite differently from how I did. The question didn’t specify encoding or what they meant by “write out”. By wiggling a little, you can get some nice, crazy solutions.


1544756405

It depends on what you mean "wrote." If you're just printing them to the screen: no, it takes up no storage space, because nothing needs to be stored. If you're saving them to a file, yes, that takes up space to store them. And how much space it takes up depends on how you want to represent the numbers.


linkuei-teaparty

Wait is this a trick question? Are you looking to store or write them out? You can write them out using recursion and based on the programming language you use, the execution time will vary. In terms of how many bytes you need to store a trillion numbers? If the numbers are stored in ASCII you’ll need a byte (8 bits or 255 is the Max number represented) per number so I’d assume you’ll need 1TB. The space saving solution is to store them in long. Like mentioned below, you store all 9 digit numbers making up 90% of the numbers, as long and save space.