Question About Storing Data More Efficiently

Posted by Forgeio
I've been working on a survival game project off and on for about 8 months now, and I've gotten to the point in the project where I'm going to start saving to BDBs instead of my test ini files.

I'm curious about storing large amounts of information efficiently, and I was curious if this idea would work theoretically or if there's a better way/if I'm better off using the normal methods.

In theory, I'm thinking about using a section of a BDB to store all of the values for one chunk of the in game world. A chunk would be something like 32x32 blocks in size. I'd use the bdb_u16 data type, and store the block IDs as numbers separated by zeros. The IDs would have to work around this, as none of the numbers could have zeros when read back into the game since they tell the program that it is the end of a number. So, block IDs would go something like: 1 2 3 4 5 6 7 8 9 11, skipping 10 because it contains a zero. The block IDs would be stored like this within the BDB section: 1 0 2 0 1 0 5 0 18 0 19 0 21 0.
This would allow full worlds to be stored within a single BDB, and then read back into gamemaker by different string functions that separate out each value. Would this be efficient? Am I missing something that wouldn't work here? Huge amounts of data could be stored within a very small amount of storage, but I'm not sure that loading this data would be fast. Maybe the chunk sizes could be smaller? Then the game would only have to translate a few at a time from within the loaded BDB.

Replies (11)

Last message on 26 Jun 2021

Size43 (Administrator) on 18 May 2021, 22:39:04
Apologies for the delay. I don't quite follow the need for the zeros. You can just store the block IDs as u16s without putting any separators inbetween. In fact, this is basically what the 'Blocks' example in the download is doing (though I believe I used u8 there).

I would advise against using any string functions. Building strings, then splitting them and then converting them back to numbers again is not very good for performance.
Forgeio (Topicstarter) on 23 May 2021, 05:57:39
The point of this would be to expand the amount of storage used because one u16 could hold a large amount of data. If I store each individual block as a u16, even with many BDBs I wouldn't be able to store a very large amount of worlds. If one chunk is 64x64 blocks large (4,096 blocks total) one BDB would only store one or two chunks worth of blocks. If there were only a few worlds and the players had explored a lot in them generating many chunks, the game would run out of space to store the block information in worlds very quickly. If I read the article on binary data blocks correctly, one BDB can hold 8,192 u16s, and if I have the max of 1024 BDBs in my game that only allows for around 2048 chunks total across every world. Ideally, each world would have the ability to save a huge number of chunks so that there aren't any real limits to player exploration and the size of worlds.
Size43 (Administrator) on 30 May 2021, 17:17:46
Ah, I think I understand now. Essentially, you want to store multiple blocks in a single u16, so that you can store more blocks in the BDB?

How many different block IDs do you have? This will determine how much you'll be able to save. Assuming a uniformly distributed chance for each block to appear somewhere in a chunk, the minimum number of bytes required for storage is determined by log2(N)/8. So, for example, if you have 16 different block IDs, you will need at least log2(N)/8 = log2(16)/8 = 4/8 = 0.5 bytes to store one block ID. One BDB is 16KiB = 16 * 1024 bytes. So you will be able to store around 32 thousand block IDs per BDB. That's around 32 chunks of 32*32 block IDs.

It is possible to reduce this further by using compression techniques. However, compression will make each chunk a different size -- that will make saving and loading the BDBs much more complex.
Size43 (Administrator) on 30 May 2021, 17:27:19
Your scheme also has this "variable-length" complexity. For example, "101010" encodes three block IDs (1, 1, 1) and can be stored in a u16. However, "1201501" also encodes three block IDs (12, 15, 1) but cannot be stored in a u16. The resulting number will always be outside the range of the u16.

Of course it's up to you to decide how complex you want to make your storage format. However, there are easier ways to get pretty close to the compression ratio that your format will get you -- if you're willing to accept potentially slightly larger chunks.

For example, by storing each block ID in a u8 you can store around 16 thousand block IDs per BDB.

By storing three 5-bit (so 0 up to 31) block IDs in a u16, you can store around 24 thousand block IDs per BDB.

By storing two 4-bit (so 0 up to 16) block IDs in a u8 (or four 4-bit IDs in a u16), you can store around 32 thousand block IDs per BDB.

All these formats have a fixed size per block ID or chunk, which is much easier to work with.
Size43 (Administrator) on 30 May 2021, 17:33:59
I can't provide you a conclusive answer, because I don't know all the requirements for your game.

Strings in GameMaker are notoriously slow, and especially if you're targetting lower-end platforms or mobile devices it might become taxing if you need to load a lot of chunks. So I personally would not attempt to use them for this purpose, but that's of course up to you.

I'd be happy to expand on some of the other options if you're interested.
Forgeio (Topicstarter) on 31 May 2021, 03:50:10
I'd love to hear more options for this sort of thing. Thank you so much for such detailed and thoughtful answers!
I'm not planning for my game to have very many players. At most, there will probably be something like twenty of my friends playing if things work out as I plan. I do want to make sure the game can hold a lot of information though, allowing for many player worlds and seamless exploration within those worlds. This compression of data within BDBs was the best I could think of because I have limited experience with data storage. If there are better ways I'd absolutely like to know about them and implement one of them.
Size43 (Administrator) on 26 Jun 2021, 16:36:29
Apologies for the delay.

To squeeze as much data into a BDB as possible it helps to think of a BDB as a storage of bits, rather than bytes (seehttps://en.wikipedia.org/wiki/Binary_number if you're unfamiliar with this). One byte (a u8) contains 8 bits. That's also where the other names (u16, u32, etc.) come from. For example, a u32 stores a number in 32 bits or 32/8 = 4 bytes. One BDB can hold 16KiB * 8 bits = 131072 bits.

As you might have noticed, all sizes for the numbers (u8, u16, u32, ...) are multiples of 8. This is fine if you don't mind wasting a few bits for each number that you store. For example if you have 30 block IDs you only need 5 bits to store each ID, but using an u8 will take up 8 bits. That means you're wasting 3 bits every time you store a block ID.

To improve on this, you'll need to know some bitwise logic for this (seehttps://en.wikipedia.org/wiki/Logical_shift ).
Size43 (Administrator) on 26 Jun 2021, 16:36:53
As an example: you could store 3 5-bit IDs in a u16, so you use 15 bits out of 16. This wastes on average only 0.33 bit per ID. To do this, you need to use bitshifts. To decompose a u16 into three 5-bit block IDs, you could do something like:

var from_bdb = gms_bdb_read(handle, bdb_u16);
var id1 = from_bdb & 0x1F;
var id2 = (from_bdb >> 5) & 0x1F;
var id3 = (from_bdb >> 10) & 0x1F;
// do something with the 3 block IDs


And to combine three 5-bit block IDs into a single u16 you could do something like:

var id1 = ..., id2 = ..., id3 = ...;
var to_bdb = id1 | (id2 << 5) | (id3 << 10);
gms_bdb_write(handle, bdb_u16, to_bdb);
Size43 (Administrator) on 26 Jun 2021, 16:37:12
Now, I think that the optimization I described here is the only one really worth doing. You can easily get 10-40% smaller sizes depending on how many block IDs you actually have. This optimization also already gets you pretty close to the best size you will be able to consistently achieve.

However, if you really want to go overkill and reduce data even more you could look into compression techniques like:https://en.wikipedia.org/wiki/Huffman_coding
A fixed Huffman tree based on which blocks you expect to appear most often would work well here, I think. But be warned: this is a lot of work. If you've never done something like this before, you should expect to spend at least 40-80 hours on this alone. At that point it might be easier to just pay for a few more BDBs.
Forgeio (Topicstarter) on 31 May 2021, 03:57:35
Would there be a way to do something similar without using strings since they are so slow in Game Maker? Could the use of INIs to store unused chunks and BDBs to store "active" chunks (chunks that are seen and editable by players) work?
Size43 (Administrator) on 26 Jun 2021, 16:37:48
Storing the active chunks in the GameINI can certainly work. If you don't expect all chunks to be edited by players, the world can be much bigger since you don't actually need to store anything until someone edits a chunk.