• 0 Posts
  • 4 Comments
Joined 1Y ago
cake
Cake day: Jun 11, 2023

help-circle
rss

This reminds me of when I had to roll my own dynamic memory allocator for an obscure platform. (Something I never want to do again!) I stuck metadata in the negative space just before the returned pointer like you say. In my case, it was complicated by the fact that you had to worry about the memory alignment of the returned pointer to make sure it works with SIMD and all that. Ugh. But I guess with strings (or at least 8-bit-encoded strings), alignment should not be an issue.


Oh, so you’re talking about text representation in an editor or something along those lines? That’s kind of a separate problem isn’t it?

At the lowest level though, I suppose you still need to consider whether to use null-terminated segments. I think I’d still be going length + data, though I wouldn’t worry about packing down the length representation like with serialization formats. Your code will need to be highly cognizant of the length of strings and managing dynamic memory allocation all over the place, so it’s good to have those lengths quickly accessible at all times.


Better in what sense? I put some thought into this when designing an object serialization library modelled like a binary JSON.

When it got to string-encoding, I had to decide whether to go null-terminated vs length + data? The former is very space-efficient, particularly when you have a huge number of short strings. And let’s face it, that’s a common enough scenario. But it’s nice to have the length beforehand when you are parsing the string out of a stream.

What I did in the end was come up with a variable-length integer encoding that somewhat resembles what they do in UTF-8. It means for strings < 128 chrs, the length is a single byte. Longer than that and more bytes get used as necessary.


I don’t really have an answer for you, but can say when recompiling older codebases (some in C and some in C++) using a modern C++ compiler, typing errors are among the most common I have to address. In particular, compilers seem to insist more on explicit casts for type narrowing, which is a good thing. But I don’t know about modern C itself? It wouldn’t surprise me if the language has become stricter.