String tables in flash memory on Arduino

String tables in flash memory on Arduino#

tl;dr: In this post, we touch briefly on how saving dynamic memory on Arduino by storing data on flash memory is nice, but how it can get cumbersome quickly for a larger project. And then present a solution for doing all of that automatically.

A recurrent problem in creating user interfaces on AVR platforms, and particularly the ATmega chips used on Arduino devices, is running out of RAM for program strings. Usually, string constants are store in both program memory, and SRAM, when the program loads. Usage of the F() macro can help alleviate this problem, but this usage is suitable for local usage of constants defined inline. [1] For globally-defined strings, the PROGMEM macro (that expands to a variable attribute) does the job nicely. [2]

What if we want to create a table of strings in flash memory? Well, the GNU AVR Toolchain [3] documentation presents such a solution, using PROGMEM. It goes as such:

char string_1[] PROGMEM = "String 1";
char string_2[] PROGMEM = "String 2";
char string_3[] PROGMEM = "String 3";
char string_4[] PROGMEM = "String 4";
char string_5[] PROGMEM = "String 5";

PGM_P string_table[] PROGMEM = 
{
    string_1,
    string_2,
    string_3,
    string_4,
    string_5
};

Well, that does not look nice, does it? My main gripe with this scheme is that every string in the table has to be explicitly declared. And no, from [3] and from experience, you cannot declare the strings inline in the table. That is because PROGMEM only applies to the declaration it is used in, so you’d be declaring the array in program space, but not the strings. And then you have to keep track of the variable names, and using templates to conditionally declare a set of string against another becomes a huge chore, and so on. It’s just ugly, especially if you want to have multiple versions of the table for things like multiple-language support.

Let’s fix it!#

There has to be a better way to do it, right? Of course! For this purpose, I have created a Python 3 script that automates the generation of the program space table, together with a list of indices to the table, also in program space. You supply it with a text file, and for every line in the file, a string is appended to the table. Then, .h and .c files are generated automatically. Some features I needed were added as well:

  • Blank lines are ignored (watch out for trailing whitespace, though!);

  • Lines beginning with # are treated as comments and also ignored;

  • The character ¬ denotes that a byte in hex (for example, ‘¬4a‘) will folow, and this byte is outputted in raw form in the table;

    • Useful for inserting special characters for simple text-only LCDs (LiquidCrystal library), a table is available here;

  • A line beginning with __IDENTIFIER can be used to define the prefix used to name the tables and other definitions;

  • And finally, there is a nice macro for accessing the contents of the table, for fun and profit.

Let’s check out an example. Suppose I have the following string file:

# strings_en.txt
__IDENTIFIER str_en
Welcome to the Menu
I ¬9D NY!
¬01¬02¬ad¬0f¬ff

Then I can generate the files by running the script. For example: ./str2pgmspace strings_en.txt str_en
The first argument is the text file to be processed, and the second one is the name base of the output files. In this case, the files str_en.c and str_en.h will be generated. Let’s take a look at the result (comments added for clarity):

//
// str_en.h 
//

#pragma once

#include <avr/pgmspace.h>  // needed for program-space constants

// 'str_en.h': generated by str2progmem from 'strings_en.txt' at 2019-08-29 21:31:04.014976
static const int STR_EN_COUNT = 3;               // how many strings are there in the table
static const int STR_EN_BLOB_SZ = 35;            // how large is the entire string table
extern const unsigned short str_en_offsets[];    // external reference to the offset table
extern const char str_en_blob[];                 // external reference to the binary string content

// Macro for obtaining the byte offset of a string, given its index
#define STR_EN_GET_OFFSET(I) pgm_read_word(&(str_en_offsets[(I)]]))

// Macro for obtaining the program-space address of a string, argument is the string index (from 0 to STR_EN_COUNT-1)
#define STR_EN_GET(I) ( ((const char*) &str_en_blob) + STR_EN_GET_OFFSET(I) )

// This casts the pointer type to something that the Print-like Arduino stuff can understand as being in program space
#define PGMSTR(x) (__FlashStringHelper*)(x)


//
// str_en.c
//

#include "./str_en.h"

// definition of string table content
// look at how strings are delimited by null bytes (\x00), and how the last string was transliterated from binary data
const char str_en_blob[] PROGMEM = 
"\x57\x65\x6c\x63\x6f\x6d\x65\x20\x74\x6f\x20\x74\x68\x65\x20\x4d"
"\x65\x6e\x75\x0a\x00\x49\x20\x9d\x20\x4e\x59\x21\x00\x01\x02\xad"
"\x0f\xff\x00"
;

// the byte offset for every string in the table
const unsigned short str_en_indices[] PROGMEM = {
    0, 21, 29};

Usage#

It should be simple, with all the normal caveats that come with data in program-space. Most importantly, remember that data in program space cannot be addressed directly. If tried, the CPU will attempt to de-reference the addresses in SRAM, which can return glitchy garbage, or crash your application. That said, let’s print a string from our table:

// On serial port
Serial.println( PGMSTR(STR_EN_GET(0)) );

// On an LCD
lcd.println( PGMSTR(STR_EN_GET(1)) );

That’s more like it! If you need to do more than this with the strings, try building a String object. Just beware of heap fragmentation.

The script is available here (also on Github).

note

If you are working on a single-file sketch, or find using separate .h and .c files cumbersome, just copying the contents of the files should do the trick. Just paste the .h before the .c, and don’t forget to remove the extern references, since, well, they won’t be external anymore 🙂

References#