bc84657410
This one was missing. It should be easier to use now. It is obvious that some functions are missing, and it looks like ist2str() and istpad() are exactly the same.
168 lines
7.7 KiB
Plaintext
168 lines
7.7 KiB
Plaintext
2021-11-08 - Indirect Strings (IST) API
|
|
|
|
|
|
1. Background
|
|
-------------
|
|
|
|
When parsing traffic, most of the standard C string functions are unusable
|
|
since they rely on a trailing zero. In addition, for the rare ones that support
|
|
a length, we have to constantly maintain both the pointer and the length. But
|
|
then, it's easy to come up with complex lengths and offsets calculations all
|
|
over the place, rendering the code hard to read and bugs hard to avoid or spot.
|
|
|
|
IST provides a solution to this by defining a structure made of exactly two
|
|
word size elements, that most C ABIs know how to handle as a register when
|
|
used as a function argument or a function's return value. The functions are
|
|
inlined to leave a maximum set of opportunities to the compiler or optimization
|
|
and expression reduction, and as a result they are often inexpensive to use. It
|
|
is important however to keep in mind that all of these are designed for minimal
|
|
code size when dealing with short strings (i.e. parsing tokens in protocols),
|
|
and they are not optimal for processing large blocks.
|
|
|
|
|
|
2. API description
|
|
------------------
|
|
|
|
IST are defined like this:
|
|
|
|
struct ist {
|
|
char *ptr; // pointer to the string's first byte
|
|
size_t len; // number of valid bytes starting from ptr
|
|
};
|
|
|
|
A string is not set if its ->ptr member is NULL. In this case .len is undefined
|
|
and is recommended to be zero.
|
|
|
|
Declaring a function returning an IST:
|
|
|
|
struct ist produce_ist(int ok)
|
|
{
|
|
return ok ? IST("OK") : IST("KO");
|
|
}
|
|
|
|
Declaring a function consuming an IST:
|
|
|
|
void say_ist(struct ist i)
|
|
{
|
|
write(1, istptr(i), istlen(i));
|
|
}
|
|
|
|
Chaining the two:
|
|
|
|
void say_ok(int ok)
|
|
{
|
|
say_ist(produce_ist(ok));
|
|
}
|
|
|
|
Notes:
|
|
- the arguments are passed as value, not reference, so there's no need for
|
|
any "const" in their declaration (except to catch coding mistakes).
|
|
Pointers to ist may benefit from being marked "const" however.
|
|
|
|
- similarly for the return value, there's no point is marking it "const" as
|
|
this would protect the pointer and length, not the data.
|
|
|
|
- use ist0() to append a trailing zero to a variable string for use with
|
|
printf()'s "%s" format, or for use with functions that work on NUL-
|
|
terminated strings, but beware of not doing this with constants.
|
|
|
|
- the API provides a starting pointer and current length, but does not
|
|
provide an allocated size. It remains up to the caller to know how large
|
|
the allocated area is when adding data, though most functions make this
|
|
easy.
|
|
|
|
The following macros and functions are defined. Those whose name starts with
|
|
underscores require special care and must not be used without being certain
|
|
they are properly used (typically subject to buffer overflows if misused). Note
|
|
that most functions were added over time depending on instant needs, and some
|
|
are very close to each other. Many useful functions are still missing and would
|
|
deserve being added.
|
|
|
|
Below, arguments "i1","i2" are all of type "ist". Arguments "s" are
|
|
NUL-terminated strings of type "char*", and "cs" are of type "const char *".
|
|
Arguments "c" are of type "char", and "n" are of type size_t.
|
|
|
|
IST(cs):ist make constant IST from a NUL-terminated const string
|
|
IST_NULL:ist return an unset IST = ist2(NULL,0)
|
|
__istappend(i1,c):ist append character <c> at the end of ist <i1>
|
|
ist(s):ist return an IST from a nul-terminated string
|
|
ist0(i1):char* write a \0 at the end of an IST, return the string
|
|
ist2(cs,l):ist return a variable IST from a const string and length
|
|
ist2bin(s,i1):ist copy IST into a buffer, return the result
|
|
ist2bin_lc(s,i1):ist like ist2bin() but turning turning to lower case
|
|
ist2bin_uc(s,i1):ist like ist2bin() but turning turning to upper case
|
|
ist2str(s,i1):ist copy IST into a buffer, add NUL and return the result
|
|
ist2str_lc(s,i1):ist like ist2str() but turning turning to lower case
|
|
ist2str_uc(s,i1):ist like ist2str() but turning turning to upper case
|
|
ist_find(i1,c):ist return first occurrence of char <c> in <i1>
|
|
ist_find_ctl(i1):char* return pointer to first CTL char in <i1> or NULL
|
|
ist_skip(i1,c):ist return first occurrence of char not <c> in <i1>
|
|
istadv(i1,n):ist advance the string by <n> characters
|
|
istalloc(n):ist return allocated string of zero initial length
|
|
istcat(d,s,n):ssize_t copy <s> after <d> for <n> chars max, return len or -1
|
|
istchr(i1,c):char* return pointer to first occurrence of <c> in <i1>
|
|
istclear(i1*):size_t return previous size and set size to zero
|
|
istcpy(d,s,n):ssize_t copy <s> over <d> for <n> chars max, return len or -1
|
|
istdiff(i1,i2):int return the ordinal difference, like strcmp()
|
|
istdup(i1):ist allocate new ist and copy original one into it
|
|
istend(i1):char* return pointer to first character after the IST
|
|
isteq(i1,i2):int return non-zero if strings are equal
|
|
isteqi(i1,i2):int like isteq() but case-insensitive
|
|
istfree(i1*) free of allocated <i1>/IST_NULL and set it to IST_NULL
|
|
istissame(i1,i2):int return true if pointers and lengths are equal
|
|
istist(i1,i2):ist return first occurrence of <i2> in <i1>
|
|
istlen(i1):size_t return the length of the IST (number of characters)
|
|
istmatch(i1,i2):int return non-zero if i1 starts like i2 (empty OK)
|
|
istmatchi(i1,i2):int like istmatch() but case insensitive
|
|
istneq(i1,i2,n):int like isteq() but limited to the first <n> chars
|
|
istnext(i1):ist return the IST advanced by one character
|
|
istnmatch(i1,i2,n):int like istmatch() but limited to the first <n> chars
|
|
istpad(s,i1):ist copy IST into a buffer, add a NUL, return the result
|
|
istptr(i1):char* return the starting pointer of the IST
|
|
istscat(d,s,n):ssize_t same as istcat() but always place a NUL at the end
|
|
istscpy(d,s,n):ssize_t same as istcpy() but always place a NUL at the end
|
|
istshift(i1*):char return the first character and advance the IST by one
|
|
istsplit(i1*,c):ist return part before <c>, make ist start from <c>
|
|
iststop(i1,c):ist truncate ist before first occurrence of <c>
|
|
isttest(i1):int return true if ist is not NULL, false otherwise
|
|
isttrim(i1,n):ist return ist trimmed to no more than <n> characters
|
|
istzero(i1,n):ist trim to <n> chars, trailing zero included.
|
|
|
|
|
|
3. Quick index by typical C construct or function
|
|
-------------------------------------------------
|
|
|
|
Some common C constructs may be adjusted to use ist instead. The mapping is not
|
|
always one-to-one, but usually the computations on the length part tends to
|
|
disappear in the refactoring, allowing to directly chain function calls. The
|
|
entries below are hints to figure what function to look for in order to rewrite
|
|
some common use cases.
|
|
|
|
char* IST equivalent
|
|
|
|
strchr() istchr(), ist_find(), iststop()
|
|
strstr() istist()
|
|
strcpy() istcpy()
|
|
strscpy() istscpy()
|
|
strlcpy() istscpy()
|
|
strcat() istcat()
|
|
strscat() istscat()
|
|
strlcat() istscat()
|
|
strcmp() istdiff()
|
|
strdup() istdup()
|
|
!strcmp() isteq()
|
|
!strncmp() istneq(), istmatch(), istnmatch()
|
|
!strcasecmp() isteqi()
|
|
!strncasecmp() istneqi(), istmatchi()
|
|
strtok() istsplit()
|
|
return NULL return IST_NULL
|
|
s = malloc() s = istalloc()
|
|
free(s); s = NULL istfree(&s)
|
|
p != NULL isttest(p)
|
|
c = *(p++) c = istshift(p)
|
|
*(p++) = c __istappend(p, c)
|
|
p += n istadv(p, n)
|
|
p + strlen(p) istend(p)
|
|
p[max] = 0 isttrim(p, max)
|
|
p[max+1] = 0 istzero(p, max)
|