Namespaces
Variants
Views
Actions

Talk:c/string/byte

From cppreference.com

[edit] Conversions to numeric formats

Should "C99" appear after the entries atoll, strtoll, strtoull, strtof, and strtold? Newatthis (talk) 04:39, 9 June 2015 (PDT)

yes; updated --Cubbi (talk) 06:14, 9 June 2015 (PDT)

[edit] Display function signatures

Would it not be great to have function signatures displayed instead of just function names in the list? It is much more informative and saves from clicking the link and opening a new tab to check it out. --Ottoshmidt (talk) 07:57, 2 November 2017 (PDT)

that would be really hard to read. Can you imagine that instead of just

(function)

it would say

char *strtok( char *str, const char *delim );
(until C99)
char *strtok( char *restrict str, const char *restrict delim );
(since C99)
char *strtok_s(char *restrict str, rsize_t *restrict strmax,
     const char *restrict delim, char **restrict ptr);
(2) (since C11)

and then the same for every other function? --Cubbi (talk) 13:11, 2 November 2017 (PDT)

Dam, if not overloads with restrict qualifier, they actually would be better to read, I think. --Ottoshmidt (talk) 08:21, 3 November 2017 (PDT)
*overloads*? they are both C functions, no overload. the first version was replaced by the second one in C99. either (1) or (2) is available. not both. and (3) is another function. I prefer current style, too. Yaossg (talk) 08:49, 3 November 2017 (PDT)

[edit] Change type in examples for ctypes.h functions

I think the examples for the <ctypes.h> functions would benefit from using int instead of unsigned char.

I understand that the intention is to avoid undefined behavior but I think it's just moving a potential bug elsewhere. These functions are intended to work with EOF and by casting everything to unsigned char, EOF is converted to an undetermined positive value(although its original value is commonly implemented as -1, the standard only specifies that it must be negative).

Using unsigned char also triggers an implicit conversion warning when assigned with multibyte character constants(with the -Wextra flag and I'd argue that no C code should be compiled without it).

I'll make the necessary edits if you agree this is an acceptable change. --Elarz (talk) 05:42, 30 March 2022 (PDT)

I'm confused, I looked through all the examples and there's no casting anywhere. Taking the current isalnum example for example:
#include <stdio.h>
#include <ctype.h>
#include <locale.h>
 
int main(void)
{
    unsigned char c = '\xdf'; // German letter ß in ISO-8859-1
 
    printf("isalnum('\\xdf') in default C locale returned %d\n", !!isalnum(c));
 
    if(setlocale(LC_CTYPE, "de_DE.iso88591"))
        printf("isalnum('\\xdf') in ISO-8859-1 locale returned %d\n", !!isalnum(c));
}
There's no bug here and no cast, and the other examples are similar. I also see no warning --Ybab321 (talk) 07:14, 30 March 2022 (PDT)
you can get the warning with -Wconversion (both gcc and clang), on initializing the unsigned char with an int (type of C's char literal). Not sure I agree with always using ints to store chars in C though - if we know the range of values can't be EOF, it shouldn't be int (and if it can be, then it should, like in reading files) --Cubbi (talk) 07:48, 30 March 2022 (PDT)
I wasn't making the point to always use int though, only that it made more sense than unsigned char in these particular examples because it relates to the type of the function parameter and doesn't produce warnings(But I must admit that the list of warnings I enable when writing C could be considered "paranoid" by some people). I agree with you about the range. And testing it should be the way to prevent bugs, not blindly casting things to unsigned char. I also think saving a few bytes isn't a good reason not to use int here; example should be about clarity not size optimization. --Elarz (talk) 07:55, 30 March 2022 (PDT)
My bad, the warning appears with -Wsign-conversion which is actually not enabled by -Wextra. In the example you used the warning is: prog.c:7:23: warning: unsigned conversion from 'int' to 'unsigned char' changes value from '-33' to '223' [-Wsign-conversion].
I also think I didn't express myself correctly. I meant that these functions take an int so there is an implicit cast when we pass them variables with a unsigned char type. There is also an implicit cast with unsigned char c = '\xdf'; because the type of multibyte character constants is int too.
My remark about bugs was a general statement about people looking at these example and thinking that using unsigned char is a way to prevent bugs. It will only prevent bugs that are sometimes caused by undefined behavior and it will also actually introduce bugs at other times(with EOF as mentionned). --Elarz (talk) 07:51, 30 March 2022 (PDT)
I see I missed the implicit casts. If we were to generalise the example such c is an int with the intent of demonstrating how to use the ctype functions where it's not known a priori that it's in the unsigned char range, then we'd write an explicit range check on that int right? I'm not necessarily against adding error checking to cppreference examples, but it doesn't seem like it would be that valuable for demoing the ctype functions. I think the point at which you would normally check that a character is valid (as in representable in the unsigned char range) would happen immediately after reading the character, rather than at the point where you call a ctype function. Also reasonable would be to then store that character in a unsigned char so that you know you don't need to make that check ever again, so it seems reasonable to me to expect unsigned char here.
Regarding warnings, it would arguably look a little silly, but I would be completely fine with making the cast explicit for initialising c, although adding noise into character variable initialisations just because of C's questionable choice of data type for character literals seems somewhat unworthwhile. Compiler doesn't seem to care about the cast when calling isalnum even with -Weverything specified, so that's good at least. --Ybab321 (talk) 13:33, 30 March 2022 (PDT)
I agree with you about the explicit cast, that would be adding noise.
"[...]then we'd write an explicit range check on that int right?" Exactly! So for a little bit of back story, I'm writing string manipulation code and I need it to be safe. Although the descriptions are nice, I found the multiple examples confusing and unhelpful as a whole and thought I could update them with I, some guy with an average knowledge of C, wish I had found here. But I understand my personal experience is only a data point and I'm not willing to die on that hill. And since I asked for your opinion and two of you are reluctant to see these changes, that's good enough for me to close the case. Thanks to you both for the feedback. --Elarz (talk) 15:45, 30 March 2022 (PDT)
Part of my confusion is that some examples do use int. Should I edit these to use unsigned char to harmonize the examples? --Elarz (talk) 08:05, 1 April 2022 (PDT)

Maybe, one page I noticed that used int was isxdigit, which makes sense there because of the loop condition, an unsigned char can obviously not be less than UCHAR_MAX (although pedantically speaking, char and int could be the same size...), so the int notably shouldn't be unsigned char there --Ybab321 (talk) 08:39, 1 April 2022 (PDT)

Right, and the examples for tolower and toupper use the same kind of loop but with an unsigned char and a < UCHAR_MAX condition instead of <= UCHAR_MAX to prevent an infinite loop. --Elarz (talk) 08:56, 1 April 2022 (PDT)