how to test a string for letters only
First, using std::cin >> name
will fail if the user enters John Smith
because >>
splits input on whitespace characters. You should use std::getline()
to get the name:
std::getline(std::cin, name);
Here we go…
There are a number of ways to check that a string contains only alphabetic characters. The simplest is probably s.find_first_not_of(t)
, which returns the index of the first character in s
that is not in t
:
bool contains_non_alpha
= name.find_first_not_of("abcdefghijklmnopqrstuvwxyz") != std::string::npos;
That rapidly becomes cumbersome, however. To also match uppercase alphabetic characters, you’d have to add 26 more characters to that string! Instead, you may want to use a combination of find_if
from the <algorithm>
header and std::isalpha
from <cctype>
:
#include <algorithm>
#include <cctype>
struct non_alpha {
bool operator()(char c) {
return !std::isalpha(c);
}
};
bool contains_non_alpha
= std::find_if(name.begin(), name.end(), non_alpha()) != name.end();
find_if
searches a range for a value that matches a predicate, in this case a functor non_alpha
that returns whether its argument is a non-alphabetic character. If find_if(name.begin(), name.end(), ...)
returns name.end()
, then no match was found.
But there’s more!
To do this as a one-liner, you can use the adaptors from the <functional>
header:
#include <algorithm>
#include <cctype>
#include <functional>
bool contains_non_alpha
= std::find_if(name.begin(), name.end(),
std::not1(std::ptr_fun((int(*)(int))std::isalpha))) != name.end();
The std::not1
produces a function object that returns the logical inverse of its input; by supplying a pointer to a function with std::ptr_fun(...)
, we can tell std::not1
to produce the logical inverse of std::isalpha
. The cast (int(*)(int))
is there to select the overload of std::isalpha
which takes an int
(treated as a character) and returns an int
(treated as a Boolean).
Or, if you can use a C++11 compiler, using a lambda cleans this up a lot:
#include <cctype>
bool contains_non_alpha
= std::find_if(name.begin(), name.end(),
[](char c) { return !std::isalpha(c); }) != name.end();
[](char c) -> bool { ... }
denotes a function that accepts a character and returns a bool
. In our case we can omit the -> bool
return type because the function body consists of only a return
statement. This works just the same as the previous examples, except that the function object can be specified much more succinctly.
And (almost) finally…
In C++11 you can also use a regular expression to perform the match:
#include <regex>
bool contains_non_alpha
= !std::regex_match(name, std::regex("^[A-Za-z]+$"));
But of course…
None of these solutions addresses the issue of locale or character encoding! For a locale-independent version of isalpha()
, you’d need to use the C++ header <locale>
:
#include <locale>
bool isalpha(char c) {
std::locale locale; // Default locale.
return std::use_facet<std::ctype<char> >(locale).is(std::ctype<char>::alpha, c);
}
Ideally we would use char32_t
, but ctype
doesn’t seem to be able to classify it, so we’re stuck with char
. Lucky for us we can dance around the issue of locale entirely, because you’re probably only interested in English letters. There’s a handy header-only library called UTF8-CPP that will let us do what we need to do in a more encoding-safe way. First we define our version of isalpha()
that uses UTF-32 code points:
bool isalpha(uint32_t c) {
return (c >= 0x0041 && c <= 0x005A)
|| (c >= 0x0061 && c <= 0x007A);
}
Then we can use the utf8::iterator
adaptor to adapt the basic_string::iterator
from octets into UTF-32 code points:
#include <utf8.h>
bool contains_non_alpha
= std::find_if(utf8::iterator(name.begin(), name.begin(), name.end()),
utf8::iterator(name.end(), name.begin(), name.end()),
[](uint32_t c) { return !isalpha(c); }) != name.end();
For slightly better performance at the cost of safety, you can use utf8::unchecked::iterator
:
#include <utf8.h>
bool contains_non_alpha
= std::find_if(utf8::unchecked::iterator(name.begin()),
utf8::unchecked::iterator(name.end()),
[](uint32_t c) { return !isalpha(c); }) != name.end();
This will fail on some invalid input.
Using UTF8-CPP in this way assumes that the host encoding is UTF-8, or a compatible encoding such as ASCII. In theory this is still an imperfect solution, but in practice it will work on the vast majority of platforms.
I hope this answer is finally complete!
STL way:
struct TestFunctor
{
bool stringIsCorrect;
TestFunctor()
:stringIsCorrect(true)
{}
void operator() (char ch)
{
if(stringIsCorrect && !((ch <= 'z' && ch >= 'a') || (ch <= 'Z' && ch >= 'A')))
stringIsCorrect = false;
}
}
TestFunctor functor;
for_each(name.begin(), name.end(), functor);
if(functor.stringIsCorrect)
cout << "Yay";
If you use Boost, you can use boost::algorithm::is_alpha predicate to perform this check. Here is how to use it:
const char* text = "hello world";
bool isAlpha = all( text1, is_alpha() );
Update: As the documentation states, "all() checks all elements of a container to satisfy a condition specified by a predicate". The call to all() is needed here, since is_alpha() actually operates on characters.
Hope, I helped.