Valid Number - By DFA
Introduction to the Problem ##
The question asks the programmer to validate whether a string is a valid representation of a number. After some trials, we find that the question accepts a few formats:
- pure integer
- real number (decimal representation), including omitted zero, for example “.5”, “12.”
- scientific number, that looks like “{Real Number}e{Integer}”
- integer and real number can be signed
- ignore any surrounding white spaces
In order to solve the problem in linear time, most solutions set a few flags. This is simple, and quite efficient in both time and space. In fact, I don’t really consider this problem qualified for hard.
Just to add some fun, this problem can be solved using a very textbook DFA. The code is elegant, less space efficient (but only for a constant amount) than the flag algorithm. In fact, the latter is just a compact, specialized DFA in essence. The trade-off is more variables and branching.
The Algorithm ##
The first step, we trim the string. There is probably not efficient in bringing white space processing into the DFA.
static inline string trim(string s)
{
if (s.size() == 0) return s;
char *c = &(s[0]), // starting pointer
*d = c + s.length() - 1; // ending pointer
while (*c != '\0' && isspace(*c)) c++;
while (d != c && isspace(*d)) d--;
*(d + 1) = '\0';
return string(c);
}
DFA - States ###
Any accepted string can be divided into a few parts:
- before decimal, the left part of a real number
- decimal
- after decimal, the right part of a real number
- e character
- the exponent, the right to ’e'
Any part could potentially empty. The (1) and (5) could potentially be preceded with a sign (’+’ or ‘-’).
Then, we can collect all states based on the grammar:
state | meaning |
---|---|
START |
starting state |
REALLEFT |
before encountering any decimal |
DOT |
encounter a regular decimal |
E |
encounter an ’e’ |
REALRIGHT |
have encountered a decimal |
DOT_E |
have encountered a decimal whose left is omitted |
ERIGHT |
have encountered ’e’, and is therefore part of the exponent |
SIGN1 |
the sign on the left of the ’e' |
SIGN2 |
the sign on the right of the ’e' |
FAULT |
the faulty state |
The starting state is START
.
FAULT
is a special state, that the DFA will halt whenever
it meets FAULT
, so that we do not have to process the
rest of the string.
DFA - Transition ###
The transition function is a matrix that maps (state, input char) to state.
states | DIGIT | SIGN | DOT | E | NDIGIT |
---|---|---|---|---|---|
START | REALLEFT | SIGN1 | DOT_E | FAULT | FAULT |
REALLEFT | REALLEFT | FAULT | DOT | E | FAULT |
DOT | REALRIGHT | FAULT | FAULT | E | FAULT |
E | ERIGHT | SIGN2 | FAULT | FAULT | FAULT |
REALRIGHT | REALRIGHT | FAULT | FAULT | E | FAULT |
DOT_E | REALRIGHT | FAULT | FAULT | FAULT | FAULT |
ERIGHT | ERIGHT | FAULT | FAULT | FAULT | FAULT |
SIGN1 | REALLEFT | FAULT | DOT_E | FAULT | FAULT |
SIGN2 | ERIGHT | FAULT | FAULT | FAULT | FAULT |
DFA - Termination ###
The DFA will terminate when
- input is depleted
- state is
FAULT
The string will be accepted if the termination state is one of following:
REALLEFT
REALRIGHT
ERIGHT
DOT
In all other cases, the format is somewhat faulty. For example, if the
DFA ended at E
, then the string looks like
"{some number}e"
, which is not acceptable.
The running of the DFA is easy. Simply iterate over the input, and let the transition matrix do its magic.
Complete Code: ##
class Solution {
static inline string trim(string s)
{
if (s.size() == 0) return s;
char *c = &(s[0]),
*d = c + s.length() - 1;
while (*c != '\0' && isspace(*c))
c++;
while (d != c && isspace(*d))
d--;
*(d + 1) = '\0';
return string(c);
}
int START = 0;
int REALLEFT = 1;
int DOT = 2;
int E = 3;
int REALRIGHT = 4;
int DOT_E = 5;
int ERIGHT = 6;
int SIGN1 = 7;
int SIGN2 = 8;
int FAULT = -1;
int DIGIT = 0;
int NDIGIT = 4;
int SIGN = 1;
int G[45] =
{// DIGIT SIGN DOT E NDIGIT
REALLEFT, SIGN1, DOT_E, FAULT, FAULT,
REALLEFT, FAULT, DOT, E, FAULT,
REALRIGHT, FAULT, FAULT, E, FAULT,
ERIGHT, SIGN2, FAULT, FAULT, FAULT,
REALRIGHT, FAULT, FAULT, E, FAULT,
REALRIGHT, FAULT, FAULT, FAULT, FAULT,
ERIGHT, FAULT, FAULT, FAULT, FAULT,
REALLEFT, FAULT, DOT_E, FAULT, FAULT,
ERIGHT, FAULT, FAULT, FAULT, FAULT
};
public:
bool isNumber(string s) {
s = trim(s);
if (s.size() == 0) return false;
char prev = '\0';
int state = START;
for (auto c : s)
{
if (state == FAULT) return false;
int ch = (isdigit(c)) ? DIGIT
: (c == '.') ? DOT
: (c == 'e') ? E
: (c == '+') ? SIGN
: (c == '-') ? SIGN
: NDIGIT;
state = G[state * 5 + ch];
}
if (state == E || state == DOT_E || state == SIGN1 || state == SIGN2 || state == START || state == FAULT)
return false;
return true;
}
};