# Valid Number - By DFA

## Introduction to the Problem

The question asks the programmer to validate whether a string is a valid representation of a number. After some trials, we find that the question accepts a few formats:

- pure integer
- real number (decimal representation), including omitted zero, for example “.5”, “12.”
- scientific number, that looks like “{Real Number}e{Integer}”
- integer and real number can be signed
- ignore any surrounding white spaces

In order to solve the problem in linear time, most solutions set a few flags. This is smple, and quite efficient in both time and space. In fact, I don’t really consider this problem qualified for *hard*.

Just to add some fun, this problem can be solved using a very textbook DFA. The code is elegant, less space efficient (but only for a constant amount) than the flag algorithm. In fact, the latter is just a compact, specialized DFA in essense. The trade off is more variables and branching.

## The Algorithm

The first step, we trim the string. There is probably not efficient in bringing white space processing into the DFA.

```
static inline string trim(string s)
{
if (s.size() == 0) return s;
char *c = &(s[0]), // starting pointer
*d = c + s.length() - 1; // ending pointer
while (*c != '\0' && isspace(*c)) c++;
while (d != c && isspace(*d)) d--;
*(d + 1) = '\0';
return string(c);
}
```

### DFA - States

Any accepted string can be divided into a few parts:

- before decimal, the left part of a real number
- decimal
- after decimal, the right part of a real number
- e character
- the exponent, the right to ‘e’

Any part could potentially empty. The (1) and (5) could potentially be preceded with a sign (‘+’ or ‘-‘).

Then, we can collect all states based on the grammar:

state | meaning |
---|---|

`START` | starting state |

`REALLEFT` | before encountering any decimal |

`DOT` | encounter a regular decimal |

`E` | encounter an ‘e’ |

`REALRIGHT` | have encountered a decimal |

`DOT_E` | have encountered a decimal whose left is omitted |

`ERIGHT` | have encountered ‘e’, and is therefore part of the exponent |

`SIGN1` | the sign on the left of the ‘e’ |

`SIGN2` | the sign on the right of the ‘e’ |

`FAULT` | the faulty state |

The starting state is `START`

.

`FAULT`

is a special state, that the DFA will halt whenever it meets `FAULT`

, so that we do not have to process the rest of the string.

### DFA - Transition

The transition function is a matrix that maps (state, input char) to state.

states | DIGIT | SIGN | DOT | E | NDIGIT |
---|---|---|---|---|---|

START | REALLEFT | SIGN1 | DOT_E | FAULT | FAULT |

REALLEFT | REALLEFT | FAULT | DOT | E | FAULT |

DOT | REALRIGHT | FAULT | FAULT | E | FAULT |

E | ERIGHT | SIGN2 | FAULT | FAULT | FAULT |

REALRIGHT | REALRIGHT | FAULT | FAULT | E | FAULT |

DOT_E | REALRIGHT | FAULT | FAULT | FAULT | FAULT |

ERIGHT | ERIGHT | FAULT | FAULT | FAULT | FAULT |

SIGN1 | REALLEFT | FAULT | DOT_E | FAULT | FAULT |

SIGN2 | ERIGHT | FAULT | FAULT | FAULT | FAULT |

### DFA - Termination

The DFA will terminate when

- input is deplete
- state is
`FAULT`

The string will be accepted if the termination state is one of following:

`REALLEFT`

`REALRIGHT`

`ERIGHT`

`DOT`

In all other cases, the format is somewhat faulty. For example, if the DFA ended at `E`

, then the string looks like `"{some number}e"`

, which is not acceptable.

The running of the DFA is easy. Simply iterate over the input, and let the transition matrix do its magic.

## Complete Code:

```
class Solution {
static inline string trim(string s)
{
if (s.size() == 0) return s;
char *c = &(s[0]),
*d = c + s.length() - 1;
while (*c != '\0' && isspace(*c))
c++;
while (d != c && isspace(*d))
d--;
*(d + 1) = '\0';
return string(c);
}
int START = 0;
int REALLEFT = 1;
int DOT = 2;
int E = 3;
int REALRIGHT = 4;
int DOT_E = 5;
int ERIGHT = 6;
int SIGN1 = 7;
int SIGN2 = 8;
int FAULT = -1;
int DIGIT = 0;
int NDIGIT = 4;
int SIGN = 1;
int G[45] =
{// DIGIT SIGN DOT E NDIGIT
REALLEFT, SIGN1, DOT_E, FAULT, FAULT,
REALLEFT, FAULT, DOT, E, FAULT,
REALRIGHT, FAULT, FAULT, E, FAULT,
ERIGHT, SIGN2, FAULT, FAULT, FAULT,
REALRIGHT, FAULT, FAULT, E, FAULT,
REALRIGHT, FAULT, FAULT, FAULT, FAULT,
ERIGHT, FAULT, FAULT, FAULT, FAULT,
REALLEFT, FAULT, DOT_E, FAULT, FAULT,
ERIGHT, FAULT, FAULT, FAULT, FAULT
};
public:
bool isNumber(string s) {
s = trim(s);
if (s.size() == 0) return false;
char prev = '\0';
int state = START;
for (auto c : s)
{
if (state == FAULT) return false;
int ch = (isdigit(c)) ? DIGIT
: (c == '.') ? DOT
: (c == 'e') ? E
: (c == '+') ? SIGN
: (c == '-') ? SIGN
: NDIGIT;
state = G[state * 5 + ch];
}
if (state == E || state == DOT_E || state == SIGN1 || state == SIGN2 || state == START || state == FAULT)
return false;
return true;
}
};
```

Power Snail COMSCI

computer science algorithm leetcode C++