@audibleid Sorry if you took offense at any of my remarks. None was intended. None whatsoever. When I say "you" I am not addressing "you" personally, but all future readers of this thread. I write a lot of stuff and I guess the style is probably at the terse end of things. I have commented on many of the RegEx patterns over at RegEx library and have offered to moderate over there to weed out the worst of the flagrently incorrect stuff. I am quite happy to help you improve your RegEx patterns here, so I'll look back from time to time with that in mind. I wasn't aware that you'd already referred to the stuff I wrote earlier. I'll take a look at the earlier posts in the thread.
^[0-9]{0,5}[ ]{0,1}[0-9]{0,6}$
This allows 0 to 5 digits, an optional space, 0 to 6 digits.
As each part can have 0 length, this accepts blank input!
The [ ]{0,1} simplifies to \s? here.
The pattern will not match UK 2+8 or 3+7 format numbers but will match numbers with missing digits.
Off the top of my head, a simple pattern such as
^\(?0\d{2}(\)?[\s-]?\d){7,8}$
will allow only the correct number of digits while being flexible with punctuation.
The pattern matches optional opening parentheses, literal 0, two digits (the shortest area code), followed by 7 or 8 groups that can consist of "optional closing parentheses, and optional space or hyphen, and mandatory digit".
This allows 020 3555 7788, (020) 3555 7888, and 016977 3555, (016977) 3555 etc.
It also allows for people to write their number in the wrong format and still be accepted, e.g. 02035 557 788 and this is a good thing.
It also allows stuff like (01750) 3) 4) 5) 6) 7) 8 but that's not much of an issue as the number does contain the right number of digits.
It's the digits that are the important thing.
The spaces and punctuation should be dumped before validating the number is in a valid range, stored only as unformatted digits with country code, e.g. 441750345678, and reformatted as +44 1750 345678 or as (01750) 345678 for display.
Modifying that RegEx to also allow +44, with optional parentheses and spaces, in place of 0 yields this:
^\(?(\+44\)?\s?\(?|0)\d{2}(\)?[\s-]?\d){7,8}$
If you want to also allow 00 44 and 011 44, with optional parentheses and spaces, then it becomes:
^\(?((0(0|11)\)?\s?\(?|\+)44\)?\s?\(?|0)\d{2}(\)?[\s-]?\d){7,8}$
The (01750) 3) 4) 5) 6) 7) 8 problem is partially solved using:
^\(?((0(0|11)\)?\s?\(?|\+)44\)?\s?\(?|0)\d{2}(\)?[\s-]?\d){4}[\s-]{3,4}$
or
^\(?((0(0|11)\)?\s?\(?|\+)44\)?\s?\(?|0)[1-357-9]\d(\)?[\s-]?\d){4}[\s-]{3,4}$
limiting the optional closing brackets to only after the 2nd to 5th digits of the NSN.
The big problem with GB numbers is that the area code length varies from 2 to 5 digits around the country, and the subscriber number can be 8, 7, 6, 5, or 4 digits for a total length of 9 or 10 digits (not including the 0 trunk code). There's a LOT of history as to how it came to be this way.
The valid combinations are 2+8 (in 5 areas), 3+7 (in 12 areas), 3+6 (all 0500 and some 0800 numbers), 4+6 (in ~480 areas), 4+5 (in 40 areas: these 40 areas have a mix of 4+6 and 4+5 numbers), 5+5 (in 12 areas), 5+4 (in one area, and this area has a mix of 5+5 and 5+4 numbers).
Someone else has explained it far better over here:
www.aa-asterisk.org.uk/index.php/Number_format
www.aa-asterisk.org.uk/index.php/01_numbers
www.aa-asterisk.org.uk/index.php/Mixed_areas
www.aa-asterisk.org.uk/index.php/(0)
and other pages.
Regular expressions that attempt to validate and format in one go are doomed to failure. Splitting the task into matching and extraction, cleaning, validation and formatting makes for simpler patterns at each stage and easier management (and less bugs!).