I've been tasked with writing a program that reads a string's format followed by the actual string from a file, tests the string to see whether it adheres to the format or not, and attempts to format the string if possible. The syntax for the string's format is such that [(a-z),(0-9)@(A-Z),(0-9)] means that the string consists of common letters between a and z; integers between 0 and 9, followed but an @ sign, followed by capital letters between A - Z and integers between 0 and 9.
I'm aware of the policy concerning home work questions and this is indeed one of those, but i'm completely stumped and would appreciate a nod in the right direction. Thank you in advance.
I should clarify, that I am aware of how to format a string given most criteria, but my problem is interpreting the correct format from the format string. Also, while the format syntax resembles that of a regular expression, I am only to use the default string library.
That's relatively simple. The lowercase letters are in the ASCII range 97-122, the digits are in the range 48-57, and the capital letters are in the range 65-90. The @ is 64. See http://www.asciitable.com/ for reference.
You iterate over the string twice, first you check if the characters are lowercase letters or digits. When you hit an @, you quit the loop and enter the second one. In that one, you check if the characters are capital letters or digits. If in the first loop you hit the end of the string before finding an @, the string doesn't abide the format. Similarily, if you find non-lower case or -digits before the @, or if you find non-capital letters or -digits after it.
As to the correction, if it's a letter you can just convert it to an uppercase or lowercase letter respectively, other characters can probably not be converted unless you got further specifications for such conversions.
Hanst99. Thanks for the reply, it is much appreciated, but knowing how to check and format string wasn't really my concern. My problem is having the program read the format string and then determine actual string should look like. What you outlined was the steps that should be taken after the program has made that determination based on the example that i gave.
The lowercase letters are in the ASCII range 97-122, the digits are in the range 48-57, and the capital letters are in the range 65-90
I would put it this way, which I think is more readable: lowercase letter are in the range 'a' to 'z', digits are in the range '0' to '9', and uppercase letters are in the range 'A' to 'Z'. This way, it doesn't matter what the underlying ASCII code is, as it's just an implementation detail. And you get nice, readable code, instead of magic numbers.
That said, why not use islower(), isdigit() and isupper()?
@Amakusa66: does (a-z) means any number of characters from a to z? From the format, it doesn't look like it, but your explanation says "letters" and "digits".
Yes, its means any any amount of characters as longs they fall withing the range a to z. Just to reiterate, if only had to format a string based on a fixed format, then i'd have no problem. The problem lies in having the program interpret the format criteria and then format the string based on that criteria.
I'd start like this: write a function that takes a format string and a string to be checked as arguments. Then parse the format string.
The first character must be a left bracket.
Once we're inside, we generalize:
- If the next character we find is a left paren, what follows needs to be a character x, a '-', another character y (y > x) and a right paren. As you check, store those two characters. Then you can initialize an iterator to the string to be checked and keep incrementing it until the character it points to isn't in that range. If the first character you check isn't in the range, signal an error and get out. Back in the format string, we check for the next character. If it's a comma, it must be followed by a left paren or the closing bracket. Of course, if it's a left paren we repeat what we just did, and if it's anything else we signal an error.
- If the next character is not a left paren or a closing bracket, (I'm assuming the format doesn't only work with '@'), our iterator must point to a character holding the same value.
Repeat until it's done.
That's the general idea. You probably have parsed text before, since you got this assignment, so you must have an idea of how to accomplish what I described. You can add the attempt to reformat the string after you get the checking to work.
Wait, so your assignment was to write a parser for regular expressions (this IS a regular expression, just the syntax is different from the usual perl-like one)? Sorry, completely misread it lol.
@hanst99, No problem, the fault is mine actually, my post was a bit convoluted because my programming lingo isn't yet up to scratch.
@filipe, Thanks alot for the advice :). Oh, and no, we weren't taught anything close to the level of what is needed for the assignment, but alas, that's what texts books and internet forums are for lol.