Parsing a COBOL word

Started by Steven White on 6-Jul-2018/14:54:45-7:00
I am reviewing some old code in light of some new understanding of parsing. I have a brute-force function that checks a string to see if it is a valid COBOL word, that is, starts with a letter, contains only letters, digits, or the hyphen, and is no more than 30 characters long. I am trying to perform that check using parse, and I think I have got it, except for the length restriction of 30 characters. I am wondering if a parse rule can check the parsed data for its length, or if that check is best done elsewhere. In other words, check for the maximum length, and THEN parse it for valid characters. Thank you. Code sample follows. R E B O L [Title: "Test for COBOL word"] LETTER: charset [#"A" - #"Z"] DIGIT: charset [#"0" - #"9"] VALIDCHARACTER: [some LETTER | some DIGIT | #"-"] COBOLWORD: [1 LETTER some VALIDCHARACTER] print parse "123456" COBOLWORD ;; should be false; starts with number print parse "ABCDEF" COBOLWORD ;; should be true; all letters print parse "A-1-STEAK-SAUCE" COBOLWORD ;; should be true; starts with letter print parse "4runner" COBOLWORD ;; should be false; starts with number print parse "$ average" COBOLWORD ;; should be false; invalid character print parse "A----BCDE" COBOLWORD ;; should be true; multiple - allowed halt
Try this: COBOLWORD: [1 LETTER and [not 30 VALIDCHARACTER] some VALIDCHARACTER ]
When you say `<integerA> <integerA> rule` that means "between A and B matches of the rule". COBOLWORD: [ 1 LETTER 1 29 [LETTER | DIGIT | "-"] ] I would skip the separate definition of VALIDCHARACTER, doesn't seem necessary (confusing name...it's valid but not at the beginning, so it should be called VALIDNOTFIRSTCHARACTER or something, not naming it seems best) Because I think one of the big points of the language is aesthetics over micro-optimization, I think parse rules should use string literals where possible. Character literals are ugly. String matching a single character string is slightly slower...but there's not a very compelling reason why it should be to any great extent. YMMV.
That would have to become COBOLWORD: [ 1 LETTER 0 29 [LETTER | DIGIT | "-"] ] Because 1 letter is also very valid as a variable.

Reply