Parsing with to

Started by Andyh on 25-Jan-2012/23:46:28-8:00
I'm building a web scraping application which when modeled like this works as I expect: p: " stuff <h1> a header </h1> <p> some words </p> <p> more words </p> whatever " chars: charset [ #"a" - #"z" ] h1: [ <h1> copy title any chars (print title)</h1> ] para: [ <p> copy ptext any chars (print ptext) </p> ] r: parse p [ any chars h1 some para any chars end ] print r Output is: a header some words more words true If, instead, I do something similar with "to", I get an error. p: " stuff <h1> a header </h1> <p> some words </p> <p> more words </p> whatever " header: [ <h1> copy title to </h1> (print title) ] para: [ <p> copy ptext to </p> (print ptext)] r: parse p [ to header [some para ] to end] print r ** Script Error: Invalid argument: <h1> copy title to </h1> print title ** Near: r: parse p [to header [some para] to end] print Can someone explain why "to" behaves so differently than "any chars"? I'm sure that I'm missing something. Thanks for your help.
This is because string parsing and block parsing is totally different. Take a look in Core manual about them. In string parsing there are characters, in block parsing there are REBOL values (words, numbers, blocks)
TO cannot accept a rule as argument. You could use instead: r: parse p [ to <h1> header [some para ] to end]
Thanks a bunch Doc. I think I've got it. Now I'll try some real web pages!
Happy scraping! That is what's made me want to learn REBOL in the first place, twelve years ago. ;-)

Reply