Scraping earnings.com

I am trying to scrape a page for earnings details. The target page may be in two sections "Earnings Releases-Confirmed" and "-Proposed". "Confirmed" comes first, and I have a pattern which finds that.
Going down through the page, you hit the "...Proposed" section and I want to change my CONFPROP variable at that point to "Proposed" and keep on extracting to the end, effectively capturing the CONFPROP variable value for the database.

I have scrapeable files for the CONFPROP and for the details, but I need them to work together for the session so I wind up with a database like this:

Confirmed, .....data from detail lines ....
. .......................................................
.
..........................................................
Proposed, ....data from detail lines ...
...........................................................
.
..........................................................
End

Scraping earnings.com

Thanks. I got the thing working by using two patterns, one to get the stuff between "Confirmed" and "Proposed" and another to get the stuff from "Proposed" to the end of the page plus a couple of one-liners to set the session variable, CONFPROP. Yes, I had to change to Pro version to be able to get to the data objects. Other than minor pattern tweaks, its ready to roll. Now I can look at automating it and having it dump the stuff into my db. Lets hope I find a few odds-improving markers for earnings response.

rg

Scraping earnings.com

Thanks for the tips. I was just reading the API docs and wondering if I could make a script to run the pattern. Do you know of any properties that tell where (line nr or other position info) the match is made? If I have that, I can identify the boundaries for Confirmed and Proposed.

Scraping earnings.com

Now I can get detail lines and write out stuff successfully. Next problem is applying the correct CONFPROP value for the section where the match happens. It can be in the "Confirmed" or "Proposed" section so I don't know how to restrict matching to the first (Confirmed) section or capture the location (line nr) of the second (Proposed) tag.

If someone is interested, I can post the extractor patterns, etc.