TL;DR - It could easily be another week before I have this working and handling everything again.
I started work on the new scraping code today. I'm at best half way done with it and taking the rest of the day off from this. The curious can check the 'xf2' branch in the github repo (which I linked somewhere in a previous post).
Small differences in data available in the activity list make me grateful I was given access to the forum's API (although it doesn't contain an equivalent to the per-user activity list so that bit is still HTML scraping). Re-creating exactly the same experience might be a little tricky, particularly when it comes to the details of post content that includes quotes (I'm going to have to investigate Parse::BBCode to turn the BB code version of posts that the API spits out into appropriate HTML for the RSS feed and search interface).
I also found a whole bunch of duplicate posts I had due to various past forum shenanigans with changing the format of URLs. I think I have all that cleaned up now, but still need to go back and make sure my 'guid_url' data is something that will work on thread-starting posts for old posts.
I started work on the new scraping code today. I'm at best half way done with it and taking the rest of the day off from this. The curious can check the 'xf2' branch in the github repo (which I linked somewhere in a previous post).
Small differences in data available in the activity list make me grateful I was given access to the forum's API (although it doesn't contain an equivalent to the per-user activity list so that bit is still HTML scraping). Re-creating exactly the same experience might be a little tricky, particularly when it comes to the details of post content that includes quotes (I'm going to have to investigate Parse::BBCode to turn the BB code version of posts that the API spits out into appropriate HTML for the RSS feed and search interface).
I also found a whole bunch of duplicate posts I had due to various past forum shenanigans with changing the format of URLs. I think I have all that cleaned up now, but still need to go back and make sure my 'guid_url' data is something that will work on thread-starting posts for old posts.