I've been beavering away on this fulltext thing. The scraper has been collecting that, in addition to what it already was, for around a week now. I've also run a script to backfill all the still reachable posts I had archived. If a new posts is found but the 'click through' to the fulltext failed then the post will be stored without the fulltext. I didn't want to not store the post at all and continue having the same issue until after the post was no longer in the relevant member's activity list, and thus lose the post entirely. This does mean that the fulltext RSS feed below might sometimes contain posts with only precis text. I'll know when this has happened and will re-run the backfill script to ensure the archive has the full text.
I've just tweaked things to now generate an extra .rss file that utilises the fulltext. This output comes with the same HTML class names/ids as in the forum HTML, but most readers will strip all the attributes out, so I've not attempted to style things as per the forums. Instead I change the blockquote element that's there into a div so that I can then change the divs that contain post quotes into blockquotes. I'm also stripping out the inline images used for emoticons, mostly because of an attribute I'd have to have stripped from them otherwise to get the RSS to validate. I may re-visit that.
Anyway, if anyone wants to test with the new fulltext RSS feed the URL is: https://miggy.org/games/elite-dangerous/devtracker/ed-dev-posts-fulltext.rss
I've switched over my personal Tiny-Tiny RSS instance to using this now and will be monitoring it. I still have a few things on my TODO list for it, like making sure that any forum code elements are rendered in a sane manner (but that's likely just changing an enclosing div into a pre instead, due to not being able to apply specific styling).
The old feed URL remains unchanged and will continue to contain only the 'precis' text that's available from the forum member Activity Lists.
Next up will be updating the search UI/code to allow for searching in the fulltext, not just the precis and title.
I've just tweaked things to now generate an extra .rss file that utilises the fulltext. This output comes with the same HTML class names/ids as in the forum HTML, but most readers will strip all the attributes out, so I've not attempted to style things as per the forums. Instead I change the blockquote element that's there into a div so that I can then change the divs that contain post quotes into blockquotes. I'm also stripping out the inline images used for emoticons, mostly because of an attribute I'd have to have stripped from them otherwise to get the RSS to validate. I may re-visit that.
Anyway, if anyone wants to test with the new fulltext RSS feed the URL is: https://miggy.org/games/elite-dangerous/devtracker/ed-dev-posts-fulltext.rss
I've switched over my personal Tiny-Tiny RSS instance to using this now and will be monitoring it. I still have a few things on my TODO list for it, like making sure that any forum code elements are rendered in a sane manner (but that's likely just changing an enclosing div into a pre instead, due to not being able to apply specific styling).
The old feed URL remains unchanged and will continue to contain only the 'precis' text that's available from the forum member Activity Lists.
Next up will be updating the search UI/code to allow for searching in the fulltext, not just the precis and title.