Thursday, April 18, 2024

How to fix CURL call imporitng an RSS feed on a site blocking CURL calls



There is a 3rd party service provider that my organization uses called bibliocommons.  They have these nice book carousels.  However the carousels are not very customizable and are only available for specific lists made by Bibliocommons. 


So I wrote a php rss reader that takes an RSS feed that is produced by the page list and built a carousel that is more customizable and updates it's self once a day (see below)

The carousel I wrote has all the art at the same height; and it puts the name of the book under the art which makes it more accessible and still links to the book in the catalogue just like the other carousel.  It has been working fine for 6 years but not after a major DDoS attack on the vendor; they are using cloud front to stop scripts from hitting the site using CURL and broke my carousel (see below).


Even after asking the vendor to whitelist the webserver things were not progressing on getting the issue resolved so; comes sublime text edit (my all time favorite coding tool).  So I thought I would see what exactly is going on since I could access the RSS feed just fine in my web browser but my script was receiving 301 - 307 errors (whatever the server felt like throwing).

So troubleshooting this I found two issues.  One was a full url that was in the node value and not being escaped and a USER_AGENT detection which was causing the script from accessing the RSS feed; which seems silly because I would think you would want users to access RSS feeds.  Since my browser was able to access the RSS feed; I determined that they must be doing some sort of detecting; which they were.

CODE

<?PHP  
$ch = curl_init();
  $url = "$RSSURL";
  curl_setopt($ch,CURLOPT_URL,$url);
  curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36');
  curl_setopt($ch,CURLOPT_HEADER, 0);
  curl_setopt($ch,CURLOPT_NOBODY, 0);
  curl_setopt($ch,CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch,CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt($ch,CURLOPT_CONNECTTIMEOUT ,5);
  curl_setopt($ch,CURLOPT_TIMEOUT, 20);
  $response = curl_exec($ch);
  curl_close ($ch);
file_put_contents("$XMLDATAFILE.xml", $response);
?>

So the key was adding the CURLOPT_USERAGENT to get access to the RSS Feed; I randomly chose the user agent but you could randomize it in a variable so you wouldn't have the same useragent hitting the server every time if you wanted something more random.  

You can get some sample agents from deviceatlas.com,  Once added to my CURLOPT; my carousel started to work again and the XML errors where corrected using the file_put_contents command into a different xml file and reading that XML file.


How to fix CURL call imporitng an RSS feed on a site blocking CURL calls

There is a 3rd party service provider that my organization uses called bibliocommons.  They have these nice book carousels.  However the car...