Laravel web scraper goutte library

Laravel web scraper goutte library

  • 14 July, 2018
  • Renish Khunt
  • Laravel

Web scraper is very useful when we want to get data from another site. We are using goutte library to fetching data from another site. Web scrapping is very helpful for fetching some recent news, article etc… Today we are going to scraping data from WordPress site. Let’s see how to fetch recent posts using Laravel web scraper goutte library.

I hope you install Laravel If not yet follow this tutorial.

We are going to install goutte package for Laravel let’s follow below steps.

Laravel Web scraper Goutte Library

composer require weidner/goutte

Wait until the package is installing, Once a package is installed we are going to register Service Provider and Facades. Let’s open “config/app.php” file then add service provider and facades like

'providers' => [
     Weidner\Goutte\GoutteServiceProvider::class,
],

'aliases' => [
     'Goutte' => Weidner\Goutte\GoutteFacade::class,
],

The Goutte package is ready for scrapping from to another site let’s see the example for fetching the recent posts from the WordPress site.

Route::get('/web/crawler/thecodingstuff', function() {
    $crawler = Goutte::request('GET', 'https://thecodingstuff.com');
    $crawler->filter('h2.blog-entry-title a')->each(function ($node) {
      dump($node->text());
    });
});

We are fetching the Recent Posts title from  “thecodingstuff.com” Also, We can access the HTML tags attribute value using Goutte package like

Route::get('/web/crawler/thecodingstuff', function() {
    $crawler = Goutte::request('GET', 'https://thecodingstuff.com');
    $crawler->filter('h2.blog-entry-title a')->each(function ($node) {
        print "<a href='".$node->attr('href')."'>".$node->text()."</a><br/>";
    });
});

Using “attr” method we can able to access the attribute of the selected HTML element.

Get the output something like this

Javascript validate Youtube video URL
Add Zero before US Zipcode/Pincode
Laravel 5 get recent posts WordPress using REST API
Laravel 5 CURL request using ixudra/curl package
WordPress inject Javascript to footer using action
JAVASCRIPT get location from Zip code using Ziptastic
WordPress create custom post type
Change placeholder color CSS
Programmatically publish tweet on the Twitter using PHP
Laravel – #1071 Specified key was too long; max key length is 767 bytes

Laravel we scraping is too easy with goutte module. We can easily get any kind of information from another site as per above example.

If you have any question write the comment below.