Yahoo Pipes 101 For Internet Marketers: Scraping Google Hot Trends

by imrat on September 14, 2010

Yahoo Pipes is a website that allows you to extract, amend, and combine data from a large range of sites and services by dragging and dropping basic modules.

The best way to get a basic understanding is to watch this yahoo pipes introductory video.

Here is an overview of the yahoo pipes editor screen:

At this point you will need some basic knowledge of HTML.

To extract data from a website, all you need to do is use the Fetch Page module and plug in the URL of the site (if needed using the URL Builder module). As an example I will show you how to scrape the keywords from google hot trends.

You want to identify the html code before and after the list of data you are looking for. I just use the browsers’ view sourcecode function.

Before: <td width=”25%”>
After: <script type=”text/javascript”>

You want to add these two items to the module.

Then select the module and press refresh on the Debugger window. If you have picked the right html code you should see the data you want to scrape by expanding the 0 just underneath “time taken” as per the screenshot below:

Then identify the html tag that is separating each unit of data by clicking on source and looking at the sourcecode. Often these are list items, ie <li>…</li> or url anchors <a>…</a>. After some trial and error I found that </a> as the delimiter to split the items got the best result:

Although in most instances your done at this point, in this particular case I am after the exact keyword, and none of the surrounding HTML. If you click on the ‘source’ link above the item you see what the problem is. Lots of other noise surrounding the piece i want:

<a rel=”nofollow” target=”_blank” href=”http://www.google.com/trends/hottrends?q=grandparents+day+2010&date=2010-9-12&sa=X”>grandparents day 2010</a>

You therefore need a second module to clean the data. For that use Operators > Regex. This will cycle through each of the items from the previous module and apply a regex to the item content.

Dont forget to connect the output of the Fetch Page module with the RegEx Module.

Regex is quit advanced stuff and I struggle at times. Check this site for a good online tool with explanations.

To extract the keyword here, the easiest for a novice like me is to just delete the elements of the content that you dont want. In this case all items are consistent, so its fairly easy.

So with the 1st regex rule, select the content upto …sa=X”>, and replace with nothing. To delete the end, select all text from </a… and replace that with nothing. Make sure the options are set as indicated in the screenshot below:

As a final step to get the list of keywords to show up when you run the module, you need to copy the item.content to item.title.

I also want the titles to link through to the google search for the keyword. To do this you need to add the google search url to each item followed by the keyword. I used Advanced Search, because the Instant Search was creating some URL problems in Yahoo Pipes.

To do this you create a Loop module, and insert into that a URL Builder module. Paste the full Google Advanced Search url (the one after you press enter) into the Base field and press Tab. This should split the URL into all the fields. Quickly remove all the ones that have no value by clicking the – button.

Then change the value of the as_q field by clicking the blue arrow next to the value field and select item.title. To ensure the link variable is created, type item.link in the assign-results-to field.

Finally your almost done. All you now need to do is connect the output from the RegEx module with the Pipe Output.

To save, just click the Save button in the right hand corner and add a Pipe name.

Then…just press Run Pipe…

It will look something like this:

You can test it your self here.

From here you can expand the pipe in a number of ways, for example by adding in some alexa data, google trends data, or for example google related terms. Alternatively you can use this pipe as an input to another pipe that for example check for domain availability.

That that for now.

I am working on a related guest post at the moment that will walk you through the process of creating your own URL scraper for scaling your PPV campaigns.

If you liked this post, or got some questions, leave a comment below…

  • Pingback: Scraping Google Instant Search | Internet Marketing Rat

  • Joseph

    Another amazing post! Quick question…

    When you said “And here is the pipe embedded in this post…”? What do you mean “embedded”? You mean “link”? And did I miss why it *should* be embedded? And where? I poked around in the source of this page but couldn’t find any clues.

    I feel like I’m missing something important here and feel a little silly.

  • http://imrat.com imrat

    Hi Joseph – there is an iframe below the statement in the post with the output from the yahoo pipe, but I noticed it is actually not showing the results. Not sure why that is but you can see the pipe here: http://pipes.yahoo.com/imrat/googlehottrends. I’ll update the post

  • Joseph

    Ah, I guess I’m still missing how you’re using it to make your million$. :-) Am I an idiot here? Wait, don’t answer that…

  • http://imrat.com imrat

    Yahoo Pipes in itself wont make you millions, but it will make life a lot easier and will automate sclaing campaigns to hit that $3000 / day. Its a simple timesaver basically, automating tasks that you could do manually or outsource.

  • Joseph

    Ah, ok… Seriously, I have about 6,000 ideas right now… I just wanted to make sure I want missing the *main point* of the post.

    You never know what you don’t know… you know?

    P.S. ppcbz is RT’ing you…! Damn, bitch. That bro walks on water! I’d do *anything* for a 15 minute conversation with him. I’m going to stalk him at the next Affiliate Summit in Vegas and try to tempt him with newly legal (hopefully) “presents” from California! Shhh…

  • Alexandra

    That’s incredibly cool! Great post… how do you find this stuff???

  • http://imrat.com imrat

    who knows…. call me sad or a geek. grew up with programming, went into business for my grown up job, worked my way up, and am now bored and going back to the good old days of finding new ways of solving a common problem. Noted that many people in PPV do stuff by hand, copy and paste. I am too impatient, so developed couple of ways to automate this stuff.I dont share everything….aim is to show enough to allow you to learn and find your own way ;)

Previous post:

Next post: