SORRY THIS NO LONGER WORKS. Import.io REMOVED THEIR FREE PLAN AND I JUST COULDN’T PAY TO KEEP THIS RUNNING.
Amazon has a listing of the best sellers in the Kindle Store, and the listing is divided into two types: Paid and Free. Meaning, there’s a list of top 100 best selling kindle books that are free, and also a list of top 100 kindle books that are not free. And according to Amazon, the list changes every hour.
Q: How do you remain updated with content that is changing regularly?
A: You subscribe to its feed.
Like the way I’m sure must of you have subscribed to the feed of this website, you could subscribe to the feed of any other page or site, and be notified whenever the page changes. To make it easy for its users to keep track of the changes in the best selling books, amazon provides an RSS feed as well. You could use any feedreader/feed subscriber programs and subscribe to the feed.
However, what is amusing is that, the RSS feed is available for only the Top 100 Paid Best Sellers. Amazon does not provide an RSS feed for the Top 100 Free Best Sellers on Kindle Store.
- Go to the best sellers listing.
- Scroll to the end till you see RSS Feed.
- Now, click on Subscribe to: Best Sellers > Kindle Store and you’ll get an RSS feed that consists of only the paid books.
What I tried to do was create an RSS feed for the Top 100 Free Best Sellers as well.
And the link to the RSS feed is: bit.ly
- Trained an Import.io Connector to scrape the content of Amazon’s listing page.
- For each page on the listing (5 pages; since its a top 100 list and each page has about 20 entries), I created different spreadsheets. So, in total I had 5 spreadsheets, each for a page in Amazon’s listing.
- Each of these spreadsheets called the connector’s API with their respective page numbers so that only the content of that particular page will be scraped. This was done in order to adhere to Google’s runtime limit of a script. A script can run for only 6 minutes but if you try fetching all 5 pages via a single script then it will take longer.
- The connector’s API would return the scraped content of each listing page, and it will be filled on the respective spreadsheet by the google script.
- Another script will finally combine all these 5 spreadsheets and create a combined RSS feed. In order to do this, the script requests the contents of each of the 5 spreadsheets in RSS format, parses them and creates a combined RSS feed out of it.
- Now, one can just subscribe to the feed using Feedly or any other RSS feed readers and get notified when there’s a new free best seller in town.
Import.io is an intelligent web scraping tool. You can scrape contents from pages so easily without needing to code anything at all. An Import.io Connector is one of the types of scraping technique provided by Import.io that can record your queries and follow it in order to make requests, and can even be used in pages that are behind authentication. See more about it.
The code for it is available on Github. But, I have removed the parts that identify my connector’s API and the individual spreadsheets, just to prevent it from being misused. You do have access to all the things you will need to recreate this on your end (if you want to), however.
Since we are requesting the Import.io connector to scrape the page, we are dependent on Import.io’s server. Sometimes, the server is on heavy load, and the script therefore runs for more than 6 minutes, which means it reaches Google’s said limit, and then it will automatically be put to rest. In this case, the RSS feed does not get to update with the latest data on the listing page. However, since the google script runs every hour, there should not be much difference.