Ok. I want a script to scrape Indeed.com. This is a job search website.
Basically, I will do a manual search first from the advanced search area of [login to view URL]
[login to view URL]
I will enter the criteria I want. Any criteria. Lets say I enter "unix" and I select from the 'show jobs from" dropdown, 'employer web sites only'. And I select from the dropdown 'Display [50] results per page' and click 'find jobs', it will do the search and generate this link:
[login to view URL]
This link is all you care about. I pass this link to the script you will write and then the script will build a list of all of the pages to go to, e.g.
[login to view URL]
[login to view URL]
[login to view URL]
etc....Down to the end of the page and then it will click "next"
[login to view URL]
And continue to collect the links on the next page. Until there are no more pages, it will collect all the links.
Then, the different pages will be fetched from the list just collected (and I can control the wait time between each page fetch with a command line argument e.g. -w 5 (means wait 5 seconds between each page fetch).
The first link:
[login to view URL]
Goes to:
[login to view URL]
I want this link captured and output to a file. The filename will be another command line argument: -f [login to view URL]
The script will proceed to go through the list it captured and simply output a list of URLs to a filename that I specify, and it will wait any number of seconds I specify it to.
Please let me know if this was a clear description and if you have any questions.
Thanks a lot.