Recently I worked on the project requiring me to scrap top 10 movies names from IMDB/boxoffice and the name of charachters in each movie.
I have used lxml package because it support xpath. In case you don’t know about Xpath, you can easily learn from here. You can use other packages too. Basic idea of this blog is to make approach as simple as possible.
The above code will return two list for box_office(5):
movie names:['Jurassic World', 'Inside Out', 'Terminator Genisys', 'Magic Mike XXL', 'Ted 2']
url of movies: ['http://www.imdb.com/title/tt0369610/', 'http://www.imdb.com/title/tt2096673/', 'http://www.imdb.com/title/tt1340138/', 'http://www.imdb.com/title/tt2268016/', 'http://www.imdb.com/title/tt2637276/']
Scrap cast name from above url.
The above code will going to return list of names of cast in the movie.