IMDB REST Webservice update
Update 2009-12-16
Looking for help in hosting
If you have a server that hosts PHP and you want to support the Scraper service please contact me (info at thumnet dot com).
Update 2009-12-12
Service currently down, due to too many request to IMDb, working on a fix!
Update 2009-12-09
- Added Picture in imdb-name request
- Added result count limitation param (see the guide)
- Some small bugfixes:
- missing tt and nm prefix in ImdbID property in imdb-name-search request,
- missing ImdbID for Writers and Directors in imdb-title request,
- changed Season and Episode in imdb-episode request to SeasonNR and EpisodeNR
- added result type to the summary element, to identify the result data
Update 2009-12-02
- Added Name search functionality, with imdb-name-search url param (imdb-search is now imdb-title-search)
- Added Name details, with imdb-name url param
Update 2009-12-01
- Fixed the Plot and Tagline for imdb title’s, see comments below.
It’s been some time since my post about the IMDB webservice but I’m proud to tell you readers there is a new version available.
Some important changes include:
- Restructured output (sorry to you guys who have to update their software)
- Output available in XML, JSON and debug (other output formats can be added on request)
- Automatic support for gzipping the output
- Summary information, containing:
- data source
- timestamp of the data
- time taken in ms
- scraper info
- error code (0 for no error!) and error description
- Easily extendable scrape framework, so in the future more sites can be scraped!
- Admin interface to review the data you guys produce
Still interested or just curious?
Well the new url is: http://scraper.thumnet.com
The old version (imdb.thumnet.com) will only be available until 1 december 2009.
Hi ,
The id below is not working same as the old scraper.
http://scraper.thumnet.com/xml/imdb-title/tt1176724
Br
Jimmy
@Jimmy
I’m sorry for that, I didn’t test the XML output.
The error occures because some of the characters aren’t encoded and others are, I build in a check to test for characters that need encoding and encode them if IMDB didn’t.
The problem should be fixed now.
Hi , seams to be working but I still have some issues with the encoding of special chars like the swedish ” Å Ä Ö”
If you compare the old one with the new you see what I mean.
http://imdb.thumnet.com/xml/title/tt1521870
and
http://scraper.thumnet.com/xml/imdb-title/tt1521870
@Jimmy
Dahm, stupid encoding.
Try it now…
Hi , wow that was a fast response
but ÅÖÄ is still funky
http://scraper.thumnet.com/xml/imdb-title/tt1307466
“Vi hade i alla fall tur med vädret – Igen”
but it should read
“Vi hade i alla fall tur med vädret – Igen”
/Jimmy
Hi
Is this API free to use ? Or should we have to get a license from IMDB ?
@Jimmy
Thnx,
Maybe for these characters you can do the decoding yourself.
@Imthiaz
Hello, officially you would need a license from IMDB.
Great service, thanks for your efforts.
However, recently several data elements, such as Plot and Tagline, are missing from the returned XML. Previously these data elements were present.
@Tyler
Hi Tyler,
Thnx for you’re comment.
Elements no longer showing in the output means that IMDB changed their HTML. I’ll look into the problem and make a post here when it’s updated.
A note to other users, please tell me when parts of the XML aren’t there or contain false data, so I can update the service.
It would be great if we can see the original (US) release dates of movies in the XML… Any chance of this being included?
Also, would it be possible for you to open-source your scraper?
Bara
Great service indeed! I hope you get the missing data elements fixed soon – I didn’t even know the Plot and Tagline etc. were supposed to be there until I saw the comments.
Also – any chance for searching by person name? Like for an actor, director etc. (http://www.imdb.com/find?s=all&q=brad+pitt) ? And then displaying the details for the person?
I’m working on a little project that uses your IMDb scraper. Is there some way of contacting you other than these comments? I would like to mention the scraper as a source etc. and possibly encourage users to support your site.
@Tyler
@Markus
Plot and and tagline are fixed now.
@Markus
Before the end of this week I will release functionality to search people (names) and get information about them using their nm ID.
@ThumNet
Sweet!!
Hi again
Solved my decoding problems and everyting is working great now.
On the functionality side I would love to see support for releses dates and alternative titles as seen here for example
http://www.imdb.com/title/tt1216487/releaseinfo#akas
Thrilled about the new name search feature! Any chance the might be a ‘Name details’ with a list of relevant movie titles/id option coming up?
A couple of questions:
#1 Any idea why searching for ‘heroes’ or ’2012′ don’t really what one would expect?
Try http://scraper.thumnet.com/xml/imdb-title-search/heroes or http://scraper.thumnet.com/xml/imdb-title-search/2012
and compare with http://www.imdb.com/find?s=all&q=heroes
Scrapers XML has popular hits with only node for ‘Picture’
#2 Similar with name search:
http://scraper.thumnet.com/xml/imdb-name-search/brad+pitt
It returns only ‘Popular results’ only one node -> ‘Picture’
Thanks for all the work!
@Markus
Fixed the problems you posted.
Also added name details
@ThumNet In the words of director Burns – eeexcellent!
Sweet, man. Eager to implement this in the old app I worked on (Hippo).
I think I found a couple of things that need fixing/adding.
#1 Actors/producers etc. in the person details xml are missing the ‘nm’ and ‘tt’ letters from their IMDb id’s -> http://scraper.thumnet.com/xml/imdb-name/nm0000093
#2 Writers/directors/producers in the title details xml are missing their IMDb id nodes. -> http://scraper.thumnet.com/xml/imdb-title/tt0133093
#3 Any chance of getting a picture for the person details xml also?
#4 The episodes feed now has structure ‘Episodes’ > ‘Episode’ > ‘Episode’ – could this be ‘Episodes’ > ‘Episode’ > ‘Episodenumber’ ? I would make the parsing simpler if the node names inside the parent node were unique. Simpler for me that is as I’m not a ‘proper’ coder
#5 A possiblity to define the max length of the hits to limit the size of the XML would be cool
#6 Also helpful would be to have a node in the XML ‘root’ that simply defines the type of the feed – like resultperson ( ‘resulttitle’, ‘detailperson’, ‘detailtitle’ ).
@Markus
When I have some time to spare I’ll look into them! Most likely tomorrow evening/night.
Ever so excellent, thanks!
@Markus
See update above!
Any update maybe on when the website’s fixed?
Other than that, I’m very very happy with your service, and your episode imdb adds exactly what I felt was missing. Keep up the good work!
@Joris Kommeren
Currently I’m looking for support in running the hosting script.
So if anyone could help, please contact me. (info at thumnet dot com)
Hi Thumnet,
is this legal to use in a freeware app that I create.
The imdb website says we cant use thier data, but accessing it through your website, how does the legal thing work ?
thanks
k
How the legal thing works exactly I really don’t know, but maybe someone else can point this out…?
@kkr
For sure you cannot ask money for what ever you are doing with the data. In my project I’ve made sure to link to IMDb.com whenever possible and branded it as an UnOfficial IMDb client so as not to claim to be the owner of the data. All in all it will just generate traffic towards IMDb.com and hopefully they will see it as a good thing…
Hi !
Been using your excelent service since the start now to get metadata for my movie collection and most of the titles works fine bu now and then there is a tiitle that gives me problems.
For example
http://scraper.thumnet.com/xml/imdb-title/tt0153922/
and
http://scraper.thumnet.com/xml/imdb-title/tt1186830/
where the tag contains a lot more that just the title, is this someting that could be fixed ?
Br
Jimmy
Hi Jimmy, the problem can indeed occur and it is something that will be fixed in the future. But I can’t tell you when the fix will be available. This is because my private live is hectic at the moment.