«

»

IMDB REST Webservice update

Update 2009-12-16

Looking for help in hosting

If you have a server that hosts PHP and you want to support the Scraper service please contact me (info at thumnet dot com).


Update 2009-12-12

Service currently down, due to too many request to IMDb, working on a fix!

Update 2009-12-09

  • Added Picture in imdb-name request
  • Added result count limitation param (see the guide)
  • Some small bugfixes:
    • missing tt and nm prefix in ImdbID property in imdb-name-search request,
    • missing ImdbID for Writers and Directors in imdb-title request,
    • changed Season and Episode in imdb-episode request to SeasonNR and EpisodeNR
    • added result type to the summary element, to identify the result data

Update 2009-12-02

  • Added Name search functionality, with imdb-name-search url param (imdb-search is now imdb-title-search)
  • Added Name details, with imdb-name url param

Update 2009-12-01

  • Fixed the Plot and Tagline for imdb title’s, see comments below.

It’s been some time since my post about the IMDB webservice but I’m proud to tell you readers there is a new version available.

Some important changes include:

  • Restructured output (sorry to you guys who have to update their software)
  • Output available in XML, JSON and debug (other output formats can be added on request)
  • Automatic support for gzipping the output
  • Summary information, containing:
    • data source
    • timestamp of the data
    • time taken in ms
    • scraper info
    • error code (0 for no error!) and error description
  • Easily extendable scrape framework, so in the future more sites can be scraped!
  • Admin interface to review the data you guys produce

Still interested or just curious?

Well the new url is: http://scraper.thumnet.com

The old version (imdb.thumnet.com) will only be available until 1 december 2009.

34 comments

2 pings

  1. Jimmy says:

    Hi ,

    The id below is not working same as the old scraper.

    http://scraper.thumnet.com/xml/imdb-title/tt1176724

    Br
    Jimmy

  2. ThumNet says:

    @Jimmy
    I’m sorry for that, I didn’t test the XML output.

    The error occures because some of the characters aren’t encoded and others are, I build in a check to test for characters that need encoding and encode them if IMDB didn’t.

    The problem should be fixed now.

  3. Jimmy says:

    Hi , seams to be working but I still have some issues with the encoding of special chars like the swedish ” Å Ä Ö”

    If you compare the old one with the new you see what I mean.
    http://imdb.thumnet.com/xml/title/tt1521870
    and
    http://scraper.thumnet.com/xml/imdb-title/tt1521870

  4. ThumNet says:

    @Jimmy
    Dahm, stupid encoding.

    Try it now…

  5. Jimmy says:

    Hi , wow that was a fast response ;)

    but ÅÖÄ is still funky
    http://scraper.thumnet.com/xml/imdb-title/tt1307466

    “Vi hade i alla fall tur med vädret – Igen”

    but it should read
    “Vi hade i alla fall tur med vädret – Igen”

    /Jimmy

  6. Imthiaz says:

    Hi

    Is this API free to use ? Or should we have to get a license from IMDB ?

  7. ThumNet says:

    @Jimmy
    Thnx, :D

    Maybe for these characters you can do the decoding yourself.

  8. ThumNet says:

    @Imthiaz
    Hello, officially you would need a license from IMDB.

  9. Tyler says:

    Great service, thanks for your efforts.
    However, recently several data elements, such as Plot and Tagline, are missing from the returned XML. Previously these data elements were present.

  10. ThumNet says:

    @Tyler
    Hi Tyler,

    Thnx for you’re comment.

    Elements no longer showing in the output means that IMDB changed their HTML. I’ll look into the problem and make a post here when it’s updated.

    A note to other users, please tell me when parts of the XML aren’t there or contain false data, so I can update the service.

  11. Bara says:

    It would be great if we can see the original (US) release dates of movies in the XML… Any chance of this being included?

    Also, would it be possible for you to open-source your scraper?

    Bara

  12. Markus says:

    Great service indeed! I hope you get the missing data elements fixed soon – I didn’t even know the Plot and Tagline etc. were supposed to be there until I saw the comments.

    Also – any chance for searching by person name? Like for an actor, director etc. (http://www.imdb.com/find?s=all&q=brad+pitt) ? And then displaying the details for the person?

  13. Markus says:

    I’m working on a little project that uses your IMDb scraper. Is there some way of contacting you other than these comments? I would like to mention the scraper as a source etc. and possibly encourage users to support your site.

  14. ThumNet says:

    @Tyler

    @Markus

    Plot and and tagline are fixed now.

  15. ThumNet says:

    @Markus

    Before the end of this week I will release functionality to search people (names) and get information about them using their nm ID.

  16. Markus says:

    @ThumNet
    Sweet!!

  17. JImmy says:

    Hi again :)

    Solved my decoding problems and everyting is working great now.

    On the functionality side I would love to see support for releses dates and alternative titles as seen here for example

    http://www.imdb.com/title/tt1216487/releaseinfo#akas

  18. Markus says:

    Thrilled about the new name search feature! Any chance the might be a ‘Name details’ with a list of relevant movie titles/id option coming up?

    A couple of questions:
    #1 Any idea why searching for ‘heroes’ or ’2012′ don’t really what one would expect?

    Try http://scraper.thumnet.com/xml/imdb-title-search/heroes or http://scraper.thumnet.com/xml/imdb-title-search/2012
    and compare with http://www.imdb.com/find?s=all&q=heroes
    Scrapers XML has popular hits with only node for ‘Picture’

    #2 Similar with name search:
    http://scraper.thumnet.com/xml/imdb-name-search/brad+pitt
    It returns only ‘Popular results’ only one node -> ‘Picture’

    Thanks for all the work!

  19. ThumNet says:

    @Markus

    Fixed the problems you posted.

    Also added name details :D

  20. Markus says:

    @ThumNet In the words of director Burns – eeexcellent!

  21. Johan says:

    Sweet, man. Eager to implement this in the old app I worked on (Hippo).

  22. Markus says:

    I think I found a couple of things that need fixing/adding.

    #1 Actors/producers etc. in the person details xml are missing the ‘nm’ and ‘tt’ letters from their IMDb id’s -> http://scraper.thumnet.com/xml/imdb-name/nm0000093

    #2 Writers/directors/producers in the title details xml are missing their IMDb id nodes. -> http://scraper.thumnet.com/xml/imdb-title/tt0133093

    #3 Any chance of getting a picture for the person details xml also?

    #4 The episodes feed now has structure ‘Episodes’ > ‘Episode’ > ‘Episode’ – could this be ‘Episodes’ > ‘Episode’ > ‘Episodenumber’ ? I would make the parsing simpler if the node names inside the parent node were unique. Simpler for me that is as I’m not a ‘proper’ coder :)

    #5 A possiblity to define the max length of the hits to limit the size of the XML would be cool

    #6 Also helpful would be to have a node in the XML ‘root’ that simply defines the type of the feed – like resultperson ( ‘resulttitle’, ‘detailperson’, ‘detailtitle’ ).

  23. ThumNet says:

    @Markus
    When I have some time to spare I’ll look into them! Most likely tomorrow evening/night.

  24. Markus says:

    Ever so excellent, thanks!

  25. ThumNet says:

    @Markus
    See update above!

  26. Joris Kommeren says:

    Any update maybe on when the website’s fixed?

    Other than that, I’m very very happy with your service, and your episode imdb adds exactly what I felt was missing. Keep up the good work!

  27. ThumNet says:

    @Joris Kommeren

    Currently I’m looking for support in running the hosting script.

    So if anyone could help, please contact me. (info at thumnet dot com)

  28. kkr says:

    Hi Thumnet,

    is this legal to use in a freeware app that I create.
    The imdb website says we cant use thier data, but accessing it through your website, how does the legal thing work ?

    thanks
    k

  29. ThumNet says:

    How the legal thing works exactly I really don’t know, but maybe someone else can point this out…?

  30. Markus says:

    @kkr
    For sure you cannot ask money for what ever you are doing with the data. In my project I’ve made sure to link to IMDb.com whenever possible and branded it as an UnOfficial IMDb client so as not to claim to be the owner of the data. All in all it will just generate traffic towards IMDb.com and hopefully they will see it as a good thing…

  31. Jimmy says:

    Hi !

    Been using your excelent service since the start now to get metadata for my movie collection and most of the titles works fine bu now and then there is a tiitle that gives me problems.

    For example
    http://scraper.thumnet.com/xml/imdb-title/tt0153922/
    and
    http://scraper.thumnet.com/xml/imdb-title/tt1186830/

    where the tag contains a lot more that just the title, is this someting that could be fixed ?

    Br
    Jimmy

  32. ThumNet says:

    Hi Jimmy, the problem can indeed occur and it is something that will be fixed in the future. But I can’t tell you when the fix will be available. This is because my private live is hectic at the moment.

  33. Thomas says:

    Hi
    I just came across your script and wanted to ask if you thought about making it open source. Your server load would be reduced and maybe you’d even find some people to help you perfect it.

  34. ThumNet says:

    Hi Thomas, maybe in the future I’ll put the source on github or Google Code.
    I’ll write a new blogpost when the sources are available.

  1. REST-like webservice/api for imdb.com with XML or JSON output | ThumNet says:

    [...] New version available! Read my update post. [...]

  2. Moubail.com applications – status update | moubail.com says:

    [...] contains HTML code and such. This has it’s roots in the way the IMDb data is collected (see here). Whenever the HTML structure changes at IMDb.com it affects the data collecting (a.k.a. scraping) [...]

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>