Is there a Wikipedia API just for retrieve the content summary?

I need just to retrieve the first paragraph of a Wikipedia page.

Content must be HTML formatted, ready to be displayed on my website (so no BBCode, or Wikipedia special code!)


There's a way to get the entire "introduction section" without any HTML parsing! Similar to AnthonyS's answer with an additional explaintext parameter, you can get the introduction section text in plain text.

Query

Getting Stack Overflow's introduction in plain text:

Using the page title:

https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&titles=Stack%20Overflow

Or use pageids:

https://en.wikipedia.org/w/api.php?format=json&action=query&prop=extracts&exintro&explaintext&redirects=1&pageids=21721040

JSON Response

(warnings stripped)

{
    "query": {
        "pages": {
            "21721040": {
                "pageid": 21721040,
                "ns": 0,
                "title": "Stack Overflow",
                "extract": "Stack Overflow is a privately held website, the flagship site of the Stack Exchange Network, created in 2008 by Jeff Atwood and Joel Spolsky, as a more open alternative to earlier Q&A sites such as Experts Exchange. The name for the website was chosen by voting in April 2008 by readers of Coding Horror, Atwood's popular programming blog.\nIt features questions and answers on a wide range of topics in computer programming. The website serves as a platform for users to ask and answer questions, and, through membership and active participation, to vote questions and answers up or down and edit questions and answers in a fashion similar to a wiki or Digg. Users of Stack Overflow can earn reputation points and \"badges\"; for example, a person is awarded 10 reputation points for receiving an \"up\" vote on an answer given to a question, and can receive badges for their valued contributions, which represents a kind of gamification of the traditional Q&A site or forum. All user-generated content is licensed under a Creative Commons Attribute-ShareAlike license. Questions are closed in order to allow low quality questions to improve. Jeff Atwood stated in 2010 that duplicate questions are not seen as a problem but rather they constitute an advantage if such additional questions drive extra traffic to the site by multiplying relevant keyword hits in search engines.\nAs of April 2014, Stack Overflow has over 2,700,000 registered users and more than 7,100,000 questions. Based on the type of tags assigned to questions, the top eight most discussed topics on the site are: Java, JavaScript, C#, PHP, Android, jQuery, Python and HTML."
            }
        }
    }
}

Documentation: API: query/prop=extracts


There is actually a very nice prop called extracts that can be used with queries designed specifically for this purpose.

Extracts allow you to get article extracts (truncated article text). There is a parameter called exintro that can be used to retrieve the text in the zeroth section (no additional assets like images or infoboxes). You can also retrieve extracts with finer granularity such as by a certain number of characters (exchars) or by a certain number of sentences (exsentences).

Here is a sample query http://en.wikipedia.org/w/api.php?action=query&prop=extracts&format=json&exintro=&titles=Stack%20Overflow and the API sandbox http://en.wikipedia.org/wiki/Special:ApiSandbox#action=query&prop=extracts&format=json&exintro=&titles=Stack%20Overflow to experiment more with this query.

Please note that, if you want the first paragraph specifically, you still need to do some additionally parsing as suggested in the chosen answer. The difference here is that the response returned by this query is shorter than some of the other API queries suggested, because you don't have additional assets such as images in the API response to parse.


Since 2017 Wikipedia provides a REST API with better caching. In the documentation you can find the following API which perfectly fits your use case (as it is used by the new Page Previews feature).

https://en.wikipedia.org/api/rest_v1/page/summary/Stack_Overflow returns the following data which can be used to display a summery with a small thumbnail:

{
  "type": "standard",
  "title": "Stack Overflow",
  "displaytitle": "Stack Overflow",
  "extract": "Stack Overflow is a question and answer site for professional and enthusiast programmers. It is a privately held website, the flagship site of the Stack Exchange Network, created in 2008 by Jeff Atwood and Joel Spolsky. It features questions and answers on a wide range of topics in computer programming. It was created to be a more open alternative to earlier question and answer sites such as Experts-Exchange. The name for the website was chosen by voting in April 2008 by readers of Coding Horror, Atwood's popular programming blog.",
  "extract_html": "<p><b>Stack Overflow</b> is a question and answer site for professional and enthusiast programmers. It is a privately held website, the flagship site of the Stack Exchange Network, created in 2008 by Jeff Atwood and Joel Spolsky. It features questions and answers on a wide range of topics in computer programming. It was created to be a more open alternative to earlier question and answer sites such as Experts-Exchange. The name for the website was chosen by voting in April 2008 by readers of <i>Coding Horror</i>, Atwood's popular programming blog.</p>",
  "namespace": {
    "id": 0,
    "text": ""
  },
  "wikibase_item": "Q549037",
  "titles": {
    "canonical": "Stack_Overflow",
    "normalized": "Stack Overflow",
    "display": "Stack Overflow"
  },
  "pageid": 21721040,
  "thumbnail": {
    "source": "https://upload.wikimedia.org/wikipedia/en/thumb/f/fa/Stack_Overflow_homepage%2C_Feb_2017.png/320px-Stack_Overflow_homepage%2C_Feb_2017.png",
    "width": 320,
    "height": 149
  },
  "originalimage": {
    "source": "https://upload.wikimedia.org/wikipedia/en/f/fa/Stack_Overflow_homepage%2C_Feb_2017.png",
    "width": 462,
    "height": 215
  },
  "lang": "en",
  "dir": "ltr",
  "revision": "902900099",
  "tid": "1a9cdbc0-949b-11e9-bf92-7cc0de1b4f72",
  "timestamp": "2019-06-22T03:09:01Z",
  "description": "website hosting questions and answers on a wide range of topics in computer programming",
  "content_urls": {
    "desktop": {
      "page": "https://en.wikipedia.org/wiki/Stack_Overflow",
      "revisions": "https://en.wikipedia.org/wiki/Stack_Overflow?action=history",
      "edit": "https://en.wikipedia.org/wiki/Stack_Overflow?action=edit",
      "talk": "https://en.wikipedia.org/wiki/Talk:Stack_Overflow"
    },
    "mobile": {
      "page": "https://en.m.wikipedia.org/wiki/Stack_Overflow",
      "revisions": "https://en.m.wikipedia.org/wiki/Special:History/Stack_Overflow",
      "edit": "https://en.m.wikipedia.org/wiki/Stack_Overflow?action=edit",
      "talk": "https://en.m.wikipedia.org/wiki/Talk:Stack_Overflow"
    }
  },
  "api_urls": {
    "summary": "https://en.wikipedia.org/api/rest_v1/page/summary/Stack_Overflow",
    "metadata": "https://en.wikipedia.org/api/rest_v1/page/metadata/Stack_Overflow",
    "references": "https://en.wikipedia.org/api/rest_v1/page/references/Stack_Overflow",
    "media": "https://en.wikipedia.org/api/rest_v1/page/media/Stack_Overflow",
    "edit_html": "https://en.wikipedia.org/api/rest_v1/page/html/Stack_Overflow",
    "talk_page_html": "https://en.wikipedia.org/api/rest_v1/page/html/Talk:Stack_Overflow"
  }
}

By default, it follows redirects (so that /api/rest_v1/page/summary/StackOverflow also works), but this can be disabled with ?redirect=false.

If you need to access the API from another domain you can set the CORS header with &origin= (e.g., &origin=*).

As of 2019: The API seems to return more useful information about the page.


This code allows you to retrieve the content of the first paragraph of the page in plain text.

Parts of this answer come from here and thus here. See MediaWiki API documentation for more information.

// action=parse: get parsed text
// page=Baseball: from the page Baseball
// format=json: in JSON format
// prop=text: send the text content of the article
// section=0: top content of the page

$url = 'http://en.wikipedia.org/w/api.php?format=json&action=parse&page=Baseball&prop=text&section=0';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "TestScript"); // required by wikipedia.org server; use YOUR user agent with YOUR contact information. (otherwise your IP might get blocked)
$c = curl_exec($ch);

$json = json_decode($c);

$content = $json->{'parse'}->{'text'}->{'*'}; // Get the main text content of the query (it's parsed HTML)

// Pattern for first match of a paragraph
$pattern = '#<p>(.*)</p>#Us'; // http://www.phpbuilder.com/board/showthread.php?t=10352690
if(preg_match($pattern, $content, $matches))
{
    // print $matches[0]; // Content of the first paragraph (including wrapping <p> tag)
    print strip_tags($matches[1]); // Content of the first paragraph without the HTML tags.
}

Yes, there is. For example, if you wanted to get the content of the first section of the article Stack Overflow, use a query like this:

http://en.wikipedia.org/w/api.php?format=xml&action=query&prop=revisions&titles=Stack%20Overflow&rvprop=content&rvsection=0&rvparse

The parts mean this:

  • format=xml: Return the result formatter as XML. Other options (like JSON) are available. This does not affect the format of the page content itself, only the enclosing data format.

  • action=query&prop=revisions: Get information about the revisions of the page. Since we don't specify which revision, the latest one is used.

  • titles=Stack%20Overflow: Get information about the page Stack Overflow. It's possible to get the text of more pages in one go, if you separate their names by |.

  • rvprop=content: Return the content (or text) of the revision.

  • rvsection=0: Return only content from section 0.

  • rvparse: Return the content parsed as HTML.

Keep in mind that this returns the whole first section including things like hatnotes (“For other uses …”), infoboxes or images.

There are several libraries available for various languages that make working with API easier, it may be better for you if you used one of them.