Run Requests Asynchronously with the Batch Endpoint

At a glance

The Batch endpoint lets you scale more efficiently by leveraging Mailchimp’s infrastructure to enqueue and monitor longer-running requests. Batch operations run in the background on Mailchimp’s servers, and are particularly useful in three contexts:

  1. Requests to the Marketing API time out at 120 seconds, so if you’re making a long-running request that won’t finish in that time, you may need to use the Batch endpoint to complete the request.

  2. The Marketing API has a limit of 10 concurrent connections for your API key. If you’re expecting to make a higher volume of requests from your application, Batch requests will let you work around this limitation.

  3. Depending on your server language or architecture, you may not want a request to the Marketing API to block other threads. Batch requests are fast and won’t block while long operations take place.

You can use the Batch endpoint to do things like: 

  • Update audience members in bulk from your external CRM or databases

  • Update your databases with Mailchimp audience member data

  • Retrieve reports or report data in bulk

In this guide, we’re going to use the Batch endpoint to do a big job: adding 1,000 new contacts from an external database into a Mailchimp audience. We collected those names and email addresses through the sign-up form on our application; now, we’re launching a marketing newsletter, so it’s time to break out the Batch endpoint. 

(Don’t worry: We were optimistic when we built our application — we didn’t have a marketing budget, but we hoped we’d be able to afford marketing someday — so we included language in our sign-up process that ensured we’d be in compliance with data privacy laws.)

Using the Batch endpoint, we’ll walk through making the batch request, checking its status, and getting the results—and we’ll also walk through how to set up notifications for results using batch webhooks.

What you’ll need

Make a batch operations request

We’re ready to turn our 1,000 names and email addresses into 1,000 Mailchimp contacts. Rather than make 1,000 separate API calls, we’ll use the Batch endpoint to add 1,000 contacts in a single call. With that single call, the Batch endpoint will call other endpoints on our behalf — here, we’ll be implementing the Members endpoint at scale.

When you POST to the Batch endpoint, it expects a JSON object with a single key: operations, which is an array of objects that describe the API calls you want to make. To add a single member to our audience, that operations object will look like this:

Response

JSON
{
  "operations": [{
    "method": "POST", # The http verb for the operation
    "path": "/lists/{list_id}/members", # The relative path of the operation (relative to /api/3.0)
    "operation_id": "my-id", # A string you provide that identifies the operation
    "body": {  # Optional: The JSON payload for PUT, POST, or PATCH requests
      "email_address": "freddie@example.com",
      "status": "subscribed"
    },
    "params": {...}, # Optional: A JSON representation of URL query params, only used for GET requests
  }]
}

Of course, we’re batch-adding 1,000 contacts, so to finish the job, our operations array will include 1,000 of the objects above for each of the contacts we want to add.

The final code might look like this:

Make a batch operations request

#!/bin/bash
set -euo pipefail

server="YOUR_SERVER_PREFIX"
apikey="YOUR_API_KEY"
listid="YOUR_LIST_ID"

declare -A subscriber1=(
	[email]="user1@example.com"
	[id]=1
	[status]="subscribed"
)

declare -A subscriber2=(
	[email]="user2@example.com"
	[id]=2
	[status]="subscribed"
)

curl -sS --request POST \
  "https://${server}.api.mailchimp.com/3.0/batches" \
  --user "`anystring`:${apikey}" \
  --data @- \
<<EOF | jq 'del(._links)'
{
	"operations":[
    	{
        	"method": "POST",
        	"path": "/lists/${listid}/members",
        	"operation_id": "${subscriber1[id]}",
        	"body": "{\"email_address\":\"${subscriber1[email]}\",\"status\":\"${subscriber1[status]}\"}"
    	},
    	{
        	"method": "POST",
        	"path": "/lists/${listid}/members",
        	"operation_id": "${subscriber2[id]}",
        	"body": "{\"email_address\":\"${subscriber2[email]}\",\"status\":\"${subscriber2[status]}\"}"
    	}
	]
}
EOF

The response will look something like this:

Response

JSON
{
  "id": "123abc", # Unique id of the batch call
  "status": "pending", # Status for the whole call
                       # Pending, preprocessing, started, finalizing, or finished
  "total_operations": 1000, # Number of operations in the batch
  "finished_operations": 1, # Number of finished operations
  "errored_operations": 0, # Number of errored operations
  "submitted_at": "...", # Datetime the call was made
  "completed_at": "...", # Datetime when all the operations completed
  "response_body_url": "...", # URL to use to retrieve results
}

Things to keep in mind when making a call to the Batch endpoint:

  • Operations in a request are not guaranteed to run in order.

  • GET requests that do not include a count parameter automatically page through the entire collection. 

  • Batch requests are limited to 500 pending requests, meaning at any one time, you can have at most 500 batch requests with a status of pending. 

Note: Each operation you define in a batch request can include an optional operation_id parameter, which is a string. The optional operation_id you supply in the request is returned with the results of that call, allowing you to match a set of results to a specific operation in the original request. We recommend using a unique and meaningful value for optional operation_id, though the Batch endpoint does not enforce uniqueness for the optional operation_id; it’s purely for your use.

Check the status of a batch operation

Our operations are queued, but to find out if they’re finished, we need to check the status. We can retrieve that status using the id returned in the response when we created the batch request:

Check the status of a batch operation

#!/bin/bash
set -euo pipefail

server="YOUR_SERVER_PREFIX"
apikey="YOUR_API_KEY"
batchid="YOUR_BATCH_ID"

curl -sS \
  "https://${server}.api.mailchimp.com/3.0/batches/${batchid}" \
  --user "`anystring`:${apikey}" | jq '.status'

The status check response will look like the response from our initial call to the Batch endpoint. For now, we’re concerned with the status of our operation, which can be in any of the following states:

  • pending: Processing on the batch operation has not started.

  • preprocessing: The batch request is being broken up into smaller operations to speed up processing.

  • started: Processing has started.

  • finalizing: Processing is complete, but the results are being compiled and saved.

  • finished: Processing is done. You can now retrieve the results from the URL in response_body_url.

Other useful information included in the payload:

  • total_operations: The total number of operations to be processed.

  • finished_operations: The number of operations that have been processed.

  • errored_operations: The number of operations that returned a non-200 response.

  • response_body_url: The URL where you can download the gzipped archive of the results of all the operations.

Note: You can retrieve a list of all requested batch operations from the last 7 days with the List batches endpoint.

Get the results of a batch operation

When the status of our job is finished, we can retrieve the results. In this case, we want to save the web_id of each contact in our application so we can link directly to that contact in the Mailchimp web application from our tools. 

A GET request to the response_body_url returns a gzipped tar archive of JSON files. You can expect a single file per operation, unless one of your operations contains paged data; in that case, those responses may also be split across multiple files. The JSON results of each operation will be returned in the following format:

Response

JSON
[
  {
      "status_code": 200,
      "operation_id": "my-id",
      "response": "{...}"
  },...
]

This array of results contains the HTTP status (if everything went well, a 200), the operation_id we set when we created the batch request, and the response body from the actual API call. Since we used our application’s internal user IDs as the operation_id in each operation, we can process the results of our batch request and map the results to the users we just processed. 

Here, we’re going to save the Mailchimp web_id to our database:

Get the results of a batch operation

const fetch = require("node-fetch");

const responseBodyUrl = "RESPONSE_BODY_URL_FROM_PREVIOUS_STEPS";

async function run() {
  const response = await fetch(responseBodyUrl);

  // Extract data from gzipped archive, return as JSON array
  // Implementation details not included
  const results = processBatchArchive(response);
  results.forEach(result => {
    const user = fakeDB.findUser(result.operation_id);
    fakeDb.updateUser(user, {
      mailchimpWebId: result.response.web_id
    });
  });
}

run();

Note: The results of your batch operation are available to download for 7 days after you make the request. For security reasons, however, any response_body_url is only valid for 10 minutes after it’s generated. You can always generate a new response_body_url by making another Batch status call.

Batch webhooks

In one-off usage like the contact sync in our example, periodically checking the status of batch operations works well enough. But if you’re regularly making batch requests in your application — for example, if you’ve set up an internal dashboard for generating reports on-demand — setting up a batch webhook may be a better option.

A batch webhook lets Mailchimp tell your app when all the operations enqueued by a batch request are complete. You only need to set up a batch webhook once — Mailchimp will POST all completed batch requests to the webhook you create.

Now that our 1,000 contacts are in Mailchimp, let’s say we’ve also built an internal admin tool for generating Mailchimp reports on demand: it makes sense to create a batch webhook to notify us when our reports are ready rather than periodically polling for them.

First, we need to specify the URL that Mailchimp should send a POST request to; that request will include information about your completed process, including the response_body_url, which you can use to retrieve the actual results.

Note: On creation, Mailchimp’s servers will validate your webhook URL by making a GET request to the provided address to ensure that it is valid, so the webhook URL should be able to handle both GET and POST requests.

To create a batch webhook, use the Batch Webhooks API endpoint:

Batch webhooks

#!/bin/bash
set -euo pipefail

server="SERVER_PREFIX"
apikey="YOUR_API_KEY"

url="https://example.com/your-webhook-url"

curl -sS --request POST \
  "https://${server}.api.mailchimp.com/3.0/batch-webhooks" \
  --user "foo:${apikey}" \
  --data @- \
<<EOF | jq
{
	"url": "${url}"
}
EOF

Now that we’ve created the batch webhook, Mailchimp will send information about completed batch operations to our webhook URL. The body of the POST request will contain a URL-encoded, plain-text string of key/value pairs that will closely resemble a URL query string. A truncated version might look like this:

data%5Bresponse_body_url%5D=https://mailchimp-api-batch.s3.amazonaws.com/1234ab56cd-response.tar.gz?AWSAccessKeyId=XXXXXXXXXXXXXXXXXXXX&Expires=1486739377&Signature=xxxxxxxxxxxxxxxxxxxxxxxxxxxx%253D&type=batch_operation_completed

We’ll need to decode the query string to access specific values we may want. Most often, the response_body_url will be the parameter we’re interested in, since it’s what we can use to download the gzipped results of our operation. 

Note: This is the same response_body_url described in the Get the status of a batch operation section above. You can use it to download the gzipped tar archive as normal, but keep in mind that the same 10-minute expiration period applies. After 10 minutes, you can generate another response_body_url by making a call to the Batch status endpoint.

Accessing that value might look like this:

Batch endpoint status

async function handleWebhook(req) {
  const decodedText = decodeURIComponent(req.body);
  const params = new URLSearchParams(decodedText);
  const responseBodyUrl = params.get("data[response_body_url]");
  console.log(`You can fetch the gzipped response with ${responseBodyUrl}.`);
}

The full payload, parsed and represented as JSON, would look like this:

Response

JSON
{
"data[_links][0][href]": "https://usX.api.mailchimp.com/3.0/batches",
"data[_links][0][method]": "GET",
"data[_links][0][rel]":  "parent",
"data[_links][0][schema]": "https://usX.api.mailchimp.com/schema/3.0/CollectionLinks/Batches.json",
"data[_links][0][targetSchema]": "https://usX.api.mailchimp.com/schema/3.0/Definitions/Batches/CollectionResponse.json",
"data[_links][1][href]": "https://usX.api.mailchimp.com/3.0/batches/1234ab56cd",
"data[_links][1][method]": "GET",
"data[_links][1][rel]":  "self",
"data[_links][1][targetSchema]": "https://usX.api.mailchimp.com/schema/3.0/Definitions/Batches/Response.json",
"data[_links][2][href]": "https://usX.api.mailchimp.com/3.0/batches/1234ab56cd",
"data[_links][2][method]": "DELETE",
"data[_links][2][rel]":  "delete",
"data[completed_at]":  "2017-02-10T14:44:22+00:00",
"data[errored_operations]":  "0",
"data[finished_operations]": "1",
"data[id]":  "1234ab56cd",
"data[response_body_url]": "https://mailchimp-api-batch.s3.amazonaws.com/1234ab56cd-response.tar.gz?AWSAccessKeyId=XXXXXXXXXXXXXXXXXXXX&Expires=1486739377&Signature=xxxxxxxxxxxxxxxxxxxxxxxxxxxx%3D",
"data[status]":  "finished",
"data[submitted_at]":  "2017-02-10T14:44:14+00:00",
"data[total_operations]":  "1",
"fired_at":  "2017-02-10 14:59:37",
"type":  "batch_operation_completed"
}

Just keep in mind that that payload will not be delivered as JSON, but as a URI-encoded query string. Your code is responsible for parsing that string for the values you need.