retry: My Favorite API

Posted on Feb 21, 2025

I like an API that captures a problem and expresses it concisely. I imagine this is how mathematicians feel when they create a beautiful proof. I’ve always enjoyed a good API but it wasn’t until I worked with one of my best friends, who has a background in type systems, that I really started to appreciate what it means to create a good API.

I have a bunch of APIs I like but probably my favorite is retry, which I’ve implemented in several languages.

The Ocaml type definition of retry is:

val retry :
  f:(unit -> 'a Fut.t) ->
  while_:('a -> bool) ->
  betwixt:('a -> unit Fut.t) ->
  'a Fut.t

You can ignore the Fut.t stuff, that is because the function asynchronous.

retry runs a function f, calls while_ with the result, if while_ returns true, it calls betwixt with the result, then it calls f again.

  1. f - The function we want to execute.

  2. while_ - Testing the result.

  3. betwixt - Some work to do between retries.

A few helper functions:

val finite_tries : int -> ('a -> bool) -> 'a -> bool

val series :
  start:'a ->
  step:('a -> 'a) ->
  ('a -> 'b -> 'c Fut.t) ->
  'b ->
  'c Fut.t

The finite_tries function takes a number of tries and a test function. It calls the function, and returns its value. On every call, it subtracts one from the tries. When the number of tries hits 0, it returns false.

The series function takes a start value, a step function, and a function which gets called with the stepped value and another value and returns a new value. On every call, it performs the step function on the existing value.

With all this, we can implement retries with a cap on how many times it will try and with dynamic back-off.

Here is an example of a function that performs a GitHub call a maximum of tries times (default is 3), and retries if there is an error performing the request, or the HTTP response is greater than or equal to 500, or there is a rate limit blocking the call.

Between each call it waits, starting at, 1.5 seconds and multiplying by 1.5 on each retry. However, the GitHub API includes when the call can be performed, in which case if it is a rate limit being hit, it waits until the time the API says to wait.

retry
  ~f:(fun () -> Githubc2_abb.call t req)
  ~while_:
    (finite_tries tries (function
      | Error _ -> true
      | Ok resp -> Openapi.Response.status resp >= 500
                   || is_secondary_rate_limit_error resp))
  ~betwixt:
    (series ~start:1.5 ~step:(( *. ) 1.5) (fun n resp ->
         match resp with
         | Error (`Missing_response resp) ->
             let open Abb.Future.Infix_monad in
             retry_wait n resp >>= Abb.Sys.sleep
         | Ok resp ->
             let open Abb.Future.Infix_monad in
             retry_wait n resp >>= Abb.Sys.sleep
         | Error _ -> Abb.Sys.sleep n))

But we can do more than just retry something failing. The following code reads from a socket until a specific number of bytes are read or the connection is closed. It also guarantees the scheduler doesn’t get starved by putting a zero-tick sleep between each operation. This last part is more of an idiosyncratic element of Abb but the important part is that it guarantees fairness across all scheduled tasks:

retry
  ~f:(fun () ->
    let open Abbs_future_combinators.Infix_result_monad in
    Abbs_io_buffered.read
      conn.r
      ~buf
      ~pos:0
      ~len:(Bytes.length buf)
    >>= fun n ->
    Buffer.add_subbytes b buf 0 n;
    needed_bytes := !needed_bytes - n;
    Abb.Future.return (Ok n))
  ~while_:(function
    | Ok 0 | Error _ -> false
    | Ok _ -> !needed_bytes > 0)
  ~betwixt:(fun _ ->
    (* Force a scheduler tick so we don't starve the system *)
    Abb.Sys.sleep 0.0)

I’ve also implemented retry in Python, and the implementation is pretty straight forward. The code can be found here.

An example usage in Python which we use to wrap calls using requests:

def _wrap(f):
    (success, res) = retry.run(
        lambda: _wrap_call(f),
        retry.finite_tries(TRIES, _test_success),
        retry.betwixt_sleep_with_backoff(INITIAL_SLEEP, BACKOFF))

    if not success:
        raise res

    return res

retry isn’t going to change your life but it’s nice to just appreciate an API that really captures its purpose.