diff --git a/CLAUDE.md b/CLAUDE.md
index 3d18892..9a493b5 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -14,7 +14,7 @@ cd /home/katajisto/player
 ./build/player
 ```
 
-`build.jai` is a metaprogram. It points the import path at `./modules` (vendored: `Jaison`, `stb_image`) and falls back to the standard Jai modules at `/home/katajisto/bin/jai/modules`.
+`build.jai` is a metaprogram. It points the import path at `./modules` (vendored: `Jaison`) and falls back to the standard Jai modules at `/home/katajisto/bin/jai/modules`.
 
 ## Code layout
 
@@ -36,7 +36,8 @@ Each folder has an `index.jai` that `#load`s every file in that folder. To add a
 - **Sound_Player** (Jai stdlib) — audio playback (ALSA on linux, CoreAudio on mac)
 - **Curl** (Jai stdlib) — HTTP to Jellyfin
 - **Jaison** (vendored) — JSON
-- **stb_image** (vendored) — single-header PNG/JPG decode for artist images
+- **stb_image** (Jai stdlib) — single-header PNG/JPG decode for artist images
+- **stb_vorbis** (Jai stdlib) — OGG decode for the (only) audio path
 
 Cross-platform target: linux + macOS.
 
diff --git a/ai/decisions.md b/ai/decisions.md
index 7c7a6c2..1ac5a00 100644
--- a/ai/decisions.md
+++ b/ai/decisions.md
@@ -2,6 +2,30 @@
 
 Append-only. Most recent first. Keep entries short — one paragraph max. Reference files with paths.
 
+## 2026-05-01 — Single audio pipeline: OGG-only via stdlib stb_vorbis
+
+Jellyfin transcodes everything to OGG Vorbis (server is beefy, transcode cost is irrelevant). On our side, `decode_ogg` (`src/audio/decoders.jai`, ~30 lines) calls `stb_vorbis_decode_memory` to produce one s16 PCM buffer that we hand to Sound_Player as `LINEAR_SAMPLE_ARRAY`. The visualizer reads from the *same* `sd.samples` around `play_cursor` (frames, scaled by `nchannels` to index the interleaved buffer) — no parallel stb_vorbis decoder, no `analysis_vorbis`/`analysis_ogg` state, no cursor desync between bars and audio. Killed: the `Audio_Format` enum, `detect_audio_format`, the dr_mp3/dr_flac branches, the `Sound.load_audio_data` OGG_COMPRESSED path, the `current_format` field on App, the `[%]` format suffix in the now-playing label.
+
+## 2026-05-01 — Drop vendored audio_decoders + stb_image; use Jai stdlib
+
+The dr_mp3/dr_flac C wrapper module (`modules/audio_decoders/`) and its build hook (`ensure_audio_decoders_built` in `build.jai` → `source/build.sh` → `decoders.a`) existed only to support MP3/FLAC. With OGG-only playback that's all dead. Deleted. Vendored `modules/stb_image/` was older than the stdlib copy (missing Android/NN_SWITCH2 OS dispatch) — also deleted; `Stb_Image :: #import "stb_image"` now resolves to the stdlib. Only thing still vendored is `Jaison` (no stdlib JSON parser). Result: no C build step, no static archive to manage, smaller binary (4.92 MB → 4.85 MB), `core/imports.jai` lost a line.
+
+## 2026-05-01 — DeviceId is per-install, generated once and persisted
+
+Jellyfin permits exactly one active access token per `DeviceId`. A hardcoded constant (`"player-dev-device"`) meant any second instance, second machine, or re-login flow silently revoked the prior saved token — root cause of the 401-on-launch bug we hit. A 32-char hex `device_id` is now generated on first run via `random_get()` (Random seeded from `current_time_monotonic()` in `app_init`), stored alongside the token in `config.json`, and reused forever. `ensure_device_id()` runs every launch so existing configs migrate in. `jellyfin_force_logout` (the on-401 path) deliberately keeps the device_id — burning it would orphan the device entry server-side.
+
+## 2026-05-01 — `Authorization: MediaBrowser ...`, not `X-Emby-Authorization`
+
+The legacy `X-Emby-Authorization` header is being removed in Jellyfin 12.0; admins on 10.11+ can already disable it via `EnableLegacyAuthorization=false`. Switched both `client.jai` (sync path) and `async.jai` (worker path) to the modern `Authorization` header. Same value format. Client name is `"Jellyfin Celica Music Player"` and Device is `"Celica"` so the device shows up sensibly in Dashboard → Devices.
+
+## 2026-05-01 — Validate saved token at startup; auto-drop to login on 401
+
+After `config_load()` succeeds, `app_init` fires `jellyfin_validate_session_async()` (a `GET /System/Info` probe) instead of jumping straight to library. On 200 we proceed; on 401 `jellyfin_force_logout()` clears the token, persists, and switches view to `.LOGIN` (preserving server_url/username/device_id for one-click re-auth). The library callbacks also check `status_code == 401` so a mid-session revoke from the dashboard doesn't leave the user on an empty library. On non-401 failures (network blip, server down) we keep the saved token and just route to login — never punish a transient error with a token wipe. Background: Jellyfin AccessTokens have no time-based expiry; they only die from same-DeviceId re-auth, password change, or admin revoke.
+
+## 2026-05-01 — Quick Connect as a second login path
+
+`src/jellyfin/quick_connect.jai` implements the three-step flow: `POST /QuickConnect/Initiate` → display 6-char code → poll `GET /QuickConnect/Connect?secret=…` every 3s → on `Authenticated:true`, `POST /Users/AuthenticateWithQuickConnect` with `{Secret}` to claim an `AccessToken`. Polling is driven by per-frame `jellyfin_quick_connect_pump()` in the main loop (no sleeping thread). State (`qc_state`, `qc_code`, `qc_secret`, `qc_poll_at`) lives on `Jellyfin_Client`; `draw_login_view` swaps between the password form and a "enter this code" panel based on `qc_state`. Lets the user log in without typing a password into the player. Field name `Code` shadows a Jai primitive, so the parse struct uses `UserCode: string; @JsonName(Code)` for Jaison's rename note.
+
 ## 2026-04-28 — Image cache with capped concurrency; on-main-thread texture upload
 
 `gfx/images.jai` is a small async image cache. UI calls `image_request(item_id, size)` from the draw loop; it always returns an `*Image` immediately whose `.loaded` flips true once the bytes arrive. Concurrency is capped at `MAX_CONCURRENT_IMAGE_FETCHES = 4` via a pending queue (`image_pending`) drained by `image_pump()` once per frame — without this, a 500-album library would spawn 500 worker threads at first scroll. Decode + texture upload runs in the http callback (which is already main-thread per `jellyfin/async.jai`), satisfying OpenGL's no-cross-thread rule. Two cache buckets per item, keyed `t:<id>` (128 px thumb) and `l:<id>` (512 px). Sizing is requested from Jellyfin via `?fillHeight=N&fillWidth=N&quality=80` so we get a tight payload instead of full-resolution originals.
diff --git a/build.jai b/build.jai
index b5262d4..8dafe34 100644
--- a/build.jai
+++ b/build.jai
@@ -3,8 +3,8 @@
 //
 // Run with:   jai build.jai
 //
-// Adds ./modules to the import path so we can vendor Jaison and stb_image
-// alongside the standard Jai modules.
+// Adds ./modules to the import path so we can vendor Jaison alongside the
+// standard Jai modules.
 //
 
 #run build();
@@ -35,9 +35,6 @@ build :: () {
     array_add(*extra_linker, tprint("-L%lib", #filepath));
     options.additional_linker_arguments = extra_linker;
 
-    // Build the dr_mp3/dr_flac wrapper if the static lib is missing or stale.
-    ensure_audio_decoders_built();
-
     set_build_options(options, w);
 
     make_directory_if_it_does_not_exist("build");
@@ -47,27 +44,7 @@ build :: () {
     compiler_end_intercept(w);
 }
 
-ensure_audio_decoders_built :: () {
-    #if OS == .LINUX  out := tprint("%modules/audio_decoders/linux/decoders.a", #filepath);
-    #if OS == .MACOS  out := tprint("%modules/audio_decoders/macos/decoders.a", #filepath);
-    src := tprint("%modules/audio_decoders/source/decoders.c",       #filepath);
-    script := tprint("%modules/audio_decoders/source/build.sh",      #filepath);
-
-    if file_exists(out) {
-        out_modtime, _, out_ok := file_modtime_and_size(out);
-        src_modtime, _, src_ok := file_modtime_and_size(src);
-        if out_ok && src_ok && compare_apollo_times(out_modtime, src_modtime) >= 0  return;
-        log("audio_decoders: source newer than %, rebuilding\n", out);
-    }
-
-    result, output := run_command("/bin/sh", script, capture_and_return_output=true);
-    if result.exit_code != 0 {
-        compiler_report(tprint("audio_decoders build failed (exit=%):\n%", result.exit_code, output));
-    }
-}
-
 #import "Basic";
 #import "Compiler";
 #import "File";
 #import "File_Utilities";
-#import "Process";
diff --git a/build/player b/build/player
index c12033d..54b53ff 100755
Binary files a/build/player and b/build/player differ
diff --git a/modules/audio_decoders/linux/decoders.a b/modules/audio_decoders/linux/decoders.a
deleted file mode 100644
index a829861..0000000
Binary files a/modules/audio_decoders/linux/decoders.a and /dev/null differ
diff --git a/modules/audio_decoders/macos/decoders.a b/modules/audio_decoders/macos/decoders.a
deleted file mode 100644
index e6d2e9a..0000000
Binary files a/modules/audio_decoders/macos/decoders.a and /dev/null differ
diff --git a/modules/audio_decoders/module.jai b/modules/audio_decoders/module.jai
deleted file mode 100644
index bd9ac31..0000000
--- a/modules/audio_decoders/module.jai
+++ /dev/null
@@ -1,32 +0,0 @@
-//
-// Bindings for the dr_mp3 + dr_flac wrapper compiled in source/build.sh.
-//
-// Each decode_* call hands back a heap-allocated s16 PCM buffer; the caller
-// owns it and must release it with `decoder_free()`. The buffer is
-// interleaved (L,R,L,R,…) for stereo, or mono for 1ch.
-//
-
-#scope_module
-
-#if OS == .LINUX  decoders :: #library,no_dll "linux/decoders";
-#if OS == .MACOS  decoders :: #library,no_dll "macos/decoders";
-
-#scope_export
-
-decode_mp3 :: (
-    data:                   *void,
-    data_size:              u64,
-    out_channels:           *u32,
-    out_sample_rate:        *u32,
-    out_total_pcm_frames:   *u64,
-) -> *s16 #foreign decoders "player_decode_mp3";
-
-decode_flac :: (
-    data:                   *void,
-    data_size:              u64,
-    out_channels:           *u32,
-    out_sample_rate:        *u32,
-    out_total_pcm_frames:   *u64,
-) -> *s16 #foreign decoders "player_decode_flac";
-
-decoder_free :: (p: *void) #foreign decoders "player_decoder_free";
diff --git a/modules/audio_decoders/source/build.sh b/modules/audio_decoders/source/build.sh
deleted file mode 100755
index 87cb4e7..0000000
--- a/modules/audio_decoders/source/build.sh
+++ /dev/null
@@ -1,27 +0,0 @@
-#!/usr/bin/env bash
-#
-# Compile dr_mp3 + dr_flac wrapper into a static library.
-#
-# Output: ../linux/decoders.a (or ../macos/decoders.a)
-#
-# Run from anywhere — this script cd's to its own directory first.
-#
-
-set -euo pipefail
-cd "$(dirname "$0")"
-
-CC="${CC:-clang}"
-case "$(uname -s)" in
-    Linux*)   OUT_DIR=../linux ;;
-    Darwin*)  OUT_DIR=../macos ;;
-    *)        echo "unsupported platform"; exit 1 ;;
-esac
-
-mkdir -p "$OUT_DIR"
-
-echo "compiling decoders.c..."
-"$CC" -O2 -fPIC -Wno-everything -c -o decoders.o decoders.c
-ar rcs "$OUT_DIR/decoders.a" decoders.o
-rm decoders.o
-
-echo "wrote $OUT_DIR/decoders.a"
diff --git a/modules/audio_decoders/source/decoders.c b/modules/audio_decoders/source/decoders.c
deleted file mode 100644
index 8c6dcb3..0000000
--- a/modules/audio_decoders/source/decoders.c
+++ /dev/null
@@ -1,74 +0,0 @@
-//
-// Tiny C wrapper around dr_mp3 + dr_flac. Each function decodes a buffer of
-// MP3/FLAC bytes to interleaved s16 PCM. The caller owns the returned
-// pointer and must release it with player_decoder_free().
-//
-// Compiled to a static lib (decoders.a) by lib/build_decoders.sh. The
-// metaprogram in build.jai re-runs the script if the .a is missing or the
-// sources are newer.
-//
-
-#define DR_MP3_IMPLEMENTATION
-#define DR_MP3_NO_STDIO
-#include "dr_mp3.h"
-
-#define DR_FLAC_IMPLEMENTATION
-#define DR_FLAC_NO_STDIO
-#include "dr_flac.h"
-
-#include <stdint.h>
-#include <stdlib.h>
-
-int16_t* player_decode_mp3(
-    const void* data, size_t data_size,
-    uint32_t* out_channels,
-    uint32_t* out_sample_rate,
-    uint64_t* out_total_pcm_frames
-) {
-    drmp3_config config;
-    config.channels   = 0;
-    config.sampleRate = 0;
-    drmp3_uint64 total_frames = 0;
-
-    drmp3_int16* pcm = drmp3_open_memory_and_read_pcm_frames_s16(
-        data, data_size,
-        &config,
-        &total_frames,
-        NULL  // default allocator
-    );
-    if (!pcm) return NULL;
-
-    *out_channels         = config.channels;
-    *out_sample_rate      = config.sampleRate;
-    *out_total_pcm_frames = total_frames;
-    return pcm;
-}
-
-int16_t* player_decode_flac(
-    const void* data, size_t data_size,
-    uint32_t* out_channels,
-    uint32_t* out_sample_rate,
-    uint64_t* out_total_pcm_frames
-) {
-    unsigned int channels    = 0;
-    unsigned int sample_rate = 0;
-    drflac_uint64 total_frames = 0;
-
-    drflac_int16* pcm = drflac_open_memory_and_read_pcm_frames_s16(
-        data, data_size,
-        &channels,
-        &sample_rate,
-        &total_frames,
-        NULL
-    );
-    if (!pcm) return NULL;
-
-    *out_channels         = channels;
-    *out_sample_rate      = sample_rate;
-    *out_total_pcm_frames = total_frames;
-    return pcm;
-}
-
-void player_decoder_free(void* p) {
-    free(p);
-}
diff --git a/modules/audio_decoders/source/dr_flac.h b/modules/audio_decoders/source/dr_flac.h
deleted file mode 100644
index 14324cf..0000000
--- a/modules/audio_decoders/source/dr_flac.h
+++ /dev/null
@@ -1,12536 +0,0 @@
-/*
-FLAC audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
-dr_flac - v0.12.42 - 2023-11-02
-
-David Reid - mackron@gmail.com
-
-GitHub: https://github.com/mackron/dr_libs
-*/
-
-/*
-RELEASE NOTES - v0.12.0
-=======================
-Version 0.12.0 has breaking API changes including changes to the existing API and the removal of deprecated APIs.
-
-
-Improved Client-Defined Memory Allocation
------------------------------------------
-The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
-existing system of DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE are still in place and will be used by default when no custom
-allocation callbacks are specified.
-
-To use the new system, you pass in a pointer to a drflac_allocation_callbacks object to drflac_open() and family, like this:
-
-    void* my_malloc(size_t sz, void* pUserData)
-    {
-        return malloc(sz);
-    }
-    void* my_realloc(void* p, size_t sz, void* pUserData)
-    {
-        return realloc(p, sz);
-    }
-    void my_free(void* p, void* pUserData)
-    {
-        free(p);
-    }
-
-    ...
-
-    drflac_allocation_callbacks allocationCallbacks;
-    allocationCallbacks.pUserData = &myData;
-    allocationCallbacks.onMalloc  = my_malloc;
-    allocationCallbacks.onRealloc = my_realloc;
-    allocationCallbacks.onFree    = my_free;
-    drflac* pFlac = drflac_open_file("my_file.flac", &allocationCallbacks);
-
-The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
-
-Passing in null for the allocation callbacks object will cause dr_flac to use defaults which is the same as DRFLAC_MALLOC,
-DRFLAC_REALLOC and DRFLAC_FREE and the equivalent of how it worked in previous versions.
-
-Every API that opens a drflac object now takes this extra parameter. These include the following:
-
-    drflac_open()
-    drflac_open_relaxed()
-    drflac_open_with_metadata()
-    drflac_open_with_metadata_relaxed()
-    drflac_open_file()
-    drflac_open_file_with_metadata()
-    drflac_open_memory()
-    drflac_open_memory_with_metadata()
-    drflac_open_and_read_pcm_frames_s32()
-    drflac_open_and_read_pcm_frames_s16()
-    drflac_open_and_read_pcm_frames_f32()
-    drflac_open_file_and_read_pcm_frames_s32()
-    drflac_open_file_and_read_pcm_frames_s16()
-    drflac_open_file_and_read_pcm_frames_f32()
-    drflac_open_memory_and_read_pcm_frames_s32()
-    drflac_open_memory_and_read_pcm_frames_s16()
-    drflac_open_memory_and_read_pcm_frames_f32()
-
-
-
-Optimizations
--------------
-Seeking performance has been greatly improved. A new binary search based seeking algorithm has been introduced which significantly
-improves performance over the brute force method which was used when no seek table was present. Seek table based seeking also takes
-advantage of the new binary search seeking system to further improve performance there as well. Note that this depends on CRC which
-means it will be disabled when DR_FLAC_NO_CRC is used.
-
-The SSE4.1 pipeline has been cleaned up and optimized. You should see some improvements with decoding speed of 24-bit files in
-particular. 16-bit streams should also see some improvement.
-
-drflac_read_pcm_frames_s16() has been optimized. Previously this sat on top of drflac_read_pcm_frames_s32() and performed it's s32
-to s16 conversion in a second pass. This is now all done in a single pass. This includes SSE2 and ARM NEON optimized paths.
-
-A minor optimization has been implemented for drflac_read_pcm_frames_s32(). This will now use an SSE2 optimized pipeline for stereo
-channel reconstruction which is the last part of the decoding process.
-
-The ARM build has seen a few improvements. The CLZ (count leading zeroes) and REV (byte swap) instructions are now used when
-compiling with GCC and Clang which is achieved using inline assembly. The CLZ instruction requires ARM architecture version 5 at
-compile time and the REV instruction requires ARM architecture version 6.
-
-An ARM NEON optimized pipeline has been implemented. To enable this you'll need to add -mfpu=neon to the command line when compiling.
-
-
-Removed APIs
-------------
-The following APIs were deprecated in version 0.11.0 and have been completely removed in version 0.12.0:
-
-    drflac_read_s32()                   -> drflac_read_pcm_frames_s32()
-    drflac_read_s16()                   -> drflac_read_pcm_frames_s16()
-    drflac_read_f32()                   -> drflac_read_pcm_frames_f32()
-    drflac_seek_to_sample()             -> drflac_seek_to_pcm_frame()
-    drflac_open_and_decode_s32()        -> drflac_open_and_read_pcm_frames_s32()
-    drflac_open_and_decode_s16()        -> drflac_open_and_read_pcm_frames_s16()
-    drflac_open_and_decode_f32()        -> drflac_open_and_read_pcm_frames_f32()
-    drflac_open_and_decode_file_s32()   -> drflac_open_file_and_read_pcm_frames_s32()
-    drflac_open_and_decode_file_s16()   -> drflac_open_file_and_read_pcm_frames_s16()
-    drflac_open_and_decode_file_f32()   -> drflac_open_file_and_read_pcm_frames_f32()
-    drflac_open_and_decode_memory_s32() -> drflac_open_memory_and_read_pcm_frames_s32()
-    drflac_open_and_decode_memory_s16() -> drflac_open_memory_and_read_pcm_frames_s16()
-    drflac_open_and_decode_memory_f32() -> drflac_open_memroy_and_read_pcm_frames_f32()
-
-Prior versions of dr_flac operated on a per-sample basis whereas now it operates on PCM frames. The removed APIs all relate
-to the old per-sample APIs. You now need to use the "pcm_frame" versions.
-*/
-
-
-/*
-Introduction
-============
-dr_flac is a single file library. To use it, do something like the following in one .c file.
-
-    ```c
-    #define DR_FLAC_IMPLEMENTATION
-    #include "dr_flac.h"
-    ```
-
-You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
-
-    ```c
-    drflac* pFlac = drflac_open_file("MySong.flac", NULL);
-    if (pFlac == NULL) {
-        // Failed to open FLAC file
-    }
-
-    drflac_int32* pSamples = malloc(pFlac->totalPCMFrameCount * pFlac->channels * sizeof(drflac_int32));
-    drflac_uint64 numberOfInterleavedSamplesActuallyRead = drflac_read_pcm_frames_s32(pFlac, pFlac->totalPCMFrameCount, pSamples);
-    ```
-
-The drflac object represents the decoder. It is a transparent type so all the information you need, such as the number of channels and the bits per sample,
-should be directly accessible - just make sure you don't change their values. Samples are always output as interleaved signed 32-bit PCM. In the example above
-a native FLAC stream was opened, however dr_flac has seamless support for Ogg encapsulated FLAC streams as well.
-
-You do not need to decode the entire stream in one go - you just specify how many samples you'd like at any given time and the decoder will give you as many
-samples as it can, up to the amount requested. Later on when you need the next batch of samples, just call it again. Example:
-
-    ```c
-    while (drflac_read_pcm_frames_s32(pFlac, chunkSizeInPCMFrames, pChunkSamples) > 0) {
-        do_something();
-    }
-    ```
-
-You can seek to a specific PCM frame with `drflac_seek_to_pcm_frame()`.
-
-If you just want to quickly decode an entire FLAC file in one go you can do something like this:
-
-    ```c
-    unsigned int channels;
-    unsigned int sampleRate;
-    drflac_uint64 totalPCMFrameCount;
-    drflac_int32* pSampleData = drflac_open_file_and_read_pcm_frames_s32("MySong.flac", &channels, &sampleRate, &totalPCMFrameCount, NULL);
-    if (pSampleData == NULL) {
-        // Failed to open and decode FLAC file.
-    }
-
-    ...
-
-    drflac_free(pSampleData, NULL);
-    ```
-
-You can read samples as signed 16-bit integer and 32-bit floating-point PCM with the *_s16() and *_f32() family of APIs respectively, but note that these
-should be considered lossy.
-
-
-If you need access to metadata (album art, etc.), use `drflac_open_with_metadata()`, `drflac_open_file_with_metdata()` or `drflac_open_memory_with_metadata()`.
-The rationale for keeping these APIs separate is that they're slightly slower than the normal versions and also just a little bit harder to use. dr_flac
-reports metadata to the application through the use of a callback, and every metadata block is reported before `drflac_open_with_metdata()` returns.
-
-The main opening APIs (`drflac_open()`, etc.) will fail if the header is not present. The presents a problem in certain scenarios such as broadcast style
-streams or internet radio where the header may not be present because the user has started playback mid-stream. To handle this, use the relaxed APIs:
-    
-    `drflac_open_relaxed()`
-    `drflac_open_with_metadata_relaxed()`
-
-It is not recommended to use these APIs for file based streams because a missing header would usually indicate a corrupt or perverse file. In addition, these
-APIs can take a long time to initialize because they may need to spend a lot of time finding the first frame.
-
-
-
-Build Options
-=============
-#define these options before including this file.
-
-#define DR_FLAC_NO_STDIO
-  Disable `drflac_open_file()` and family.
-
-#define DR_FLAC_NO_OGG
-  Disables support for Ogg/FLAC streams.
-
-#define DR_FLAC_BUFFER_SIZE <number>
-  Defines the size of the internal buffer to store data from onRead(). This buffer is used to reduce the number of calls back to the client for more data.
-  Larger values means more memory, but better performance. My tests show diminishing returns after about 4KB (which is the default). Consider reducing this if
-  you have a very efficient implementation of onRead(), or increase it if it's very inefficient. Must be a multiple of 8.
-
-#define DR_FLAC_NO_CRC
-  Disables CRC checks. This will offer a performance boost when CRC is unnecessary. This will disable binary search seeking. When seeking, the seek table will
-  be used if available. Otherwise the seek will be performed using brute force.
-
-#define DR_FLAC_NO_SIMD
-  Disables SIMD optimizations (SSE on x86/x64 architectures, NEON on ARM architectures). Use this if you are having compatibility issues with your compiler.
-
-#define DR_FLAC_NO_WCHAR
-  Disables all functions ending with `_w`. Use this if your compiler does not provide wchar.h. Not required if DR_FLAC_NO_STDIO is also defined.
-
-
-
-Notes
-=====
-- dr_flac does not support changing the sample rate nor channel count mid stream.
-- dr_flac is not thread-safe, but its APIs can be called from any thread so long as you do your own synchronization.
-- When using Ogg encapsulation, a corrupted metadata block will result in `drflac_open_with_metadata()` and `drflac_open()` returning inconsistent samples due
-  to differences in corrupted stream recorvery logic between the two APIs.
-*/
-
-#ifndef dr_flac_h
-#define dr_flac_h
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define DRFLAC_STRINGIFY(x)      #x
-#define DRFLAC_XSTRINGIFY(x)     DRFLAC_STRINGIFY(x)
-
-#define DRFLAC_VERSION_MAJOR     0
-#define DRFLAC_VERSION_MINOR     12
-#define DRFLAC_VERSION_REVISION  42
-#define DRFLAC_VERSION_STRING    DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MAJOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_MINOR) "." DRFLAC_XSTRINGIFY(DRFLAC_VERSION_REVISION)
-
-#include <stddef.h> /* For size_t. */
-
-/* Sized Types */
-typedef   signed char           drflac_int8;
-typedef unsigned char           drflac_uint8;
-typedef   signed short          drflac_int16;
-typedef unsigned short          drflac_uint16;
-typedef   signed int            drflac_int32;
-typedef unsigned int            drflac_uint32;
-#if defined(_MSC_VER) && !defined(__clang__)
-    typedef   signed __int64    drflac_int64;
-    typedef unsigned __int64    drflac_uint64;
-#else
-    #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
-        #pragma GCC diagnostic push
-        #pragma GCC diagnostic ignored "-Wlong-long"
-        #if defined(__clang__)
-            #pragma GCC diagnostic ignored "-Wc++11-long-long"
-        #endif
-    #endif
-    typedef   signed long long  drflac_int64;
-    typedef unsigned long long  drflac_uint64;
-    #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
-        #pragma GCC diagnostic pop
-    #endif
-#endif
-#if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined(_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
-    typedef drflac_uint64       drflac_uintptr;
-#else
-    typedef drflac_uint32       drflac_uintptr;
-#endif
-typedef drflac_uint8            drflac_bool8;
-typedef drflac_uint32           drflac_bool32;
-#define DRFLAC_TRUE             1
-#define DRFLAC_FALSE            0
-/* End Sized Types */
-
-/* Decorations */
-#if !defined(DRFLAC_API)
-    #if defined(DRFLAC_DLL)
-        #if defined(_WIN32)
-            #define DRFLAC_DLL_IMPORT  __declspec(dllimport)
-            #define DRFLAC_DLL_EXPORT  __declspec(dllexport)
-            #define DRFLAC_DLL_PRIVATE static
-        #else
-            #if defined(__GNUC__) && __GNUC__ >= 4
-                #define DRFLAC_DLL_IMPORT  __attribute__((visibility("default")))
-                #define DRFLAC_DLL_EXPORT  __attribute__((visibility("default")))
-                #define DRFLAC_DLL_PRIVATE __attribute__((visibility("hidden")))
-            #else
-                #define DRFLAC_DLL_IMPORT
-                #define DRFLAC_DLL_EXPORT
-                #define DRFLAC_DLL_PRIVATE static
-            #endif
-        #endif
-
-        #if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
-            #define DRFLAC_API  DRFLAC_DLL_EXPORT
-        #else
-            #define DRFLAC_API  DRFLAC_DLL_IMPORT
-        #endif
-        #define DRFLAC_PRIVATE DRFLAC_DLL_PRIVATE
-    #else
-        #define DRFLAC_API extern
-        #define DRFLAC_PRIVATE static
-    #endif
-#endif
-/* End Decorations */
-
-#if defined(_MSC_VER) && _MSC_VER >= 1700   /* Visual Studio 2012 */
-    #define DRFLAC_DEPRECATED       __declspec(deprecated)
-#elif (defined(__GNUC__) && __GNUC__ >= 4)  /* GCC 4 */
-    #define DRFLAC_DEPRECATED       __attribute__((deprecated))
-#elif defined(__has_feature)                /* Clang */
-    #if __has_feature(attribute_deprecated)
-        #define DRFLAC_DEPRECATED   __attribute__((deprecated))
-    #else
-        #define DRFLAC_DEPRECATED
-    #endif
-#else
-    #define DRFLAC_DEPRECATED
-#endif
-
-DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision);
-DRFLAC_API const char* drflac_version_string(void);
-
-/* Allocation Callbacks */
-typedef struct
-{
-    void* pUserData;
-    void* (* onMalloc)(size_t sz, void* pUserData);
-    void* (* onRealloc)(void* p, size_t sz, void* pUserData);
-    void  (* onFree)(void* p, void* pUserData);
-} drflac_allocation_callbacks;
-/* End Allocation Callbacks */
-
-/*
-As data is read from the client it is placed into an internal buffer for fast access. This controls the size of that buffer. Larger values means more speed,
-but also more memory. In my testing there is diminishing returns after about 4KB, but you can fiddle with this to suit your own needs. Must be a multiple of 8.
-*/
-#ifndef DR_FLAC_BUFFER_SIZE
-#define DR_FLAC_BUFFER_SIZE   4096
-#endif
-
-
-/* Architecture Detection */
-#if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
-#define DRFLAC_64BIT
-#endif
-
-#if defined(__x86_64__) || defined(_M_X64)
-    #define DRFLAC_X64
-#elif defined(__i386) || defined(_M_IX86)
-    #define DRFLAC_X86
-#elif defined(__arm__) || defined(_M_ARM) || defined(__arm64) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM64)
-    #define DRFLAC_ARM
-#endif
-/* End Architecture Detection */
-
-
-#ifdef DRFLAC_64BIT
-typedef drflac_uint64 drflac_cache_t;
-#else
-typedef drflac_uint32 drflac_cache_t;
-#endif
-
-/* The various metadata block types. */
-#define DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO       0
-#define DRFLAC_METADATA_BLOCK_TYPE_PADDING          1
-#define DRFLAC_METADATA_BLOCK_TYPE_APPLICATION      2
-#define DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE        3
-#define DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT   4
-#define DRFLAC_METADATA_BLOCK_TYPE_CUESHEET         5
-#define DRFLAC_METADATA_BLOCK_TYPE_PICTURE          6
-#define DRFLAC_METADATA_BLOCK_TYPE_INVALID          127
-
-/* The various picture types specified in the PICTURE block. */
-#define DRFLAC_PICTURE_TYPE_OTHER                   0
-#define DRFLAC_PICTURE_TYPE_FILE_ICON               1
-#define DRFLAC_PICTURE_TYPE_OTHER_FILE_ICON         2
-#define DRFLAC_PICTURE_TYPE_COVER_FRONT             3
-#define DRFLAC_PICTURE_TYPE_COVER_BACK              4
-#define DRFLAC_PICTURE_TYPE_LEAFLET_PAGE            5
-#define DRFLAC_PICTURE_TYPE_MEDIA                   6
-#define DRFLAC_PICTURE_TYPE_LEAD_ARTIST             7
-#define DRFLAC_PICTURE_TYPE_ARTIST                  8
-#define DRFLAC_PICTURE_TYPE_CONDUCTOR               9
-#define DRFLAC_PICTURE_TYPE_BAND                    10
-#define DRFLAC_PICTURE_TYPE_COMPOSER                11
-#define DRFLAC_PICTURE_TYPE_LYRICIST                12
-#define DRFLAC_PICTURE_TYPE_RECORDING_LOCATION      13
-#define DRFLAC_PICTURE_TYPE_DURING_RECORDING        14
-#define DRFLAC_PICTURE_TYPE_DURING_PERFORMANCE      15
-#define DRFLAC_PICTURE_TYPE_SCREEN_CAPTURE          16
-#define DRFLAC_PICTURE_TYPE_BRIGHT_COLORED_FISH     17
-#define DRFLAC_PICTURE_TYPE_ILLUSTRATION            18
-#define DRFLAC_PICTURE_TYPE_BAND_LOGOTYPE           19
-#define DRFLAC_PICTURE_TYPE_PUBLISHER_LOGOTYPE      20
-
-typedef enum
-{
-    drflac_container_native,
-    drflac_container_ogg,
-    drflac_container_unknown
-} drflac_container;
-
-typedef enum
-{
-    drflac_seek_origin_start,
-    drflac_seek_origin_current
-} drflac_seek_origin;
-
-/* The order of members in this structure is important because we map this directly to the raw data within the SEEKTABLE metadata block. */
-typedef struct
-{
-    drflac_uint64 firstPCMFrame;
-    drflac_uint64 flacFrameOffset;   /* The offset from the first byte of the header of the first frame. */
-    drflac_uint16 pcmFrameCount;
-} drflac_seekpoint;
-
-typedef struct
-{
-    drflac_uint16 minBlockSizeInPCMFrames;
-    drflac_uint16 maxBlockSizeInPCMFrames;
-    drflac_uint32 minFrameSizeInPCMFrames;
-    drflac_uint32 maxFrameSizeInPCMFrames;
-    drflac_uint32 sampleRate;
-    drflac_uint8  channels;
-    drflac_uint8  bitsPerSample;
-    drflac_uint64 totalPCMFrameCount;
-    drflac_uint8  md5[16];
-} drflac_streaminfo;
-
-typedef struct
-{
-    /*
-    The metadata type. Use this to know how to interpret the data below. Will be set to one of the
-    DRFLAC_METADATA_BLOCK_TYPE_* tokens.
-    */
-    drflac_uint32 type;
-
-    /*
-    A pointer to the raw data. This points to a temporary buffer so don't hold on to it. It's best to
-    not modify the contents of this buffer. Use the structures below for more meaningful and structured
-    information about the metadata. It's possible for this to be null.
-    */
-    const void* pRawData;
-
-    /* The size in bytes of the block and the buffer pointed to by pRawData if it's non-NULL. */
-    drflac_uint32 rawDataSize;
-
-    union
-    {
-        drflac_streaminfo streaminfo;
-
-        struct
-        {
-            int unused;
-        } padding;
-
-        struct
-        {
-            drflac_uint32 id;
-            const void* pData;
-            drflac_uint32 dataSize;
-        } application;
-
-        struct
-        {
-            drflac_uint32 seekpointCount;
-            const drflac_seekpoint* pSeekpoints;
-        } seektable;
-
-        struct
-        {
-            drflac_uint32 vendorLength;
-            const char* vendor;
-            drflac_uint32 commentCount;
-            const void* pComments;
-        } vorbis_comment;
-
-        struct
-        {
-            char catalog[128];
-            drflac_uint64 leadInSampleCount;
-            drflac_bool32 isCD;
-            drflac_uint8 trackCount;
-            const void* pTrackData;
-        } cuesheet;
-
-        struct
-        {
-            drflac_uint32 type;
-            drflac_uint32 mimeLength;
-            const char* mime;
-            drflac_uint32 descriptionLength;
-            const char* description;
-            drflac_uint32 width;
-            drflac_uint32 height;
-            drflac_uint32 colorDepth;
-            drflac_uint32 indexColorCount;
-            drflac_uint32 pictureDataSize;
-            const drflac_uint8* pPictureData;
-        } picture;
-    } data;
-} drflac_metadata;
-
-
-/*
-Callback for when data needs to be read from the client.
-
-
-Parameters
-----------
-pUserData (in)
-    The user data that was passed to drflac_open() and family.
-
-pBufferOut (out)
-    The output buffer.
-
-bytesToRead (in)
-    The number of bytes to read.
-
-
-Return Value
-------------
-The number of bytes actually read.
-
-
-Remarks
--------
-A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until either the entire bytesToRead is filled or
-you have reached the end of the stream.
-*/
-typedef size_t (* drflac_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
-
-/*
-Callback for when data needs to be seeked.
-
-
-Parameters
-----------
-pUserData (in)
-    The user data that was passed to drflac_open() and family.
-
-offset (in)
-    The number of bytes to move, relative to the origin. Will never be negative.
-
-origin (in)
-    The origin of the seek - the current position or the start of the stream.
-
-
-Return Value
-------------
-Whether or not the seek was successful.
-
-
-Remarks
--------
-The offset will never be negative. Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which will be
-either drflac_seek_origin_start or drflac_seek_origin_current.
-
-When seeking to a PCM frame using drflac_seek_to_pcm_frame(), dr_flac may call this with an offset beyond the end of the FLAC stream. This needs to be detected
-and handled by returning DRFLAC_FALSE.
-*/
-typedef drflac_bool32 (* drflac_seek_proc)(void* pUserData, int offset, drflac_seek_origin origin);
-
-/*
-Callback for when a metadata block is read.
-
-
-Parameters
-----------
-pUserData (in)
-    The user data that was passed to drflac_open() and family.
-
-pMetadata (in)
-    A pointer to a structure containing the data of the metadata block.
-
-
-Remarks
--------
-Use pMetadata->type to determine which metadata block is being handled and how to read the data. This
-will be set to one of the DRFLAC_METADATA_BLOCK_TYPE_* tokens.
-*/
-typedef void (* drflac_meta_proc)(void* pUserData, drflac_metadata* pMetadata);
-
-
-/* Structure for internal use. Only used for decoders opened with drflac_open_memory. */
-typedef struct
-{
-    const drflac_uint8* data;
-    size_t dataSize;
-    size_t currentReadPos;
-} drflac__memory_stream;
-
-/* Structure for internal use. Used for bit streaming. */
-typedef struct
-{
-    /* The function to call when more data needs to be read. */
-    drflac_read_proc onRead;
-
-    /* The function to call when the current read position needs to be moved. */
-    drflac_seek_proc onSeek;
-
-    /* The user data to pass around to onRead and onSeek. */
-    void* pUserData;
-
-
-    /*
-    The number of unaligned bytes in the L2 cache. This will always be 0 until the end of the stream is hit. At the end of the
-    stream there will be a number of bytes that don't cleanly fit in an L1 cache line, so we use this variable to know whether
-    or not the bistreamer needs to run on a slower path to read those last bytes. This will never be more than sizeof(drflac_cache_t).
-    */
-    size_t unalignedByteCount;
-
-    /* The content of the unaligned bytes. */
-    drflac_cache_t unalignedCache;
-
-    /* The index of the next valid cache line in the "L2" cache. */
-    drflac_uint32 nextL2Line;
-
-    /* The number of bits that have been consumed by the cache. This is used to determine how many valid bits are remaining. */
-    drflac_uint32 consumedBits;
-
-    /*
-    The cached data which was most recently read from the client. There are two levels of cache. Data flows as such:
-    Client -> L2 -> L1. The L2 -> L1 movement is aligned and runs on a fast path in just a few instructions.
-    */
-    drflac_cache_t cacheL2[DR_FLAC_BUFFER_SIZE/sizeof(drflac_cache_t)];
-    drflac_cache_t cache;
-
-    /*
-    CRC-16. This is updated whenever bits are read from the bit stream. Manually set this to 0 to reset the CRC. For FLAC, this
-    is reset to 0 at the beginning of each frame.
-    */
-    drflac_uint16 crc16;
-    drflac_cache_t crc16Cache;              /* A cache for optimizing CRC calculations. This is filled when when the L1 cache is reloaded. */
-    drflac_uint32 crc16CacheIgnoredBytes;   /* The number of bytes to ignore when updating the CRC-16 from the CRC-16 cache. */
-} drflac_bs;
-
-typedef struct
-{
-    /* The type of the subframe: SUBFRAME_CONSTANT, SUBFRAME_VERBATIM, SUBFRAME_FIXED or SUBFRAME_LPC. */
-    drflac_uint8 subframeType;
-
-    /* The number of wasted bits per sample as specified by the sub-frame header. */
-    drflac_uint8 wastedBitsPerSample;
-
-    /* The order to use for the prediction stage for SUBFRAME_FIXED and SUBFRAME_LPC. */
-    drflac_uint8 lpcOrder;
-
-    /* A pointer to the buffer containing the decoded samples in the subframe. This pointer is an offset from drflac::pExtraData. */
-    drflac_int32* pSamplesS32;
-} drflac_subframe;
-
-typedef struct
-{
-    /*
-    If the stream uses variable block sizes, this will be set to the index of the first PCM frame. If fixed block sizes are used, this will
-    always be set to 0. This is 64-bit because the decoded PCM frame number will be 36 bits.
-    */
-    drflac_uint64 pcmFrameNumber;
-
-    /*
-    If the stream uses fixed block sizes, this will be set to the frame number. If variable block sizes are used, this will always be 0. This
-    is 32-bit because in fixed block sizes, the maximum frame number will be 31 bits.
-    */
-    drflac_uint32 flacFrameNumber;
-
-    /* The sample rate of this frame. */
-    drflac_uint32 sampleRate;
-
-    /* The number of PCM frames in each sub-frame within this frame. */
-    drflac_uint16 blockSizeInPCMFrames;
-
-    /*
-    The channel assignment of this frame. This is not always set to the channel count. If interchannel decorrelation is being used this
-    will be set to DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE, DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE or DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE.
-    */
-    drflac_uint8 channelAssignment;
-
-    /* The number of bits per sample within this frame. */
-    drflac_uint8 bitsPerSample;
-
-    /* The frame's CRC. */
-    drflac_uint8 crc8;
-} drflac_frame_header;
-
-typedef struct
-{
-    /* The header. */
-    drflac_frame_header header;
-
-    /*
-    The number of PCM frames left to be read in this FLAC frame. This is initially set to the block size. As PCM frames are read,
-    this will be decremented. When it reaches 0, the decoder will see this frame as fully consumed and load the next frame.
-    */
-    drflac_uint32 pcmFramesRemaining;
-
-    /* The list of sub-frames within the frame. There is one sub-frame for each channel, and there's a maximum of 8 channels. */
-    drflac_subframe subframes[8];
-} drflac_frame;
-
-typedef struct
-{
-    /* The function to call when a metadata block is read. */
-    drflac_meta_proc onMeta;
-
-    /* The user data posted to the metadata callback function. */
-    void* pUserDataMD;
-
-    /* Memory allocation callbacks. */
-    drflac_allocation_callbacks allocationCallbacks;
-
-
-    /* The sample rate. Will be set to something like 44100. */
-    drflac_uint32 sampleRate;
-
-    /*
-    The number of channels. This will be set to 1 for monaural streams, 2 for stereo, etc. Maximum 8. This is set based on the
-    value specified in the STREAMINFO block.
-    */
-    drflac_uint8 channels;
-
-    /* The bits per sample. Will be set to something like 16, 24, etc. */
-    drflac_uint8 bitsPerSample;
-
-    /* The maximum block size, in samples. This number represents the number of samples in each channel (not combined). */
-    drflac_uint16 maxBlockSizeInPCMFrames;
-
-    /*
-    The total number of PCM Frames making up the stream. Can be 0 in which case it's still a valid stream, but just means
-    the total PCM frame count is unknown. Likely the case with streams like internet radio.
-    */
-    drflac_uint64 totalPCMFrameCount;
-
-
-    /* The container type. This is set based on whether or not the decoder was opened from a native or Ogg stream. */
-    drflac_container container;
-
-    /* The number of seekpoints in the seektable. */
-    drflac_uint32 seekpointCount;
-
-
-    /* Information about the frame the decoder is currently sitting on. */
-    drflac_frame currentFLACFrame;
-
-
-    /* The index of the PCM frame the decoder is currently sitting on. This is only used for seeking. */
-    drflac_uint64 currentPCMFrame;
-
-    /* The position of the first FLAC frame in the stream. This is only ever used for seeking. */
-    drflac_uint64 firstFLACFramePosInBytes;
-
-
-    /* A hack to avoid a malloc() when opening a decoder with drflac_open_memory(). */
-    drflac__memory_stream memoryStream;
-
-
-    /* A pointer to the decoded sample data. This is an offset of pExtraData. */
-    drflac_int32* pDecodedSamples;
-
-    /* A pointer to the seek table. This is an offset of pExtraData, or NULL if there is no seek table. */
-    drflac_seekpoint* pSeekpoints;
-
-    /* Internal use only. Only used with Ogg containers. Points to a drflac_oggbs object. This is an offset of pExtraData. */
-    void* _oggbs;
-
-    /* Internal use only. Used for profiling and testing different seeking modes. */
-    drflac_bool32 _noSeekTableSeek    : 1;
-    drflac_bool32 _noBinarySearchSeek : 1;
-    drflac_bool32 _noBruteForceSeek   : 1;
-
-    /* The bit streamer. The raw FLAC data is fed through this object. */
-    drflac_bs bs;
-
-    /* Variable length extra data. We attach this to the end of the object so we can avoid unnecessary mallocs. */
-    drflac_uint8 pExtraData[1];
-} drflac;
-
-
-/*
-Opens a FLAC decoder.
-
-
-Parameters
-----------
-onRead (in)
-    The function to call when data needs to be read from the client.
-
-onSeek (in)
-    The function to call when the read position of the client data needs to move.
-
-pUserData (in, optional)
-    A pointer to application defined data that will be passed to onRead and onSeek.
-
-pAllocationCallbacks (in, optional)
-    A pointer to application defined callbacks for managing memory allocations.
-
-
-Return Value
-------------
-Returns a pointer to an object representing the decoder.
-
-
-Remarks
--------
-Close the decoder with `drflac_close()`.
-
-`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
-
-This function will automatically detect whether or not you are attempting to open a native or Ogg encapsulated FLAC, both of which should work seamlessly
-without any manual intervention. Ogg encapsulation also works with multiplexed streams which basically means it can play FLAC encoded audio tracks in videos.
-
-This is the lowest level function for opening a FLAC stream. You can also use `drflac_open_file()` and `drflac_open_memory()` to open the stream from a file or
-from a block of memory respectively.
-
-The STREAMINFO block must be present for this to succeed. Use `drflac_open_relaxed()` to open a FLAC stream where the header may not be present.
-
-Use `drflac_open_with_metadata()` if you need access to metadata.
-
-
-Seek Also
----------
-drflac_open_file()
-drflac_open_memory()
-drflac_open_with_metadata()
-drflac_close()
-*/
-DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/*
-Opens a FLAC stream with relaxed validation of the header block.
-
-
-Parameters
-----------
-onRead (in)
-    The function to call when data needs to be read from the client.
-
-onSeek (in)
-    The function to call when the read position of the client data needs to move.
-
-container (in)
-    Whether or not the FLAC stream is encapsulated using standard FLAC encapsulation or Ogg encapsulation.
-
-pUserData (in, optional)
-    A pointer to application defined data that will be passed to onRead and onSeek.
-
-pAllocationCallbacks (in, optional)
-    A pointer to application defined callbacks for managing memory allocations.
-
-
-Return Value
-------------
-A pointer to an object representing the decoder.
-
-
-Remarks
--------
-The same as drflac_open(), except attempts to open the stream even when a header block is not present.
-
-Because the header is not necessarily available, the caller must explicitly define the container (Native or Ogg). Do not set this to `drflac_container_unknown`
-as that is for internal use only.
-
-Opening in relaxed mode will continue reading data from onRead until it finds a valid frame. If a frame is never found it will continue forever. To abort,
-force your `onRead` callback to return 0, which dr_flac will use as an indicator that the end of the stream was found.
-
-Use `drflac_open_with_metadata_relaxed()` if you need access to metadata.
-*/
-DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/*
-Opens a FLAC decoder and notifies the caller of the metadata chunks (album art, etc.).
-
-
-Parameters
-----------
-onRead (in)
-    The function to call when data needs to be read from the client.
-
-onSeek (in)
-    The function to call when the read position of the client data needs to move.
-
-onMeta (in)
-    The function to call for every metadata block.
-
-pUserData (in, optional)
-    A pointer to application defined data that will be passed to onRead, onSeek and onMeta.
-
-pAllocationCallbacks (in, optional)
-    A pointer to application defined callbacks for managing memory allocations.
-
-
-Return Value
-------------
-A pointer to an object representing the decoder.
-
-
-Remarks
--------
-Close the decoder with `drflac_close()`.
-
-`pAllocationCallbacks` can be NULL in which case it will use `DRFLAC_MALLOC`, `DRFLAC_REALLOC` and `DRFLAC_FREE`.
-
-This is slower than `drflac_open()`, so avoid this one if you don't need metadata. Internally, this will allocate and free memory on the heap for every
-metadata block except for STREAMINFO and PADDING blocks.
-
-The caller is notified of the metadata via the `onMeta` callback. All metadata blocks will be handled before the function returns. This callback takes a
-pointer to a `drflac_metadata` object which is a union containing the data of all relevant metadata blocks. Use the `type` member to discriminate against
-the different metadata types.
-
-The STREAMINFO block must be present for this to succeed. Use `drflac_open_with_metadata_relaxed()` to open a FLAC stream where the header may not be present.
-
-Note that this will behave inconsistently with `drflac_open()` if the stream is an Ogg encapsulated stream and a metadata block is corrupted. This is due to
-the way the Ogg stream recovers from corrupted pages. When `drflac_open_with_metadata()` is being used, the open routine will try to read the contents of the
-metadata block, whereas `drflac_open()` will simply seek past it (for the sake of efficiency). This inconsistency can result in different samples being
-returned depending on whether or not the stream is being opened with metadata.
-
-
-Seek Also
----------
-drflac_open_file_with_metadata()
-drflac_open_memory_with_metadata()
-drflac_open()
-drflac_close()
-*/
-DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/*
-The same as drflac_open_with_metadata(), except attempts to open the stream even when a header block is not present.
-
-See Also
---------
-drflac_open_with_metadata()
-drflac_open_relaxed()
-*/
-DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/*
-Closes the given FLAC decoder.
-
-
-Parameters
-----------
-pFlac (in)
-    The decoder to close.
-
-
-Remarks
--------
-This will destroy the decoder object.
-
-
-See Also
---------
-drflac_open()
-drflac_open_with_metadata()
-drflac_open_file()
-drflac_open_file_w()
-drflac_open_file_with_metadata()
-drflac_open_file_with_metadata_w()
-drflac_open_memory()
-drflac_open_memory_with_metadata()
-*/
-DRFLAC_API void drflac_close(drflac* pFlac);
-
-
-/*
-Reads sample data from the given FLAC decoder, output as interleaved signed 32-bit PCM.
-
-
-Parameters
-----------
-pFlac (in)
-    The decoder.
-
-framesToRead (in)
-    The number of PCM frames to read.
-
-pBufferOut (out, optional)
-    A pointer to the buffer that will receive the decoded samples.
-
-
-Return Value
-------------
-Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
-
-
-Remarks
--------
-pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
-*/
-DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut);
-
-
-/*
-Reads sample data from the given FLAC decoder, output as interleaved signed 16-bit PCM.
-
-
-Parameters
-----------
-pFlac (in)
-    The decoder.
-
-framesToRead (in)
-    The number of PCM frames to read.
-
-pBufferOut (out, optional)
-    A pointer to the buffer that will receive the decoded samples.
-
-
-Return Value
-------------
-Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
-
-
-Remarks
--------
-pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
-
-Note that this is lossy for streams where the bits per sample is larger than 16.
-*/
-DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut);
-
-/*
-Reads sample data from the given FLAC decoder, output as interleaved 32-bit floating point PCM.
-
-
-Parameters
-----------
-pFlac (in)
-    The decoder.
-
-framesToRead (in)
-    The number of PCM frames to read.
-
-pBufferOut (out, optional)
-    A pointer to the buffer that will receive the decoded samples.
-
-
-Return Value
-------------
-Returns the number of PCM frames actually read. If the return value is less than `framesToRead` it has reached the end.
-
-
-Remarks
--------
-pBufferOut can be null, in which case the call will act as a seek, and the return value will be the number of frames seeked.
-
-Note that this should be considered lossy due to the nature of floating point numbers not being able to exactly represent every possible number.
-*/
-DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut);
-
-/*
-Seeks to the PCM frame at the given index.
-
-
-Parameters
-----------
-pFlac (in)
-    The decoder.
-
-pcmFrameIndex (in)
-    The index of the PCM frame to seek to. See notes below.
-
-
-Return Value
--------------
-`DRFLAC_TRUE` if successful; `DRFLAC_FALSE` otherwise.
-*/
-DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex);
-
-
-
-#ifndef DR_FLAC_NO_STDIO
-/*
-Opens a FLAC decoder from the file at the given path.
-
-
-Parameters
-----------
-pFileName (in)
-    The path of the file to open, either absolute or relative to the current directory.
-
-pAllocationCallbacks (in, optional)
-    A pointer to application defined callbacks for managing memory allocations.
-
-
-Return Value
-------------
-A pointer to an object representing the decoder.
-
-
-Remarks
--------
-Close the decoder with drflac_close().
-
-
-Remarks
--------
-This will hold a handle to the file until the decoder is closed with drflac_close(). Some platforms will restrict the number of files a process can have open
-at any given time, so keep this mind if you have many decoders open at the same time.
-
-
-See Also
---------
-drflac_open_file_with_metadata()
-drflac_open()
-drflac_close()
-*/
-DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
-DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/*
-Opens a FLAC decoder from the file at the given path and notifies the caller of the metadata chunks (album art, etc.)
-
-
-Parameters
-----------
-pFileName (in)
-    The path of the file to open, either absolute or relative to the current directory.
-
-pAllocationCallbacks (in, optional)
-    A pointer to application defined callbacks for managing memory allocations.
-
-onMeta (in)
-    The callback to fire for each metadata block.
-
-pUserData (in)
-    A pointer to the user data to pass to the metadata callback.
-
-pAllocationCallbacks (in)
-    A pointer to application defined callbacks for managing memory allocations.
-
-
-Remarks
--------
-Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
-
-
-See Also
---------
-drflac_open_with_metadata()
-drflac_open()
-drflac_close()
-*/
-DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
-DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
-#endif
-
-/*
-Opens a FLAC decoder from a pre-allocated block of memory
-
-
-Parameters
-----------
-pData (in)
-    A pointer to the raw encoded FLAC data.
-
-dataSize (in)
-    The size in bytes of `data`.
-
-pAllocationCallbacks (in)
-    A pointer to application defined callbacks for managing memory allocations.
-
-
-Return Value
-------------
-A pointer to an object representing the decoder.
-
-
-Remarks
--------
-This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for the lifetime of the decoder.
-
-
-See Also
---------
-drflac_open()
-drflac_close()
-*/
-DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/*
-Opens a FLAC decoder from a pre-allocated block of memory and notifies the caller of the metadata chunks (album art, etc.)
-
-
-Parameters
-----------
-pData (in)
-    A pointer to the raw encoded FLAC data.
-
-dataSize (in)
-    The size in bytes of `data`.
-
-onMeta (in)
-    The callback to fire for each metadata block.
-
-pUserData (in)
-    A pointer to the user data to pass to the metadata callback.
-
-pAllocationCallbacks (in)
-    A pointer to application defined callbacks for managing memory allocations.
-
-
-Remarks
--------
-Look at the documentation for drflac_open_with_metadata() for more information on how metadata is handled.
-
-
-See Also
--------
-drflac_open_with_metadata()
-drflac_open()
-drflac_close()
-*/
-DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-
-
-/* High Level APIs */
-
-/*
-Opens a FLAC stream from the given callbacks and fully decodes it in a single operation. The return value is a
-pointer to the sample data as interleaved signed 32-bit PCM. The returned data must be freed with drflac_free().
-
-You can pass in custom memory allocation callbacks via the pAllocationCallbacks parameter. This can be NULL in which
-case it will use DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
-
-Sometimes a FLAC file won't keep track of the total sample count. In this situation the function will continuously
-read samples into a dynamically sized buffer on the heap until no samples are left.
-
-Do not call this function on a broadcast type of stream (like internet radio streams and whatnot).
-*/
-DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/* Same as drflac_open_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
-DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/* Same as drflac_open_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
-DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-#ifndef DR_FLAC_NO_STDIO
-/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a file. */
-DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
-DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/* Same as drflac_open_file_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
-DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
-#endif
-
-/* Same as drflac_open_and_read_pcm_frames_s32() except opens the decoder from a block of memory. */
-DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns signed 16-bit integer samples. */
-DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/* Same as drflac_open_memory_and_read_pcm_frames_s32(), except returns 32-bit floating-point samples. */
-DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-/*
-Frees memory that was allocated internally by dr_flac.
-
-Set pAllocationCallbacks to the same object that was passed to drflac_open_*_and_read_pcm_frames_*(). If you originally passed in NULL, pass in NULL for this.
-*/
-DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks);
-
-
-/* Structure representing an iterator for vorbis comments in a VORBIS_COMMENT metadata block. */
-typedef struct
-{
-    drflac_uint32 countRemaining;
-    const char* pRunningData;
-} drflac_vorbis_comment_iterator;
-
-/*
-Initializes a vorbis comment iterator. This can be used for iterating over the vorbis comments in a VORBIS_COMMENT
-metadata block.
-*/
-DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments);
-
-/*
-Goes to the next vorbis comment in the given iterator. If null is returned it means there are no more comments. The
-returned string is NOT null terminated.
-*/
-DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut);
-
-
-/* Structure representing an iterator for cuesheet tracks in a CUESHEET metadata block. */
-typedef struct
-{
-    drflac_uint32 countRemaining;
-    const char* pRunningData;
-} drflac_cuesheet_track_iterator;
-
-/* The order of members here is important because we map this directly to the raw data within the CUESHEET metadata block. */
-typedef struct
-{
-    drflac_uint64 offset;
-    drflac_uint8 index;
-    drflac_uint8 reserved[3];
-} drflac_cuesheet_track_index;
-
-typedef struct
-{
-    drflac_uint64 offset;
-    drflac_uint8 trackNumber;
-    char ISRC[12];
-    drflac_bool8 isAudio;
-    drflac_bool8 preEmphasis;
-    drflac_uint8 indexCount;
-    const drflac_cuesheet_track_index* pIndexPoints;
-} drflac_cuesheet_track;
-
-/*
-Initializes a cuesheet track iterator. This can be used for iterating over the cuesheet tracks in a CUESHEET metadata
-block.
-*/
-DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData);
-
-/* Goes to the next cuesheet track in the given iterator. If DRFLAC_FALSE is returned it means there are no more comments. */
-DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack);
-
-
-#ifdef __cplusplus
-}
-#endif
-#endif  /* dr_flac_h */
-
-
-/************************************************************************************************************************************************************
- ************************************************************************************************************************************************************
-
- IMPLEMENTATION
-
- ************************************************************************************************************************************************************
- ************************************************************************************************************************************************************/
-#if defined(DR_FLAC_IMPLEMENTATION) || defined(DRFLAC_IMPLEMENTATION)
-#ifndef dr_flac_c
-#define dr_flac_c
-
-/* Disable some annoying warnings. */
-#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
-    #pragma GCC diagnostic push
-    #if __GNUC__ >= 7
-    #pragma GCC diagnostic ignored "-Wimplicit-fallthrough"
-    #endif
-#endif
-
-#ifdef __linux__
-    #ifndef _BSD_SOURCE
-        #define _BSD_SOURCE
-    #endif
-    #ifndef _DEFAULT_SOURCE
-        #define _DEFAULT_SOURCE
-    #endif
-    #ifndef __USE_BSD
-        #define __USE_BSD
-    #endif
-    #include <endian.h>
-#endif
-
-#include <stdlib.h>
-#include <string.h>
-
-/* Inline */
-#ifdef _MSC_VER
-    #define DRFLAC_INLINE __forceinline
-#elif defined(__GNUC__)
-    /*
-    I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
-    the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
-    case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
-    command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
-    I am using "__inline__" only when we're compiling in strict ANSI mode.
-    */
-    #if defined(__STRICT_ANSI__)
-        #define DRFLAC_GNUC_INLINE_HINT __inline__
-    #else
-        #define DRFLAC_GNUC_INLINE_HINT inline
-    #endif
-
-    #if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 2)) || defined(__clang__)
-        #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT __attribute__((always_inline))
-    #else
-        #define DRFLAC_INLINE DRFLAC_GNUC_INLINE_HINT
-    #endif
-#elif defined(__WATCOMC__)
-    #define DRFLAC_INLINE __inline
-#else
-    #define DRFLAC_INLINE
-#endif
-/* End Inline */
-
-/*
-Intrinsics Support
-
-There's a bug in GCC 4.2.x which results in an incorrect compilation error when using _mm_slli_epi32() where it complains with
-
-    "error: shift must be an immediate"
-
-Unfortuantely dr_flac depends on this for a few things so we're just going to disable SSE on GCC 4.2 and below.
-*/
-#if !defined(DR_FLAC_NO_SIMD)
-    #if defined(DRFLAC_X64) || defined(DRFLAC_X86)
-        #if defined(_MSC_VER) && !defined(__clang__)
-            /* MSVC. */
-            #if _MSC_VER >= 1400 && !defined(DRFLAC_NO_SSE2)    /* 2005 */
-                #define DRFLAC_SUPPORT_SSE2
-            #endif
-            #if _MSC_VER >= 1600 && !defined(DRFLAC_NO_SSE41)   /* 2010 */
-                #define DRFLAC_SUPPORT_SSE41
-            #endif
-        #elif defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3)))
-            /* Assume GNUC-style. */
-            #if defined(__SSE2__) && !defined(DRFLAC_NO_SSE2)
-                #define DRFLAC_SUPPORT_SSE2
-            #endif
-            #if defined(__SSE4_1__) && !defined(DRFLAC_NO_SSE41)
-                #define DRFLAC_SUPPORT_SSE41
-            #endif
-        #endif
-
-        /* If at this point we still haven't determined compiler support for the intrinsics just fall back to __has_include. */
-        #if !defined(__GNUC__) && !defined(__clang__) && defined(__has_include)
-            #if !defined(DRFLAC_SUPPORT_SSE2) && !defined(DRFLAC_NO_SSE2) && __has_include(<emmintrin.h>)
-                #define DRFLAC_SUPPORT_SSE2
-            #endif
-            #if !defined(DRFLAC_SUPPORT_SSE41) && !defined(DRFLAC_NO_SSE41) && __has_include(<smmintrin.h>)
-                #define DRFLAC_SUPPORT_SSE41
-            #endif
-        #endif
-
-        #if defined(DRFLAC_SUPPORT_SSE41)
-            #include <smmintrin.h>
-        #elif defined(DRFLAC_SUPPORT_SSE2)
-            #include <emmintrin.h>
-        #endif
-    #endif
-
-    #if defined(DRFLAC_ARM)
-        #if !defined(DRFLAC_NO_NEON) && (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
-            #define DRFLAC_SUPPORT_NEON
-            #include <arm_neon.h>
-        #endif
-    #endif
-#endif
-
-/* Compile-time CPU feature support. */
-#if !defined(DR_FLAC_NO_SIMD) && (defined(DRFLAC_X86) || defined(DRFLAC_X64))
-    #if defined(_MSC_VER) && !defined(__clang__)
-        #if _MSC_VER >= 1400
-            #include <intrin.h>
-            static void drflac__cpuid(int info[4], int fid)
-            {
-                __cpuid(info, fid);
-            }
-        #else
-            #define DRFLAC_NO_CPUID
-        #endif
-    #else
-        #if defined(__GNUC__) || defined(__clang__)
-            static void drflac__cpuid(int info[4], int fid)
-            {
-                /*
-                It looks like the -fPIC option uses the ebx register which GCC complains about. We can work around this by just using a different register, the
-                specific register of which I'm letting the compiler decide on. The "k" prefix is used to specify a 32-bit register. The {...} syntax is for
-                supporting different assembly dialects.
-
-                What's basically happening is that we're saving and restoring the ebx register manually.
-                */
-                #if defined(DRFLAC_X86) && defined(__PIC__)
-                    __asm__ __volatile__ (
-                        "xchg{l} {%%}ebx, %k1;"
-                        "cpuid;"
-                        "xchg{l} {%%}ebx, %k1;"
-                        : "=a"(info[0]), "=&r"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
-                    );
-                #else
-                    __asm__ __volatile__ (
-                        "cpuid" : "=a"(info[0]), "=b"(info[1]), "=c"(info[2]), "=d"(info[3]) : "a"(fid), "c"(0)
-                    );
-                #endif
-            }
-        #else
-            #define DRFLAC_NO_CPUID
-        #endif
-    #endif
-#else
-    #define DRFLAC_NO_CPUID
-#endif
-
-static DRFLAC_INLINE drflac_bool32 drflac_has_sse2(void)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE2)
-        #if defined(DRFLAC_X64)
-            return DRFLAC_TRUE;    /* 64-bit targets always support SSE2. */
-        #elif (defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)
-            return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate SSE2 code we can assume support. */
-        #else
-            #if defined(DRFLAC_NO_CPUID)
-                return DRFLAC_FALSE;
-            #else
-                int info[4];
-                drflac__cpuid(info, 1);
-                return (info[3] & (1 << 26)) != 0;
-            #endif
-        #endif
-    #else
-        return DRFLAC_FALSE;       /* SSE2 is only supported on x86 and x64 architectures. */
-    #endif
-#else
-    return DRFLAC_FALSE;           /* No compiler support. */
-#endif
-}
-
-static DRFLAC_INLINE drflac_bool32 drflac_has_sse41(void)
-{
-#if defined(DRFLAC_SUPPORT_SSE41)
-    #if (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(DRFLAC_NO_SSE41)
-        #if defined(__SSE4_1__) || defined(__AVX__)
-            return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate SSE41 code we can assume support. */
-        #else
-            #if defined(DRFLAC_NO_CPUID)
-                return DRFLAC_FALSE;
-            #else
-                int info[4];
-                drflac__cpuid(info, 1);
-                return (info[2] & (1 << 19)) != 0;
-            #endif
-        #endif
-    #else
-        return DRFLAC_FALSE;       /* SSE41 is only supported on x86 and x64 architectures. */
-    #endif
-#else
-    return DRFLAC_FALSE;           /* No compiler support. */
-#endif
-}
-
-
-#if defined(_MSC_VER) && _MSC_VER >= 1500 && (defined(DRFLAC_X86) || defined(DRFLAC_X64)) && !defined(__clang__)
-    #define DRFLAC_HAS_LZCNT_INTRINSIC
-#elif (defined(__GNUC__) && ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 7)))
-    #define DRFLAC_HAS_LZCNT_INTRINSIC
-#elif defined(__clang__)
-    #if defined(__has_builtin)
-        #if __has_builtin(__builtin_clzll) || __has_builtin(__builtin_clzl)
-            #define DRFLAC_HAS_LZCNT_INTRINSIC
-        #endif
-    #endif
-#endif
-
-#if defined(_MSC_VER) && _MSC_VER >= 1400 && !defined(__clang__)
-    #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
-    #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
-    #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
-#elif defined(__clang__)
-    #if defined(__has_builtin)
-        #if __has_builtin(__builtin_bswap16)
-            #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
-        #endif
-        #if __has_builtin(__builtin_bswap32)
-            #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
-        #endif
-        #if __has_builtin(__builtin_bswap64)
-            #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
-        #endif
-    #endif
-#elif defined(__GNUC__)
-    #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 3))
-        #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
-        #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
-    #endif
-    #if ((__GNUC__ > 4) || (__GNUC__ == 4 && __GNUC_MINOR__ >= 8))
-        #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
-    #endif
-#elif defined(__WATCOMC__) && defined(__386__)
-    #define DRFLAC_HAS_BYTESWAP16_INTRINSIC
-    #define DRFLAC_HAS_BYTESWAP32_INTRINSIC
-    #define DRFLAC_HAS_BYTESWAP64_INTRINSIC
-    extern __inline drflac_uint16 _watcom_bswap16(drflac_uint16);
-    extern __inline drflac_uint32 _watcom_bswap32(drflac_uint32);
-    extern __inline drflac_uint64 _watcom_bswap64(drflac_uint64);
-#pragma aux _watcom_bswap16 = \
-    "xchg al, ah" \
-    parm  [ax]    \
-    value [ax]    \
-    modify nomemory;
-#pragma aux _watcom_bswap32 = \
-    "bswap eax" \
-    parm  [eax] \
-    value [eax] \
-    modify nomemory;
-#pragma aux _watcom_bswap64 = \
-    "bswap eax"     \
-    "bswap edx"     \
-    "xchg eax,edx"  \
-    parm [eax edx]  \
-    value [eax edx] \
-    modify nomemory;
-#endif
-
-
-/* Standard library stuff. */
-#ifndef DRFLAC_ASSERT
-#include <assert.h>
-#define DRFLAC_ASSERT(expression)           assert(expression)
-#endif
-#ifndef DRFLAC_MALLOC
-#define DRFLAC_MALLOC(sz)                   malloc((sz))
-#endif
-#ifndef DRFLAC_REALLOC
-#define DRFLAC_REALLOC(p, sz)               realloc((p), (sz))
-#endif
-#ifndef DRFLAC_FREE
-#define DRFLAC_FREE(p)                      free((p))
-#endif
-#ifndef DRFLAC_COPY_MEMORY
-#define DRFLAC_COPY_MEMORY(dst, src, sz)    memcpy((dst), (src), (sz))
-#endif
-#ifndef DRFLAC_ZERO_MEMORY
-#define DRFLAC_ZERO_MEMORY(p, sz)           memset((p), 0, (sz))
-#endif
-#ifndef DRFLAC_ZERO_OBJECT
-#define DRFLAC_ZERO_OBJECT(p)               DRFLAC_ZERO_MEMORY((p), sizeof(*(p)))
-#endif
-
-#define DRFLAC_MAX_SIMD_VECTOR_SIZE                     64  /* 64 for AVX-512 in the future. */
-
-/* Result Codes */
-typedef drflac_int32 drflac_result;
-#define DRFLAC_SUCCESS                                   0
-#define DRFLAC_ERROR                                    -1   /* A generic error. */
-#define DRFLAC_INVALID_ARGS                             -2
-#define DRFLAC_INVALID_OPERATION                        -3
-#define DRFLAC_OUT_OF_MEMORY                            -4
-#define DRFLAC_OUT_OF_RANGE                             -5
-#define DRFLAC_ACCESS_DENIED                            -6
-#define DRFLAC_DOES_NOT_EXIST                           -7
-#define DRFLAC_ALREADY_EXISTS                           -8
-#define DRFLAC_TOO_MANY_OPEN_FILES                      -9
-#define DRFLAC_INVALID_FILE                             -10
-#define DRFLAC_TOO_BIG                                  -11
-#define DRFLAC_PATH_TOO_LONG                            -12
-#define DRFLAC_NAME_TOO_LONG                            -13
-#define DRFLAC_NOT_DIRECTORY                            -14
-#define DRFLAC_IS_DIRECTORY                             -15
-#define DRFLAC_DIRECTORY_NOT_EMPTY                      -16
-#define DRFLAC_END_OF_FILE                              -17
-#define DRFLAC_NO_SPACE                                 -18
-#define DRFLAC_BUSY                                     -19
-#define DRFLAC_IO_ERROR                                 -20
-#define DRFLAC_INTERRUPT                                -21
-#define DRFLAC_UNAVAILABLE                              -22
-#define DRFLAC_ALREADY_IN_USE                           -23
-#define DRFLAC_BAD_ADDRESS                              -24
-#define DRFLAC_BAD_SEEK                                 -25
-#define DRFLAC_BAD_PIPE                                 -26
-#define DRFLAC_DEADLOCK                                 -27
-#define DRFLAC_TOO_MANY_LINKS                           -28
-#define DRFLAC_NOT_IMPLEMENTED                          -29
-#define DRFLAC_NO_MESSAGE                               -30
-#define DRFLAC_BAD_MESSAGE                              -31
-#define DRFLAC_NO_DATA_AVAILABLE                        -32
-#define DRFLAC_INVALID_DATA                             -33
-#define DRFLAC_TIMEOUT                                  -34
-#define DRFLAC_NO_NETWORK                               -35
-#define DRFLAC_NOT_UNIQUE                               -36
-#define DRFLAC_NOT_SOCKET                               -37
-#define DRFLAC_NO_ADDRESS                               -38
-#define DRFLAC_BAD_PROTOCOL                             -39
-#define DRFLAC_PROTOCOL_UNAVAILABLE                     -40
-#define DRFLAC_PROTOCOL_NOT_SUPPORTED                   -41
-#define DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED            -42
-#define DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED             -43
-#define DRFLAC_SOCKET_NOT_SUPPORTED                     -44
-#define DRFLAC_CONNECTION_RESET                         -45
-#define DRFLAC_ALREADY_CONNECTED                        -46
-#define DRFLAC_NOT_CONNECTED                            -47
-#define DRFLAC_CONNECTION_REFUSED                       -48
-#define DRFLAC_NO_HOST                                  -49
-#define DRFLAC_IN_PROGRESS                              -50
-#define DRFLAC_CANCELLED                                -51
-#define DRFLAC_MEMORY_ALREADY_MAPPED                    -52
-#define DRFLAC_AT_END                                   -53
-
-#define DRFLAC_CRC_MISMATCH                             -100
-/* End Result Codes */
-
-
-#define DRFLAC_SUBFRAME_CONSTANT                        0
-#define DRFLAC_SUBFRAME_VERBATIM                        1
-#define DRFLAC_SUBFRAME_FIXED                           8
-#define DRFLAC_SUBFRAME_LPC                             32
-#define DRFLAC_SUBFRAME_RESERVED                        255
-
-#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE  0
-#define DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2 1
-
-#define DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT           0
-#define DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE             8
-#define DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE            9
-#define DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE              10
-
-#define DRFLAC_SEEKPOINT_SIZE_IN_BYTES                  18
-#define DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES             36
-#define DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES       12
-
-#define drflac_align(x, a)                              ((((x) + (a) - 1) / (a)) * (a))
-
-
-DRFLAC_API void drflac_version(drflac_uint32* pMajor, drflac_uint32* pMinor, drflac_uint32* pRevision)
-{
-    if (pMajor) {
-        *pMajor = DRFLAC_VERSION_MAJOR;
-    }
-
-    if (pMinor) {
-        *pMinor = DRFLAC_VERSION_MINOR;
-    }
-
-    if (pRevision) {
-        *pRevision = DRFLAC_VERSION_REVISION;
-    }
-}
-
-DRFLAC_API const char* drflac_version_string(void)
-{
-    return DRFLAC_VERSION_STRING;
-}
-
-
-/* CPU caps. */
-#if defined(__has_feature)
-    #if __has_feature(thread_sanitizer)
-        #define DRFLAC_NO_THREAD_SANITIZE __attribute__((no_sanitize("thread")))
-    #else
-        #define DRFLAC_NO_THREAD_SANITIZE
-    #endif
-#else
-    #define DRFLAC_NO_THREAD_SANITIZE
-#endif
-
-#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
-static drflac_bool32 drflac__gIsLZCNTSupported = DRFLAC_FALSE;
-#endif
-
-#ifndef DRFLAC_NO_CPUID
-static drflac_bool32 drflac__gIsSSE2Supported  = DRFLAC_FALSE;
-static drflac_bool32 drflac__gIsSSE41Supported = DRFLAC_FALSE;
-
-/*
-I've had a bug report that Clang's ThreadSanitizer presents a warning in this function. Having reviewed this, this does
-actually make sense. However, since CPU caps should never differ for a running process, I don't think the trade off of
-complicating internal API's by passing around CPU caps versus just disabling the warnings is worthwhile. I'm therefore
-just going to disable these warnings. This is disabled via the DRFLAC_NO_THREAD_SANITIZE attribute.
-*/
-DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
-{
-    static drflac_bool32 isCPUCapsInitialized = DRFLAC_FALSE;
-
-    if (!isCPUCapsInitialized) {
-        /* LZCNT */
-#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
-        int info[4] = {0};
-        drflac__cpuid(info, 0x80000001);
-        drflac__gIsLZCNTSupported = (info[2] & (1 << 5)) != 0;
-#endif
-
-        /* SSE2 */
-        drflac__gIsSSE2Supported = drflac_has_sse2();
-
-        /* SSE4.1 */
-        drflac__gIsSSE41Supported = drflac_has_sse41();
-
-        /* Initialized. */
-        isCPUCapsInitialized = DRFLAC_TRUE;
-    }
-}
-#else
-static drflac_bool32 drflac__gIsNEONSupported  = DRFLAC_FALSE;
-
-static DRFLAC_INLINE drflac_bool32 drflac__has_neon(void)
-{
-#if defined(DRFLAC_SUPPORT_NEON)
-    #if defined(DRFLAC_ARM) && !defined(DRFLAC_NO_NEON)
-        #if (defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64))
-            return DRFLAC_TRUE;    /* If the compiler is allowed to freely generate NEON code we can assume support. */
-        #else
-            /* TODO: Runtime check. */
-            return DRFLAC_FALSE;
-        #endif
-    #else
-        return DRFLAC_FALSE;       /* NEON is only supported on ARM architectures. */
-    #endif
-#else
-    return DRFLAC_FALSE;           /* No compiler support. */
-#endif
-}
-
-DRFLAC_NO_THREAD_SANITIZE static void drflac__init_cpu_caps(void)
-{
-    drflac__gIsNEONSupported = drflac__has_neon();
-
-#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
-    drflac__gIsLZCNTSupported = DRFLAC_TRUE;
-#endif
-}
-#endif
-
-
-/* Endian Management */
-static DRFLAC_INLINE drflac_bool32 drflac__is_little_endian(void)
-{
-#if defined(DRFLAC_X86) || defined(DRFLAC_X64)
-    return DRFLAC_TRUE;
-#elif defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && __BYTE_ORDER == __LITTLE_ENDIAN
-    return DRFLAC_TRUE;
-#else
-    int n = 1;
-    return (*(char*)&n) == 1;
-#endif
-}
-
-static DRFLAC_INLINE drflac_uint16 drflac__swap_endian_uint16(drflac_uint16 n)
-{
-#ifdef DRFLAC_HAS_BYTESWAP16_INTRINSIC
-    #if defined(_MSC_VER) && !defined(__clang__)
-        return _byteswap_ushort(n);
-    #elif defined(__GNUC__) || defined(__clang__)
-        return __builtin_bswap16(n);
-    #elif defined(__WATCOMC__) && defined(__386__)
-        return _watcom_bswap16(n);
-    #else
-        #error "This compiler does not support the byte swap intrinsic."
-    #endif
-#else
-    return ((n & 0xFF00) >> 8) |
-           ((n & 0x00FF) << 8);
-#endif
-}
-
-static DRFLAC_INLINE drflac_uint32 drflac__swap_endian_uint32(drflac_uint32 n)
-{
-#ifdef DRFLAC_HAS_BYTESWAP32_INTRINSIC
-    #if defined(_MSC_VER) && !defined(__clang__)
-        return _byteswap_ulong(n);
-    #elif defined(__GNUC__) || defined(__clang__)
-        #if defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 6) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT)   /* <-- 64-bit inline assembly has not been tested, so disabling for now. */
-            /* Inline assembly optimized implementation for ARM. In my testing, GCC does not generate optimized code with __builtin_bswap32(). */
-            drflac_uint32 r;
-            __asm__ __volatile__ (
-            #if defined(DRFLAC_64BIT)
-                "rev %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(n)   /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
-            #else
-                "rev %[out], %[in]" : [out]"=r"(r) : [in]"r"(n)
-            #endif
-            );
-            return r;
-        #else
-            return __builtin_bswap32(n);
-        #endif
-    #elif defined(__WATCOMC__) && defined(__386__)
-        return _watcom_bswap32(n);
-    #else
-        #error "This compiler does not support the byte swap intrinsic."
-    #endif
-#else
-    return ((n & 0xFF000000) >> 24) |
-           ((n & 0x00FF0000) >>  8) |
-           ((n & 0x0000FF00) <<  8) |
-           ((n & 0x000000FF) << 24);
-#endif
-}
-
-static DRFLAC_INLINE drflac_uint64 drflac__swap_endian_uint64(drflac_uint64 n)
-{
-#ifdef DRFLAC_HAS_BYTESWAP64_INTRINSIC
-    #if defined(_MSC_VER) && !defined(__clang__)
-        return _byteswap_uint64(n);
-    #elif defined(__GNUC__) || defined(__clang__)
-        return __builtin_bswap64(n);
-    #elif defined(__WATCOMC__) && defined(__386__)
-        return _watcom_bswap64(n);
-    #else
-        #error "This compiler does not support the byte swap intrinsic."
-    #endif
-#else
-    /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
-    return ((n & ((drflac_uint64)0xFF000000 << 32)) >> 56) |
-           ((n & ((drflac_uint64)0x00FF0000 << 32)) >> 40) |
-           ((n & ((drflac_uint64)0x0000FF00 << 32)) >> 24) |
-           ((n & ((drflac_uint64)0x000000FF << 32)) >>  8) |
-           ((n & ((drflac_uint64)0xFF000000      )) <<  8) |
-           ((n & ((drflac_uint64)0x00FF0000      )) << 24) |
-           ((n & ((drflac_uint64)0x0000FF00      )) << 40) |
-           ((n & ((drflac_uint64)0x000000FF      )) << 56);
-#endif
-}
-
-
-static DRFLAC_INLINE drflac_uint16 drflac__be2host_16(drflac_uint16 n)
-{
-    if (drflac__is_little_endian()) {
-        return drflac__swap_endian_uint16(n);
-    }
-
-    return n;
-}
-
-static DRFLAC_INLINE drflac_uint32 drflac__be2host_32(drflac_uint32 n)
-{
-    if (drflac__is_little_endian()) {
-        return drflac__swap_endian_uint32(n);
-    }
-
-    return n;
-}
-
-static DRFLAC_INLINE drflac_uint32 drflac__be2host_32_ptr_unaligned(const void* pData)
-{
-    const drflac_uint8* pNum = (drflac_uint8*)pData;
-    return *(pNum) << 24 | *(pNum+1) << 16 | *(pNum+2) << 8 | *(pNum+3);
-}
-
-static DRFLAC_INLINE drflac_uint64 drflac__be2host_64(drflac_uint64 n)
-{
-    if (drflac__is_little_endian()) {
-        return drflac__swap_endian_uint64(n);
-    }
-
-    return n;
-}
-
-
-static DRFLAC_INLINE drflac_uint32 drflac__le2host_32(drflac_uint32 n)
-{
-    if (!drflac__is_little_endian()) {
-        return drflac__swap_endian_uint32(n);
-    }
-
-    return n;
-}
-
-static DRFLAC_INLINE drflac_uint32 drflac__le2host_32_ptr_unaligned(const void* pData)
-{
-    const drflac_uint8* pNum = (drflac_uint8*)pData;
-    return *pNum | *(pNum+1) << 8 |  *(pNum+2) << 16 | *(pNum+3) << 24;
-}
-
-
-static DRFLAC_INLINE drflac_uint32 drflac__unsynchsafe_32(drflac_uint32 n)
-{
-    drflac_uint32 result = 0;
-    result |= (n & 0x7F000000) >> 3;
-    result |= (n & 0x007F0000) >> 2;
-    result |= (n & 0x00007F00) >> 1;
-    result |= (n & 0x0000007F) >> 0;
-
-    return result;
-}
-
-
-
-/* The CRC code below is based on this document: http://zlib.net/crc_v3.txt */
-static drflac_uint8 drflac__crc8_table[] = {
-    0x00, 0x07, 0x0E, 0x09, 0x1C, 0x1B, 0x12, 0x15, 0x38, 0x3F, 0x36, 0x31, 0x24, 0x23, 0x2A, 0x2D,
-    0x70, 0x77, 0x7E, 0x79, 0x6C, 0x6B, 0x62, 0x65, 0x48, 0x4F, 0x46, 0x41, 0x54, 0x53, 0x5A, 0x5D,
-    0xE0, 0xE7, 0xEE, 0xE9, 0xFC, 0xFB, 0xF2, 0xF5, 0xD8, 0xDF, 0xD6, 0xD1, 0xC4, 0xC3, 0xCA, 0xCD,
-    0x90, 0x97, 0x9E, 0x99, 0x8C, 0x8B, 0x82, 0x85, 0xA8, 0xAF, 0xA6, 0xA1, 0xB4, 0xB3, 0xBA, 0xBD,
-    0xC7, 0xC0, 0xC9, 0xCE, 0xDB, 0xDC, 0xD5, 0xD2, 0xFF, 0xF8, 0xF1, 0xF6, 0xE3, 0xE4, 0xED, 0xEA,
-    0xB7, 0xB0, 0xB9, 0xBE, 0xAB, 0xAC, 0xA5, 0xA2, 0x8F, 0x88, 0x81, 0x86, 0x93, 0x94, 0x9D, 0x9A,
-    0x27, 0x20, 0x29, 0x2E, 0x3B, 0x3C, 0x35, 0x32, 0x1F, 0x18, 0x11, 0x16, 0x03, 0x04, 0x0D, 0x0A,
-    0x57, 0x50, 0x59, 0x5E, 0x4B, 0x4C, 0x45, 0x42, 0x6F, 0x68, 0x61, 0x66, 0x73, 0x74, 0x7D, 0x7A,
-    0x89, 0x8E, 0x87, 0x80, 0x95, 0x92, 0x9B, 0x9C, 0xB1, 0xB6, 0xBF, 0xB8, 0xAD, 0xAA, 0xA3, 0xA4,
-    0xF9, 0xFE, 0xF7, 0xF0, 0xE5, 0xE2, 0xEB, 0xEC, 0xC1, 0xC6, 0xCF, 0xC8, 0xDD, 0xDA, 0xD3, 0xD4,
-    0x69, 0x6E, 0x67, 0x60, 0x75, 0x72, 0x7B, 0x7C, 0x51, 0x56, 0x5F, 0x58, 0x4D, 0x4A, 0x43, 0x44,
-    0x19, 0x1E, 0x17, 0x10, 0x05, 0x02, 0x0B, 0x0C, 0x21, 0x26, 0x2F, 0x28, 0x3D, 0x3A, 0x33, 0x34,
-    0x4E, 0x49, 0x40, 0x47, 0x52, 0x55, 0x5C, 0x5B, 0x76, 0x71, 0x78, 0x7F, 0x6A, 0x6D, 0x64, 0x63,
-    0x3E, 0x39, 0x30, 0x37, 0x22, 0x25, 0x2C, 0x2B, 0x06, 0x01, 0x08, 0x0F, 0x1A, 0x1D, 0x14, 0x13,
-    0xAE, 0xA9, 0xA0, 0xA7, 0xB2, 0xB5, 0xBC, 0xBB, 0x96, 0x91, 0x98, 0x9F, 0x8A, 0x8D, 0x84, 0x83,
-    0xDE, 0xD9, 0xD0, 0xD7, 0xC2, 0xC5, 0xCC, 0xCB, 0xE6, 0xE1, 0xE8, 0xEF, 0xFA, 0xFD, 0xF4, 0xF3
-};
-
-static drflac_uint16 drflac__crc16_table[] = {
-    0x0000, 0x8005, 0x800F, 0x000A, 0x801B, 0x001E, 0x0014, 0x8011,
-    0x8033, 0x0036, 0x003C, 0x8039, 0x0028, 0x802D, 0x8027, 0x0022,
-    0x8063, 0x0066, 0x006C, 0x8069, 0x0078, 0x807D, 0x8077, 0x0072,
-    0x0050, 0x8055, 0x805F, 0x005A, 0x804B, 0x004E, 0x0044, 0x8041,
-    0x80C3, 0x00C6, 0x00CC, 0x80C9, 0x00D8, 0x80DD, 0x80D7, 0x00D2,
-    0x00F0, 0x80F5, 0x80FF, 0x00FA, 0x80EB, 0x00EE, 0x00E4, 0x80E1,
-    0x00A0, 0x80A5, 0x80AF, 0x00AA, 0x80BB, 0x00BE, 0x00B4, 0x80B1,
-    0x8093, 0x0096, 0x009C, 0x8099, 0x0088, 0x808D, 0x8087, 0x0082,
-    0x8183, 0x0186, 0x018C, 0x8189, 0x0198, 0x819D, 0x8197, 0x0192,
-    0x01B0, 0x81B5, 0x81BF, 0x01BA, 0x81AB, 0x01AE, 0x01A4, 0x81A1,
-    0x01E0, 0x81E5, 0x81EF, 0x01EA, 0x81FB, 0x01FE, 0x01F4, 0x81F1,
-    0x81D3, 0x01D6, 0x01DC, 0x81D9, 0x01C8, 0x81CD, 0x81C7, 0x01C2,
-    0x0140, 0x8145, 0x814F, 0x014A, 0x815B, 0x015E, 0x0154, 0x8151,
-    0x8173, 0x0176, 0x017C, 0x8179, 0x0168, 0x816D, 0x8167, 0x0162,
-    0x8123, 0x0126, 0x012C, 0x8129, 0x0138, 0x813D, 0x8137, 0x0132,
-    0x0110, 0x8115, 0x811F, 0x011A, 0x810B, 0x010E, 0x0104, 0x8101,
-    0x8303, 0x0306, 0x030C, 0x8309, 0x0318, 0x831D, 0x8317, 0x0312,
-    0x0330, 0x8335, 0x833F, 0x033A, 0x832B, 0x032E, 0x0324, 0x8321,
-    0x0360, 0x8365, 0x836F, 0x036A, 0x837B, 0x037E, 0x0374, 0x8371,
-    0x8353, 0x0356, 0x035C, 0x8359, 0x0348, 0x834D, 0x8347, 0x0342,
-    0x03C0, 0x83C5, 0x83CF, 0x03CA, 0x83DB, 0x03DE, 0x03D4, 0x83D1,
-    0x83F3, 0x03F6, 0x03FC, 0x83F9, 0x03E8, 0x83ED, 0x83E7, 0x03E2,
-    0x83A3, 0x03A6, 0x03AC, 0x83A9, 0x03B8, 0x83BD, 0x83B7, 0x03B2,
-    0x0390, 0x8395, 0x839F, 0x039A, 0x838B, 0x038E, 0x0384, 0x8381,
-    0x0280, 0x8285, 0x828F, 0x028A, 0x829B, 0x029E, 0x0294, 0x8291,
-    0x82B3, 0x02B6, 0x02BC, 0x82B9, 0x02A8, 0x82AD, 0x82A7, 0x02A2,
-    0x82E3, 0x02E6, 0x02EC, 0x82E9, 0x02F8, 0x82FD, 0x82F7, 0x02F2,
-    0x02D0, 0x82D5, 0x82DF, 0x02DA, 0x82CB, 0x02CE, 0x02C4, 0x82C1,
-    0x8243, 0x0246, 0x024C, 0x8249, 0x0258, 0x825D, 0x8257, 0x0252,
-    0x0270, 0x8275, 0x827F, 0x027A, 0x826B, 0x026E, 0x0264, 0x8261,
-    0x0220, 0x8225, 0x822F, 0x022A, 0x823B, 0x023E, 0x0234, 0x8231,
-    0x8213, 0x0216, 0x021C, 0x8219, 0x0208, 0x820D, 0x8207, 0x0202
-};
-
-static DRFLAC_INLINE drflac_uint8 drflac_crc8_byte(drflac_uint8 crc, drflac_uint8 data)
-{
-    return drflac__crc8_table[crc ^ data];
-}
-
-static DRFLAC_INLINE drflac_uint8 drflac_crc8(drflac_uint8 crc, drflac_uint32 data, drflac_uint32 count)
-{
-#ifdef DR_FLAC_NO_CRC
-    (void)crc;
-    (void)data;
-    (void)count;
-    return 0;
-#else
-#if 0
-    /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc8(crc, 0, 8);") */
-    drflac_uint8 p = 0x07;
-    for (int i = count-1; i >= 0; --i) {
-        drflac_uint8 bit = (data & (1 << i)) >> i;
-        if (crc & 0x80) {
-            crc = ((crc << 1) | bit) ^ p;
-        } else {
-            crc = ((crc << 1) | bit);
-        }
-    }
-    return crc;
-#else
-    drflac_uint32 wholeBytes;
-    drflac_uint32 leftoverBits;
-    drflac_uint64 leftoverDataMask;
-
-    static drflac_uint64 leftoverDataMaskTable[8] = {
-        0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
-    };
-
-    DRFLAC_ASSERT(count <= 32);
-
-    wholeBytes = count >> 3;
-    leftoverBits = count - (wholeBytes*8);
-    leftoverDataMask = leftoverDataMaskTable[leftoverBits];
-
-    switch (wholeBytes) {
-        case 4: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
-        case 3: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
-        case 2: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
-        case 1: crc = drflac_crc8_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
-        case 0: if (leftoverBits > 0) crc = (drflac_uint8)((crc << leftoverBits) ^ drflac__crc8_table[(crc >> (8 - leftoverBits)) ^ (data & leftoverDataMask)]);
-    }
-    return crc;
-#endif
-#endif
-}
-
-static DRFLAC_INLINE drflac_uint16 drflac_crc16_byte(drflac_uint16 crc, drflac_uint8 data)
-{
-    return (crc << 8) ^ drflac__crc16_table[(drflac_uint8)(crc >> 8) ^ data];
-}
-
-static DRFLAC_INLINE drflac_uint16 drflac_crc16_cache(drflac_uint16 crc, drflac_cache_t data)
-{
-#ifdef DRFLAC_64BIT
-    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
-    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
-    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
-    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
-#endif
-    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
-    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
-    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  8) & 0xFF));
-    crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  0) & 0xFF));
-
-    return crc;
-}
-
-static DRFLAC_INLINE drflac_uint16 drflac_crc16_bytes(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 byteCount)
-{
-    switch (byteCount)
-    {
-#ifdef DRFLAC_64BIT
-    case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 56) & 0xFF));
-    case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 48) & 0xFF));
-    case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 40) & 0xFF));
-    case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 32) & 0xFF));
-#endif
-    case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 24) & 0xFF));
-    case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >> 16) & 0xFF));
-    case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  8) & 0xFF));
-    case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data >>  0) & 0xFF));
-    }
-
-    return crc;
-}
-
-#if 0
-static DRFLAC_INLINE drflac_uint16 drflac_crc16__32bit(drflac_uint16 crc, drflac_uint32 data, drflac_uint32 count)
-{
-#ifdef DR_FLAC_NO_CRC
-    (void)crc;
-    (void)data;
-    (void)count;
-    return 0;
-#else
-#if 0
-    /* REFERENCE (use of this implementation requires an explicit flush by doing "drflac_crc16(crc, 0, 16);") */
-    drflac_uint16 p = 0x8005;
-    for (int i = count-1; i >= 0; --i) {
-        drflac_uint16 bit = (data & (1ULL << i)) >> i;
-        if (r & 0x8000) {
-            r = ((r << 1) | bit) ^ p;
-        } else {
-            r = ((r << 1) | bit);
-        }
-    }
-
-    return crc;
-#else
-    drflac_uint32 wholeBytes;
-    drflac_uint32 leftoverBits;
-    drflac_uint64 leftoverDataMask;
-
-    static drflac_uint64 leftoverDataMaskTable[8] = {
-        0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
-    };
-
-    DRFLAC_ASSERT(count <= 64);
-
-    wholeBytes = count >> 3;
-    leftoverBits = count & 7;
-    leftoverDataMask = leftoverDataMaskTable[leftoverBits];
-
-    switch (wholeBytes) {
-        default:
-        case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0xFF000000UL << leftoverBits)) >> (24 + leftoverBits)));
-        case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x00FF0000UL << leftoverBits)) >> (16 + leftoverBits)));
-        case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x0000FF00UL << leftoverBits)) >> ( 8 + leftoverBits)));
-        case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (0x000000FFUL << leftoverBits)) >> ( 0 + leftoverBits)));
-        case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
-    }
-    return crc;
-#endif
-#endif
-}
-
-static DRFLAC_INLINE drflac_uint16 drflac_crc16__64bit(drflac_uint16 crc, drflac_uint64 data, drflac_uint32 count)
-{
-#ifdef DR_FLAC_NO_CRC
-    (void)crc;
-    (void)data;
-    (void)count;
-    return 0;
-#else
-    drflac_uint32 wholeBytes;
-    drflac_uint32 leftoverBits;
-    drflac_uint64 leftoverDataMask;
-
-    static drflac_uint64 leftoverDataMaskTable[8] = {
-        0x00, 0x01, 0x03, 0x07, 0x0F, 0x1F, 0x3F, 0x7F
-    };
-
-    DRFLAC_ASSERT(count <= 64);
-
-    wholeBytes = count >> 3;
-    leftoverBits = count & 7;
-    leftoverDataMask = leftoverDataMaskTable[leftoverBits];
-
-    switch (wholeBytes) {
-        default:
-        case 8: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000 << 32) << leftoverBits)) >> (56 + leftoverBits)));    /* Weird "<< 32" bitshift is required for C89 because it doesn't support 64-bit constants. Should be optimized out by a good compiler. */
-        case 7: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000 << 32) << leftoverBits)) >> (48 + leftoverBits)));
-        case 6: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00 << 32) << leftoverBits)) >> (40 + leftoverBits)));
-        case 5: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF << 32) << leftoverBits)) >> (32 + leftoverBits)));
-        case 4: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0xFF000000      ) << leftoverBits)) >> (24 + leftoverBits)));
-        case 3: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x00FF0000      ) << leftoverBits)) >> (16 + leftoverBits)));
-        case 2: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x0000FF00      ) << leftoverBits)) >> ( 8 + leftoverBits)));
-        case 1: crc = drflac_crc16_byte(crc, (drflac_uint8)((data & (((drflac_uint64)0x000000FF      ) << leftoverBits)) >> ( 0 + leftoverBits)));
-        case 0: if (leftoverBits > 0) crc = (crc << leftoverBits) ^ drflac__crc16_table[(crc >> (16 - leftoverBits)) ^ (data & leftoverDataMask)];
-    }
-    return crc;
-#endif
-}
-
-
-static DRFLAC_INLINE drflac_uint16 drflac_crc16(drflac_uint16 crc, drflac_cache_t data, drflac_uint32 count)
-{
-#ifdef DRFLAC_64BIT
-    return drflac_crc16__64bit(crc, data, count);
-#else
-    return drflac_crc16__32bit(crc, data, count);
-#endif
-}
-#endif
-
-
-#ifdef DRFLAC_64BIT
-#define drflac__be2host__cache_line drflac__be2host_64
-#else
-#define drflac__be2host__cache_line drflac__be2host_32
-#endif
-
-/*
-BIT READING ATTEMPT #2
-
-This uses a 32- or 64-bit bit-shifted cache - as bits are read, the cache is shifted such that the first valid bit is sitting
-on the most significant bit. It uses the notion of an L1 and L2 cache (borrowed from CPU architecture), where the L1 cache
-is a 32- or 64-bit unsigned integer (depending on whether or not a 32- or 64-bit build is being compiled) and the L2 is an
-array of "cache lines", with each cache line being the same size as the L1. The L2 is a buffer of about 4KB and is where data
-from onRead() is read into.
-*/
-#define DRFLAC_CACHE_L1_SIZE_BYTES(bs)                      (sizeof((bs)->cache))
-#define DRFLAC_CACHE_L1_SIZE_BITS(bs)                       (sizeof((bs)->cache)*8)
-#define DRFLAC_CACHE_L1_BITS_REMAINING(bs)                  (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (bs)->consumedBits)
-#define DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount)           (~((~(drflac_cache_t)0) >> (_bitCount)))
-#define DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, _bitCount)      (DRFLAC_CACHE_L1_SIZE_BITS(bs) - (_bitCount))
-#define DRFLAC_CACHE_L1_SELECT(bs, _bitCount)               (((bs)->cache) & DRFLAC_CACHE_L1_SELECTION_MASK(_bitCount))
-#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, _bitCount)     (DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >>  DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)))
-#define DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, _bitCount)(DRFLAC_CACHE_L1_SELECT((bs), (_bitCount)) >> (DRFLAC_CACHE_L1_SELECTION_SHIFT((bs), (_bitCount)) & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1)))
-#define DRFLAC_CACHE_L2_SIZE_BYTES(bs)                      (sizeof((bs)->cacheL2))
-#define DRFLAC_CACHE_L2_LINE_COUNT(bs)                      (DRFLAC_CACHE_L2_SIZE_BYTES(bs) / sizeof((bs)->cacheL2[0]))
-#define DRFLAC_CACHE_L2_LINES_REMAINING(bs)                 (DRFLAC_CACHE_L2_LINE_COUNT(bs) - (bs)->nextL2Line)
-
-
-#ifndef DR_FLAC_NO_CRC
-static DRFLAC_INLINE void drflac__reset_crc16(drflac_bs* bs)
-{
-    bs->crc16 = 0;
-    bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
-}
-
-static DRFLAC_INLINE void drflac__update_crc16(drflac_bs* bs)
-{
-    if (bs->crc16CacheIgnoredBytes == 0) {
-        bs->crc16 = drflac_crc16_cache(bs->crc16, bs->crc16Cache);
-    } else {
-        bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache, DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bs->crc16CacheIgnoredBytes);
-        bs->crc16CacheIgnoredBytes = 0;
-    }
-}
-
-static DRFLAC_INLINE drflac_uint16 drflac__flush_crc16(drflac_bs* bs)
-{
-    /* We should never be flushing in a situation where we are not aligned on a byte boundary. */
-    DRFLAC_ASSERT((DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7) == 0);
-
-    /*
-    The bits that were read from the L1 cache need to be accumulated. The number of bytes needing to be accumulated is determined
-    by the number of bits that have been consumed.
-    */
-    if (DRFLAC_CACHE_L1_BITS_REMAINING(bs) == 0) {
-        drflac__update_crc16(bs);
-    } else {
-        /* We only accumulate the consumed bits. */
-        bs->crc16 = drflac_crc16_bytes(bs->crc16, bs->crc16Cache >> DRFLAC_CACHE_L1_BITS_REMAINING(bs), (bs->consumedBits >> 3) - bs->crc16CacheIgnoredBytes);
-
-        /*
-        The bits that we just accumulated should never be accumulated again. We need to keep track of how many bytes were accumulated
-        so we can handle that later.
-        */
-        bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
-    }
-
-    return bs->crc16;
-}
-#endif
-
-static DRFLAC_INLINE drflac_bool32 drflac__reload_l1_cache_from_l2(drflac_bs* bs)
-{
-    size_t bytesRead;
-    size_t alignedL1LineCount;
-
-    /* Fast path. Try loading straight from L2. */
-    if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
-        bs->cache = bs->cacheL2[bs->nextL2Line++];
-        return DRFLAC_TRUE;
-    }
-
-    /*
-    If we get here it means we've run out of data in the L2 cache. We'll need to fetch more from the client, if there's
-    any left.
-    */
-    if (bs->unalignedByteCount > 0) {
-        return DRFLAC_FALSE;   /* If we have any unaligned bytes it means there's no more aligned bytes left in the client. */
-    }
-
-    bytesRead = bs->onRead(bs->pUserData, bs->cacheL2, DRFLAC_CACHE_L2_SIZE_BYTES(bs));
-
-    bs->nextL2Line = 0;
-    if (bytesRead == DRFLAC_CACHE_L2_SIZE_BYTES(bs)) {
-        bs->cache = bs->cacheL2[bs->nextL2Line++];
-        return DRFLAC_TRUE;
-    }
-
-
-    /*
-    If we get here it means we were unable to retrieve enough data to fill the entire L2 cache. It probably
-    means we've just reached the end of the file. We need to move the valid data down to the end of the buffer
-    and adjust the index of the next line accordingly. Also keep in mind that the L2 cache must be aligned to
-    the size of the L1 so we'll need to seek backwards by any misaligned bytes.
-    */
-    alignedL1LineCount = bytesRead / DRFLAC_CACHE_L1_SIZE_BYTES(bs);
-
-    /* We need to keep track of any unaligned bytes for later use. */
-    bs->unalignedByteCount = bytesRead - (alignedL1LineCount * DRFLAC_CACHE_L1_SIZE_BYTES(bs));
-    if (bs->unalignedByteCount > 0) {
-        bs->unalignedCache = bs->cacheL2[alignedL1LineCount];
-    }
-
-    if (alignedL1LineCount > 0) {
-        size_t offset = DRFLAC_CACHE_L2_LINE_COUNT(bs) - alignedL1LineCount;
-        size_t i;
-        for (i = alignedL1LineCount; i > 0; --i) {
-            bs->cacheL2[i-1 + offset] = bs->cacheL2[i-1];
-        }
-
-        bs->nextL2Line = (drflac_uint32)offset;
-        bs->cache = bs->cacheL2[bs->nextL2Line++];
-        return DRFLAC_TRUE;
-    } else {
-        /* If we get into this branch it means we weren't able to load any L1-aligned data. */
-        bs->nextL2Line = DRFLAC_CACHE_L2_LINE_COUNT(bs);
-        return DRFLAC_FALSE;
-    }
-}
-
-static drflac_bool32 drflac__reload_cache(drflac_bs* bs)
-{
-    size_t bytesRead;
-
-#ifndef DR_FLAC_NO_CRC
-    drflac__update_crc16(bs);
-#endif
-
-    /* Fast path. Try just moving the next value in the L2 cache to the L1 cache. */
-    if (drflac__reload_l1_cache_from_l2(bs)) {
-        bs->cache = drflac__be2host__cache_line(bs->cache);
-        bs->consumedBits = 0;
-#ifndef DR_FLAC_NO_CRC
-        bs->crc16Cache = bs->cache;
-#endif
-        return DRFLAC_TRUE;
-    }
-
-    /* Slow path. */
-
-    /*
-    If we get here it means we have failed to load the L1 cache from the L2. Likely we've just reached the end of the stream and the last
-    few bytes did not meet the alignment requirements for the L2 cache. In this case we need to fall back to a slower path and read the
-    data from the unaligned cache.
-    */
-    bytesRead = bs->unalignedByteCount;
-    if (bytesRead == 0) {
-        bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);   /* <-- The stream has been exhausted, so marked the bits as consumed. */
-        return DRFLAC_FALSE;
-    }
-
-    DRFLAC_ASSERT(bytesRead < DRFLAC_CACHE_L1_SIZE_BYTES(bs));
-    bs->consumedBits = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BYTES(bs) - bytesRead) * 8;
-
-    bs->cache = drflac__be2host__cache_line(bs->unalignedCache);
-    bs->cache &= DRFLAC_CACHE_L1_SELECTION_MASK(DRFLAC_CACHE_L1_BITS_REMAINING(bs));    /* <-- Make sure the consumed bits are always set to zero. Other parts of the library depend on this property. */
-    bs->unalignedByteCount = 0;     /* <-- At this point the unaligned bytes have been moved into the cache and we thus have no more unaligned bytes. */
-
-#ifndef DR_FLAC_NO_CRC
-    bs->crc16Cache = bs->cache >> bs->consumedBits;
-    bs->crc16CacheIgnoredBytes = bs->consumedBits >> 3;
-#endif
-    return DRFLAC_TRUE;
-}
-
-static void drflac__reset_cache(drflac_bs* bs)
-{
-    bs->nextL2Line   = DRFLAC_CACHE_L2_LINE_COUNT(bs);  /* <-- This clears the L2 cache. */
-    bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);   /* <-- This clears the L1 cache. */
-    bs->cache = 0;
-    bs->unalignedByteCount = 0;                         /* <-- This clears the trailing unaligned bytes. */
-    bs->unalignedCache = 0;
-
-#ifndef DR_FLAC_NO_CRC
-    bs->crc16Cache = 0;
-    bs->crc16CacheIgnoredBytes = 0;
-#endif
-}
-
-
-static DRFLAC_INLINE drflac_bool32 drflac__read_uint32(drflac_bs* bs, unsigned int bitCount, drflac_uint32* pResultOut)
-{
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pResultOut != NULL);
-    DRFLAC_ASSERT(bitCount > 0);
-    DRFLAC_ASSERT(bitCount <= 32);
-
-    if (bs->consumedBits == DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
-        if (!drflac__reload_cache(bs)) {
-            return DRFLAC_FALSE;
-        }
-    }
-
-    if (bitCount <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
-        /*
-        If we want to load all 32-bits from a 32-bit cache we need to do it slightly differently because we can't do
-        a 32-bit shift on a 32-bit integer. This will never be the case on 64-bit caches, so we can have a slightly
-        more optimal solution for this.
-        */
-#ifdef DRFLAC_64BIT
-        *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
-        bs->consumedBits += bitCount;
-        bs->cache <<= bitCount;
-#else
-        if (bitCount < DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
-            *pResultOut = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCount);
-            bs->consumedBits += bitCount;
-            bs->cache <<= bitCount;
-        } else {
-            /* Cannot shift by 32-bits, so need to do it differently. */
-            *pResultOut = (drflac_uint32)bs->cache;
-            bs->consumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs);
-            bs->cache = 0;
-        }
-#endif
-
-        return DRFLAC_TRUE;
-    } else {
-        /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
-        drflac_uint32 bitCountHi = DRFLAC_CACHE_L1_BITS_REMAINING(bs);
-        drflac_uint32 bitCountLo = bitCount - bitCountHi;
-        drflac_uint32 resultHi;
-
-        DRFLAC_ASSERT(bitCountHi > 0);
-        DRFLAC_ASSERT(bitCountHi < 32);
-        resultHi = (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountHi);
-
-        if (!drflac__reload_cache(bs)) {
-            return DRFLAC_FALSE;
-        }
-        if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
-            /* This happens when we get to end of stream */
-            return DRFLAC_FALSE;
-        }
-
-        *pResultOut = (resultHi << bitCountLo) | (drflac_uint32)DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, bitCountLo);
-        bs->consumedBits += bitCountLo;
-        bs->cache <<= bitCountLo;
-        return DRFLAC_TRUE;
-    }
-}
-
-static drflac_bool32 drflac__read_int32(drflac_bs* bs, unsigned int bitCount, drflac_int32* pResult)
-{
-    drflac_uint32 result;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pResult != NULL);
-    DRFLAC_ASSERT(bitCount > 0);
-    DRFLAC_ASSERT(bitCount <= 32);
-
-    if (!drflac__read_uint32(bs, bitCount, &result)) {
-        return DRFLAC_FALSE;
-    }
-
-    /* Do not attempt to shift by 32 as it's undefined. */
-    if (bitCount < 32) {
-        drflac_uint32 signbit;
-        signbit = ((result >> (bitCount-1)) & 0x01);
-        result |= (~signbit + 1) << bitCount;
-    }
-
-    *pResult = (drflac_int32)result;
-    return DRFLAC_TRUE;
-}
-
-#ifdef DRFLAC_64BIT
-static drflac_bool32 drflac__read_uint64(drflac_bs* bs, unsigned int bitCount, drflac_uint64* pResultOut)
-{
-    drflac_uint32 resultHi;
-    drflac_uint32 resultLo;
-
-    DRFLAC_ASSERT(bitCount <= 64);
-    DRFLAC_ASSERT(bitCount >  32);
-
-    if (!drflac__read_uint32(bs, bitCount - 32, &resultHi)) {
-        return DRFLAC_FALSE;
-    }
-
-    if (!drflac__read_uint32(bs, 32, &resultLo)) {
-        return DRFLAC_FALSE;
-    }
-
-    *pResultOut = (((drflac_uint64)resultHi) << 32) | ((drflac_uint64)resultLo);
-    return DRFLAC_TRUE;
-}
-#endif
-
-/* Function below is unused, but leaving it here in case I need to quickly add it again. */
-#if 0
-static drflac_bool32 drflac__read_int64(drflac_bs* bs, unsigned int bitCount, drflac_int64* pResultOut)
-{
-    drflac_uint64 result;
-    drflac_uint64 signbit;
-
-    DRFLAC_ASSERT(bitCount <= 64);
-
-    if (!drflac__read_uint64(bs, bitCount, &result)) {
-        return DRFLAC_FALSE;
-    }
-
-    signbit = ((result >> (bitCount-1)) & 0x01);
-    result |= (~signbit + 1) << bitCount;
-
-    *pResultOut = (drflac_int64)result;
-    return DRFLAC_TRUE;
-}
-#endif
-
-static drflac_bool32 drflac__read_uint16(drflac_bs* bs, unsigned int bitCount, drflac_uint16* pResult)
-{
-    drflac_uint32 result;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pResult != NULL);
-    DRFLAC_ASSERT(bitCount > 0);
-    DRFLAC_ASSERT(bitCount <= 16);
-
-    if (!drflac__read_uint32(bs, bitCount, &result)) {
-        return DRFLAC_FALSE;
-    }
-
-    *pResult = (drflac_uint16)result;
-    return DRFLAC_TRUE;
-}
-
-#if 0
-static drflac_bool32 drflac__read_int16(drflac_bs* bs, unsigned int bitCount, drflac_int16* pResult)
-{
-    drflac_int32 result;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pResult != NULL);
-    DRFLAC_ASSERT(bitCount > 0);
-    DRFLAC_ASSERT(bitCount <= 16);
-
-    if (!drflac__read_int32(bs, bitCount, &result)) {
-        return DRFLAC_FALSE;
-    }
-
-    *pResult = (drflac_int16)result;
-    return DRFLAC_TRUE;
-}
-#endif
-
-static drflac_bool32 drflac__read_uint8(drflac_bs* bs, unsigned int bitCount, drflac_uint8* pResult)
-{
-    drflac_uint32 result;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pResult != NULL);
-    DRFLAC_ASSERT(bitCount > 0);
-    DRFLAC_ASSERT(bitCount <= 8);
-
-    if (!drflac__read_uint32(bs, bitCount, &result)) {
-        return DRFLAC_FALSE;
-    }
-
-    *pResult = (drflac_uint8)result;
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__read_int8(drflac_bs* bs, unsigned int bitCount, drflac_int8* pResult)
-{
-    drflac_int32 result;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pResult != NULL);
-    DRFLAC_ASSERT(bitCount > 0);
-    DRFLAC_ASSERT(bitCount <= 8);
-
-    if (!drflac__read_int32(bs, bitCount, &result)) {
-        return DRFLAC_FALSE;
-    }
-
-    *pResult = (drflac_int8)result;
-    return DRFLAC_TRUE;
-}
-
-
-static drflac_bool32 drflac__seek_bits(drflac_bs* bs, size_t bitsToSeek)
-{
-    if (bitsToSeek <= DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
-        bs->consumedBits += (drflac_uint32)bitsToSeek;
-        bs->cache <<= bitsToSeek;
-        return DRFLAC_TRUE;
-    } else {
-        /* It straddles the cached data. This function isn't called too frequently so I'm favouring simplicity here. */
-        bitsToSeek       -= DRFLAC_CACHE_L1_BITS_REMAINING(bs);
-        bs->consumedBits += DRFLAC_CACHE_L1_BITS_REMAINING(bs);
-        bs->cache         = 0;
-
-        /* Simple case. Seek in groups of the same number as bits that fit within a cache line. */
-#ifdef DRFLAC_64BIT
-        while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
-            drflac_uint64 bin;
-            if (!drflac__read_uint64(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
-                return DRFLAC_FALSE;
-            }
-            bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
-        }
-#else
-        while (bitsToSeek >= DRFLAC_CACHE_L1_SIZE_BITS(bs)) {
-            drflac_uint32 bin;
-            if (!drflac__read_uint32(bs, DRFLAC_CACHE_L1_SIZE_BITS(bs), &bin)) {
-                return DRFLAC_FALSE;
-            }
-            bitsToSeek -= DRFLAC_CACHE_L1_SIZE_BITS(bs);
-        }
-#endif
-
-        /* Whole leftover bytes. */
-        while (bitsToSeek >= 8) {
-            drflac_uint8 bin;
-            if (!drflac__read_uint8(bs, 8, &bin)) {
-                return DRFLAC_FALSE;
-            }
-            bitsToSeek -= 8;
-        }
-
-        /* Leftover bits. */
-        if (bitsToSeek > 0) {
-            drflac_uint8 bin;
-            if (!drflac__read_uint8(bs, (drflac_uint32)bitsToSeek, &bin)) {
-                return DRFLAC_FALSE;
-            }
-            bitsToSeek = 0; /* <-- Necessary for the assert below. */
-        }
-
-        DRFLAC_ASSERT(bitsToSeek == 0);
-        return DRFLAC_TRUE;
-    }
-}
-
-
-/* This function moves the bit streamer to the first bit after the sync code (bit 15 of the of the frame header). It will also update the CRC-16. */
-static drflac_bool32 drflac__find_and_seek_to_next_sync_code(drflac_bs* bs)
-{
-    DRFLAC_ASSERT(bs != NULL);
-
-    /*
-    The sync code is always aligned to 8 bits. This is convenient for us because it means we can do byte-aligned movements. The first
-    thing to do is align to the next byte.
-    */
-    if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
-        return DRFLAC_FALSE;
-    }
-
-    for (;;) {
-        drflac_uint8 hi;
-
-#ifndef DR_FLAC_NO_CRC
-        drflac__reset_crc16(bs);
-#endif
-
-        if (!drflac__read_uint8(bs, 8, &hi)) {
-            return DRFLAC_FALSE;
-        }
-
-        if (hi == 0xFF) {
-            drflac_uint8 lo;
-            if (!drflac__read_uint8(bs, 6, &lo)) {
-                return DRFLAC_FALSE;
-            }
-
-            if (lo == 0x3E) {
-                return DRFLAC_TRUE;
-            } else {
-                if (!drflac__seek_bits(bs, DRFLAC_CACHE_L1_BITS_REMAINING(bs) & 7)) {
-                    return DRFLAC_FALSE;
-                }
-            }
-        }
-    }
-
-    /* Should never get here. */
-    /*return DRFLAC_FALSE;*/
-}
-
-
-#if defined(DRFLAC_HAS_LZCNT_INTRINSIC)
-#define DRFLAC_IMPLEMENT_CLZ_LZCNT
-#endif
-#if  defined(_MSC_VER) && _MSC_VER >= 1400 && (defined(DRFLAC_X64) || defined(DRFLAC_X86)) && !defined(__clang__)
-#define DRFLAC_IMPLEMENT_CLZ_MSVC
-#endif
-#if  defined(__WATCOMC__) && defined(__386__)
-#define DRFLAC_IMPLEMENT_CLZ_WATCOM
-#endif
-#ifdef __MRC__
-#include <intrinsics.h>
-#define DRFLAC_IMPLEMENT_CLZ_MRC
-#endif
-
-static DRFLAC_INLINE drflac_uint32 drflac__clz_software(drflac_cache_t x)
-{
-    drflac_uint32 n;
-    static drflac_uint32 clz_table_4[] = {
-        0,
-        4,
-        3, 3,
-        2, 2, 2, 2,
-        1, 1, 1, 1, 1, 1, 1, 1
-    };
-
-    if (x == 0) {
-        return sizeof(x)*8;
-    }
-
-    n = clz_table_4[x >> (sizeof(x)*8 - 4)];
-    if (n == 0) {
-#ifdef DRFLAC_64BIT
-        if ((x & ((drflac_uint64)0xFFFFFFFF << 32)) == 0) { n  = 32; x <<= 32; }
-        if ((x & ((drflac_uint64)0xFFFF0000 << 32)) == 0) { n += 16; x <<= 16; }
-        if ((x & ((drflac_uint64)0xFF000000 << 32)) == 0) { n += 8;  x <<= 8;  }
-        if ((x & ((drflac_uint64)0xF0000000 << 32)) == 0) { n += 4;  x <<= 4;  }
-#else
-        if ((x & 0xFFFF0000) == 0) { n  = 16; x <<= 16; }
-        if ((x & 0xFF000000) == 0) { n += 8;  x <<= 8;  }
-        if ((x & 0xF0000000) == 0) { n += 4;  x <<= 4;  }
-#endif
-        n += clz_table_4[x >> (sizeof(x)*8 - 4)];
-    }
-
-    return n - 1;
-}
-
-#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
-static DRFLAC_INLINE drflac_bool32 drflac__is_lzcnt_supported(void)
-{
-    /* Fast compile time check for ARM. */
-#if defined(DRFLAC_HAS_LZCNT_INTRINSIC) && defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5)
-    return DRFLAC_TRUE;
-#elif defined(__MRC__)
-    return DRFLAC_TRUE;
-#else
-    /* If the compiler itself does not support the intrinsic then we'll need to return false. */
-    #ifdef DRFLAC_HAS_LZCNT_INTRINSIC
-        return drflac__gIsLZCNTSupported;
-    #else
-        return DRFLAC_FALSE;
-    #endif
-#endif
-}
-
-static DRFLAC_INLINE drflac_uint32 drflac__clz_lzcnt(drflac_cache_t x)
-{
-    /*
-    It's critical for competitive decoding performance that this function be highly optimal. With MSVC we can use the __lzcnt64() and __lzcnt() intrinsics
-    to achieve good performance, however on GCC and Clang it's a little bit more annoying. The __builtin_clzl() and __builtin_clzll() intrinsics leave
-    it undefined as to the return value when `x` is 0. We need this to be well defined as returning 32 or 64, depending on whether or not it's a 32- or
-    64-bit build. To work around this we would need to add a conditional to check for the x = 0 case, but this creates unnecessary inefficiency. To work
-    around this problem I have written some inline assembly to emit the LZCNT (x86) or CLZ (ARM) instruction directly which removes the need to include
-    the conditional. This has worked well in the past, but for some reason Clang's MSVC compatible driver, clang-cl, does not seem to be handling this
-    in the same way as the normal Clang driver. It seems that `clang-cl` is just outputting the wrong results sometimes, maybe due to some register
-    getting clobbered?
-
-    I'm not sure if this is a bug with dr_flac's inlined assembly (most likely), a bug in `clang-cl` or just a misunderstanding on my part with inline
-    assembly rules for `clang-cl`. If somebody can identify an error in dr_flac's inlined assembly I'm happy to get that fixed.
-
-    Fortunately there is an easy workaround for this. Clang implements MSVC-specific intrinsics for compatibility. It also defines _MSC_VER for extra
-    compatibility. We can therefore just check for _MSC_VER and use the MSVC intrinsic which, fortunately for us, Clang supports. It would still be nice
-    to know how to fix the inlined assembly for correctness sake, however.
-    */
-
-#if defined(_MSC_VER) /*&& !defined(__clang__)*/    /* <-- Intentionally wanting Clang to use the MSVC __lzcnt64/__lzcnt intrinsics due to above ^. */
-    #ifdef DRFLAC_64BIT
-        return (drflac_uint32)__lzcnt64(x);
-    #else
-        return (drflac_uint32)__lzcnt(x);
-    #endif
-#else
-    #if defined(__GNUC__) || defined(__clang__)
-        #if defined(DRFLAC_X64)
-            {
-                drflac_uint64 r;
-                __asm__ __volatile__ (
-                    "lzcnt{ %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
-                );
-
-                return (drflac_uint32)r;
-            }
-        #elif defined(DRFLAC_X86)
-            {
-                drflac_uint32 r;
-                __asm__ __volatile__ (
-                    "lzcnt{l %1, %0| %0, %1}" : "=r"(r) : "r"(x) : "cc"
-                );
-
-                return r;
-            }
-        #elif defined(DRFLAC_ARM) && (defined(__ARM_ARCH) && __ARM_ARCH >= 5) && !defined(__ARM_ARCH_6M__) && !defined(DRFLAC_64BIT)   /* <-- I haven't tested 64-bit inline assembly, so only enabling this for the 32-bit build for now. */
-            {
-                unsigned int r;
-                __asm__ __volatile__ (
-                #if defined(DRFLAC_64BIT)
-                    "clz %w[out], %w[in]" : [out]"=r"(r) : [in]"r"(x)   /* <-- This is untested. If someone in the community could test this, that would be appreciated! */
-                #else
-                    "clz %[out], %[in]" : [out]"=r"(r) : [in]"r"(x)
-                #endif
-                );
-
-                return r;
-            }
-        #else
-            if (x == 0) {
-                return sizeof(x)*8;
-            }
-            #ifdef DRFLAC_64BIT
-                return (drflac_uint32)__builtin_clzll((drflac_uint64)x);
-            #else
-                return (drflac_uint32)__builtin_clzl((drflac_uint32)x);
-            #endif
-        #endif
-    #else
-        /* Unsupported compiler. */
-        #error "This compiler does not support the lzcnt intrinsic."
-    #endif
-#endif
-}
-#endif
-
-#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
-#include <intrin.h> /* For BitScanReverse(). */
-
-static DRFLAC_INLINE drflac_uint32 drflac__clz_msvc(drflac_cache_t x)
-{
-    drflac_uint32 n;
-
-    if (x == 0) {
-        return sizeof(x)*8;
-    }
-
-#ifdef DRFLAC_64BIT
-    _BitScanReverse64((unsigned long*)&n, x);
-#else
-    _BitScanReverse((unsigned long*)&n, x);
-#endif
-    return sizeof(x)*8 - n - 1;
-}
-#endif
-
-#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM
-static __inline drflac_uint32 drflac__clz_watcom (drflac_uint32);
-#ifdef DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT
-/* Use the LZCNT instruction (only available on some processors since the 2010s). */
-#pragma aux drflac__clz_watcom_lzcnt = \
-    "db 0F3h, 0Fh, 0BDh, 0C0h" /* lzcnt eax, eax */ \
-    parm [eax] \
-    value [eax] \
-    modify nomemory;
-#else
-/* Use the 386+-compatible implementation. */
-#pragma aux drflac__clz_watcom = \
-    "bsr eax, eax" \
-    "xor eax, 31" \
-    parm [eax] nomemory \
-    value [eax] \
-    modify exact [eax] nomemory;
-#endif
-#endif
-
-static DRFLAC_INLINE drflac_uint32 drflac__clz(drflac_cache_t x)
-{
-#ifdef DRFLAC_IMPLEMENT_CLZ_LZCNT
-    if (drflac__is_lzcnt_supported()) {
-        return drflac__clz_lzcnt(x);
-    } else
-#endif
-    {
-#ifdef DRFLAC_IMPLEMENT_CLZ_MSVC
-        return drflac__clz_msvc(x);
-#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM_LZCNT)
-        return drflac__clz_watcom_lzcnt(x);
-#elif defined(DRFLAC_IMPLEMENT_CLZ_WATCOM)
-        return (x == 0) ? sizeof(x)*8 : drflac__clz_watcom(x);
-#elif defined(__MRC__)
-        return __cntlzw(x);
-#else
-        return drflac__clz_software(x);
-#endif
-    }
-}
-
-
-static DRFLAC_INLINE drflac_bool32 drflac__seek_past_next_set_bit(drflac_bs* bs, unsigned int* pOffsetOut)
-{
-    drflac_uint32 zeroCounter = 0;
-    drflac_uint32 setBitOffsetPlus1;
-
-    while (bs->cache == 0) {
-        zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
-        if (!drflac__reload_cache(bs)) {
-            return DRFLAC_FALSE;
-        }
-    }
-
-    if (bs->cache == 1) {
-        /* Not catching this would lead to undefined behaviour: a shift of a 32-bit number by 32 or more is undefined */
-        *pOffsetOut = zeroCounter + (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs) - 1;
-        if (!drflac__reload_cache(bs)) {
-            return DRFLAC_FALSE;
-        }
-
-        return DRFLAC_TRUE;
-    }
-
-    setBitOffsetPlus1 = drflac__clz(bs->cache);
-    setBitOffsetPlus1 += 1;
-
-    if (setBitOffsetPlus1 > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
-        /* This happens when we get to end of stream */
-        return DRFLAC_FALSE;
-    }
-
-    bs->consumedBits += setBitOffsetPlus1;
-    bs->cache <<= setBitOffsetPlus1;
-
-    *pOffsetOut = zeroCounter + setBitOffsetPlus1 - 1;
-    return DRFLAC_TRUE;
-}
-
-
-
-static drflac_bool32 drflac__seek_to_byte(drflac_bs* bs, drflac_uint64 offsetFromStart)
-{
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(offsetFromStart > 0);
-
-    /*
-    Seeking from the start is not quite as trivial as it sounds because the onSeek callback takes a signed 32-bit integer (which
-    is intentional because it simplifies the implementation of the onSeek callbacks), however offsetFromStart is unsigned 64-bit.
-    To resolve we just need to do an initial seek from the start, and then a series of offset seeks to make up the remainder.
-    */
-    if (offsetFromStart > 0x7FFFFFFF) {
-        drflac_uint64 bytesRemaining = offsetFromStart;
-        if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
-            return DRFLAC_FALSE;
-        }
-        bytesRemaining -= 0x7FFFFFFF;
-
-        while (bytesRemaining > 0x7FFFFFFF) {
-            if (!bs->onSeek(bs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
-                return DRFLAC_FALSE;
-            }
-            bytesRemaining -= 0x7FFFFFFF;
-        }
-
-        if (bytesRemaining > 0) {
-            if (!bs->onSeek(bs->pUserData, (int)bytesRemaining, drflac_seek_origin_current)) {
-                return DRFLAC_FALSE;
-            }
-        }
-    } else {
-        if (!bs->onSeek(bs->pUserData, (int)offsetFromStart, drflac_seek_origin_start)) {
-            return DRFLAC_FALSE;
-        }
-    }
-
-    /* The cache should be reset to force a reload of fresh data from the client. */
-    drflac__reset_cache(bs);
-    return DRFLAC_TRUE;
-}
-
-
-static drflac_result drflac__read_utf8_coded_number(drflac_bs* bs, drflac_uint64* pNumberOut, drflac_uint8* pCRCOut)
-{
-    drflac_uint8 crc;
-    drflac_uint64 result;
-    drflac_uint8 utf8[7] = {0};
-    int byteCount;
-    int i;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pNumberOut != NULL);
-    DRFLAC_ASSERT(pCRCOut != NULL);
-
-    crc = *pCRCOut;
-
-    if (!drflac__read_uint8(bs, 8, utf8)) {
-        *pNumberOut = 0;
-        return DRFLAC_AT_END;
-    }
-    crc = drflac_crc8(crc, utf8[0], 8);
-
-    if ((utf8[0] & 0x80) == 0) {
-        *pNumberOut = utf8[0];
-        *pCRCOut = crc;
-        return DRFLAC_SUCCESS;
-    }
-
-    /*byteCount = 1;*/
-    if ((utf8[0] & 0xE0) == 0xC0) {
-        byteCount = 2;
-    } else if ((utf8[0] & 0xF0) == 0xE0) {
-        byteCount = 3;
-    } else if ((utf8[0] & 0xF8) == 0xF0) {
-        byteCount = 4;
-    } else if ((utf8[0] & 0xFC) == 0xF8) {
-        byteCount = 5;
-    } else if ((utf8[0] & 0xFE) == 0xFC) {
-        byteCount = 6;
-    } else if ((utf8[0] & 0xFF) == 0xFE) {
-        byteCount = 7;
-    } else {
-        *pNumberOut = 0;
-        return DRFLAC_CRC_MISMATCH;     /* Bad UTF-8 encoding. */
-    }
-
-    /* Read extra bytes. */
-    DRFLAC_ASSERT(byteCount > 1);
-
-    result = (drflac_uint64)(utf8[0] & (0xFF >> (byteCount + 1)));
-    for (i = 1; i < byteCount; ++i) {
-        if (!drflac__read_uint8(bs, 8, utf8 + i)) {
-            *pNumberOut = 0;
-            return DRFLAC_AT_END;
-        }
-        crc = drflac_crc8(crc, utf8[i], 8);
-
-        result = (result << 6) | (utf8[i] & 0x3F);
-    }
-
-    *pNumberOut = result;
-    *pCRCOut = crc;
-    return DRFLAC_SUCCESS;
-}
-
-
-static DRFLAC_INLINE drflac_uint32 drflac__ilog2_u32(drflac_uint32 x)
-{
-#if 1   /* Needs optimizing. */
-    drflac_uint32 result = 0;
-    while (x > 0) {
-        result += 1;
-        x >>= 1;
-    }
-
-    return result;
-#endif
-}
-
-static DRFLAC_INLINE drflac_bool32 drflac__use_64_bit_prediction(drflac_uint32 bitsPerSample, drflac_uint32 order, drflac_uint32 precision)
-{
-    /* https://web.archive.org/web/20220205005724/https://github.com/ietf-wg-cellar/flac-specification/blob/37a49aa48ba4ba12e8757badfc59c0df35435fec/rfc_backmatter.md */
-    return bitsPerSample + precision + drflac__ilog2_u32(order) > 32;
-}
-
-
-/*
-The next two functions are responsible for calculating the prediction.
-
-When the bits per sample is >16 we need to use 64-bit integer arithmetic because otherwise we'll run out of precision. It's
-safe to assume this will be slower on 32-bit platforms so we use a more optimal solution when the bits per sample is <=16.
-*/
-#if defined(__clang__)
-__attribute__((no_sanitize("signed-integer-overflow")))
-#endif
-static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_32(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
-{
-    drflac_int32 prediction = 0;
-
-    DRFLAC_ASSERT(order <= 32);
-
-    /* 32-bit version. */
-
-    /* VC++ optimizes this to a single jmp. I've not yet verified this for other compilers. */
-    switch (order)
-    {
-    case 32: prediction += coefficients[31] * pDecodedSamples[-32];
-    case 31: prediction += coefficients[30] * pDecodedSamples[-31];
-    case 30: prediction += coefficients[29] * pDecodedSamples[-30];
-    case 29: prediction += coefficients[28] * pDecodedSamples[-29];
-    case 28: prediction += coefficients[27] * pDecodedSamples[-28];
-    case 27: prediction += coefficients[26] * pDecodedSamples[-27];
-    case 26: prediction += coefficients[25] * pDecodedSamples[-26];
-    case 25: prediction += coefficients[24] * pDecodedSamples[-25];
-    case 24: prediction += coefficients[23] * pDecodedSamples[-24];
-    case 23: prediction += coefficients[22] * pDecodedSamples[-23];
-    case 22: prediction += coefficients[21] * pDecodedSamples[-22];
-    case 21: prediction += coefficients[20] * pDecodedSamples[-21];
-    case 20: prediction += coefficients[19] * pDecodedSamples[-20];
-    case 19: prediction += coefficients[18] * pDecodedSamples[-19];
-    case 18: prediction += coefficients[17] * pDecodedSamples[-18];
-    case 17: prediction += coefficients[16] * pDecodedSamples[-17];
-    case 16: prediction += coefficients[15] * pDecodedSamples[-16];
-    case 15: prediction += coefficients[14] * pDecodedSamples[-15];
-    case 14: prediction += coefficients[13] * pDecodedSamples[-14];
-    case 13: prediction += coefficients[12] * pDecodedSamples[-13];
-    case 12: prediction += coefficients[11] * pDecodedSamples[-12];
-    case 11: prediction += coefficients[10] * pDecodedSamples[-11];
-    case 10: prediction += coefficients[ 9] * pDecodedSamples[-10];
-    case  9: prediction += coefficients[ 8] * pDecodedSamples[- 9];
-    case  8: prediction += coefficients[ 7] * pDecodedSamples[- 8];
-    case  7: prediction += coefficients[ 6] * pDecodedSamples[- 7];
-    case  6: prediction += coefficients[ 5] * pDecodedSamples[- 6];
-    case  5: prediction += coefficients[ 4] * pDecodedSamples[- 5];
-    case  4: prediction += coefficients[ 3] * pDecodedSamples[- 4];
-    case  3: prediction += coefficients[ 2] * pDecodedSamples[- 3];
-    case  2: prediction += coefficients[ 1] * pDecodedSamples[- 2];
-    case  1: prediction += coefficients[ 0] * pDecodedSamples[- 1];
-    }
-
-    return (drflac_int32)(prediction >> shift);
-}
-
-static DRFLAC_INLINE drflac_int32 drflac__calculate_prediction_64(drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
-{
-    drflac_int64 prediction;
-
-    DRFLAC_ASSERT(order <= 32);
-
-    /* 64-bit version. */
-
-    /* This method is faster on the 32-bit build when compiling with VC++. See note below. */
-#ifndef DRFLAC_64BIT
-    if (order == 8)
-    {
-        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
-        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
-        prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
-        prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
-        prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
-        prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
-        prediction += coefficients[7] * (drflac_int64)pDecodedSamples[-8];
-    }
-    else if (order == 7)
-    {
-        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
-        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
-        prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
-        prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
-        prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
-        prediction += coefficients[6] * (drflac_int64)pDecodedSamples[-7];
-    }
-    else if (order == 3)
-    {
-        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
-        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
-    }
-    else if (order == 6)
-    {
-        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
-        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
-        prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
-        prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
-        prediction += coefficients[5] * (drflac_int64)pDecodedSamples[-6];
-    }
-    else if (order == 5)
-    {
-        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
-        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
-        prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
-        prediction += coefficients[4] * (drflac_int64)pDecodedSamples[-5];
-    }
-    else if (order == 4)
-    {
-        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
-        prediction += coefficients[2] * (drflac_int64)pDecodedSamples[-3];
-        prediction += coefficients[3] * (drflac_int64)pDecodedSamples[-4];
-    }
-    else if (order == 12)
-    {
-        prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
-        prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
-        prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
-        prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
-        prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
-        prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
-        prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
-        prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
-        prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
-        prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
-        prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
-    }
-    else if (order == 2)
-    {
-        prediction  = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1] * (drflac_int64)pDecodedSamples[-2];
-    }
-    else if (order == 1)
-    {
-        prediction = coefficients[0] * (drflac_int64)pDecodedSamples[-1];
-    }
-    else if (order == 10)
-    {
-        prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
-        prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
-        prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
-        prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
-        prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
-        prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
-        prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
-        prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
-        prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
-    }
-    else if (order == 9)
-    {
-        prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
-        prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
-        prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
-        prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
-        prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
-        prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
-        prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
-        prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
-    }
-    else if (order == 11)
-    {
-        prediction  = coefficients[0]  * (drflac_int64)pDecodedSamples[-1];
-        prediction += coefficients[1]  * (drflac_int64)pDecodedSamples[-2];
-        prediction += coefficients[2]  * (drflac_int64)pDecodedSamples[-3];
-        prediction += coefficients[3]  * (drflac_int64)pDecodedSamples[-4];
-        prediction += coefficients[4]  * (drflac_int64)pDecodedSamples[-5];
-        prediction += coefficients[5]  * (drflac_int64)pDecodedSamples[-6];
-        prediction += coefficients[6]  * (drflac_int64)pDecodedSamples[-7];
-        prediction += coefficients[7]  * (drflac_int64)pDecodedSamples[-8];
-        prediction += coefficients[8]  * (drflac_int64)pDecodedSamples[-9];
-        prediction += coefficients[9]  * (drflac_int64)pDecodedSamples[-10];
-        prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
-    }
-    else
-    {
-        int j;
-
-        prediction = 0;
-        for (j = 0; j < (int)order; ++j) {
-            prediction += coefficients[j] * (drflac_int64)pDecodedSamples[-j-1];
-        }
-    }
-#endif
-
-    /*
-    VC++ optimizes this to a single jmp instruction, but only the 64-bit build. The 32-bit build generates less efficient code for some
-    reason. The ugly version above is faster so we'll just switch between the two depending on the target platform.
-    */
-#ifdef DRFLAC_64BIT
-    prediction = 0;
-    switch (order)
-    {
-    case 32: prediction += coefficients[31] * (drflac_int64)pDecodedSamples[-32];
-    case 31: prediction += coefficients[30] * (drflac_int64)pDecodedSamples[-31];
-    case 30: prediction += coefficients[29] * (drflac_int64)pDecodedSamples[-30];
-    case 29: prediction += coefficients[28] * (drflac_int64)pDecodedSamples[-29];
-    case 28: prediction += coefficients[27] * (drflac_int64)pDecodedSamples[-28];
-    case 27: prediction += coefficients[26] * (drflac_int64)pDecodedSamples[-27];
-    case 26: prediction += coefficients[25] * (drflac_int64)pDecodedSamples[-26];
-    case 25: prediction += coefficients[24] * (drflac_int64)pDecodedSamples[-25];
-    case 24: prediction += coefficients[23] * (drflac_int64)pDecodedSamples[-24];
-    case 23: prediction += coefficients[22] * (drflac_int64)pDecodedSamples[-23];
-    case 22: prediction += coefficients[21] * (drflac_int64)pDecodedSamples[-22];
-    case 21: prediction += coefficients[20] * (drflac_int64)pDecodedSamples[-21];
-    case 20: prediction += coefficients[19] * (drflac_int64)pDecodedSamples[-20];
-    case 19: prediction += coefficients[18] * (drflac_int64)pDecodedSamples[-19];
-    case 18: prediction += coefficients[17] * (drflac_int64)pDecodedSamples[-18];
-    case 17: prediction += coefficients[16] * (drflac_int64)pDecodedSamples[-17];
-    case 16: prediction += coefficients[15] * (drflac_int64)pDecodedSamples[-16];
-    case 15: prediction += coefficients[14] * (drflac_int64)pDecodedSamples[-15];
-    case 14: prediction += coefficients[13] * (drflac_int64)pDecodedSamples[-14];
-    case 13: prediction += coefficients[12] * (drflac_int64)pDecodedSamples[-13];
-    case 12: prediction += coefficients[11] * (drflac_int64)pDecodedSamples[-12];
-    case 11: prediction += coefficients[10] * (drflac_int64)pDecodedSamples[-11];
-    case 10: prediction += coefficients[ 9] * (drflac_int64)pDecodedSamples[-10];
-    case  9: prediction += coefficients[ 8] * (drflac_int64)pDecodedSamples[- 9];
-    case  8: prediction += coefficients[ 7] * (drflac_int64)pDecodedSamples[- 8];
-    case  7: prediction += coefficients[ 6] * (drflac_int64)pDecodedSamples[- 7];
-    case  6: prediction += coefficients[ 5] * (drflac_int64)pDecodedSamples[- 6];
-    case  5: prediction += coefficients[ 4] * (drflac_int64)pDecodedSamples[- 5];
-    case  4: prediction += coefficients[ 3] * (drflac_int64)pDecodedSamples[- 4];
-    case  3: prediction += coefficients[ 2] * (drflac_int64)pDecodedSamples[- 3];
-    case  2: prediction += coefficients[ 1] * (drflac_int64)pDecodedSamples[- 2];
-    case  1: prediction += coefficients[ 0] * (drflac_int64)pDecodedSamples[- 1];
-    }
-#endif
-
-    return (drflac_int32)(prediction >> shift);
-}
-
-
-#if 0
-/*
-Reference implementation for reading and decoding samples with residual. This is intentionally left unoptimized for the
-sake of readability and should only be used as a reference.
-*/
-static drflac_bool32 drflac__decode_samples_with_residual__rice__reference(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-    drflac_uint32 i;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pSamplesOut != NULL);
-
-    for (i = 0; i < count; ++i) {
-        drflac_uint32 zeroCounter = 0;
-        for (;;) {
-            drflac_uint8 bit;
-            if (!drflac__read_uint8(bs, 1, &bit)) {
-                return DRFLAC_FALSE;
-            }
-
-            if (bit == 0) {
-                zeroCounter += 1;
-            } else {
-                break;
-            }
-        }
-
-        drflac_uint32 decodedRice;
-        if (riceParam > 0) {
-            if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
-                return DRFLAC_FALSE;
-            }
-        } else {
-            decodedRice = 0;
-        }
-
-        decodedRice |= (zeroCounter << riceParam);
-        if ((decodedRice & 0x01)) {
-            decodedRice = ~(decodedRice >> 1);
-        } else {
-            decodedRice =  (decodedRice >> 1);
-        }
-
-
-        if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
-            pSamplesOut[i] = decodedRice + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
-        } else {
-            pSamplesOut[i] = decodedRice + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
-        }
-    }
-
-    return DRFLAC_TRUE;
-}
-#endif
-
-#if 0
-static drflac_bool32 drflac__read_rice_parts__reference(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
-{
-    drflac_uint32 zeroCounter = 0;
-    drflac_uint32 decodedRice;
-
-    for (;;) {
-        drflac_uint8 bit;
-        if (!drflac__read_uint8(bs, 1, &bit)) {
-            return DRFLAC_FALSE;
-        }
-
-        if (bit == 0) {
-            zeroCounter += 1;
-        } else {
-            break;
-        }
-    }
-
-    if (riceParam > 0) {
-        if (!drflac__read_uint32(bs, riceParam, &decodedRice)) {
-            return DRFLAC_FALSE;
-        }
-    } else {
-        decodedRice = 0;
-    }
-
-    *pZeroCounterOut = zeroCounter;
-    *pRiceParamPartOut = decodedRice;
-    return DRFLAC_TRUE;
-}
-#endif
-
-#if 0
-static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
-{
-    drflac_cache_t riceParamMask;
-    drflac_uint32 zeroCounter;
-    drflac_uint32 setBitOffsetPlus1;
-    drflac_uint32 riceParamPart;
-    drflac_uint32 riceLength;
-
-    DRFLAC_ASSERT(riceParam > 0);   /* <-- riceParam should never be 0. drflac__read_rice_parts__param_equals_zero() should be used instead for this case. */
-
-    riceParamMask = DRFLAC_CACHE_L1_SELECTION_MASK(riceParam);
-
-    zeroCounter = 0;
-    while (bs->cache == 0) {
-        zeroCounter += (drflac_uint32)DRFLAC_CACHE_L1_BITS_REMAINING(bs);
-        if (!drflac__reload_cache(bs)) {
-            return DRFLAC_FALSE;
-        }
-    }
-
-    setBitOffsetPlus1 = drflac__clz(bs->cache);
-    zeroCounter += setBitOffsetPlus1;
-    setBitOffsetPlus1 += 1;
-
-    riceLength = setBitOffsetPlus1 + riceParam;
-    if (riceLength < DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
-        riceParamPart = (drflac_uint32)((bs->cache & (riceParamMask >> setBitOffsetPlus1)) >> DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceLength));
-
-        bs->consumedBits += riceLength;
-        bs->cache <<= riceLength;
-    } else {
-        drflac_uint32 bitCountLo;
-        drflac_cache_t resultHi;
-
-        bs->consumedBits += riceLength;
-        bs->cache <<= setBitOffsetPlus1 & (DRFLAC_CACHE_L1_SIZE_BITS(bs)-1);    /* <-- Equivalent to "if (setBitOffsetPlus1 < DRFLAC_CACHE_L1_SIZE_BITS(bs)) { bs->cache <<= setBitOffsetPlus1; }" */
-
-        /* It straddles the cached data. It will never cover more than the next chunk. We just read the number in two parts and combine them. */
-        bitCountLo = bs->consumedBits - DRFLAC_CACHE_L1_SIZE_BITS(bs);
-        resultHi = DRFLAC_CACHE_L1_SELECT_AND_SHIFT(bs, riceParam);  /* <-- Use DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE() if ever this function allows riceParam=0. */
-
-        if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
-#ifndef DR_FLAC_NO_CRC
-            drflac__update_crc16(bs);
-#endif
-            bs->cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
-            bs->consumedBits = 0;
-#ifndef DR_FLAC_NO_CRC
-            bs->crc16Cache = bs->cache;
-#endif
-        } else {
-            /* Slow path. We need to fetch more data from the client. */
-            if (!drflac__reload_cache(bs)) {
-                return DRFLAC_FALSE;
-            }
-            if (bitCountLo > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
-                /* This happens when we get to end of stream */
-                return DRFLAC_FALSE;
-            }
-        }
-
-        riceParamPart = (drflac_uint32)(resultHi | DRFLAC_CACHE_L1_SELECT_AND_SHIFT_SAFE(bs, bitCountLo));
-
-        bs->consumedBits += bitCountLo;
-        bs->cache <<= bitCountLo;
-    }
-
-    pZeroCounterOut[0] = zeroCounter;
-    pRiceParamPartOut[0] = riceParamPart;
-
-    return DRFLAC_TRUE;
-}
-#endif
-
-static DRFLAC_INLINE drflac_bool32 drflac__read_rice_parts_x1(drflac_bs* bs, drflac_uint8 riceParam, drflac_uint32* pZeroCounterOut, drflac_uint32* pRiceParamPartOut)
-{
-    drflac_uint32  riceParamPlus1 = riceParam + 1;
-    /*drflac_cache_t riceParamPlus1Mask  = DRFLAC_CACHE_L1_SELECTION_MASK(riceParamPlus1);*/
-    drflac_uint32  riceParamPlus1Shift = DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPlus1);
-    drflac_uint32  riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
-
-    /*
-    The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
-    no idea how this will work in practice...
-    */
-    drflac_cache_t bs_cache = bs->cache;
-    drflac_uint32  bs_consumedBits = bs->consumedBits;
-
-    /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
-    drflac_uint32  lzcount = drflac__clz(bs_cache);
-    if (lzcount < sizeof(bs_cache)*8) {
-        pZeroCounterOut[0] = lzcount;
-
-        /*
-        It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
-        this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
-        outside of this function at a higher level.
-        */
-    extract_rice_param_part:
-        bs_cache       <<= lzcount;
-        bs_consumedBits += lzcount;
-
-        if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
-            /* Getting here means the rice parameter part is wholly contained within the current cache line. */
-            pRiceParamPartOut[0] = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
-            bs_cache       <<= riceParamPlus1;
-            bs_consumedBits += riceParamPlus1;
-        } else {
-            drflac_uint32 riceParamPartHi;
-            drflac_uint32 riceParamPartLo;
-            drflac_uint32 riceParamPartLoBitCount;
-
-            /*
-            Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
-            line, reload the cache, and then combine it with the head of the next cache line.
-            */
-
-            /* Grab the high part of the rice parameter part. */
-            riceParamPartHi = (drflac_uint32)(bs_cache >> riceParamPlus1Shift);
-
-            /* Before reloading the cache we need to grab the size in bits of the low part. */
-            riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
-            DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
-
-            /* Now reload the cache. */
-            if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
-            #ifndef DR_FLAC_NO_CRC
-                drflac__update_crc16(bs);
-            #endif
-                bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
-                bs_consumedBits = riceParamPartLoBitCount;
-            #ifndef DR_FLAC_NO_CRC
-                bs->crc16Cache = bs_cache;
-            #endif
-            } else {
-                /* Slow path. We need to fetch more data from the client. */
-                if (!drflac__reload_cache(bs)) {
-                    return DRFLAC_FALSE;
-                }
-                if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
-                    /* This happens when we get to end of stream */
-                    return DRFLAC_FALSE;
-                }
-
-                bs_cache = bs->cache;
-                bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
-            }
-
-            /* We should now have enough information to construct the rice parameter part. */
-            riceParamPartLo = (drflac_uint32)(bs_cache >> (DRFLAC_CACHE_L1_SELECTION_SHIFT(bs, riceParamPartLoBitCount)));
-            pRiceParamPartOut[0] = riceParamPartHi | riceParamPartLo;
-
-            bs_cache <<= riceParamPartLoBitCount;
-        }
-    } else {
-        /*
-        Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
-        to drflac__clz() and we need to reload the cache.
-        */
-        drflac_uint32 zeroCounter = (drflac_uint32)(DRFLAC_CACHE_L1_SIZE_BITS(bs) - bs_consumedBits);
-        for (;;) {
-            if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
-            #ifndef DR_FLAC_NO_CRC
-                drflac__update_crc16(bs);
-            #endif
-                bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
-                bs_consumedBits = 0;
-            #ifndef DR_FLAC_NO_CRC
-                bs->crc16Cache = bs_cache;
-            #endif
-            } else {
-                /* Slow path. We need to fetch more data from the client. */
-                if (!drflac__reload_cache(bs)) {
-                    return DRFLAC_FALSE;
-                }
-
-                bs_cache = bs->cache;
-                bs_consumedBits = bs->consumedBits;
-            }
-
-            lzcount = drflac__clz(bs_cache);
-            zeroCounter += lzcount;
-
-            if (lzcount < sizeof(bs_cache)*8) {
-                break;
-            }
-        }
-
-        pZeroCounterOut[0] = zeroCounter;
-        goto extract_rice_param_part;
-    }
-
-    /* Make sure the cache is restored at the end of it all. */
-    bs->cache = bs_cache;
-    bs->consumedBits = bs_consumedBits;
-
-    return DRFLAC_TRUE;
-}
-
-static DRFLAC_INLINE drflac_bool32 drflac__seek_rice_parts(drflac_bs* bs, drflac_uint8 riceParam)
-{
-    drflac_uint32  riceParamPlus1 = riceParam + 1;
-    drflac_uint32  riceParamPlus1MaxConsumedBits = DRFLAC_CACHE_L1_SIZE_BITS(bs) - riceParamPlus1;
-
-    /*
-    The idea here is to use local variables for the cache in an attempt to encourage the compiler to store them in registers. I have
-    no idea how this will work in practice...
-    */
-    drflac_cache_t bs_cache = bs->cache;
-    drflac_uint32  bs_consumedBits = bs->consumedBits;
-
-    /* The first thing to do is find the first unset bit. Most likely a bit will be set in the current cache line. */
-    drflac_uint32  lzcount = drflac__clz(bs_cache);
-    if (lzcount < sizeof(bs_cache)*8) {
-        /*
-        It is most likely that the riceParam part (which comes after the zero counter) is also on this cache line. When extracting
-        this, we include the set bit from the unary coded part because it simplifies cache management. This bit will be handled
-        outside of this function at a higher level.
-        */
-    extract_rice_param_part:
-        bs_cache       <<= lzcount;
-        bs_consumedBits += lzcount;
-
-        if (bs_consumedBits <= riceParamPlus1MaxConsumedBits) {
-            /* Getting here means the rice parameter part is wholly contained within the current cache line. */
-            bs_cache       <<= riceParamPlus1;
-            bs_consumedBits += riceParamPlus1;
-        } else {
-            /*
-            Getting here means the rice parameter part straddles the cache line. We need to read from the tail of the current cache
-            line, reload the cache, and then combine it with the head of the next cache line.
-            */
-
-            /* Before reloading the cache we need to grab the size in bits of the low part. */
-            drflac_uint32 riceParamPartLoBitCount = bs_consumedBits - riceParamPlus1MaxConsumedBits;
-            DRFLAC_ASSERT(riceParamPartLoBitCount > 0 && riceParamPartLoBitCount < 32);
-
-            /* Now reload the cache. */
-            if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
-            #ifndef DR_FLAC_NO_CRC
-                drflac__update_crc16(bs);
-            #endif
-                bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
-                bs_consumedBits = riceParamPartLoBitCount;
-            #ifndef DR_FLAC_NO_CRC
-                bs->crc16Cache = bs_cache;
-            #endif
-            } else {
-                /* Slow path. We need to fetch more data from the client. */
-                if (!drflac__reload_cache(bs)) {
-                    return DRFLAC_FALSE;
-                }
-
-                if (riceParamPartLoBitCount > DRFLAC_CACHE_L1_BITS_REMAINING(bs)) {
-                    /* This happens when we get to end of stream */
-                    return DRFLAC_FALSE;
-                }
-
-                bs_cache = bs->cache;
-                bs_consumedBits = bs->consumedBits + riceParamPartLoBitCount;
-            }
-
-            bs_cache <<= riceParamPartLoBitCount;
-        }
-    } else {
-        /*
-        Getting here means there are no bits set on the cache line. This is a less optimal case because we just wasted a call
-        to drflac__clz() and we need to reload the cache.
-        */
-        for (;;) {
-            if (bs->nextL2Line < DRFLAC_CACHE_L2_LINE_COUNT(bs)) {
-            #ifndef DR_FLAC_NO_CRC
-                drflac__update_crc16(bs);
-            #endif
-                bs_cache = drflac__be2host__cache_line(bs->cacheL2[bs->nextL2Line++]);
-                bs_consumedBits = 0;
-            #ifndef DR_FLAC_NO_CRC
-                bs->crc16Cache = bs_cache;
-            #endif
-            } else {
-                /* Slow path. We need to fetch more data from the client. */
-                if (!drflac__reload_cache(bs)) {
-                    return DRFLAC_FALSE;
-                }
-
-                bs_cache = bs->cache;
-                bs_consumedBits = bs->consumedBits;
-            }
-
-            lzcount = drflac__clz(bs_cache);
-            if (lzcount < sizeof(bs_cache)*8) {
-                break;
-            }
-        }
-
-        goto extract_rice_param_part;
-    }
-
-    /* Make sure the cache is restored at the end of it all. */
-    bs->cache = bs_cache;
-    bs->consumedBits = bs_consumedBits;
-
-    return DRFLAC_TRUE;
-}
-
-
-static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar_zeroorder(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-    drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
-    drflac_uint32 zeroCountPart0;
-    drflac_uint32 riceParamPart0;
-    drflac_uint32 riceParamMask;
-    drflac_uint32 i;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pSamplesOut != NULL);
-
-    (void)bitsPerSample;
-    (void)order;
-    (void)shift;
-    (void)coefficients;
-
-    riceParamMask  = (drflac_uint32)~((~0UL) << riceParam);
-
-    i = 0;
-    while (i < count) {
-        /* Rice extraction. */
-        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
-            return DRFLAC_FALSE;
-        }
-
-        /* Rice reconstruction. */
-        riceParamPart0 &= riceParamMask;
-        riceParamPart0 |= (zeroCountPart0 << riceParam);
-        riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
-
-        pSamplesOut[i] = riceParamPart0;
-
-        i += 1;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__decode_samples_with_residual__rice__scalar(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-    drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
-    drflac_uint32 zeroCountPart0 = 0;
-    drflac_uint32 zeroCountPart1 = 0;
-    drflac_uint32 zeroCountPart2 = 0;
-    drflac_uint32 zeroCountPart3 = 0;
-    drflac_uint32 riceParamPart0 = 0;
-    drflac_uint32 riceParamPart1 = 0;
-    drflac_uint32 riceParamPart2 = 0;
-    drflac_uint32 riceParamPart3 = 0;
-    drflac_uint32 riceParamMask;
-    const drflac_int32* pSamplesOutEnd;
-    drflac_uint32 i;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pSamplesOut != NULL);
-
-    if (lpcOrder == 0) {
-        return drflac__decode_samples_with_residual__rice__scalar_zeroorder(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
-    }
-
-    riceParamMask  = (drflac_uint32)~((~0UL) << riceParam);
-    pSamplesOutEnd = pSamplesOut + (count & ~3);
-
-    if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
-        while (pSamplesOut < pSamplesOutEnd) {
-            /*
-            Rice extraction. It's faster to do this one at a time against local variables than it is to use the x4 version
-            against an array. Not sure why, but perhaps it's making more efficient use of registers?
-            */
-            if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
-                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
-                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
-                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
-                return DRFLAC_FALSE;
-            }
-
-            riceParamPart0 &= riceParamMask;
-            riceParamPart1 &= riceParamMask;
-            riceParamPart2 &= riceParamMask;
-            riceParamPart3 &= riceParamMask;
-
-            riceParamPart0 |= (zeroCountPart0 << riceParam);
-            riceParamPart1 |= (zeroCountPart1 << riceParam);
-            riceParamPart2 |= (zeroCountPart2 << riceParam);
-            riceParamPart3 |= (zeroCountPart3 << riceParam);
-
-            riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
-            riceParamPart1  = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
-            riceParamPart2  = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
-            riceParamPart3  = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
-
-            pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
-            pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
-            pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
-            pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
-
-            pSamplesOut += 4;
-        }
-    } else {
-        while (pSamplesOut < pSamplesOutEnd) {
-            if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0) ||
-                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart1, &riceParamPart1) ||
-                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart2, &riceParamPart2) ||
-                !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart3, &riceParamPart3)) {
-                return DRFLAC_FALSE;
-            }
-
-            riceParamPart0 &= riceParamMask;
-            riceParamPart1 &= riceParamMask;
-            riceParamPart2 &= riceParamMask;
-            riceParamPart3 &= riceParamMask;
-
-            riceParamPart0 |= (zeroCountPart0 << riceParam);
-            riceParamPart1 |= (zeroCountPart1 << riceParam);
-            riceParamPart2 |= (zeroCountPart2 << riceParam);
-            riceParamPart3 |= (zeroCountPart3 << riceParam);
-
-            riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
-            riceParamPart1  = (riceParamPart1 >> 1) ^ t[riceParamPart1 & 0x01];
-            riceParamPart2  = (riceParamPart2 >> 1) ^ t[riceParamPart2 & 0x01];
-            riceParamPart3  = (riceParamPart3 >> 1) ^ t[riceParamPart3 & 0x01];
-
-            pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
-            pSamplesOut[1] = riceParamPart1 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 1);
-            pSamplesOut[2] = riceParamPart2 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 2);
-            pSamplesOut[3] = riceParamPart3 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 3);
-
-            pSamplesOut += 4;
-        }
-    }
-
-    i = (count & ~3);
-    while (i < count) {
-        /* Rice extraction. */
-        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountPart0, &riceParamPart0)) {
-            return DRFLAC_FALSE;
-        }
-
-        /* Rice reconstruction. */
-        riceParamPart0 &= riceParamMask;
-        riceParamPart0 |= (zeroCountPart0 << riceParam);
-        riceParamPart0  = (riceParamPart0 >> 1) ^ t[riceParamPart0 & 0x01];
-        /*riceParamPart0  = (riceParamPart0 >> 1) ^ (~(riceParamPart0 & 0x01) + 1);*/
-
-        /* Sample reconstruction. */
-        if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
-            pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
-        } else {
-            pSamplesOut[0] = riceParamPart0 + drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + 0);
-        }
-
-        i += 1;
-        pSamplesOut += 1;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE __m128i drflac__mm_packs_interleaved_epi32(__m128i a, __m128i b)
-{
-    __m128i r;
-
-    /* Pack. */
-    r = _mm_packs_epi32(a, b);
-
-    /* a3a2 a1a0 b3b2 b1b0 -> a3a2 b3b2 a1a0 b1b0 */
-    r = _mm_shuffle_epi32(r, _MM_SHUFFLE(3, 1, 2, 0));
-
-    /* a3a2 b3b2 a1a0 b1b0 -> a3b3 a2b2 a1b1 a0b0 */
-    r = _mm_shufflehi_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
-    r = _mm_shufflelo_epi16(r, _MM_SHUFFLE(3, 1, 2, 0));
-
-    return r;
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_SSE41)
-static DRFLAC_INLINE __m128i drflac__mm_not_si128(__m128i a)
-{
-    return _mm_xor_si128(a, _mm_cmpeq_epi32(_mm_setzero_si128(), _mm_setzero_si128()));
-}
-
-static DRFLAC_INLINE __m128i drflac__mm_hadd_epi32(__m128i x)
-{
-    __m128i x64 = _mm_add_epi32(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
-    __m128i x32 = _mm_shufflelo_epi16(x64, _MM_SHUFFLE(1, 0, 3, 2));
-    return _mm_add_epi32(x64, x32);
-}
-
-static DRFLAC_INLINE __m128i drflac__mm_hadd_epi64(__m128i x)
-{
-    return _mm_add_epi64(x, _mm_shuffle_epi32(x, _MM_SHUFFLE(1, 0, 3, 2)));
-}
-
-static DRFLAC_INLINE __m128i drflac__mm_srai_epi64(__m128i x, int count)
-{
-    /*
-    To simplify this we are assuming count < 32. This restriction allows us to work on a low side and a high side. The low side
-    is shifted with zero bits, whereas the right side is shifted with sign bits.
-    */
-    __m128i lo = _mm_srli_epi64(x, count);
-    __m128i hi = _mm_srai_epi32(x, count);
-
-    hi = _mm_and_si128(hi, _mm_set_epi32(0xFFFFFFFF, 0, 0xFFFFFFFF, 0));    /* The high part needs to have the low part cleared. */
-
-    return _mm_or_si128(lo, hi);
-}
-
-static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-    int i;
-    drflac_uint32 riceParamMask;
-    drflac_int32* pDecodedSamples    = pSamplesOut;
-    drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
-    drflac_uint32 zeroCountParts0 = 0;
-    drflac_uint32 zeroCountParts1 = 0;
-    drflac_uint32 zeroCountParts2 = 0;
-    drflac_uint32 zeroCountParts3 = 0;
-    drflac_uint32 riceParamParts0 = 0;
-    drflac_uint32 riceParamParts1 = 0;
-    drflac_uint32 riceParamParts2 = 0;
-    drflac_uint32 riceParamParts3 = 0;
-    __m128i coefficients128_0;
-    __m128i coefficients128_4;
-    __m128i coefficients128_8;
-    __m128i samples128_0;
-    __m128i samples128_4;
-    __m128i samples128_8;
-    __m128i riceParamMask128;
-
-    const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
-
-    riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
-    riceParamMask128 = _mm_set1_epi32(riceParamMask);
-
-    /* Pre-load. */
-    coefficients128_0 = _mm_setzero_si128();
-    coefficients128_4 = _mm_setzero_si128();
-    coefficients128_8 = _mm_setzero_si128();
-
-    samples128_0 = _mm_setzero_si128();
-    samples128_4 = _mm_setzero_si128();
-    samples128_8 = _mm_setzero_si128();
-
-    /*
-    Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
-    what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
-    in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
-    so I think there's opportunity for this to be simplified.
-    */
-#if 1
-    {
-        int runningOrder = order;
-
-        /* 0 - 3. */
-        if (runningOrder >= 4) {
-            coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
-            samples128_0      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 4));
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
-                case 2: coefficients128_0 = _mm_set_epi32(0, 0,               coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0,               0); break;
-                case 1: coefficients128_0 = _mm_set_epi32(0, 0,               0,               coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0,               0,               0); break;
-            }
-            runningOrder = 0;
-        }
-
-        /* 4 - 7 */
-        if (runningOrder >= 4) {
-            coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
-            samples128_4      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 8));
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
-                case 2: coefficients128_4 = _mm_set_epi32(0, 0,               coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0,               0); break;
-                case 1: coefficients128_4 = _mm_set_epi32(0, 0,               0,               coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0,               0,               0); break;
-            }
-            runningOrder = 0;
-        }
-
-        /* 8 - 11 */
-        if (runningOrder == 4) {
-            coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
-            samples128_8      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 12));
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
-                case 2: coefficients128_8 = _mm_set_epi32(0, 0,                coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0,                0); break;
-                case 1: coefficients128_8 = _mm_set_epi32(0, 0,                0,               coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0,                0,                0); break;
-            }
-            runningOrder = 0;
-        }
-
-        /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
-        coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
-        coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
-        coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
-    }
-#else
-    /* This causes strict-aliasing warnings with GCC. */
-    switch (order)
-    {
-    case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
-    case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
-    case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
-    case 9:  ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
-    case 8:  ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
-    case 7:  ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
-    case 6:  ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
-    case 5:  ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
-    case 4:  ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
-    case 3:  ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
-    case 2:  ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
-    case 1:  ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
-    }
-#endif
-
-    /* For this version we are doing one sample at a time. */
-    while (pDecodedSamples < pDecodedSamplesEnd) {
-        __m128i prediction128;
-        __m128i zeroCountPart128;
-        __m128i riceParamPart128;
-
-        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
-            return DRFLAC_FALSE;
-        }
-
-        zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
-        riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
-
-        riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
-        riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
-        riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01))), _mm_set1_epi32(0x01)));  /* <-- SSE2 compatible */
-        /*riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_mullo_epi32(_mm_and_si128(riceParamPart128, _mm_set1_epi32(0x01)), _mm_set1_epi32(0xFFFFFFFF)));*/   /* <-- Only supported from SSE4.1 and is slower in my testing... */
-
-        if (order <= 4) {
-            for (i = 0; i < 4; i += 1) {
-                prediction128 = _mm_mullo_epi32(coefficients128_0, samples128_0);
-
-                /* Horizontal add and shift. */
-                prediction128 = drflac__mm_hadd_epi32(prediction128);
-                prediction128 = _mm_srai_epi32(prediction128, shift);
-                prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
-
-                samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
-                riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
-            }
-        } else if (order <= 8) {
-            for (i = 0; i < 4; i += 1) {
-                prediction128 =                              _mm_mullo_epi32(coefficients128_4, samples128_4);
-                prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
-
-                /* Horizontal add and shift. */
-                prediction128 = drflac__mm_hadd_epi32(prediction128);
-                prediction128 = _mm_srai_epi32(prediction128, shift);
-                prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
-
-                samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
-                samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
-                riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
-            }
-        } else {
-            for (i = 0; i < 4; i += 1) {
-                prediction128 =                              _mm_mullo_epi32(coefficients128_8, samples128_8);
-                prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_4, samples128_4));
-                prediction128 = _mm_add_epi32(prediction128, _mm_mullo_epi32(coefficients128_0, samples128_0));
-
-                /* Horizontal add and shift. */
-                prediction128 = drflac__mm_hadd_epi32(prediction128);
-                prediction128 = _mm_srai_epi32(prediction128, shift);
-                prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
-
-                samples128_8 = _mm_alignr_epi8(samples128_4,  samples128_8, 4);
-                samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
-                samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
-                riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
-            }
-        }
-
-        /* We store samples in groups of 4. */
-        _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
-        pDecodedSamples += 4;
-    }
-
-    /* Make sure we process the last few samples. */
-    i = (count & ~3);
-    while (i < (int)count) {
-        /* Rice extraction. */
-        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
-            return DRFLAC_FALSE;
-        }
-
-        /* Rice reconstruction. */
-        riceParamParts0 &= riceParamMask;
-        riceParamParts0 |= (zeroCountParts0 << riceParam);
-        riceParamParts0  = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
-
-        /* Sample reconstruction. */
-        pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
-
-        i += 1;
-        pDecodedSamples += 1;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-    int i;
-    drflac_uint32 riceParamMask;
-    drflac_int32* pDecodedSamples    = pSamplesOut;
-    drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
-    drflac_uint32 zeroCountParts0 = 0;
-    drflac_uint32 zeroCountParts1 = 0;
-    drflac_uint32 zeroCountParts2 = 0;
-    drflac_uint32 zeroCountParts3 = 0;
-    drflac_uint32 riceParamParts0 = 0;
-    drflac_uint32 riceParamParts1 = 0;
-    drflac_uint32 riceParamParts2 = 0;
-    drflac_uint32 riceParamParts3 = 0;
-    __m128i coefficients128_0;
-    __m128i coefficients128_4;
-    __m128i coefficients128_8;
-    __m128i samples128_0;
-    __m128i samples128_4;
-    __m128i samples128_8;
-    __m128i prediction128;
-    __m128i riceParamMask128;
-
-    const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
-
-    DRFLAC_ASSERT(order <= 12);
-
-    riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
-    riceParamMask128 = _mm_set1_epi32(riceParamMask);
-
-    prediction128 = _mm_setzero_si128();
-
-    /* Pre-load. */
-    coefficients128_0  = _mm_setzero_si128();
-    coefficients128_4  = _mm_setzero_si128();
-    coefficients128_8  = _mm_setzero_si128();
-
-    samples128_0  = _mm_setzero_si128();
-    samples128_4  = _mm_setzero_si128();
-    samples128_8  = _mm_setzero_si128();
-
-#if 1
-    {
-        int runningOrder = order;
-
-        /* 0 - 3. */
-        if (runningOrder >= 4) {
-            coefficients128_0 = _mm_loadu_si128((const __m128i*)(coefficients + 0));
-            samples128_0      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 4));
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: coefficients128_0 = _mm_set_epi32(0, coefficients[2], coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], pSamplesOut[-3], 0); break;
-                case 2: coefficients128_0 = _mm_set_epi32(0, 0,               coefficients[1], coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], pSamplesOut[-2], 0,               0); break;
-                case 1: coefficients128_0 = _mm_set_epi32(0, 0,               0,               coefficients[0]); samples128_0 = _mm_set_epi32(pSamplesOut[-1], 0,               0,               0); break;
-            }
-            runningOrder = 0;
-        }
-
-        /* 4 - 7 */
-        if (runningOrder >= 4) {
-            coefficients128_4 = _mm_loadu_si128((const __m128i*)(coefficients + 4));
-            samples128_4      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 8));
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: coefficients128_4 = _mm_set_epi32(0, coefficients[6], coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], pSamplesOut[-7], 0); break;
-                case 2: coefficients128_4 = _mm_set_epi32(0, 0,               coefficients[5], coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], pSamplesOut[-6], 0,               0); break;
-                case 1: coefficients128_4 = _mm_set_epi32(0, 0,               0,               coefficients[4]); samples128_4 = _mm_set_epi32(pSamplesOut[-5], 0,               0,               0); break;
-            }
-            runningOrder = 0;
-        }
-
-        /* 8 - 11 */
-        if (runningOrder == 4) {
-            coefficients128_8 = _mm_loadu_si128((const __m128i*)(coefficients + 8));
-            samples128_8      = _mm_loadu_si128((const __m128i*)(pSamplesOut  - 12));
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: coefficients128_8 = _mm_set_epi32(0, coefficients[10], coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], pSamplesOut[-11], 0); break;
-                case 2: coefficients128_8 = _mm_set_epi32(0, 0,                coefficients[9], coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], pSamplesOut[-10], 0,                0); break;
-                case 1: coefficients128_8 = _mm_set_epi32(0, 0,                0,               coefficients[8]); samples128_8 = _mm_set_epi32(pSamplesOut[-9], 0,                0,                0); break;
-            }
-            runningOrder = 0;
-        }
-
-        /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
-        coefficients128_0 = _mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(0, 1, 2, 3));
-        coefficients128_4 = _mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(0, 1, 2, 3));
-        coefficients128_8 = _mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(0, 1, 2, 3));
-    }
-#else
-    switch (order)
-    {
-    case 12: ((drflac_int32*)&coefficients128_8)[0] = coefficients[11]; ((drflac_int32*)&samples128_8)[0] = pDecodedSamples[-12];
-    case 11: ((drflac_int32*)&coefficients128_8)[1] = coefficients[10]; ((drflac_int32*)&samples128_8)[1] = pDecodedSamples[-11];
-    case 10: ((drflac_int32*)&coefficients128_8)[2] = coefficients[ 9]; ((drflac_int32*)&samples128_8)[2] = pDecodedSamples[-10];
-    case 9:  ((drflac_int32*)&coefficients128_8)[3] = coefficients[ 8]; ((drflac_int32*)&samples128_8)[3] = pDecodedSamples[- 9];
-    case 8:  ((drflac_int32*)&coefficients128_4)[0] = coefficients[ 7]; ((drflac_int32*)&samples128_4)[0] = pDecodedSamples[- 8];
-    case 7:  ((drflac_int32*)&coefficients128_4)[1] = coefficients[ 6]; ((drflac_int32*)&samples128_4)[1] = pDecodedSamples[- 7];
-    case 6:  ((drflac_int32*)&coefficients128_4)[2] = coefficients[ 5]; ((drflac_int32*)&samples128_4)[2] = pDecodedSamples[- 6];
-    case 5:  ((drflac_int32*)&coefficients128_4)[3] = coefficients[ 4]; ((drflac_int32*)&samples128_4)[3] = pDecodedSamples[- 5];
-    case 4:  ((drflac_int32*)&coefficients128_0)[0] = coefficients[ 3]; ((drflac_int32*)&samples128_0)[0] = pDecodedSamples[- 4];
-    case 3:  ((drflac_int32*)&coefficients128_0)[1] = coefficients[ 2]; ((drflac_int32*)&samples128_0)[1] = pDecodedSamples[- 3];
-    case 2:  ((drflac_int32*)&coefficients128_0)[2] = coefficients[ 1]; ((drflac_int32*)&samples128_0)[2] = pDecodedSamples[- 2];
-    case 1:  ((drflac_int32*)&coefficients128_0)[3] = coefficients[ 0]; ((drflac_int32*)&samples128_0)[3] = pDecodedSamples[- 1];
-    }
-#endif
-
-    /* For this version we are doing one sample at a time. */
-    while (pDecodedSamples < pDecodedSamplesEnd) {
-        __m128i zeroCountPart128;
-        __m128i riceParamPart128;
-
-        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts1, &riceParamParts1) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts2, &riceParamParts2) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts3, &riceParamParts3)) {
-            return DRFLAC_FALSE;
-        }
-
-        zeroCountPart128 = _mm_set_epi32(zeroCountParts3, zeroCountParts2, zeroCountParts1, zeroCountParts0);
-        riceParamPart128 = _mm_set_epi32(riceParamParts3, riceParamParts2, riceParamParts1, riceParamParts0);
-
-        riceParamPart128 = _mm_and_si128(riceParamPart128, riceParamMask128);
-        riceParamPart128 = _mm_or_si128(riceParamPart128, _mm_slli_epi32(zeroCountPart128, riceParam));
-        riceParamPart128 = _mm_xor_si128(_mm_srli_epi32(riceParamPart128, 1), _mm_add_epi32(drflac__mm_not_si128(_mm_and_si128(riceParamPart128, _mm_set1_epi32(1))), _mm_set1_epi32(1)));
-
-        for (i = 0; i < 4; i += 1) {
-            prediction128 = _mm_xor_si128(prediction128, prediction128);    /* Reset to 0. */
-
-            switch (order)
-            {
-            case 12:
-            case 11: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(1, 1, 0, 0))));
-            case 10:
-            case  9: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_8, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_8, _MM_SHUFFLE(3, 3, 2, 2))));
-            case  8:
-            case  7: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(1, 1, 0, 0))));
-            case  6:
-            case  5: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_4, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_4, _MM_SHUFFLE(3, 3, 2, 2))));
-            case  4:
-            case  3: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(1, 1, 0, 0)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(1, 1, 0, 0))));
-            case  2:
-            case  1: prediction128 = _mm_add_epi64(prediction128, _mm_mul_epi32(_mm_shuffle_epi32(coefficients128_0, _MM_SHUFFLE(3, 3, 2, 2)), _mm_shuffle_epi32(samples128_0, _MM_SHUFFLE(3, 3, 2, 2))));
-            }
-
-            /* Horizontal add and shift. */
-            prediction128 = drflac__mm_hadd_epi64(prediction128);
-            prediction128 = drflac__mm_srai_epi64(prediction128, shift);
-            prediction128 = _mm_add_epi32(riceParamPart128, prediction128);
-
-            /* Our value should be sitting in prediction128[0]. We need to combine this with our SSE samples. */
-            samples128_8 = _mm_alignr_epi8(samples128_4,  samples128_8, 4);
-            samples128_4 = _mm_alignr_epi8(samples128_0,  samples128_4, 4);
-            samples128_0 = _mm_alignr_epi8(prediction128, samples128_0, 4);
-
-            /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
-            riceParamPart128 = _mm_alignr_epi8(_mm_setzero_si128(), riceParamPart128, 4);
-        }
-
-        /* We store samples in groups of 4. */
-        _mm_storeu_si128((__m128i*)pDecodedSamples, samples128_0);
-        pDecodedSamples += 4;
-    }
-
-    /* Make sure we process the last few samples. */
-    i = (count & ~3);
-    while (i < (int)count) {
-        /* Rice extraction. */
-        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts0, &riceParamParts0)) {
-            return DRFLAC_FALSE;
-        }
-
-        /* Rice reconstruction. */
-        riceParamParts0 &= riceParamMask;
-        riceParamParts0 |= (zeroCountParts0 << riceParam);
-        riceParamParts0  = (riceParamParts0 >> 1) ^ t[riceParamParts0 & 0x01];
-
-        /* Sample reconstruction. */
-        pDecodedSamples[0] = riceParamParts0 + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
-
-        i += 1;
-        pDecodedSamples += 1;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__decode_samples_with_residual__rice__sse41(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pSamplesOut != NULL);
-
-    /* In my testing the order is rarely > 12, so in this case I'm going to simplify the SSE implementation by only handling order <= 12. */
-    if (lpcOrder > 0 && lpcOrder <= 12) {
-        if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
-            return drflac__decode_samples_with_residual__rice__sse41_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
-        } else {
-            return drflac__decode_samples_with_residual__rice__sse41_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
-        }
-    } else {
-        return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac__vst2q_s32(drflac_int32* p, int32x4x2_t x)
-{
-    vst1q_s32(p+0, x.val[0]);
-    vst1q_s32(p+4, x.val[1]);
-}
-
-static DRFLAC_INLINE void drflac__vst2q_u32(drflac_uint32* p, uint32x4x2_t x)
-{
-    vst1q_u32(p+0, x.val[0]);
-    vst1q_u32(p+4, x.val[1]);
-}
-
-static DRFLAC_INLINE void drflac__vst2q_f32(float* p, float32x4x2_t x)
-{
-    vst1q_f32(p+0, x.val[0]);
-    vst1q_f32(p+4, x.val[1]);
-}
-
-static DRFLAC_INLINE void drflac__vst2q_s16(drflac_int16* p, int16x4x2_t x)
-{
-    vst1q_s16(p, vcombine_s16(x.val[0], x.val[1]));
-}
-
-static DRFLAC_INLINE void drflac__vst2q_u16(drflac_uint16* p, uint16x4x2_t x)
-{
-    vst1q_u16(p, vcombine_u16(x.val[0], x.val[1]));
-}
-
-static DRFLAC_INLINE int32x4_t drflac__vdupq_n_s32x4(drflac_int32 x3, drflac_int32 x2, drflac_int32 x1, drflac_int32 x0)
-{
-    drflac_int32 x[4];
-    x[3] = x3;
-    x[2] = x2;
-    x[1] = x1;
-    x[0] = x0;
-    return vld1q_s32(x);
-}
-
-static DRFLAC_INLINE int32x4_t drflac__valignrq_s32_1(int32x4_t a, int32x4_t b)
-{
-    /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
-
-    /* Reference */
-    /*return drflac__vdupq_n_s32x4(
-        vgetq_lane_s32(a, 0),
-        vgetq_lane_s32(b, 3),
-        vgetq_lane_s32(b, 2),
-        vgetq_lane_s32(b, 1)
-    );*/
-
-    return vextq_s32(b, a, 1);
-}
-
-static DRFLAC_INLINE uint32x4_t drflac__valignrq_u32_1(uint32x4_t a, uint32x4_t b)
-{
-    /* Equivalent to SSE's _mm_alignr_epi8(a, b, 4) */
-
-    /* Reference */
-    /*return drflac__vdupq_n_s32x4(
-        vgetq_lane_s32(a, 0),
-        vgetq_lane_s32(b, 3),
-        vgetq_lane_s32(b, 2),
-        vgetq_lane_s32(b, 1)
-    );*/
-
-    return vextq_u32(b, a, 1);
-}
-
-static DRFLAC_INLINE int32x2_t drflac__vhaddq_s32(int32x4_t x)
-{
-    /* The sum must end up in position 0. */
-
-    /* Reference */
-    /*return vdupq_n_s32(
-        vgetq_lane_s32(x, 3) +
-        vgetq_lane_s32(x, 2) +
-        vgetq_lane_s32(x, 1) +
-        vgetq_lane_s32(x, 0)
-    );*/
-
-    int32x2_t r = vadd_s32(vget_high_s32(x), vget_low_s32(x));
-    return vpadd_s32(r, r);
-}
-
-static DRFLAC_INLINE int64x1_t drflac__vhaddq_s64(int64x2_t x)
-{
-    return vadd_s64(vget_high_s64(x), vget_low_s64(x));
-}
-
-static DRFLAC_INLINE int32x4_t drflac__vrevq_s32(int32x4_t x)
-{
-    /* Reference */
-    /*return drflac__vdupq_n_s32x4(
-        vgetq_lane_s32(x, 0),
-        vgetq_lane_s32(x, 1),
-        vgetq_lane_s32(x, 2),
-        vgetq_lane_s32(x, 3)
-    );*/
-
-    return vrev64q_s32(vcombine_s32(vget_high_s32(x), vget_low_s32(x)));
-}
-
-static DRFLAC_INLINE int32x4_t drflac__vnotq_s32(int32x4_t x)
-{
-    return veorq_s32(x, vdupq_n_s32(0xFFFFFFFF));
-}
-
-static DRFLAC_INLINE uint32x4_t drflac__vnotq_u32(uint32x4_t x)
-{
-    return veorq_u32(x, vdupq_n_u32(0xFFFFFFFF));
-}
-
-static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_32(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-    int i;
-    drflac_uint32 riceParamMask;
-    drflac_int32* pDecodedSamples    = pSamplesOut;
-    drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
-    drflac_uint32 zeroCountParts[4];
-    drflac_uint32 riceParamParts[4];
-    int32x4_t coefficients128_0;
-    int32x4_t coefficients128_4;
-    int32x4_t coefficients128_8;
-    int32x4_t samples128_0;
-    int32x4_t samples128_4;
-    int32x4_t samples128_8;
-    uint32x4_t riceParamMask128;
-    int32x4_t riceParam128;
-    int32x2_t shift64;
-    uint32x4_t one128;
-
-    const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
-
-    riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
-    riceParamMask128 = vdupq_n_u32(riceParamMask);
-
-    riceParam128 = vdupq_n_s32(riceParam);
-    shift64 = vdup_n_s32(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
-    one128 = vdupq_n_u32(1);
-
-    /*
-    Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
-    what's available in the input buffers. It would be conenient to use a fall-through switch to do this, but this results
-    in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
-    so I think there's opportunity for this to be simplified.
-    */
-    {
-        int runningOrder = order;
-        drflac_int32 tempC[4] = {0, 0, 0, 0};
-        drflac_int32 tempS[4] = {0, 0, 0, 0};
-
-        /* 0 - 3. */
-        if (runningOrder >= 4) {
-            coefficients128_0 = vld1q_s32(coefficients + 0);
-            samples128_0      = vld1q_s32(pSamplesOut  - 4);
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
-                case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
-                case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
-            }
-
-            coefficients128_0 = vld1q_s32(tempC);
-            samples128_0      = vld1q_s32(tempS);
-            runningOrder = 0;
-        }
-
-        /* 4 - 7 */
-        if (runningOrder >= 4) {
-            coefficients128_4 = vld1q_s32(coefficients + 4);
-            samples128_4      = vld1q_s32(pSamplesOut  - 8);
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
-                case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
-                case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
-            }
-
-            coefficients128_4 = vld1q_s32(tempC);
-            samples128_4      = vld1q_s32(tempS);
-            runningOrder = 0;
-        }
-
-        /* 8 - 11 */
-        if (runningOrder == 4) {
-            coefficients128_8 = vld1q_s32(coefficients + 8);
-            samples128_8      = vld1q_s32(pSamplesOut  - 12);
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
-                case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
-                case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
-            }
-
-            coefficients128_8 = vld1q_s32(tempC);
-            samples128_8      = vld1q_s32(tempS);
-            runningOrder = 0;
-        }
-
-        /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
-        coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
-        coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
-        coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
-    }
-
-    /* For this version we are doing one sample at a time. */
-    while (pDecodedSamples < pDecodedSamplesEnd) {
-        int32x4_t prediction128;
-        int32x2_t prediction64;
-        uint32x4_t zeroCountPart128;
-        uint32x4_t riceParamPart128;
-
-        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
-            return DRFLAC_FALSE;
-        }
-
-        zeroCountPart128 = vld1q_u32(zeroCountParts);
-        riceParamPart128 = vld1q_u32(riceParamParts);
-
-        riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
-        riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
-        riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
-
-        if (order <= 4) {
-            for (i = 0; i < 4; i += 1) {
-                prediction128 = vmulq_s32(coefficients128_0, samples128_0);
-
-                /* Horizontal add and shift. */
-                prediction64 = drflac__vhaddq_s32(prediction128);
-                prediction64 = vshl_s32(prediction64, shift64);
-                prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
-
-                samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
-                riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
-            }
-        } else if (order <= 8) {
-            for (i = 0; i < 4; i += 1) {
-                prediction128 =                vmulq_s32(coefficients128_4, samples128_4);
-                prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
-
-                /* Horizontal add and shift. */
-                prediction64 = drflac__vhaddq_s32(prediction128);
-                prediction64 = vshl_s32(prediction64, shift64);
-                prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
-
-                samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
-                samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
-                riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
-            }
-        } else {
-            for (i = 0; i < 4; i += 1) {
-                prediction128 =                vmulq_s32(coefficients128_8, samples128_8);
-                prediction128 = vmlaq_s32(prediction128, coefficients128_4, samples128_4);
-                prediction128 = vmlaq_s32(prediction128, coefficients128_0, samples128_0);
-
-                /* Horizontal add and shift. */
-                prediction64 = drflac__vhaddq_s32(prediction128);
-                prediction64 = vshl_s32(prediction64, shift64);
-                prediction64 = vadd_s32(prediction64, vget_low_s32(vreinterpretq_s32_u32(riceParamPart128)));
-
-                samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
-                samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
-                samples128_0 = drflac__valignrq_s32_1(vcombine_s32(prediction64, vdup_n_s32(0)), samples128_0);
-                riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
-            }
-        }
-
-        /* We store samples in groups of 4. */
-        vst1q_s32(pDecodedSamples, samples128_0);
-        pDecodedSamples += 4;
-    }
-
-    /* Make sure we process the last few samples. */
-    i = (count & ~3);
-    while (i < (int)count) {
-        /* Rice extraction. */
-        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
-            return DRFLAC_FALSE;
-        }
-
-        /* Rice reconstruction. */
-        riceParamParts[0] &= riceParamMask;
-        riceParamParts[0] |= (zeroCountParts[0] << riceParam);
-        riceParamParts[0]  = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
-
-        /* Sample reconstruction. */
-        pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_32(order, shift, coefficients, pDecodedSamples);
-
-        i += 1;
-        pDecodedSamples += 1;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__decode_samples_with_residual__rice__neon_64(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 order, drflac_int32 shift, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-    int i;
-    drflac_uint32 riceParamMask;
-    drflac_int32* pDecodedSamples    = pSamplesOut;
-    drflac_int32* pDecodedSamplesEnd = pSamplesOut + (count & ~3);
-    drflac_uint32 zeroCountParts[4];
-    drflac_uint32 riceParamParts[4];
-    int32x4_t coefficients128_0;
-    int32x4_t coefficients128_4;
-    int32x4_t coefficients128_8;
-    int32x4_t samples128_0;
-    int32x4_t samples128_4;
-    int32x4_t samples128_8;
-    uint32x4_t riceParamMask128;
-    int32x4_t riceParam128;
-    int64x1_t shift64;
-    uint32x4_t one128;
-    int64x2_t prediction128 = { 0 };
-    uint32x4_t zeroCountPart128;
-    uint32x4_t riceParamPart128;
-
-    const drflac_uint32 t[2] = {0x00000000, 0xFFFFFFFF};
-
-    riceParamMask    = (drflac_uint32)~((~0UL) << riceParam);
-    riceParamMask128 = vdupq_n_u32(riceParamMask);
-
-    riceParam128 = vdupq_n_s32(riceParam);
-    shift64 = vdup_n_s64(-shift); /* Negate the shift because we'll be doing a variable shift using vshlq_s32(). */
-    one128 = vdupq_n_u32(1);
-
-    /*
-    Pre-loading the coefficients and prior samples is annoying because we need to ensure we don't try reading more than
-    what's available in the input buffers. It would be convenient to use a fall-through switch to do this, but this results
-    in strict aliasing warnings with GCC. To work around this I'm just doing something hacky. This feels a bit convoluted
-    so I think there's opportunity for this to be simplified.
-    */
-    {
-        int runningOrder = order;
-        drflac_int32 tempC[4] = {0, 0, 0, 0};
-        drflac_int32 tempS[4] = {0, 0, 0, 0};
-
-        /* 0 - 3. */
-        if (runningOrder >= 4) {
-            coefficients128_0 = vld1q_s32(coefficients + 0);
-            samples128_0      = vld1q_s32(pSamplesOut  - 4);
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: tempC[2] = coefficients[2]; tempS[1] = pSamplesOut[-3]; /* fallthrough */
-                case 2: tempC[1] = coefficients[1]; tempS[2] = pSamplesOut[-2]; /* fallthrough */
-                case 1: tempC[0] = coefficients[0]; tempS[3] = pSamplesOut[-1]; /* fallthrough */
-            }
-
-            coefficients128_0 = vld1q_s32(tempC);
-            samples128_0      = vld1q_s32(tempS);
-            runningOrder = 0;
-        }
-
-        /* 4 - 7 */
-        if (runningOrder >= 4) {
-            coefficients128_4 = vld1q_s32(coefficients + 4);
-            samples128_4      = vld1q_s32(pSamplesOut  - 8);
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: tempC[2] = coefficients[6]; tempS[1] = pSamplesOut[-7]; /* fallthrough */
-                case 2: tempC[1] = coefficients[5]; tempS[2] = pSamplesOut[-6]; /* fallthrough */
-                case 1: tempC[0] = coefficients[4]; tempS[3] = pSamplesOut[-5]; /* fallthrough */
-            }
-
-            coefficients128_4 = vld1q_s32(tempC);
-            samples128_4      = vld1q_s32(tempS);
-            runningOrder = 0;
-        }
-
-        /* 8 - 11 */
-        if (runningOrder == 4) {
-            coefficients128_8 = vld1q_s32(coefficients + 8);
-            samples128_8      = vld1q_s32(pSamplesOut  - 12);
-            runningOrder -= 4;
-        } else {
-            switch (runningOrder) {
-                case 3: tempC[2] = coefficients[10]; tempS[1] = pSamplesOut[-11]; /* fallthrough */
-                case 2: tempC[1] = coefficients[ 9]; tempS[2] = pSamplesOut[-10]; /* fallthrough */
-                case 1: tempC[0] = coefficients[ 8]; tempS[3] = pSamplesOut[- 9]; /* fallthrough */
-            }
-
-            coefficients128_8 = vld1q_s32(tempC);
-            samples128_8      = vld1q_s32(tempS);
-            runningOrder = 0;
-        }
-
-        /* Coefficients need to be shuffled for our streaming algorithm below to work. Samples are already in the correct order from the loading routine above. */
-        coefficients128_0 = drflac__vrevq_s32(coefficients128_0);
-        coefficients128_4 = drflac__vrevq_s32(coefficients128_4);
-        coefficients128_8 = drflac__vrevq_s32(coefficients128_8);
-    }
-
-    /* For this version we are doing one sample at a time. */
-    while (pDecodedSamples < pDecodedSamplesEnd) {
-        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0]) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[1], &riceParamParts[1]) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[2], &riceParamParts[2]) ||
-            !drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[3], &riceParamParts[3])) {
-            return DRFLAC_FALSE;
-        }
-
-        zeroCountPart128 = vld1q_u32(zeroCountParts);
-        riceParamPart128 = vld1q_u32(riceParamParts);
-
-        riceParamPart128 = vandq_u32(riceParamPart128, riceParamMask128);
-        riceParamPart128 = vorrq_u32(riceParamPart128, vshlq_u32(zeroCountPart128, riceParam128));
-        riceParamPart128 = veorq_u32(vshrq_n_u32(riceParamPart128, 1), vaddq_u32(drflac__vnotq_u32(vandq_u32(riceParamPart128, one128)), one128));
-
-        for (i = 0; i < 4; i += 1) {
-            int64x1_t prediction64;
-
-            prediction128 = veorq_s64(prediction128, prediction128);    /* Reset to 0. */
-            switch (order)
-            {
-            case 12:
-            case 11: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_8), vget_low_s32(samples128_8)));
-            case 10:
-            case  9: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_8), vget_high_s32(samples128_8)));
-            case  8:
-            case  7: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_4), vget_low_s32(samples128_4)));
-            case  6:
-            case  5: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_4), vget_high_s32(samples128_4)));
-            case  4:
-            case  3: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_low_s32(coefficients128_0), vget_low_s32(samples128_0)));
-            case  2:
-            case  1: prediction128 = vaddq_s64(prediction128, vmull_s32(vget_high_s32(coefficients128_0), vget_high_s32(samples128_0)));
-            }
-
-            /* Horizontal add and shift. */
-            prediction64 = drflac__vhaddq_s64(prediction128);
-            prediction64 = vshl_s64(prediction64, shift64);
-            prediction64 = vadd_s64(prediction64, vdup_n_s64(vgetq_lane_u32(riceParamPart128, 0)));
-
-            /* Our value should be sitting in prediction64[0]. We need to combine this with our SSE samples. */
-            samples128_8 = drflac__valignrq_s32_1(samples128_4, samples128_8);
-            samples128_4 = drflac__valignrq_s32_1(samples128_0, samples128_4);
-            samples128_0 = drflac__valignrq_s32_1(vcombine_s32(vreinterpret_s32_s64(prediction64), vdup_n_s32(0)), samples128_0);
-
-            /* Slide our rice parameter down so that the value in position 0 contains the next one to process. */
-            riceParamPart128 = drflac__valignrq_u32_1(vdupq_n_u32(0), riceParamPart128);
-        }
-
-        /* We store samples in groups of 4. */
-        vst1q_s32(pDecodedSamples, samples128_0);
-        pDecodedSamples += 4;
-    }
-
-    /* Make sure we process the last few samples. */
-    i = (count & ~3);
-    while (i < (int)count) {
-        /* Rice extraction. */
-        if (!drflac__read_rice_parts_x1(bs, riceParam, &zeroCountParts[0], &riceParamParts[0])) {
-            return DRFLAC_FALSE;
-        }
-
-        /* Rice reconstruction. */
-        riceParamParts[0] &= riceParamMask;
-        riceParamParts[0] |= (zeroCountParts[0] << riceParam);
-        riceParamParts[0]  = (riceParamParts[0] >> 1) ^ t[riceParamParts[0] & 0x01];
-
-        /* Sample reconstruction. */
-        pDecodedSamples[0] = riceParamParts[0] + drflac__calculate_prediction_64(order, shift, coefficients, pDecodedSamples);
-
-        i += 1;
-        pDecodedSamples += 1;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__decode_samples_with_residual__rice__neon(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(pSamplesOut != NULL);
-
-    /* In my testing the order is rarely > 12, so in this case I'm going to simplify the NEON implementation by only handling order <= 12. */
-    if (lpcOrder > 0 && lpcOrder <= 12) {
-        if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
-            return drflac__decode_samples_with_residual__rice__neon_64(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
-        } else {
-            return drflac__decode_samples_with_residual__rice__neon_32(bs, count, riceParam, lpcOrder, lpcShift, coefficients, pSamplesOut);
-        }
-    } else {
-        return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
-    }
-}
-#endif
-
-static drflac_bool32 drflac__decode_samples_with_residual__rice(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 riceParam, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-#if defined(DRFLAC_SUPPORT_SSE41)
-    if (drflac__gIsSSE41Supported) {
-        return drflac__decode_samples_with_residual__rice__sse41(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported) {
-        return drflac__decode_samples_with_residual__rice__neon(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-    #if 0
-        return drflac__decode_samples_with_residual__rice__reference(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
-    #else
-        return drflac__decode_samples_with_residual__rice__scalar(bs, bitsPerSample, count, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pSamplesOut);
-    #endif
-    }
-}
-
-/* Reads and seeks past a string of residual values as Rice codes. The decoder should be sitting on the first bit of the Rice codes. */
-static drflac_bool32 drflac__read_and_seek_residual__rice(drflac_bs* bs, drflac_uint32 count, drflac_uint8 riceParam)
-{
-    drflac_uint32 i;
-
-    DRFLAC_ASSERT(bs != NULL);
-
-    for (i = 0; i < count; ++i) {
-        if (!drflac__seek_rice_parts(bs, riceParam)) {
-            return DRFLAC_FALSE;
-        }
-    }
-
-    return DRFLAC_TRUE;
-}
-
-#if defined(__clang__)
-__attribute__((no_sanitize("signed-integer-overflow")))
-#endif
-static drflac_bool32 drflac__decode_samples_with_residual__unencoded(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 count, drflac_uint8 unencodedBitsPerSample, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pSamplesOut)
-{
-    drflac_uint32 i;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(unencodedBitsPerSample <= 31);    /* <-- unencodedBitsPerSample is a 5 bit number, so cannot exceed 31. */
-    DRFLAC_ASSERT(pSamplesOut != NULL);
-
-    for (i = 0; i < count; ++i) {
-        if (unencodedBitsPerSample > 0) {
-            if (!drflac__read_int32(bs, unencodedBitsPerSample, pSamplesOut + i)) {
-                return DRFLAC_FALSE;
-            }
-        } else {
-            pSamplesOut[i] = 0;
-        }
-
-        if (drflac__use_64_bit_prediction(bitsPerSample, lpcOrder, lpcPrecision)) {
-            pSamplesOut[i] += drflac__calculate_prediction_64(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
-        } else {
-            pSamplesOut[i] += drflac__calculate_prediction_32(lpcOrder, lpcShift, coefficients, pSamplesOut + i);
-        }
-    }
-
-    return DRFLAC_TRUE;
-}
-
-
-/*
-Reads and decodes the residual for the sub-frame the decoder is currently sitting on. This function should be called
-when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be ignored. The
-<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
-*/
-static drflac_bool32 drflac__decode_samples_with_residual(drflac_bs* bs, drflac_uint32 bitsPerSample, drflac_uint32 blockSize, drflac_uint32 lpcOrder, drflac_int32 lpcShift, drflac_uint32 lpcPrecision, const drflac_int32* coefficients, drflac_int32* pDecodedSamples)
-{
-    drflac_uint8 residualMethod;
-    drflac_uint8 partitionOrder;
-    drflac_uint32 samplesInPartition;
-    drflac_uint32 partitionsRemaining;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(blockSize != 0);
-    DRFLAC_ASSERT(pDecodedSamples != NULL);       /* <-- Should we allow NULL, in which case we just seek past the residual rather than do a full decode? */
-
-    if (!drflac__read_uint8(bs, 2, &residualMethod)) {
-        return DRFLAC_FALSE;
-    }
-
-    if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
-        return DRFLAC_FALSE;    /* Unknown or unsupported residual coding method. */
-    }
-
-    /* Ignore the first <order> values. */
-    pDecodedSamples += lpcOrder;
-
-    if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
-        return DRFLAC_FALSE;
-    }
-
-    /*
-    From the FLAC spec:
-      The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
-    */
-    if (partitionOrder > 8) {
-        return DRFLAC_FALSE;
-    }
-
-    /* Validation check. */
-    if ((blockSize / (1 << partitionOrder)) < lpcOrder) {
-        return DRFLAC_FALSE;
-    }
-
-    samplesInPartition = (blockSize / (1 << partitionOrder)) - lpcOrder;
-    partitionsRemaining = (1 << partitionOrder);
-    for (;;) {
-        drflac_uint8 riceParam = 0;
-        if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
-            if (!drflac__read_uint8(bs, 4, &riceParam)) {
-                return DRFLAC_FALSE;
-            }
-            if (riceParam == 15) {
-                riceParam = 0xFF;
-            }
-        } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
-            if (!drflac__read_uint8(bs, 5, &riceParam)) {
-                return DRFLAC_FALSE;
-            }
-            if (riceParam == 31) {
-                riceParam = 0xFF;
-            }
-        }
-
-        if (riceParam != 0xFF) {
-            if (!drflac__decode_samples_with_residual__rice(bs, bitsPerSample, samplesInPartition, riceParam, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
-                return DRFLAC_FALSE;
-            }
-        } else {
-            drflac_uint8 unencodedBitsPerSample = 0;
-            if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
-                return DRFLAC_FALSE;
-            }
-
-            if (!drflac__decode_samples_with_residual__unencoded(bs, bitsPerSample, samplesInPartition, unencodedBitsPerSample, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
-                return DRFLAC_FALSE;
-            }
-        }
-
-        pDecodedSamples += samplesInPartition;
-
-        if (partitionsRemaining == 1) {
-            break;
-        }
-
-        partitionsRemaining -= 1;
-
-        if (partitionOrder != 0) {
-            samplesInPartition = blockSize / (1 << partitionOrder);
-        }
-    }
-
-    return DRFLAC_TRUE;
-}
-
-/*
-Reads and seeks past the residual for the sub-frame the decoder is currently sitting on. This function should be called
-when the decoder is sitting at the very start of the RESIDUAL block. The first <order> residuals will be set to 0. The
-<blockSize> and <order> parameters are used to determine how many residual values need to be decoded.
-*/
-static drflac_bool32 drflac__read_and_seek_residual(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 order)
-{
-    drflac_uint8 residualMethod;
-    drflac_uint8 partitionOrder;
-    drflac_uint32 samplesInPartition;
-    drflac_uint32 partitionsRemaining;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(blockSize != 0);
-
-    if (!drflac__read_uint8(bs, 2, &residualMethod)) {
-        return DRFLAC_FALSE;
-    }
-
-    if (residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE && residualMethod != DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
-        return DRFLAC_FALSE;    /* Unknown or unsupported residual coding method. */
-    }
-
-    if (!drflac__read_uint8(bs, 4, &partitionOrder)) {
-        return DRFLAC_FALSE;
-    }
-
-    /*
-    From the FLAC spec:
-      The Rice partition order in a Rice-coded residual section must be less than or equal to 8.
-    */
-    if (partitionOrder > 8) {
-        return DRFLAC_FALSE;
-    }
-
-    /* Validation check. */
-    if ((blockSize / (1 << partitionOrder)) <= order) {
-        return DRFLAC_FALSE;
-    }
-
-    samplesInPartition = (blockSize / (1 << partitionOrder)) - order;
-    partitionsRemaining = (1 << partitionOrder);
-    for (;;)
-    {
-        drflac_uint8 riceParam = 0;
-        if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE) {
-            if (!drflac__read_uint8(bs, 4, &riceParam)) {
-                return DRFLAC_FALSE;
-            }
-            if (riceParam == 15) {
-                riceParam = 0xFF;
-            }
-        } else if (residualMethod == DRFLAC_RESIDUAL_CODING_METHOD_PARTITIONED_RICE2) {
-            if (!drflac__read_uint8(bs, 5, &riceParam)) {
-                return DRFLAC_FALSE;
-            }
-            if (riceParam == 31) {
-                riceParam = 0xFF;
-            }
-        }
-
-        if (riceParam != 0xFF) {
-            if (!drflac__read_and_seek_residual__rice(bs, samplesInPartition, riceParam)) {
-                return DRFLAC_FALSE;
-            }
-        } else {
-            drflac_uint8 unencodedBitsPerSample = 0;
-            if (!drflac__read_uint8(bs, 5, &unencodedBitsPerSample)) {
-                return DRFLAC_FALSE;
-            }
-
-            if (!drflac__seek_bits(bs, unencodedBitsPerSample * samplesInPartition)) {
-                return DRFLAC_FALSE;
-            }
-        }
-
-
-        if (partitionsRemaining == 1) {
-            break;
-        }
-
-        partitionsRemaining -= 1;
-        samplesInPartition = blockSize / (1 << partitionOrder);
-    }
-
-    return DRFLAC_TRUE;
-}
-
-
-static drflac_bool32 drflac__decode_samples__constant(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
-{
-    drflac_uint32 i;
-
-    /* Only a single sample needs to be decoded here. */
-    drflac_int32 sample;
-    if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
-        return DRFLAC_FALSE;
-    }
-
-    /*
-    We don't really need to expand this, but it does simplify the process of reading samples. If this becomes a performance issue (unlikely)
-    we'll want to look at a more efficient way.
-    */
-    for (i = 0; i < blockSize; ++i) {
-        pDecodedSamples[i] = sample;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__decode_samples__verbatim(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_int32* pDecodedSamples)
-{
-    drflac_uint32 i;
-
-    for (i = 0; i < blockSize; ++i) {
-        drflac_int32 sample;
-        if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
-            return DRFLAC_FALSE;
-        }
-
-        pDecodedSamples[i] = sample;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__decode_samples__fixed(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 subframeBitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
-{
-    drflac_uint32 i;
-
-    static drflac_int32 lpcCoefficientsTable[5][4] = {
-        {0,  0, 0,  0},
-        {1,  0, 0,  0},
-        {2, -1, 0,  0},
-        {3, -3, 1,  0},
-        {4, -6, 4, -1}
-    };
-
-    /* Warm up samples and coefficients. */
-    for (i = 0; i < lpcOrder; ++i) {
-        drflac_int32 sample;
-        if (!drflac__read_int32(bs, subframeBitsPerSample, &sample)) {
-            return DRFLAC_FALSE;
-        }
-
-        pDecodedSamples[i] = sample;
-    }
-
-    if (!drflac__decode_samples_with_residual(bs, subframeBitsPerSample, blockSize, lpcOrder, 0, 4, lpcCoefficientsTable[lpcOrder], pDecodedSamples)) {
-        return DRFLAC_FALSE;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__decode_samples__lpc(drflac_bs* bs, drflac_uint32 blockSize, drflac_uint32 bitsPerSample, drflac_uint8 lpcOrder, drflac_int32* pDecodedSamples)
-{
-    drflac_uint8 i;
-    drflac_uint8 lpcPrecision;
-    drflac_int8 lpcShift;
-    drflac_int32 coefficients[32];
-
-    /* Warm up samples. */
-    for (i = 0; i < lpcOrder; ++i) {
-        drflac_int32 sample;
-        if (!drflac__read_int32(bs, bitsPerSample, &sample)) {
-            return DRFLAC_FALSE;
-        }
-
-        pDecodedSamples[i] = sample;
-    }
-
-    if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
-        return DRFLAC_FALSE;
-    }
-    if (lpcPrecision == 15) {
-        return DRFLAC_FALSE;    /* Invalid. */
-    }
-    lpcPrecision += 1;
-
-    if (!drflac__read_int8(bs, 5, &lpcShift)) {
-        return DRFLAC_FALSE;
-    }
-
-    /*
-    From the FLAC specification:
-
-        Quantized linear predictor coefficient shift needed in bits (NOTE: this number is signed two's-complement)
-
-    Emphasis on the "signed two's-complement". In practice there does not seem to be any encoders nor decoders supporting negative shifts. For now dr_flac is
-    not going to support negative shifts as I don't have any reference files. However, when a reference file comes through I will consider adding support.
-    */
-    if (lpcShift < 0) {
-        return DRFLAC_FALSE;
-    }
-
-    DRFLAC_ZERO_MEMORY(coefficients, sizeof(coefficients));
-    for (i = 0; i < lpcOrder; ++i) {
-        if (!drflac__read_int32(bs, lpcPrecision, coefficients + i)) {
-            return DRFLAC_FALSE;
-        }
-    }
-
-    if (!drflac__decode_samples_with_residual(bs, bitsPerSample, blockSize, lpcOrder, lpcShift, lpcPrecision, coefficients, pDecodedSamples)) {
-        return DRFLAC_FALSE;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-
-static drflac_bool32 drflac__read_next_flac_frame_header(drflac_bs* bs, drflac_uint8 streaminfoBitsPerSample, drflac_frame_header* header)
-{
-    const drflac_uint32 sampleRateTable[12]  = {0, 88200, 176400, 192000, 8000, 16000, 22050, 24000, 32000, 44100, 48000, 96000};
-    const drflac_uint8 bitsPerSampleTable[8] = {0, 8, 12, (drflac_uint8)-1, 16, 20, 24, (drflac_uint8)-1};   /* -1 = reserved. */
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(header != NULL);
-
-    /* Keep looping until we find a valid sync code. */
-    for (;;) {
-        drflac_uint8 crc8 = 0xCE; /* 0xCE = drflac_crc8(0, 0x3FFE, 14); */
-        drflac_uint8 reserved = 0;
-        drflac_uint8 blockingStrategy = 0;
-        drflac_uint8 blockSize = 0;
-        drflac_uint8 sampleRate = 0;
-        drflac_uint8 channelAssignment = 0;
-        drflac_uint8 bitsPerSample = 0;
-        drflac_bool32 isVariableBlockSize;
-
-        if (!drflac__find_and_seek_to_next_sync_code(bs)) {
-            return DRFLAC_FALSE;
-        }
-
-        if (!drflac__read_uint8(bs, 1, &reserved)) {
-            return DRFLAC_FALSE;
-        }
-        if (reserved == 1) {
-            continue;
-        }
-        crc8 = drflac_crc8(crc8, reserved, 1);
-
-        if (!drflac__read_uint8(bs, 1, &blockingStrategy)) {
-            return DRFLAC_FALSE;
-        }
-        crc8 = drflac_crc8(crc8, blockingStrategy, 1);
-
-        if (!drflac__read_uint8(bs, 4, &blockSize)) {
-            return DRFLAC_FALSE;
-        }
-        if (blockSize == 0) {
-            continue;
-        }
-        crc8 = drflac_crc8(crc8, blockSize, 4);
-
-        if (!drflac__read_uint8(bs, 4, &sampleRate)) {
-            return DRFLAC_FALSE;
-        }
-        crc8 = drflac_crc8(crc8, sampleRate, 4);
-
-        if (!drflac__read_uint8(bs, 4, &channelAssignment)) {
-            return DRFLAC_FALSE;
-        }
-        if (channelAssignment > 10) {
-            continue;
-        }
-        crc8 = drflac_crc8(crc8, channelAssignment, 4);
-
-        if (!drflac__read_uint8(bs, 3, &bitsPerSample)) {
-            return DRFLAC_FALSE;
-        }
-        if (bitsPerSample == 3 || bitsPerSample == 7) {
-            continue;
-        }
-        crc8 = drflac_crc8(crc8, bitsPerSample, 3);
-
-
-        if (!drflac__read_uint8(bs, 1, &reserved)) {
-            return DRFLAC_FALSE;
-        }
-        if (reserved == 1) {
-            continue;
-        }
-        crc8 = drflac_crc8(crc8, reserved, 1);
-
-
-        isVariableBlockSize = blockingStrategy == 1;
-        if (isVariableBlockSize) {
-            drflac_uint64 pcmFrameNumber;
-            drflac_result result = drflac__read_utf8_coded_number(bs, &pcmFrameNumber, &crc8);
-            if (result != DRFLAC_SUCCESS) {
-                if (result == DRFLAC_AT_END) {
-                    return DRFLAC_FALSE;
-                } else {
-                    continue;
-                }
-            }
-            header->flacFrameNumber  = 0;
-            header->pcmFrameNumber = pcmFrameNumber;
-        } else {
-            drflac_uint64 flacFrameNumber = 0;
-            drflac_result result = drflac__read_utf8_coded_number(bs, &flacFrameNumber, &crc8);
-            if (result != DRFLAC_SUCCESS) {
-                if (result == DRFLAC_AT_END) {
-                    return DRFLAC_FALSE;
-                } else {
-                    continue;
-                }
-            }
-            header->flacFrameNumber  = (drflac_uint32)flacFrameNumber;   /* <-- Safe cast. */
-            header->pcmFrameNumber = 0;
-        }
-
-
-        DRFLAC_ASSERT(blockSize > 0);
-        if (blockSize == 1) {
-            header->blockSizeInPCMFrames = 192;
-        } else if (blockSize <= 5) {
-            DRFLAC_ASSERT(blockSize >= 2);
-            header->blockSizeInPCMFrames = 576 * (1 << (blockSize - 2));
-        } else if (blockSize == 6) {
-            if (!drflac__read_uint16(bs, 8, &header->blockSizeInPCMFrames)) {
-                return DRFLAC_FALSE;
-            }
-            crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 8);
-            header->blockSizeInPCMFrames += 1;
-        } else if (blockSize == 7) {
-            if (!drflac__read_uint16(bs, 16, &header->blockSizeInPCMFrames)) {
-                return DRFLAC_FALSE;
-            }
-            crc8 = drflac_crc8(crc8, header->blockSizeInPCMFrames, 16);
-            if (header->blockSizeInPCMFrames == 0xFFFF) {
-                return DRFLAC_FALSE;    /* Frame is too big. This is the size of the frame minus 1. The STREAMINFO block defines the max block size which is 16-bits. Adding one will make it 17 bits and therefore too big. */
-            }
-            header->blockSizeInPCMFrames += 1;
-        } else {
-            DRFLAC_ASSERT(blockSize >= 8);
-            header->blockSizeInPCMFrames = 256 * (1 << (blockSize - 8));
-        }
-
-
-        if (sampleRate <= 11) {
-            header->sampleRate = sampleRateTable[sampleRate];
-        } else if (sampleRate == 12) {
-            if (!drflac__read_uint32(bs, 8, &header->sampleRate)) {
-                return DRFLAC_FALSE;
-            }
-            crc8 = drflac_crc8(crc8, header->sampleRate, 8);
-            header->sampleRate *= 1000;
-        } else if (sampleRate == 13) {
-            if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
-                return DRFLAC_FALSE;
-            }
-            crc8 = drflac_crc8(crc8, header->sampleRate, 16);
-        } else if (sampleRate == 14) {
-            if (!drflac__read_uint32(bs, 16, &header->sampleRate)) {
-                return DRFLAC_FALSE;
-            }
-            crc8 = drflac_crc8(crc8, header->sampleRate, 16);
-            header->sampleRate *= 10;
-        } else {
-            continue;  /* Invalid. Assume an invalid block. */
-        }
-
-
-        header->channelAssignment = channelAssignment;
-
-        header->bitsPerSample = bitsPerSampleTable[bitsPerSample];
-        if (header->bitsPerSample == 0) {
-            header->bitsPerSample = streaminfoBitsPerSample;
-        }
-
-        if (header->bitsPerSample != streaminfoBitsPerSample) {
-            /* If this subframe has a different bitsPerSample then streaminfo or the first frame, reject it */
-            return DRFLAC_FALSE;
-        }
-
-        if (!drflac__read_uint8(bs, 8, &header->crc8)) {
-            return DRFLAC_FALSE;
-        }
-
-#ifndef DR_FLAC_NO_CRC
-        if (header->crc8 != crc8) {
-            continue;    /* CRC mismatch. Loop back to the top and find the next sync code. */
-        }
-#endif
-        return DRFLAC_TRUE;
-    }
-}
-
-static drflac_bool32 drflac__read_subframe_header(drflac_bs* bs, drflac_subframe* pSubframe)
-{
-    drflac_uint8 header;
-    int type;
-
-    if (!drflac__read_uint8(bs, 8, &header)) {
-        return DRFLAC_FALSE;
-    }
-
-    /* First bit should always be 0. */
-    if ((header & 0x80) != 0) {
-        return DRFLAC_FALSE;
-    }
-
-    type = (header & 0x7E) >> 1;
-    if (type == 0) {
-        pSubframe->subframeType = DRFLAC_SUBFRAME_CONSTANT;
-    } else if (type == 1) {
-        pSubframe->subframeType = DRFLAC_SUBFRAME_VERBATIM;
-    } else {
-        if ((type & 0x20) != 0) {
-            pSubframe->subframeType = DRFLAC_SUBFRAME_LPC;
-            pSubframe->lpcOrder = (drflac_uint8)(type & 0x1F) + 1;
-        } else if ((type & 0x08) != 0) {
-            pSubframe->subframeType = DRFLAC_SUBFRAME_FIXED;
-            pSubframe->lpcOrder = (drflac_uint8)(type & 0x07);
-            if (pSubframe->lpcOrder > 4) {
-                pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
-                pSubframe->lpcOrder = 0;
-            }
-        } else {
-            pSubframe->subframeType = DRFLAC_SUBFRAME_RESERVED;
-        }
-    }
-
-    if (pSubframe->subframeType == DRFLAC_SUBFRAME_RESERVED) {
-        return DRFLAC_FALSE;
-    }
-
-    /* Wasted bits per sample. */
-    pSubframe->wastedBitsPerSample = 0;
-    if ((header & 0x01) == 1) {
-        unsigned int wastedBitsPerSample;
-        if (!drflac__seek_past_next_set_bit(bs, &wastedBitsPerSample)) {
-            return DRFLAC_FALSE;
-        }
-        pSubframe->wastedBitsPerSample = (drflac_uint8)wastedBitsPerSample + 1;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__decode_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex, drflac_int32* pDecodedSamplesOut)
-{
-    drflac_subframe* pSubframe;
-    drflac_uint32 subframeBitsPerSample;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(frame != NULL);
-
-    pSubframe = frame->subframes + subframeIndex;
-    if (!drflac__read_subframe_header(bs, pSubframe)) {
-        return DRFLAC_FALSE;
-    }
-
-    /* Side channels require an extra bit per sample. Took a while to figure that one out... */
-    subframeBitsPerSample = frame->header.bitsPerSample;
-    if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
-        subframeBitsPerSample += 1;
-    } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
-        subframeBitsPerSample += 1;
-    }
-
-    if (subframeBitsPerSample > 32) {
-        /* libFLAC and ffmpeg reject 33-bit subframes as well */
-        return DRFLAC_FALSE;
-    }
-
-    /* Need to handle wasted bits per sample. */
-    if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
-        return DRFLAC_FALSE;
-    }
-    subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
-
-    pSubframe->pSamplesS32 = pDecodedSamplesOut;
-
-    switch (pSubframe->subframeType)
-    {
-        case DRFLAC_SUBFRAME_CONSTANT:
-        {
-            drflac__decode_samples__constant(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
-        } break;
-
-        case DRFLAC_SUBFRAME_VERBATIM:
-        {
-            drflac__decode_samples__verbatim(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->pSamplesS32);
-        } break;
-
-        case DRFLAC_SUBFRAME_FIXED:
-        {
-            drflac__decode_samples__fixed(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
-        } break;
-
-        case DRFLAC_SUBFRAME_LPC:
-        {
-            drflac__decode_samples__lpc(bs, frame->header.blockSizeInPCMFrames, subframeBitsPerSample, pSubframe->lpcOrder, pSubframe->pSamplesS32);
-        } break;
-
-        default: return DRFLAC_FALSE;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__seek_subframe(drflac_bs* bs, drflac_frame* frame, int subframeIndex)
-{
-    drflac_subframe* pSubframe;
-    drflac_uint32 subframeBitsPerSample;
-
-    DRFLAC_ASSERT(bs != NULL);
-    DRFLAC_ASSERT(frame != NULL);
-
-    pSubframe = frame->subframes + subframeIndex;
-    if (!drflac__read_subframe_header(bs, pSubframe)) {
-        return DRFLAC_FALSE;
-    }
-
-    /* Side channels require an extra bit per sample. Took a while to figure that one out... */
-    subframeBitsPerSample = frame->header.bitsPerSample;
-    if ((frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE || frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE) && subframeIndex == 1) {
-        subframeBitsPerSample += 1;
-    } else if (frame->header.channelAssignment == DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE && subframeIndex == 0) {
-        subframeBitsPerSample += 1;
-    }
-
-    /* Need to handle wasted bits per sample. */
-    if (pSubframe->wastedBitsPerSample >= subframeBitsPerSample) {
-        return DRFLAC_FALSE;
-    }
-    subframeBitsPerSample -= pSubframe->wastedBitsPerSample;
-
-    pSubframe->pSamplesS32 = NULL;
-
-    switch (pSubframe->subframeType)
-    {
-        case DRFLAC_SUBFRAME_CONSTANT:
-        {
-            if (!drflac__seek_bits(bs, subframeBitsPerSample)) {
-                return DRFLAC_FALSE;
-            }
-        } break;
-
-        case DRFLAC_SUBFRAME_VERBATIM:
-        {
-            unsigned int bitsToSeek = frame->header.blockSizeInPCMFrames * subframeBitsPerSample;
-            if (!drflac__seek_bits(bs, bitsToSeek)) {
-                return DRFLAC_FALSE;
-            }
-        } break;
-
-        case DRFLAC_SUBFRAME_FIXED:
-        {
-            unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
-            if (!drflac__seek_bits(bs, bitsToSeek)) {
-                return DRFLAC_FALSE;
-            }
-
-            if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
-                return DRFLAC_FALSE;
-            }
-        } break;
-
-        case DRFLAC_SUBFRAME_LPC:
-        {
-            drflac_uint8 lpcPrecision;
-
-            unsigned int bitsToSeek = pSubframe->lpcOrder * subframeBitsPerSample;
-            if (!drflac__seek_bits(bs, bitsToSeek)) {
-                return DRFLAC_FALSE;
-            }
-
-            if (!drflac__read_uint8(bs, 4, &lpcPrecision)) {
-                return DRFLAC_FALSE;
-            }
-            if (lpcPrecision == 15) {
-                return DRFLAC_FALSE;    /* Invalid. */
-            }
-            lpcPrecision += 1;
-
-
-            bitsToSeek = (pSubframe->lpcOrder * lpcPrecision) + 5;    /* +5 for shift. */
-            if (!drflac__seek_bits(bs, bitsToSeek)) {
-                return DRFLAC_FALSE;
-            }
-
-            if (!drflac__read_and_seek_residual(bs, frame->header.blockSizeInPCMFrames, pSubframe->lpcOrder)) {
-                return DRFLAC_FALSE;
-            }
-        } break;
-
-        default: return DRFLAC_FALSE;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-
-static DRFLAC_INLINE drflac_uint8 drflac__get_channel_count_from_channel_assignment(drflac_int8 channelAssignment)
-{
-    drflac_uint8 lookup[] = {1, 2, 3, 4, 5, 6, 7, 8, 2, 2, 2};
-
-    DRFLAC_ASSERT(channelAssignment <= 10);
-    return lookup[channelAssignment];
-}
-
-static drflac_result drflac__decode_flac_frame(drflac* pFlac)
-{
-    int channelCount;
-    int i;
-    drflac_uint8 paddingSizeInBits;
-    drflac_uint16 desiredCRC16;
-#ifndef DR_FLAC_NO_CRC
-    drflac_uint16 actualCRC16;
-#endif
-
-    /* This function should be called while the stream is sitting on the first byte after the frame header. */
-    DRFLAC_ZERO_MEMORY(pFlac->currentFLACFrame.subframes, sizeof(pFlac->currentFLACFrame.subframes));
-
-    /* The frame block size must never be larger than the maximum block size defined by the FLAC stream. */
-    if (pFlac->currentFLACFrame.header.blockSizeInPCMFrames > pFlac->maxBlockSizeInPCMFrames) {
-        return DRFLAC_ERROR;
-    }
-
-    /* The number of channels in the frame must match the channel count from the STREAMINFO block. */
-    channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
-    if (channelCount != (int)pFlac->channels) {
-        return DRFLAC_ERROR;
-    }
-
-    for (i = 0; i < channelCount; ++i) {
-        if (!drflac__decode_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i, pFlac->pDecodedSamples + (pFlac->currentFLACFrame.header.blockSizeInPCMFrames * i))) {
-            return DRFLAC_ERROR;
-        }
-    }
-
-    paddingSizeInBits = (drflac_uint8)(DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7);
-    if (paddingSizeInBits > 0) {
-        drflac_uint8 padding = 0;
-        if (!drflac__read_uint8(&pFlac->bs, paddingSizeInBits, &padding)) {
-            return DRFLAC_AT_END;
-        }
-    }
-
-#ifndef DR_FLAC_NO_CRC
-    actualCRC16 = drflac__flush_crc16(&pFlac->bs);
-#endif
-    if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
-        return DRFLAC_AT_END;
-    }
-
-#ifndef DR_FLAC_NO_CRC
-    if (actualCRC16 != desiredCRC16) {
-        return DRFLAC_CRC_MISMATCH;    /* CRC mismatch. */
-    }
-#endif
-
-    pFlac->currentFLACFrame.pcmFramesRemaining = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
-
-    return DRFLAC_SUCCESS;
-}
-
-static drflac_result drflac__seek_flac_frame(drflac* pFlac)
-{
-    int channelCount;
-    int i;
-    drflac_uint16 desiredCRC16;
-#ifndef DR_FLAC_NO_CRC
-    drflac_uint16 actualCRC16;
-#endif
-
-    channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
-    for (i = 0; i < channelCount; ++i) {
-        if (!drflac__seek_subframe(&pFlac->bs, &pFlac->currentFLACFrame, i)) {
-            return DRFLAC_ERROR;
-        }
-    }
-
-    /* Padding. */
-    if (!drflac__seek_bits(&pFlac->bs, DRFLAC_CACHE_L1_BITS_REMAINING(&pFlac->bs) & 7)) {
-        return DRFLAC_ERROR;
-    }
-
-    /* CRC. */
-#ifndef DR_FLAC_NO_CRC
-    actualCRC16 = drflac__flush_crc16(&pFlac->bs);
-#endif
-    if (!drflac__read_uint16(&pFlac->bs, 16, &desiredCRC16)) {
-        return DRFLAC_AT_END;
-    }
-
-#ifndef DR_FLAC_NO_CRC
-    if (actualCRC16 != desiredCRC16) {
-        return DRFLAC_CRC_MISMATCH;    /* CRC mismatch. */
-    }
-#endif
-
-    return DRFLAC_SUCCESS;
-}
-
-static drflac_bool32 drflac__read_and_decode_next_flac_frame(drflac* pFlac)
-{
-    DRFLAC_ASSERT(pFlac != NULL);
-
-    for (;;) {
-        drflac_result result;
-
-        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-            return DRFLAC_FALSE;
-        }
-
-        result = drflac__decode_flac_frame(pFlac);
-        if (result != DRFLAC_SUCCESS) {
-            if (result == DRFLAC_CRC_MISMATCH) {
-                continue;   /* CRC mismatch. Skip to the next frame. */
-            } else {
-                return DRFLAC_FALSE;
-            }
-        }
-
-        return DRFLAC_TRUE;
-    }
-}
-
-static void drflac__get_pcm_frame_range_of_current_flac_frame(drflac* pFlac, drflac_uint64* pFirstPCMFrame, drflac_uint64* pLastPCMFrame)
-{
-    drflac_uint64 firstPCMFrame;
-    drflac_uint64 lastPCMFrame;
-
-    DRFLAC_ASSERT(pFlac != NULL);
-
-    firstPCMFrame = pFlac->currentFLACFrame.header.pcmFrameNumber;
-    if (firstPCMFrame == 0) {
-        firstPCMFrame = ((drflac_uint64)pFlac->currentFLACFrame.header.flacFrameNumber) * pFlac->maxBlockSizeInPCMFrames;
-    }
-
-    lastPCMFrame = firstPCMFrame + pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
-    if (lastPCMFrame > 0) {
-        lastPCMFrame -= 1; /* Needs to be zero based. */
-    }
-
-    if (pFirstPCMFrame) {
-        *pFirstPCMFrame = firstPCMFrame;
-    }
-    if (pLastPCMFrame) {
-        *pLastPCMFrame = lastPCMFrame;
-    }
-}
-
-static drflac_bool32 drflac__seek_to_first_frame(drflac* pFlac)
-{
-    drflac_bool32 result;
-
-    DRFLAC_ASSERT(pFlac != NULL);
-
-    result = drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes);
-
-    DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
-    pFlac->currentPCMFrame = 0;
-
-    return result;
-}
-
-static DRFLAC_INLINE drflac_result drflac__seek_to_next_flac_frame(drflac* pFlac)
-{
-    /* This function should only ever be called while the decoder is sitting on the first byte past the FRAME_HEADER section. */
-    DRFLAC_ASSERT(pFlac != NULL);
-    return drflac__seek_flac_frame(pFlac);
-}
-
-
-static drflac_uint64 drflac__seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 pcmFramesToSeek)
-{
-    drflac_uint64 pcmFramesRead = 0;
-    while (pcmFramesToSeek > 0) {
-        if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
-            if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
-                break;  /* Couldn't read the next frame, so just break from the loop and return. */
-            }
-        } else {
-            if (pFlac->currentFLACFrame.pcmFramesRemaining > pcmFramesToSeek) {
-                pcmFramesRead   += pcmFramesToSeek;
-                pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)pcmFramesToSeek;   /* <-- Safe cast. Will always be < currentFrame.pcmFramesRemaining < 65536. */
-                pcmFramesToSeek  = 0;
-            } else {
-                pcmFramesRead   += pFlac->currentFLACFrame.pcmFramesRemaining;
-                pcmFramesToSeek -= pFlac->currentFLACFrame.pcmFramesRemaining;
-                pFlac->currentFLACFrame.pcmFramesRemaining = 0;
-            }
-        }
-    }
-
-    pFlac->currentPCMFrame += pcmFramesRead;
-    return pcmFramesRead;
-}
-
-
-static drflac_bool32 drflac__seek_to_pcm_frame__brute_force(drflac* pFlac, drflac_uint64 pcmFrameIndex)
-{
-    drflac_bool32 isMidFrame = DRFLAC_FALSE;
-    drflac_uint64 runningPCMFrameCount;
-
-    DRFLAC_ASSERT(pFlac != NULL);
-
-    /* If we are seeking forward we start from the current position. Otherwise we need to start all the way from the start of the file. */
-    if (pcmFrameIndex >= pFlac->currentPCMFrame) {
-        /* Seeking forward. Need to seek from the current position. */
-        runningPCMFrameCount = pFlac->currentPCMFrame;
-
-        /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
-        if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
-            if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-                return DRFLAC_FALSE;
-            }
-        } else {
-            isMidFrame = DRFLAC_TRUE;
-        }
-    } else {
-        /* Seeking backwards. Need to seek from the start of the file. */
-        runningPCMFrameCount = 0;
-
-        /* Move back to the start. */
-        if (!drflac__seek_to_first_frame(pFlac)) {
-            return DRFLAC_FALSE;
-        }
-
-        /* Decode the first frame in preparation for sample-exact seeking below. */
-        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-            return DRFLAC_FALSE;
-        }
-    }
-
-    /*
-    We need to as quickly as possible find the frame that contains the target sample. To do this, we iterate over each frame and inspect its
-    header. If based on the header we can determine that the frame contains the sample, we do a full decode of that frame.
-    */
-    for (;;) {
-        drflac_uint64 pcmFrameCountInThisFLACFrame;
-        drflac_uint64 firstPCMFrameInFLACFrame = 0;
-        drflac_uint64 lastPCMFrameInFLACFrame = 0;
-
-        drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
-
-        pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
-        if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
-            /*
-            The sample should be in this frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
-            it never existed and keep iterating.
-            */
-            drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
-
-            if (!isMidFrame) {
-                drflac_result result = drflac__decode_flac_frame(pFlac);
-                if (result == DRFLAC_SUCCESS) {
-                    /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
-                    return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
-                } else {
-                    if (result == DRFLAC_CRC_MISMATCH) {
-                        goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
-                    } else {
-                        return DRFLAC_FALSE;
-                    }
-                }
-            } else {
-                /* We started seeking mid-frame which means we need to skip the frame decoding part. */
-                return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
-            }
-        } else {
-            /*
-            It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
-            frame never existed and leave the running sample count untouched.
-            */
-            if (!isMidFrame) {
-                drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
-                if (result == DRFLAC_SUCCESS) {
-                    runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
-                } else {
-                    if (result == DRFLAC_CRC_MISMATCH) {
-                        goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
-                    } else {
-                        return DRFLAC_FALSE;
-                    }
-                }
-            } else {
-                /*
-                We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
-                drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
-                */
-                runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
-                pFlac->currentFLACFrame.pcmFramesRemaining = 0;
-                isMidFrame = DRFLAC_FALSE;
-            }
-
-            /* If we are seeking to the end of the file and we've just hit it, we're done. */
-            if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
-                return DRFLAC_TRUE;
-            }
-        }
-
-    next_iteration:
-        /* Grab the next frame in preparation for the next iteration. */
-        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-            return DRFLAC_FALSE;
-        }
-    }
-}
-
-
-#if !defined(DR_FLAC_NO_CRC)
-/*
-We use an average compression ratio to determine our approximate start location. FLAC files are generally about 50%-70% the size of their
-uncompressed counterparts so we'll use this as a basis. I'm going to split the middle and use a factor of 0.6 to determine the starting
-location.
-*/
-#define DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO 0.6f
-
-static drflac_bool32 drflac__seek_to_approximate_flac_frame_to_byte(drflac* pFlac, drflac_uint64 targetByte, drflac_uint64 rangeLo, drflac_uint64 rangeHi, drflac_uint64* pLastSuccessfulSeekOffset)
-{
-    DRFLAC_ASSERT(pFlac != NULL);
-    DRFLAC_ASSERT(pLastSuccessfulSeekOffset != NULL);
-    DRFLAC_ASSERT(targetByte >= rangeLo);
-    DRFLAC_ASSERT(targetByte <= rangeHi);
-
-    *pLastSuccessfulSeekOffset = pFlac->firstFLACFramePosInBytes;
-
-    for (;;) {
-        /* After rangeLo == rangeHi == targetByte fails, we need to break out. */
-        drflac_uint64 lastTargetByte = targetByte;
-
-        /* When seeking to a byte, failure probably means we've attempted to seek beyond the end of the stream. To counter this we just halve it each attempt. */
-        if (!drflac__seek_to_byte(&pFlac->bs, targetByte)) {
-            /* If we couldn't even seek to the first byte in the stream we have a problem. Just abandon the whole thing. */
-            if (targetByte == 0) {
-                drflac__seek_to_first_frame(pFlac); /* Try to recover. */
-                return DRFLAC_FALSE;
-            }
-
-            /* Halve the byte location and continue. */
-            targetByte = rangeLo + ((rangeHi - rangeLo)/2);
-            rangeHi = targetByte;
-        } else {
-            /* Getting here should mean that we have seeked to an appropriate byte. */
-
-            /* Clear the details of the FLAC frame so we don't misreport data. */
-            DRFLAC_ZERO_MEMORY(&pFlac->currentFLACFrame, sizeof(pFlac->currentFLACFrame));
-
-            /*
-            Now seek to the next FLAC frame. We need to decode the entire frame (not just the header) because it's possible for the header to incorrectly pass the
-            CRC check and return bad data. We need to decode the entire frame to be more certain. Although this seems unlikely, this has happened to me in testing
-            so it needs to stay this way for now.
-            */
-#if 1
-            if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
-                /* Halve the byte location and continue. */
-                targetByte = rangeLo + ((rangeHi - rangeLo)/2);
-                rangeHi = targetByte;
-            } else {
-                break;
-            }
-#else
-            if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-                /* Halve the byte location and continue. */
-                targetByte = rangeLo + ((rangeHi - rangeLo)/2);
-                rangeHi = targetByte;
-            } else {
-                break;
-            }
-#endif
-        }
-
-        /* We already tried this byte and there are no more to try, break out. */
-        if(targetByte == lastTargetByte) {
-            return DRFLAC_FALSE;
-        }
-    }
-
-    /* The current PCM frame needs to be updated based on the frame we just seeked to. */
-    drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
-
-    DRFLAC_ASSERT(targetByte <= rangeHi);
-
-    *pLastSuccessfulSeekOffset = targetByte;
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(drflac* pFlac, drflac_uint64 offset)
-{
-    /* This section of code would be used if we were only decoding the FLAC frame header when calling drflac__seek_to_approximate_flac_frame_to_byte(). */
-#if 0
-    if (drflac__decode_flac_frame(pFlac) != DRFLAC_SUCCESS) {
-        /* We failed to decode this frame which may be due to it being corrupt. We'll just use the next valid FLAC frame. */
-        if (drflac__read_and_decode_next_flac_frame(pFlac) == DRFLAC_FALSE) {
-            return DRFLAC_FALSE;
-        }
-    }
-#endif
-
-    return drflac__seek_forward_by_pcm_frames(pFlac, offset) == offset;
-}
-
-
-static drflac_bool32 drflac__seek_to_pcm_frame__binary_search_internal(drflac* pFlac, drflac_uint64 pcmFrameIndex, drflac_uint64 byteRangeLo, drflac_uint64 byteRangeHi)
-{
-    /* This assumes pFlac->currentPCMFrame is sitting on byteRangeLo upon entry. */
-
-    drflac_uint64 targetByte;
-    drflac_uint64 pcmRangeLo = pFlac->totalPCMFrameCount;
-    drflac_uint64 pcmRangeHi = 0;
-    drflac_uint64 lastSuccessfulSeekOffset = (drflac_uint64)-1;
-    drflac_uint64 closestSeekOffsetBeforeTargetPCMFrame = byteRangeLo;
-    drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
-
-    targetByte = byteRangeLo + (drflac_uint64)(((drflac_int64)((pcmFrameIndex - pFlac->currentPCMFrame) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * DRFLAC_BINARY_SEARCH_APPROX_COMPRESSION_RATIO);
-    if (targetByte > byteRangeHi) {
-        targetByte = byteRangeHi;
-    }
-
-    for (;;) {
-        if (drflac__seek_to_approximate_flac_frame_to_byte(pFlac, targetByte, byteRangeLo, byteRangeHi, &lastSuccessfulSeekOffset)) {
-            /* We found a FLAC frame. We need to check if it contains the sample we're looking for. */
-            drflac_uint64 newPCMRangeLo;
-            drflac_uint64 newPCMRangeHi;
-            drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &newPCMRangeLo, &newPCMRangeHi);
-
-            /* If we selected the same frame, it means we should be pretty close. Just decode the rest. */
-            if (pcmRangeLo == newPCMRangeLo) {
-                if (!drflac__seek_to_approximate_flac_frame_to_byte(pFlac, closestSeekOffsetBeforeTargetPCMFrame, closestSeekOffsetBeforeTargetPCMFrame, byteRangeHi, &lastSuccessfulSeekOffset)) {
-                    break;  /* Failed to seek to closest frame. */
-                }
-
-                if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
-                    return DRFLAC_TRUE;
-                } else {
-                    break;  /* Failed to seek forward. */
-                }
-            }
-
-            pcmRangeLo = newPCMRangeLo;
-            pcmRangeHi = newPCMRangeHi;
-
-            if (pcmRangeLo <= pcmFrameIndex && pcmRangeHi >= pcmFrameIndex) {
-                /* The target PCM frame is in this FLAC frame. */
-                if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame) ) {
-                    return DRFLAC_TRUE;
-                } else {
-                    break;  /* Failed to seek to FLAC frame. */
-                }
-            } else {
-                const float approxCompressionRatio = (drflac_int64)(lastSuccessfulSeekOffset - pFlac->firstFLACFramePosInBytes) / ((drflac_int64)(pcmRangeLo * pFlac->channels * pFlac->bitsPerSample)/8.0f);
-
-                if (pcmRangeLo > pcmFrameIndex) {
-                    /* We seeked too far forward. We need to move our target byte backward and try again. */
-                    byteRangeHi = lastSuccessfulSeekOffset;
-                    if (byteRangeLo > byteRangeHi) {
-                        byteRangeLo = byteRangeHi;
-                    }
-
-                    targetByte = byteRangeLo + ((byteRangeHi - byteRangeLo) / 2);
-                    if (targetByte < byteRangeLo) {
-                        targetByte = byteRangeLo;
-                    }
-                } else /*if (pcmRangeHi < pcmFrameIndex)*/ {
-                    /* We didn't seek far enough. We need to move our target byte forward and try again. */
-
-                    /* If we're close enough we can just seek forward. */
-                    if ((pcmFrameIndex - pcmRangeLo) < seekForwardThreshold) {
-                        if (drflac__decode_flac_frame_and_seek_forward_by_pcm_frames(pFlac, pcmFrameIndex - pFlac->currentPCMFrame)) {
-                            return DRFLAC_TRUE;
-                        } else {
-                            break;  /* Failed to seek to FLAC frame. */
-                        }
-                    } else {
-                        byteRangeLo = lastSuccessfulSeekOffset;
-                        if (byteRangeHi < byteRangeLo) {
-                            byteRangeHi = byteRangeLo;
-                        }
-
-                        targetByte = lastSuccessfulSeekOffset + (drflac_uint64)(((drflac_int64)((pcmFrameIndex-pcmRangeLo) * pFlac->channels * pFlac->bitsPerSample)/8.0f) * approxCompressionRatio);
-                        if (targetByte > byteRangeHi) {
-                            targetByte = byteRangeHi;
-                        }
-
-                        if (closestSeekOffsetBeforeTargetPCMFrame < lastSuccessfulSeekOffset) {
-                            closestSeekOffsetBeforeTargetPCMFrame = lastSuccessfulSeekOffset;
-                        }
-                    }
-                }
-            }
-        } else {
-            /* Getting here is really bad. We just recover as best we can, but moving to the first frame in the stream, and then abort. */
-            break;
-        }
-    }
-
-    drflac__seek_to_first_frame(pFlac); /* <-- Try to recover. */
-    return DRFLAC_FALSE;
-}
-
-static drflac_bool32 drflac__seek_to_pcm_frame__binary_search(drflac* pFlac, drflac_uint64 pcmFrameIndex)
-{
-    drflac_uint64 byteRangeLo;
-    drflac_uint64 byteRangeHi;
-    drflac_uint32 seekForwardThreshold = (pFlac->maxBlockSizeInPCMFrames != 0) ? pFlac->maxBlockSizeInPCMFrames*2 : 4096;
-
-    /* Our algorithm currently assumes the FLAC stream is currently sitting at the start. */
-    if (drflac__seek_to_first_frame(pFlac) == DRFLAC_FALSE) {
-        return DRFLAC_FALSE;
-    }
-
-    /* If we're close enough to the start, just move to the start and seek forward. */
-    if (pcmFrameIndex < seekForwardThreshold) {
-        return drflac__seek_forward_by_pcm_frames(pFlac, pcmFrameIndex) == pcmFrameIndex;
-    }
-
-    /*
-    Our starting byte range is the byte position of the first FLAC frame and the approximate end of the file as if it were completely uncompressed. This ensures
-    the entire file is included, even though most of the time it'll exceed the end of the actual stream. This is OK as the frame searching logic will handle it.
-    */
-    byteRangeLo = pFlac->firstFLACFramePosInBytes;
-    byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
-
-    return drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi);
-}
-#endif  /* !DR_FLAC_NO_CRC */
-
-static drflac_bool32 drflac__seek_to_pcm_frame__seek_table(drflac* pFlac, drflac_uint64 pcmFrameIndex)
-{
-    drflac_uint32 iClosestSeekpoint = 0;
-    drflac_bool32 isMidFrame = DRFLAC_FALSE;
-    drflac_uint64 runningPCMFrameCount;
-    drflac_uint32 iSeekpoint;
-
-
-    DRFLAC_ASSERT(pFlac != NULL);
-
-    if (pFlac->pSeekpoints == NULL || pFlac->seekpointCount == 0) {
-        return DRFLAC_FALSE;
-    }
-
-    /* Do not use the seektable if pcmFramIndex is not coverd by it. */
-    if (pFlac->pSeekpoints[0].firstPCMFrame > pcmFrameIndex) {
-        return DRFLAC_FALSE;
-    }
-
-    for (iSeekpoint = 0; iSeekpoint < pFlac->seekpointCount; ++iSeekpoint) {
-        if (pFlac->pSeekpoints[iSeekpoint].firstPCMFrame >= pcmFrameIndex) {
-            break;
-        }
-
-        iClosestSeekpoint = iSeekpoint;
-    }
-
-    /* There's been cases where the seek table contains only zeros. We need to do some basic validation on the closest seekpoint. */
-    if (pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount == 0 || pFlac->pSeekpoints[iClosestSeekpoint].pcmFrameCount > pFlac->maxBlockSizeInPCMFrames) {
-        return DRFLAC_FALSE;
-    }
-    if (pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame > pFlac->totalPCMFrameCount && pFlac->totalPCMFrameCount > 0) {
-        return DRFLAC_FALSE;
-    }
-
-#if !defined(DR_FLAC_NO_CRC)
-    /* At this point we should know the closest seek point. We can use a binary search for this. We need to know the total sample count for this. */
-    if (pFlac->totalPCMFrameCount > 0) {
-        drflac_uint64 byteRangeLo;
-        drflac_uint64 byteRangeHi;
-
-        byteRangeHi = pFlac->firstFLACFramePosInBytes + (drflac_uint64)((drflac_int64)(pFlac->totalPCMFrameCount * pFlac->channels * pFlac->bitsPerSample)/8.0f);
-        byteRangeLo = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset;
-
-        /*
-        If our closest seek point is not the last one, we only need to search between it and the next one. The section below calculates an appropriate starting
-        value for byteRangeHi which will clamp it appropriately.
-
-        Note that the next seekpoint must have an offset greater than the closest seekpoint because otherwise our binary search algorithm will break down. There
-        have been cases where a seektable consists of seek points where every byte offset is set to 0 which causes problems. If this happens we need to abort.
-        */
-        if (iClosestSeekpoint < pFlac->seekpointCount-1) {
-            drflac_uint32 iNextSeekpoint = iClosestSeekpoint + 1;
-
-            /* Basic validation on the seekpoints to ensure they're usable. */
-            if (pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset >= pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset || pFlac->pSeekpoints[iNextSeekpoint].pcmFrameCount == 0) {
-                return DRFLAC_FALSE;    /* The next seekpoint doesn't look right. The seek table cannot be trusted from here. Abort. */
-            }
-
-            if (pFlac->pSeekpoints[iNextSeekpoint].firstPCMFrame != (((drflac_uint64)0xFFFFFFFF << 32) | 0xFFFFFFFF)) { /* Make sure it's not a placeholder seekpoint. */
-                byteRangeHi = pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iNextSeekpoint].flacFrameOffset - 1; /* byteRangeHi must be zero based. */
-            }
-        }
-
-        if (drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
-            if (drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-                drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &pFlac->currentPCMFrame, NULL);
-
-                if (drflac__seek_to_pcm_frame__binary_search_internal(pFlac, pcmFrameIndex, byteRangeLo, byteRangeHi)) {
-                    return DRFLAC_TRUE;
-                }
-            }
-        }
-    }
-#endif  /* !DR_FLAC_NO_CRC */
-
-    /* Getting here means we need to use a slower algorithm because the binary search method failed or cannot be used. */
-
-    /*
-    If we are seeking forward and the closest seekpoint is _before_ the current sample, we just seek forward from where we are. Otherwise we start seeking
-    from the seekpoint's first sample.
-    */
-    if (pcmFrameIndex >= pFlac->currentPCMFrame && pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame <= pFlac->currentPCMFrame) {
-        /* Optimized case. Just seek forward from where we are. */
-        runningPCMFrameCount = pFlac->currentPCMFrame;
-
-        /* The frame header for the first frame may not yet have been read. We need to do that if necessary. */
-        if (pFlac->currentPCMFrame == 0 && pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
-            if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-                return DRFLAC_FALSE;
-            }
-        } else {
-            isMidFrame = DRFLAC_TRUE;
-        }
-    } else {
-        /* Slower case. Seek to the start of the seekpoint and then seek forward from there. */
-        runningPCMFrameCount = pFlac->pSeekpoints[iClosestSeekpoint].firstPCMFrame;
-
-        if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes + pFlac->pSeekpoints[iClosestSeekpoint].flacFrameOffset)) {
-            return DRFLAC_FALSE;
-        }
-
-        /* Grab the frame the seekpoint is sitting on in preparation for the sample-exact seeking below. */
-        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-            return DRFLAC_FALSE;
-        }
-    }
-
-    for (;;) {
-        drflac_uint64 pcmFrameCountInThisFLACFrame;
-        drflac_uint64 firstPCMFrameInFLACFrame = 0;
-        drflac_uint64 lastPCMFrameInFLACFrame = 0;
-
-        drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
-
-        pcmFrameCountInThisFLACFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
-        if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFLACFrame)) {
-            /*
-            The sample should be in this frame. We need to fully decode it, but if it's an invalid frame (a CRC mismatch) we need to pretend
-            it never existed and keep iterating.
-            */
-            drflac_uint64 pcmFramesToDecode = pcmFrameIndex - runningPCMFrameCount;
-
-            if (!isMidFrame) {
-                drflac_result result = drflac__decode_flac_frame(pFlac);
-                if (result == DRFLAC_SUCCESS) {
-                    /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
-                    return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
-                } else {
-                    if (result == DRFLAC_CRC_MISMATCH) {
-                        goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
-                    } else {
-                        return DRFLAC_FALSE;
-                    }
-                }
-            } else {
-                /* We started seeking mid-frame which means we need to skip the frame decoding part. */
-                return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;
-            }
-        } else {
-            /*
-            It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
-            frame never existed and leave the running sample count untouched.
-            */
-            if (!isMidFrame) {
-                drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
-                if (result == DRFLAC_SUCCESS) {
-                    runningPCMFrameCount += pcmFrameCountInThisFLACFrame;
-                } else {
-                    if (result == DRFLAC_CRC_MISMATCH) {
-                        goto next_iteration;   /* CRC mismatch. Pretend this frame never existed. */
-                    } else {
-                        return DRFLAC_FALSE;
-                    }
-                }
-            } else {
-                /*
-                We started seeking mid-frame which means we need to seek by reading to the end of the frame instead of with
-                drflac__seek_to_next_flac_frame() which only works if the decoder is sitting on the byte just after the frame header.
-                */
-                runningPCMFrameCount += pFlac->currentFLACFrame.pcmFramesRemaining;
-                pFlac->currentFLACFrame.pcmFramesRemaining = 0;
-                isMidFrame = DRFLAC_FALSE;
-            }
-
-            /* If we are seeking to the end of the file and we've just hit it, we're done. */
-            if (pcmFrameIndex == pFlac->totalPCMFrameCount && runningPCMFrameCount == pFlac->totalPCMFrameCount) {
-                return DRFLAC_TRUE;
-            }
-        }
-
-    next_iteration:
-        /* Grab the next frame in preparation for the next iteration. */
-        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-            return DRFLAC_FALSE;
-        }
-    }
-}
-
-
-#ifndef DR_FLAC_NO_OGG
-typedef struct
-{
-    drflac_uint8 capturePattern[4];  /* Should be "OggS" */
-    drflac_uint8 structureVersion;   /* Always 0. */
-    drflac_uint8 headerType;
-    drflac_uint64 granulePosition;
-    drflac_uint32 serialNumber;
-    drflac_uint32 sequenceNumber;
-    drflac_uint32 checksum;
-    drflac_uint8 segmentCount;
-    drflac_uint8 segmentTable[255];
-} drflac_ogg_page_header;
-#endif
-
-typedef struct
-{
-    drflac_read_proc onRead;
-    drflac_seek_proc onSeek;
-    drflac_meta_proc onMeta;
-    drflac_container container;
-    void* pUserData;
-    void* pUserDataMD;
-    drflac_uint32 sampleRate;
-    drflac_uint8  channels;
-    drflac_uint8  bitsPerSample;
-    drflac_uint64 totalPCMFrameCount;
-    drflac_uint16 maxBlockSizeInPCMFrames;
-    drflac_uint64 runningFilePos;
-    drflac_bool32 hasStreamInfoBlock;
-    drflac_bool32 hasMetadataBlocks;
-    drflac_bs bs;                           /* <-- A bit streamer is required for loading data during initialization. */
-    drflac_frame_header firstFrameHeader;   /* <-- The header of the first frame that was read during relaxed initalization. Only set if there is no STREAMINFO block. */
-
-#ifndef DR_FLAC_NO_OGG
-    drflac_uint32 oggSerial;
-    drflac_uint64 oggFirstBytePos;
-    drflac_ogg_page_header oggBosHeader;
-#endif
-} drflac_init_info;
-
-static DRFLAC_INLINE void drflac__decode_block_header(drflac_uint32 blockHeader, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
-{
-    blockHeader = drflac__be2host_32(blockHeader);
-    *isLastBlock = (drflac_uint8)((blockHeader & 0x80000000UL) >> 31);
-    *blockType   = (drflac_uint8)((blockHeader & 0x7F000000UL) >> 24);
-    *blockSize   =                (blockHeader & 0x00FFFFFFUL);
-}
-
-static DRFLAC_INLINE drflac_bool32 drflac__read_and_decode_block_header(drflac_read_proc onRead, void* pUserData, drflac_uint8* isLastBlock, drflac_uint8* blockType, drflac_uint32* blockSize)
-{
-    drflac_uint32 blockHeader;
-
-    *blockSize = 0;
-    if (onRead(pUserData, &blockHeader, 4) != 4) {
-        return DRFLAC_FALSE;
-    }
-
-    drflac__decode_block_header(blockHeader, isLastBlock, blockType, blockSize);
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__read_streaminfo(drflac_read_proc onRead, void* pUserData, drflac_streaminfo* pStreamInfo)
-{
-    drflac_uint32 blockSizes;
-    drflac_uint64 frameSizes = 0;
-    drflac_uint64 importantProps;
-    drflac_uint8 md5[16];
-
-    /* min/max block size. */
-    if (onRead(pUserData, &blockSizes, 4) != 4) {
-        return DRFLAC_FALSE;
-    }
-
-    /* min/max frame size. */
-    if (onRead(pUserData, &frameSizes, 6) != 6) {
-        return DRFLAC_FALSE;
-    }
-
-    /* Sample rate, channels, bits per sample and total sample count. */
-    if (onRead(pUserData, &importantProps, 8) != 8) {
-        return DRFLAC_FALSE;
-    }
-
-    /* MD5 */
-    if (onRead(pUserData, md5, sizeof(md5)) != sizeof(md5)) {
-        return DRFLAC_FALSE;
-    }
-
-    blockSizes     = drflac__be2host_32(blockSizes);
-    frameSizes     = drflac__be2host_64(frameSizes);
-    importantProps = drflac__be2host_64(importantProps);
-
-    pStreamInfo->minBlockSizeInPCMFrames = (drflac_uint16)((blockSizes & 0xFFFF0000) >> 16);
-    pStreamInfo->maxBlockSizeInPCMFrames = (drflac_uint16) (blockSizes & 0x0000FFFF);
-    pStreamInfo->minFrameSizeInPCMFrames = (drflac_uint32)((frameSizes     &  (((drflac_uint64)0x00FFFFFF << 16) << 24)) >> 40);
-    pStreamInfo->maxFrameSizeInPCMFrames = (drflac_uint32)((frameSizes     &  (((drflac_uint64)0x00FFFFFF << 16) <<  0)) >> 16);
-    pStreamInfo->sampleRate              = (drflac_uint32)((importantProps &  (((drflac_uint64)0x000FFFFF << 16) << 28)) >> 44);
-    pStreamInfo->channels                = (drflac_uint8 )((importantProps &  (((drflac_uint64)0x0000000E << 16) << 24)) >> 41) + 1;
-    pStreamInfo->bitsPerSample           = (drflac_uint8 )((importantProps &  (((drflac_uint64)0x0000001F << 16) << 20)) >> 36) + 1;
-    pStreamInfo->totalPCMFrameCount      =                ((importantProps & ((((drflac_uint64)0x0000000F << 16) << 16) | 0xFFFFFFFF)));
-    DRFLAC_COPY_MEMORY(pStreamInfo->md5, md5, sizeof(md5));
-
-    return DRFLAC_TRUE;
-}
-
-
-static void* drflac__malloc_default(size_t sz, void* pUserData)
-{
-    (void)pUserData;
-    return DRFLAC_MALLOC(sz);
-}
-
-static void* drflac__realloc_default(void* p, size_t sz, void* pUserData)
-{
-    (void)pUserData;
-    return DRFLAC_REALLOC(p, sz);
-}
-
-static void drflac__free_default(void* p, void* pUserData)
-{
-    (void)pUserData;
-    DRFLAC_FREE(p);
-}
-
-
-static void* drflac__malloc_from_callbacks(size_t sz, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    if (pAllocationCallbacks == NULL) {
-        return NULL;
-    }
-
-    if (pAllocationCallbacks->onMalloc != NULL) {
-        return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
-    }
-
-    /* Try using realloc(). */
-    if (pAllocationCallbacks->onRealloc != NULL) {
-        return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
-    }
-
-    return NULL;
-}
-
-static void* drflac__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    if (pAllocationCallbacks == NULL) {
-        return NULL;
-    }
-
-    if (pAllocationCallbacks->onRealloc != NULL) {
-        return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
-    }
-
-    /* Try emulating realloc() in terms of malloc()/free(). */
-    if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
-        void* p2;
-
-        p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
-        if (p2 == NULL) {
-            return NULL;
-        }
-
-        if (p != NULL) {
-            DRFLAC_COPY_MEMORY(p2, p, szOld);
-            pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
-        }
-
-        return p2;
-    }
-
-    return NULL;
-}
-
-static void drflac__free_from_callbacks(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    if (p == NULL || pAllocationCallbacks == NULL) {
-        return;
-    }
-
-    if (pAllocationCallbacks->onFree != NULL) {
-        pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
-    }
-}
-
-
-static drflac_bool32 drflac__read_and_decode_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_uint64* pFirstFramePos, drflac_uint64* pSeektablePos, drflac_uint32* pSeekpointCount, drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    /*
-    We want to keep track of the byte position in the stream of the seektable. At the time of calling this function we know that
-    we'll be sitting on byte 42.
-    */
-    drflac_uint64 runningFilePos = 42;
-    drflac_uint64 seektablePos   = 0;
-    drflac_uint32 seektableSize  = 0;
-
-    for (;;) {
-        drflac_metadata metadata;
-        drflac_uint8 isLastBlock = 0;
-        drflac_uint8 blockType = 0;
-        drflac_uint32 blockSize;
-        if (drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize) == DRFLAC_FALSE) {
-            return DRFLAC_FALSE;
-        }
-        runningFilePos += 4;
-
-        metadata.type = blockType;
-        metadata.pRawData = NULL;
-        metadata.rawDataSize = 0;
-
-        switch (blockType)
-        {
-            case DRFLAC_METADATA_BLOCK_TYPE_APPLICATION:
-            {
-                if (blockSize < 4) {
-                    return DRFLAC_FALSE;
-                }
-
-                if (onMeta) {
-                    void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
-                    if (pRawData == NULL) {
-                        return DRFLAC_FALSE;
-                    }
-
-                    if (onRead(pUserData, pRawData, blockSize) != blockSize) {
-                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                        return DRFLAC_FALSE;
-                    }
-
-                    metadata.pRawData = pRawData;
-                    metadata.rawDataSize = blockSize;
-                    metadata.data.application.id       = drflac__be2host_32(*(drflac_uint32*)pRawData);
-                    metadata.data.application.pData    = (const void*)((drflac_uint8*)pRawData + sizeof(drflac_uint32));
-                    metadata.data.application.dataSize = blockSize - sizeof(drflac_uint32);
-                    onMeta(pUserDataMD, &metadata);
-
-                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                }
-            } break;
-
-            case DRFLAC_METADATA_BLOCK_TYPE_SEEKTABLE:
-            {
-                seektablePos  = runningFilePos;
-                seektableSize = blockSize;
-
-                if (onMeta) {
-                    drflac_uint32 seekpointCount;
-                    drflac_uint32 iSeekpoint;
-                    void* pRawData;
-
-                    seekpointCount = blockSize/DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
-
-                    pRawData = drflac__malloc_from_callbacks(seekpointCount * sizeof(drflac_seekpoint), pAllocationCallbacks);
-                    if (pRawData == NULL) {
-                        return DRFLAC_FALSE;
-                    }
-
-                    /* We need to read seekpoint by seekpoint and do some processing. */
-                    for (iSeekpoint = 0; iSeekpoint < seekpointCount; ++iSeekpoint) {
-                        drflac_seekpoint* pSeekpoint = (drflac_seekpoint*)pRawData + iSeekpoint;
-
-                        if (onRead(pUserData, pSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) != DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
-                            drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                            return DRFLAC_FALSE;
-                        }
-
-                        /* Endian swap. */
-                        pSeekpoint->firstPCMFrame   = drflac__be2host_64(pSeekpoint->firstPCMFrame);
-                        pSeekpoint->flacFrameOffset = drflac__be2host_64(pSeekpoint->flacFrameOffset);
-                        pSeekpoint->pcmFrameCount   = drflac__be2host_16(pSeekpoint->pcmFrameCount);
-                    }
-
-                    metadata.pRawData = pRawData;
-                    metadata.rawDataSize = blockSize;
-                    metadata.data.seektable.seekpointCount = seekpointCount;
-                    metadata.data.seektable.pSeekpoints = (const drflac_seekpoint*)pRawData;
-
-                    onMeta(pUserDataMD, &metadata);
-
-                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                }
-            } break;
-
-            case DRFLAC_METADATA_BLOCK_TYPE_VORBIS_COMMENT:
-            {
-                if (blockSize < 8) {
-                    return DRFLAC_FALSE;
-                }
-
-                if (onMeta) {
-                    void* pRawData;
-                    const char* pRunningData;
-                    const char* pRunningDataEnd;
-                    drflac_uint32 i;
-
-                    pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
-                    if (pRawData == NULL) {
-                        return DRFLAC_FALSE;
-                    }
-
-                    if (onRead(pUserData, pRawData, blockSize) != blockSize) {
-                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                        return DRFLAC_FALSE;
-                    }
-
-                    metadata.pRawData = pRawData;
-                    metadata.rawDataSize = blockSize;
-
-                    pRunningData    = (const char*)pRawData;
-                    pRunningDataEnd = (const char*)pRawData + blockSize;
-
-                    metadata.data.vorbis_comment.vendorLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-
-                    /* Need space for the rest of the block */
-                    if ((pRunningDataEnd - pRunningData) - 4 < (drflac_int64)metadata.data.vorbis_comment.vendorLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
-                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                        return DRFLAC_FALSE;
-                    }
-                    metadata.data.vorbis_comment.vendor       = pRunningData;                                            pRunningData += metadata.data.vorbis_comment.vendorLength;
-                    metadata.data.vorbis_comment.commentCount = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-
-                    /* Need space for 'commentCount' comments after the block, which at minimum is a drflac_uint32 per comment */
-                    if ((pRunningDataEnd - pRunningData) / sizeof(drflac_uint32) < metadata.data.vorbis_comment.commentCount) { /* <-- Note the order of operations to avoid overflow to a valid value */
-                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                        return DRFLAC_FALSE;
-                    }
-                    metadata.data.vorbis_comment.pComments    = pRunningData;
-
-                    /* Check that the comments section is valid before passing it to the callback */
-                    for (i = 0; i < metadata.data.vorbis_comment.commentCount; ++i) {
-                        drflac_uint32 commentLength;
-
-                        if (pRunningDataEnd - pRunningData < 4) {
-                            drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                            return DRFLAC_FALSE;
-                        }
-
-                        commentLength = drflac__le2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-                        if (pRunningDataEnd - pRunningData < (drflac_int64)commentLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
-                            drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                            return DRFLAC_FALSE;
-                        }
-                        pRunningData += commentLength;
-                    }
-
-                    onMeta(pUserDataMD, &metadata);
-
-                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                }
-            } break;
-
-            case DRFLAC_METADATA_BLOCK_TYPE_CUESHEET:
-            {
-                if (blockSize < 396) {
-                    return DRFLAC_FALSE;
-                }
-
-                if (onMeta) {
-                    void* pRawData;
-                    const char* pRunningData;
-                    const char* pRunningDataEnd;
-                    size_t bufferSize;
-                    drflac_uint8 iTrack;
-                    drflac_uint8 iIndex;
-                    void* pTrackData;
-
-                    /*
-                    This needs to be loaded in two passes. The first pass is used to calculate the size of the memory allocation
-                    we need for storing the necessary data. The second pass will fill that buffer with usable data.
-                    */
-                    pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
-                    if (pRawData == NULL) {
-                        return DRFLAC_FALSE;
-                    }
-
-                    if (onRead(pUserData, pRawData, blockSize) != blockSize) {
-                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                        return DRFLAC_FALSE;
-                    }
-
-                    metadata.pRawData = pRawData;
-                    metadata.rawDataSize = blockSize;
-
-                    pRunningData    = (const char*)pRawData;
-                    pRunningDataEnd = (const char*)pRawData + blockSize;
-
-                    DRFLAC_COPY_MEMORY(metadata.data.cuesheet.catalog, pRunningData, 128);                              pRunningData += 128;
-                    metadata.data.cuesheet.leadInSampleCount = drflac__be2host_64(*(const drflac_uint64*)pRunningData); pRunningData += 8;
-                    metadata.data.cuesheet.isCD              = (pRunningData[0] & 0x80) != 0;                           pRunningData += 259;
-                    metadata.data.cuesheet.trackCount        = pRunningData[0];                                         pRunningData += 1;
-                    metadata.data.cuesheet.pTrackData        = NULL;    /* Will be filled later. */
-
-                    /* Pass 1: Calculate the size of the buffer for the track data. */
-                    {
-                        const char* pRunningDataSaved = pRunningData;   /* Will be restored at the end in preparation for the second pass. */
-
-                        bufferSize = metadata.data.cuesheet.trackCount * DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES;
-
-                        for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
-                            drflac_uint8 indexCount;
-                            drflac_uint32 indexPointSize;
-
-                            if (pRunningDataEnd - pRunningData < DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES) {
-                                drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                                return DRFLAC_FALSE;
-                            }
-
-                            /* Skip to the index point count */
-                            pRunningData += 35;
-                            
-                            indexCount = pRunningData[0];
-                            pRunningData += 1;
-                            
-                            bufferSize += indexCount * sizeof(drflac_cuesheet_track_index);
-
-                            /* Quick validation check. */
-                            indexPointSize = indexCount * DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
-                            if (pRunningDataEnd - pRunningData < (drflac_int64)indexPointSize) {
-                                drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                                return DRFLAC_FALSE;
-                            }
-
-                            pRunningData += indexPointSize;
-                        }
-
-                        pRunningData = pRunningDataSaved;
-                    }
-
-                    /* Pass 2: Allocate a buffer and fill the data. Validation was done in the step above so can be skipped. */
-                    {
-                        char* pRunningTrackData;
-
-                        pTrackData = drflac__malloc_from_callbacks(bufferSize, pAllocationCallbacks);
-                        if (pTrackData == NULL) {
-                            drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                            return DRFLAC_FALSE;
-                        }
-
-                        pRunningTrackData = (char*)pTrackData;
-
-                        for (iTrack = 0; iTrack < metadata.data.cuesheet.trackCount; ++iTrack) {
-                            drflac_uint8 indexCount;
-
-                            DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES);
-                            pRunningData      += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1; /* Skip forward, but not beyond the last byte in the CUESHEET_TRACK block which is the index count. */
-                            pRunningTrackData += DRFLAC_CUESHEET_TRACK_SIZE_IN_BYTES-1;
-
-                            /* Grab the index count for the next part. */
-                            indexCount = pRunningData[0];
-                            pRunningData      += 1;
-                            pRunningTrackData += 1;
-
-                            /* Extract each track index. */
-                            for (iIndex = 0; iIndex < indexCount; ++iIndex) {
-                                drflac_cuesheet_track_index* pTrackIndex = (drflac_cuesheet_track_index*)pRunningTrackData;
-
-                                DRFLAC_COPY_MEMORY(pRunningTrackData, pRunningData, DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES);
-                                pRunningData      += DRFLAC_CUESHEET_TRACK_INDEX_SIZE_IN_BYTES;
-                                pRunningTrackData += sizeof(drflac_cuesheet_track_index);
-
-                                pTrackIndex->offset = drflac__be2host_64(pTrackIndex->offset);
-                            }
-                        }
-
-                        metadata.data.cuesheet.pTrackData = pTrackData;
-                    }
-
-                    /* The original data is no longer needed. */
-                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                    pRawData = NULL;
-
-                    onMeta(pUserDataMD, &metadata);
-
-                    drflac__free_from_callbacks(pTrackData, pAllocationCallbacks);
-                    pTrackData = NULL;
-                }
-            } break;
-
-            case DRFLAC_METADATA_BLOCK_TYPE_PICTURE:
-            {
-                if (blockSize < 32) {
-                    return DRFLAC_FALSE;
-                }
-
-                if (onMeta) {
-                    void* pRawData;
-                    const char* pRunningData;
-                    const char* pRunningDataEnd;
-
-                    pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
-                    if (pRawData == NULL) {
-                        return DRFLAC_FALSE;
-                    }
-
-                    if (onRead(pUserData, pRawData, blockSize) != blockSize) {
-                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                        return DRFLAC_FALSE;
-                    }
-
-                    metadata.pRawData = pRawData;
-                    metadata.rawDataSize = blockSize;
-
-                    pRunningData    = (const char*)pRawData;
-                    pRunningDataEnd = (const char*)pRawData + blockSize;
-
-                    metadata.data.picture.type       = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-                    metadata.data.picture.mimeLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-
-                    /* Need space for the rest of the block */
-                    if ((pRunningDataEnd - pRunningData) - 24 < (drflac_int64)metadata.data.picture.mimeLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
-                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                        return DRFLAC_FALSE;
-                    }
-                    metadata.data.picture.mime              = pRunningData;                                   pRunningData += metadata.data.picture.mimeLength;
-                    metadata.data.picture.descriptionLength = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-
-                    /* Need space for the rest of the block */
-                    if ((pRunningDataEnd - pRunningData) - 20 < (drflac_int64)metadata.data.picture.descriptionLength) { /* <-- Note the order of operations to avoid overflow to a valid value */
-                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                        return DRFLAC_FALSE;
-                    }
-                    metadata.data.picture.description     = pRunningData;                                   pRunningData += metadata.data.picture.descriptionLength;
-                    metadata.data.picture.width           = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-                    metadata.data.picture.height          = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-                    metadata.data.picture.colorDepth      = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-                    metadata.data.picture.indexColorCount = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-                    metadata.data.picture.pictureDataSize = drflac__be2host_32_ptr_unaligned(pRunningData); pRunningData += 4;
-                    metadata.data.picture.pPictureData    = (const drflac_uint8*)pRunningData;
-
-                    /* Need space for the picture after the block */
-                    if (pRunningDataEnd - pRunningData < (drflac_int64)metadata.data.picture.pictureDataSize) { /* <-- Note the order of operations to avoid overflow to a valid value */
-                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                        return DRFLAC_FALSE;
-                    }
-
-                    onMeta(pUserDataMD, &metadata);
-
-                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                }
-            } break;
-
-            case DRFLAC_METADATA_BLOCK_TYPE_PADDING:
-            {
-                if (onMeta) {
-                    metadata.data.padding.unused = 0;
-
-                    /* Padding doesn't have anything meaningful in it, so just skip over it, but make sure the caller is aware of it by firing the callback. */
-                    if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
-                        isLastBlock = DRFLAC_TRUE;  /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
-                    } else {
-                        onMeta(pUserDataMD, &metadata);
-                    }
-                }
-            } break;
-
-            case DRFLAC_METADATA_BLOCK_TYPE_INVALID:
-            {
-                /* Invalid chunk. Just skip over this one. */
-                if (onMeta) {
-                    if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
-                        isLastBlock = DRFLAC_TRUE;  /* An error occurred while seeking. Attempt to recover by treating this as the last block which will in turn terminate the loop. */
-                    }
-                }
-            } break;
-
-            default:
-            {
-                /*
-                It's an unknown chunk, but not necessarily invalid. There's a chance more metadata blocks might be defined later on, so we
-                can at the very least report the chunk to the application and let it look at the raw data.
-                */
-                if (onMeta) {
-                    void* pRawData = drflac__malloc_from_callbacks(blockSize, pAllocationCallbacks);
-                    if (pRawData == NULL) {
-                        return DRFLAC_FALSE;
-                    }
-
-                    if (onRead(pUserData, pRawData, blockSize) != blockSize) {
-                        drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                        return DRFLAC_FALSE;
-                    }
-
-                    metadata.pRawData = pRawData;
-                    metadata.rawDataSize = blockSize;
-                    onMeta(pUserDataMD, &metadata);
-
-                    drflac__free_from_callbacks(pRawData, pAllocationCallbacks);
-                }
-            } break;
-        }
-
-        /* If we're not handling metadata, just skip over the block. If we are, it will have been handled earlier in the switch statement above. */
-        if (onMeta == NULL && blockSize > 0) {
-            if (!onSeek(pUserData, blockSize, drflac_seek_origin_current)) {
-                isLastBlock = DRFLAC_TRUE;
-            }
-        }
-
-        runningFilePos += blockSize;
-        if (isLastBlock) {
-            break;
-        }
-    }
-
-    *pSeektablePos   = seektablePos;
-    *pSeekpointCount = seektableSize / DRFLAC_SEEKPOINT_SIZE_IN_BYTES;
-    *pFirstFramePos  = runningFilePos;
-
-    return DRFLAC_TRUE;
-}
-
-static drflac_bool32 drflac__init_private__native(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
-{
-    /* Pre Condition: The bit stream should be sitting just past the 4-byte id header. */
-
-    drflac_uint8 isLastBlock;
-    drflac_uint8 blockType;
-    drflac_uint32 blockSize;
-
-    (void)onSeek;
-
-    pInit->container = drflac_container_native;
-
-    /* The first metadata block should be the STREAMINFO block. */
-    if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
-        return DRFLAC_FALSE;
-    }
-
-    if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
-        if (!relaxed) {
-            /* We're opening in strict mode and the first block is not the STREAMINFO block. Error. */
-            return DRFLAC_FALSE;
-        } else {
-            /*
-            Relaxed mode. To open from here we need to just find the first frame and set the sample rate, etc. to whatever is defined
-            for that frame.
-            */
-            pInit->hasStreamInfoBlock = DRFLAC_FALSE;
-            pInit->hasMetadataBlocks  = DRFLAC_FALSE;
-
-            if (!drflac__read_next_flac_frame_header(&pInit->bs, 0, &pInit->firstFrameHeader)) {
-                return DRFLAC_FALSE;    /* Couldn't find a frame. */
-            }
-
-            if (pInit->firstFrameHeader.bitsPerSample == 0) {
-                return DRFLAC_FALSE;    /* Failed to initialize because the first frame depends on the STREAMINFO block, which does not exist. */
-            }
-
-            pInit->sampleRate              = pInit->firstFrameHeader.sampleRate;
-            pInit->channels                = drflac__get_channel_count_from_channel_assignment(pInit->firstFrameHeader.channelAssignment);
-            pInit->bitsPerSample           = pInit->firstFrameHeader.bitsPerSample;
-            pInit->maxBlockSizeInPCMFrames = 65535;   /* <-- See notes here: https://xiph.org/flac/format.html#metadata_block_streaminfo */
-            return DRFLAC_TRUE;
-        }
-    } else {
-        drflac_streaminfo streaminfo;
-        if (!drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
-            return DRFLAC_FALSE;
-        }
-
-        pInit->hasStreamInfoBlock      = DRFLAC_TRUE;
-        pInit->sampleRate              = streaminfo.sampleRate;
-        pInit->channels                = streaminfo.channels;
-        pInit->bitsPerSample           = streaminfo.bitsPerSample;
-        pInit->totalPCMFrameCount      = streaminfo.totalPCMFrameCount;
-        pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;    /* Don't care about the min block size - only the max (used for determining the size of the memory allocation). */
-        pInit->hasMetadataBlocks       = !isLastBlock;
-
-        if (onMeta) {
-            drflac_metadata metadata;
-            metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
-            metadata.pRawData = NULL;
-            metadata.rawDataSize = 0;
-            metadata.data.streaminfo = streaminfo;
-            onMeta(pUserDataMD, &metadata);
-        }
-
-        return DRFLAC_TRUE;
-    }
-}
-
-#ifndef DR_FLAC_NO_OGG
-#define DRFLAC_OGG_MAX_PAGE_SIZE            65307
-#define DRFLAC_OGG_CAPTURE_PATTERN_CRC32    1605413199  /* CRC-32 of "OggS". */
-
-typedef enum
-{
-    drflac_ogg_recover_on_crc_mismatch,
-    drflac_ogg_fail_on_crc_mismatch
-} drflac_ogg_crc_mismatch_recovery;
-
-#ifndef DR_FLAC_NO_CRC
-static drflac_uint32 drflac__crc32_table[] = {
-    0x00000000L, 0x04C11DB7L, 0x09823B6EL, 0x0D4326D9L,
-    0x130476DCL, 0x17C56B6BL, 0x1A864DB2L, 0x1E475005L,
-    0x2608EDB8L, 0x22C9F00FL, 0x2F8AD6D6L, 0x2B4BCB61L,
-    0x350C9B64L, 0x31CD86D3L, 0x3C8EA00AL, 0x384FBDBDL,
-    0x4C11DB70L, 0x48D0C6C7L, 0x4593E01EL, 0x4152FDA9L,
-    0x5F15ADACL, 0x5BD4B01BL, 0x569796C2L, 0x52568B75L,
-    0x6A1936C8L, 0x6ED82B7FL, 0x639B0DA6L, 0x675A1011L,
-    0x791D4014L, 0x7DDC5DA3L, 0x709F7B7AL, 0x745E66CDL,
-    0x9823B6E0L, 0x9CE2AB57L, 0x91A18D8EL, 0x95609039L,
-    0x8B27C03CL, 0x8FE6DD8BL, 0x82A5FB52L, 0x8664E6E5L,
-    0xBE2B5B58L, 0xBAEA46EFL, 0xB7A96036L, 0xB3687D81L,
-    0xAD2F2D84L, 0xA9EE3033L, 0xA4AD16EAL, 0xA06C0B5DL,
-    0xD4326D90L, 0xD0F37027L, 0xDDB056FEL, 0xD9714B49L,
-    0xC7361B4CL, 0xC3F706FBL, 0xCEB42022L, 0xCA753D95L,
-    0xF23A8028L, 0xF6FB9D9FL, 0xFBB8BB46L, 0xFF79A6F1L,
-    0xE13EF6F4L, 0xE5FFEB43L, 0xE8BCCD9AL, 0xEC7DD02DL,
-    0x34867077L, 0x30476DC0L, 0x3D044B19L, 0x39C556AEL,
-    0x278206ABL, 0x23431B1CL, 0x2E003DC5L, 0x2AC12072L,
-    0x128E9DCFL, 0x164F8078L, 0x1B0CA6A1L, 0x1FCDBB16L,
-    0x018AEB13L, 0x054BF6A4L, 0x0808D07DL, 0x0CC9CDCAL,
-    0x7897AB07L, 0x7C56B6B0L, 0x71159069L, 0x75D48DDEL,
-    0x6B93DDDBL, 0x6F52C06CL, 0x6211E6B5L, 0x66D0FB02L,
-    0x5E9F46BFL, 0x5A5E5B08L, 0x571D7DD1L, 0x53DC6066L,
-    0x4D9B3063L, 0x495A2DD4L, 0x44190B0DL, 0x40D816BAL,
-    0xACA5C697L, 0xA864DB20L, 0xA527FDF9L, 0xA1E6E04EL,
-    0xBFA1B04BL, 0xBB60ADFCL, 0xB6238B25L, 0xB2E29692L,
-    0x8AAD2B2FL, 0x8E6C3698L, 0x832F1041L, 0x87EE0DF6L,
-    0x99A95DF3L, 0x9D684044L, 0x902B669DL, 0x94EA7B2AL,
-    0xE0B41DE7L, 0xE4750050L, 0xE9362689L, 0xEDF73B3EL,
-    0xF3B06B3BL, 0xF771768CL, 0xFA325055L, 0xFEF34DE2L,
-    0xC6BCF05FL, 0xC27DEDE8L, 0xCF3ECB31L, 0xCBFFD686L,
-    0xD5B88683L, 0xD1799B34L, 0xDC3ABDEDL, 0xD8FBA05AL,
-    0x690CE0EEL, 0x6DCDFD59L, 0x608EDB80L, 0x644FC637L,
-    0x7A089632L, 0x7EC98B85L, 0x738AAD5CL, 0x774BB0EBL,
-    0x4F040D56L, 0x4BC510E1L, 0x46863638L, 0x42472B8FL,
-    0x5C007B8AL, 0x58C1663DL, 0x558240E4L, 0x51435D53L,
-    0x251D3B9EL, 0x21DC2629L, 0x2C9F00F0L, 0x285E1D47L,
-    0x36194D42L, 0x32D850F5L, 0x3F9B762CL, 0x3B5A6B9BL,
-    0x0315D626L, 0x07D4CB91L, 0x0A97ED48L, 0x0E56F0FFL,
-    0x1011A0FAL, 0x14D0BD4DL, 0x19939B94L, 0x1D528623L,
-    0xF12F560EL, 0xF5EE4BB9L, 0xF8AD6D60L, 0xFC6C70D7L,
-    0xE22B20D2L, 0xE6EA3D65L, 0xEBA91BBCL, 0xEF68060BL,
-    0xD727BBB6L, 0xD3E6A601L, 0xDEA580D8L, 0xDA649D6FL,
-    0xC423CD6AL, 0xC0E2D0DDL, 0xCDA1F604L, 0xC960EBB3L,
-    0xBD3E8D7EL, 0xB9FF90C9L, 0xB4BCB610L, 0xB07DABA7L,
-    0xAE3AFBA2L, 0xAAFBE615L, 0xA7B8C0CCL, 0xA379DD7BL,
-    0x9B3660C6L, 0x9FF77D71L, 0x92B45BA8L, 0x9675461FL,
-    0x8832161AL, 0x8CF30BADL, 0x81B02D74L, 0x857130C3L,
-    0x5D8A9099L, 0x594B8D2EL, 0x5408ABF7L, 0x50C9B640L,
-    0x4E8EE645L, 0x4A4FFBF2L, 0x470CDD2BL, 0x43CDC09CL,
-    0x7B827D21L, 0x7F436096L, 0x7200464FL, 0x76C15BF8L,
-    0x68860BFDL, 0x6C47164AL, 0x61043093L, 0x65C52D24L,
-    0x119B4BE9L, 0x155A565EL, 0x18197087L, 0x1CD86D30L,
-    0x029F3D35L, 0x065E2082L, 0x0B1D065BL, 0x0FDC1BECL,
-    0x3793A651L, 0x3352BBE6L, 0x3E119D3FL, 0x3AD08088L,
-    0x2497D08DL, 0x2056CD3AL, 0x2D15EBE3L, 0x29D4F654L,
-    0xC5A92679L, 0xC1683BCEL, 0xCC2B1D17L, 0xC8EA00A0L,
-    0xD6AD50A5L, 0xD26C4D12L, 0xDF2F6BCBL, 0xDBEE767CL,
-    0xE3A1CBC1L, 0xE760D676L, 0xEA23F0AFL, 0xEEE2ED18L,
-    0xF0A5BD1DL, 0xF464A0AAL, 0xF9278673L, 0xFDE69BC4L,
-    0x89B8FD09L, 0x8D79E0BEL, 0x803AC667L, 0x84FBDBD0L,
-    0x9ABC8BD5L, 0x9E7D9662L, 0x933EB0BBL, 0x97FFAD0CL,
-    0xAFB010B1L, 0xAB710D06L, 0xA6322BDFL, 0xA2F33668L,
-    0xBCB4666DL, 0xB8757BDAL, 0xB5365D03L, 0xB1F740B4L
-};
-#endif
-
-static DRFLAC_INLINE drflac_uint32 drflac_crc32_byte(drflac_uint32 crc32, drflac_uint8 data)
-{
-#ifndef DR_FLAC_NO_CRC
-    return (crc32 << 8) ^ drflac__crc32_table[(drflac_uint8)((crc32 >> 24) & 0xFF) ^ data];
-#else
-    (void)data;
-    return crc32;
-#endif
-}
-
-#if 0
-static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint32(drflac_uint32 crc32, drflac_uint32 data)
-{
-    crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 24) & 0xFF));
-    crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >> 16) & 0xFF));
-    crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >>  8) & 0xFF));
-    crc32 = drflac_crc32_byte(crc32, (drflac_uint8)((data >>  0) & 0xFF));
-    return crc32;
-}
-
-static DRFLAC_INLINE drflac_uint32 drflac_crc32_uint64(drflac_uint32 crc32, drflac_uint64 data)
-{
-    crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >> 32) & 0xFFFFFFFF));
-    crc32 = drflac_crc32_uint32(crc32, (drflac_uint32)((data >>  0) & 0xFFFFFFFF));
-    return crc32;
-}
-#endif
-
-static DRFLAC_INLINE drflac_uint32 drflac_crc32_buffer(drflac_uint32 crc32, drflac_uint8* pData, drflac_uint32 dataSize)
-{
-    /* This can be optimized. */
-    drflac_uint32 i;
-    for (i = 0; i < dataSize; ++i) {
-        crc32 = drflac_crc32_byte(crc32, pData[i]);
-    }
-    return crc32;
-}
-
-
-static DRFLAC_INLINE drflac_bool32 drflac_ogg__is_capture_pattern(drflac_uint8 pattern[4])
-{
-    return pattern[0] == 'O' && pattern[1] == 'g' && pattern[2] == 'g' && pattern[3] == 'S';
-}
-
-static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_header_size(drflac_ogg_page_header* pHeader)
-{
-    return 27 + pHeader->segmentCount;
-}
-
-static DRFLAC_INLINE drflac_uint32 drflac_ogg__get_page_body_size(drflac_ogg_page_header* pHeader)
-{
-    drflac_uint32 pageBodySize = 0;
-    int i;
-
-    for (i = 0; i < pHeader->segmentCount; ++i) {
-        pageBodySize += pHeader->segmentTable[i];
-    }
-
-    return pageBodySize;
-}
-
-static drflac_result drflac_ogg__read_page_header_after_capture_pattern(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
-{
-    drflac_uint8 data[23];
-    drflac_uint32 i;
-
-    DRFLAC_ASSERT(*pCRC32 == DRFLAC_OGG_CAPTURE_PATTERN_CRC32);
-
-    if (onRead(pUserData, data, 23) != 23) {
-        return DRFLAC_AT_END;
-    }
-    *pBytesRead += 23;
-
-    /*
-    It's not actually used, but set the capture pattern to 'OggS' for completeness. Not doing this will cause static analysers to complain about
-    us trying to access uninitialized data. We could alternatively just comment out this member of the drflac_ogg_page_header structure, but I
-    like to have it map to the structure of the underlying data.
-    */
-    pHeader->capturePattern[0] = 'O';
-    pHeader->capturePattern[1] = 'g';
-    pHeader->capturePattern[2] = 'g';
-    pHeader->capturePattern[3] = 'S';
-
-    pHeader->structureVersion = data[0];
-    pHeader->headerType       = data[1];
-    DRFLAC_COPY_MEMORY(&pHeader->granulePosition, &data[ 2], 8);
-    DRFLAC_COPY_MEMORY(&pHeader->serialNumber,    &data[10], 4);
-    DRFLAC_COPY_MEMORY(&pHeader->sequenceNumber,  &data[14], 4);
-    DRFLAC_COPY_MEMORY(&pHeader->checksum,        &data[18], 4);
-    pHeader->segmentCount     = data[22];
-
-    /* Calculate the CRC. Note that for the calculation the checksum part of the page needs to be set to 0. */
-    data[18] = 0;
-    data[19] = 0;
-    data[20] = 0;
-    data[21] = 0;
-
-    for (i = 0; i < 23; ++i) {
-        *pCRC32 = drflac_crc32_byte(*pCRC32, data[i]);
-    }
-
-
-    if (onRead(pUserData, pHeader->segmentTable, pHeader->segmentCount) != pHeader->segmentCount) {
-        return DRFLAC_AT_END;
-    }
-    *pBytesRead += pHeader->segmentCount;
-
-    for (i = 0; i < pHeader->segmentCount; ++i) {
-        *pCRC32 = drflac_crc32_byte(*pCRC32, pHeader->segmentTable[i]);
-    }
-
-    return DRFLAC_SUCCESS;
-}
-
-static drflac_result drflac_ogg__read_page_header(drflac_read_proc onRead, void* pUserData, drflac_ogg_page_header* pHeader, drflac_uint32* pBytesRead, drflac_uint32* pCRC32)
-{
-    drflac_uint8 id[4];
-
-    *pBytesRead = 0;
-
-    if (onRead(pUserData, id, 4) != 4) {
-        return DRFLAC_AT_END;
-    }
-    *pBytesRead += 4;
-
-    /* We need to read byte-by-byte until we find the OggS capture pattern. */
-    for (;;) {
-        if (drflac_ogg__is_capture_pattern(id)) {
-            drflac_result result;
-
-            *pCRC32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
-
-            result = drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, pHeader, pBytesRead, pCRC32);
-            if (result == DRFLAC_SUCCESS) {
-                return DRFLAC_SUCCESS;
-            } else {
-                if (result == DRFLAC_CRC_MISMATCH) {
-                    continue;
-                } else {
-                    return result;
-                }
-            }
-        } else {
-            /* The first 4 bytes did not equal the capture pattern. Read the next byte and try again. */
-            id[0] = id[1];
-            id[1] = id[2];
-            id[2] = id[3];
-            if (onRead(pUserData, &id[3], 1) != 1) {
-                return DRFLAC_AT_END;
-            }
-            *pBytesRead += 1;
-        }
-    }
-}
-
-
-/*
-The main part of the Ogg encapsulation is the conversion from the physical Ogg bitstream to the native FLAC bitstream. It works
-in three general stages: Ogg Physical Bitstream -> Ogg/FLAC Logical Bitstream -> FLAC Native Bitstream. dr_flac is designed
-in such a way that the core sections assume everything is delivered in native format. Therefore, for each encapsulation type
-dr_flac is supporting there needs to be a layer sitting on top of the onRead and onSeek callbacks that ensures the bits read from
-the physical Ogg bitstream are converted and delivered in native FLAC format.
-*/
-typedef struct
-{
-    drflac_read_proc onRead;                /* The original onRead callback from drflac_open() and family. */
-    drflac_seek_proc onSeek;                /* The original onSeek callback from drflac_open() and family. */
-    void* pUserData;                        /* The user data passed on onRead and onSeek. This is the user data that was passed on drflac_open() and family. */
-    drflac_uint64 currentBytePos;           /* The position of the byte we are sitting on in the physical byte stream. Used for efficient seeking. */
-    drflac_uint64 firstBytePos;             /* The position of the first byte in the physical bitstream. Points to the start of the "OggS" identifier of the FLAC bos page. */
-    drflac_uint32 serialNumber;             /* The serial number of the FLAC audio pages. This is determined by the initial header page that was read during initialization. */
-    drflac_ogg_page_header bosPageHeader;   /* Used for seeking. */
-    drflac_ogg_page_header currentPageHeader;
-    drflac_uint32 bytesRemainingInPage;
-    drflac_uint32 pageDataSize;
-    drflac_uint8 pageData[DRFLAC_OGG_MAX_PAGE_SIZE];
-} drflac_oggbs; /* oggbs = Ogg Bitstream */
-
-static size_t drflac_oggbs__read_physical(drflac_oggbs* oggbs, void* bufferOut, size_t bytesToRead)
-{
-    size_t bytesActuallyRead = oggbs->onRead(oggbs->pUserData, bufferOut, bytesToRead);
-    oggbs->currentBytePos += bytesActuallyRead;
-
-    return bytesActuallyRead;
-}
-
-static drflac_bool32 drflac_oggbs__seek_physical(drflac_oggbs* oggbs, drflac_uint64 offset, drflac_seek_origin origin)
-{
-    if (origin == drflac_seek_origin_start) {
-        if (offset <= 0x7FFFFFFF) {
-            if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_start)) {
-                return DRFLAC_FALSE;
-            }
-            oggbs->currentBytePos = offset;
-
-            return DRFLAC_TRUE;
-        } else {
-            if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_start)) {
-                return DRFLAC_FALSE;
-            }
-            oggbs->currentBytePos = offset;
-
-            return drflac_oggbs__seek_physical(oggbs, offset - 0x7FFFFFFF, drflac_seek_origin_current);
-        }
-    } else {
-        while (offset > 0x7FFFFFFF) {
-            if (!oggbs->onSeek(oggbs->pUserData, 0x7FFFFFFF, drflac_seek_origin_current)) {
-                return DRFLAC_FALSE;
-            }
-            oggbs->currentBytePos += 0x7FFFFFFF;
-            offset -= 0x7FFFFFFF;
-        }
-
-        if (!oggbs->onSeek(oggbs->pUserData, (int)offset, drflac_seek_origin_current)) {    /* <-- Safe cast thanks to the loop above. */
-            return DRFLAC_FALSE;
-        }
-        oggbs->currentBytePos += offset;
-
-        return DRFLAC_TRUE;
-    }
-}
-
-static drflac_bool32 drflac_oggbs__goto_next_page(drflac_oggbs* oggbs, drflac_ogg_crc_mismatch_recovery recoveryMethod)
-{
-    drflac_ogg_page_header header;
-    for (;;) {
-        drflac_uint32 crc32 = 0;
-        drflac_uint32 bytesRead;
-        drflac_uint32 pageBodySize;
-#ifndef DR_FLAC_NO_CRC
-        drflac_uint32 actualCRC32;
-#endif
-
-        if (drflac_ogg__read_page_header(oggbs->onRead, oggbs->pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
-            return DRFLAC_FALSE;
-        }
-        oggbs->currentBytePos += bytesRead;
-
-        pageBodySize = drflac_ogg__get_page_body_size(&header);
-        if (pageBodySize > DRFLAC_OGG_MAX_PAGE_SIZE) {
-            continue;   /* Invalid page size. Assume it's corrupted and just move to the next page. */
-        }
-
-        if (header.serialNumber != oggbs->serialNumber) {
-            /* It's not a FLAC page. Skip it. */
-            if (pageBodySize > 0 && !drflac_oggbs__seek_physical(oggbs, pageBodySize, drflac_seek_origin_current)) {
-                return DRFLAC_FALSE;
-            }
-            continue;
-        }
-
-
-        /* We need to read the entire page and then do a CRC check on it. If there's a CRC mismatch we need to skip this page. */
-        if (drflac_oggbs__read_physical(oggbs, oggbs->pageData, pageBodySize) != pageBodySize) {
-            return DRFLAC_FALSE;
-        }
-        oggbs->pageDataSize = pageBodySize;
-
-#ifndef DR_FLAC_NO_CRC
-        actualCRC32 = drflac_crc32_buffer(crc32, oggbs->pageData, oggbs->pageDataSize);
-        if (actualCRC32 != header.checksum) {
-            if (recoveryMethod == drflac_ogg_recover_on_crc_mismatch) {
-                continue;   /* CRC mismatch. Skip this page. */
-            } else {
-                /*
-                Even though we are failing on a CRC mismatch, we still want our stream to be in a good state. Therefore we
-                go to the next valid page to ensure we're in a good state, but return false to let the caller know that the
-                seek did not fully complete.
-                */
-                drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch);
-                return DRFLAC_FALSE;
-            }
-        }
-#else
-        (void)recoveryMethod;   /* <-- Silence a warning. */
-#endif
-
-        oggbs->currentPageHeader = header;
-        oggbs->bytesRemainingInPage = pageBodySize;
-        return DRFLAC_TRUE;
-    }
-}
-
-/* Function below is unused at the moment, but I might be re-adding it later. */
-#if 0
-static drflac_uint8 drflac_oggbs__get_current_segment_index(drflac_oggbs* oggbs, drflac_uint8* pBytesRemainingInSeg)
-{
-    drflac_uint32 bytesConsumedInPage = drflac_ogg__get_page_body_size(&oggbs->currentPageHeader) - oggbs->bytesRemainingInPage;
-    drflac_uint8 iSeg = 0;
-    drflac_uint32 iByte = 0;
-    while (iByte < bytesConsumedInPage) {
-        drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
-        if (iByte + segmentSize > bytesConsumedInPage) {
-            break;
-        } else {
-            iSeg += 1;
-            iByte += segmentSize;
-        }
-    }
-
-    *pBytesRemainingInSeg = oggbs->currentPageHeader.segmentTable[iSeg] - (drflac_uint8)(bytesConsumedInPage - iByte);
-    return iSeg;
-}
-
-static drflac_bool32 drflac_oggbs__seek_to_next_packet(drflac_oggbs* oggbs)
-{
-    /* The current packet ends when we get to the segment with a lacing value of < 255 which is not at the end of a page. */
-    for (;;) {
-        drflac_bool32 atEndOfPage = DRFLAC_FALSE;
-
-        drflac_uint8 bytesRemainingInSeg;
-        drflac_uint8 iFirstSeg = drflac_oggbs__get_current_segment_index(oggbs, &bytesRemainingInSeg);
-
-        drflac_uint32 bytesToEndOfPacketOrPage = bytesRemainingInSeg;
-        for (drflac_uint8 iSeg = iFirstSeg; iSeg < oggbs->currentPageHeader.segmentCount; ++iSeg) {
-            drflac_uint8 segmentSize = oggbs->currentPageHeader.segmentTable[iSeg];
-            if (segmentSize < 255) {
-                if (iSeg == oggbs->currentPageHeader.segmentCount-1) {
-                    atEndOfPage = DRFLAC_TRUE;
-                }
-
-                break;
-            }
-
-            bytesToEndOfPacketOrPage += segmentSize;
-        }
-
-        /*
-        At this point we will have found either the packet or the end of the page. If were at the end of the page we'll
-        want to load the next page and keep searching for the end of the packet.
-        */
-        drflac_oggbs__seek_physical(oggbs, bytesToEndOfPacketOrPage, drflac_seek_origin_current);
-        oggbs->bytesRemainingInPage -= bytesToEndOfPacketOrPage;
-
-        if (atEndOfPage) {
-            /*
-            We're potentially at the next packet, but we need to check the next page first to be sure because the packet may
-            straddle pages.
-            */
-            if (!drflac_oggbs__goto_next_page(oggbs)) {
-                return DRFLAC_FALSE;
-            }
-
-            /* If it's a fresh packet it most likely means we're at the next packet. */
-            if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {
-                return DRFLAC_TRUE;
-            }
-        } else {
-            /* We're at the next packet. */
-            return DRFLAC_TRUE;
-        }
-    }
-}
-
-static drflac_bool32 drflac_oggbs__seek_to_next_frame(drflac_oggbs* oggbs)
-{
-    /* The bitstream should be sitting on the first byte just after the header of the frame. */
-
-    /* What we're actually doing here is seeking to the start of the next packet. */
-    return drflac_oggbs__seek_to_next_packet(oggbs);
-}
-#endif
-
-static size_t drflac__on_read_ogg(void* pUserData, void* bufferOut, size_t bytesToRead)
-{
-    drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
-    drflac_uint8* pRunningBufferOut = (drflac_uint8*)bufferOut;
-    size_t bytesRead = 0;
-
-    DRFLAC_ASSERT(oggbs != NULL);
-    DRFLAC_ASSERT(pRunningBufferOut != NULL);
-
-    /* Reading is done page-by-page. If we've run out of bytes in the page we need to move to the next one. */
-    while (bytesRead < bytesToRead) {
-        size_t bytesRemainingToRead = bytesToRead - bytesRead;
-
-        if (oggbs->bytesRemainingInPage >= bytesRemainingToRead) {
-            DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), bytesRemainingToRead);
-            bytesRead += bytesRemainingToRead;
-            oggbs->bytesRemainingInPage -= (drflac_uint32)bytesRemainingToRead;
-            break;
-        }
-
-        /* If we get here it means some of the requested data is contained in the next pages. */
-        if (oggbs->bytesRemainingInPage > 0) {
-            DRFLAC_COPY_MEMORY(pRunningBufferOut, oggbs->pageData + (oggbs->pageDataSize - oggbs->bytesRemainingInPage), oggbs->bytesRemainingInPage);
-            bytesRead += oggbs->bytesRemainingInPage;
-            pRunningBufferOut += oggbs->bytesRemainingInPage;
-            oggbs->bytesRemainingInPage = 0;
-        }
-
-        DRFLAC_ASSERT(bytesRemainingToRead > 0);
-        if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
-            break;  /* Failed to go to the next page. Might have simply hit the end of the stream. */
-        }
-    }
-
-    return bytesRead;
-}
-
-static drflac_bool32 drflac__on_seek_ogg(void* pUserData, int offset, drflac_seek_origin origin)
-{
-    drflac_oggbs* oggbs = (drflac_oggbs*)pUserData;
-    int bytesSeeked = 0;
-
-    DRFLAC_ASSERT(oggbs != NULL);
-    DRFLAC_ASSERT(offset >= 0);  /* <-- Never seek backwards. */
-
-    /* Seeking is always forward which makes things a lot simpler. */
-    if (origin == drflac_seek_origin_start) {
-        if (!drflac_oggbs__seek_physical(oggbs, (int)oggbs->firstBytePos, drflac_seek_origin_start)) {
-            return DRFLAC_FALSE;
-        }
-
-        if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
-            return DRFLAC_FALSE;
-        }
-
-        return drflac__on_seek_ogg(pUserData, offset, drflac_seek_origin_current);
-    }
-
-    DRFLAC_ASSERT(origin == drflac_seek_origin_current);
-
-    while (bytesSeeked < offset) {
-        int bytesRemainingToSeek = offset - bytesSeeked;
-        DRFLAC_ASSERT(bytesRemainingToSeek >= 0);
-
-        if (oggbs->bytesRemainingInPage >= (size_t)bytesRemainingToSeek) {
-            bytesSeeked += bytesRemainingToSeek;
-            (void)bytesSeeked;  /* <-- Silence a dead store warning emitted by Clang Static Analyzer. */
-            oggbs->bytesRemainingInPage -= bytesRemainingToSeek;
-            break;
-        }
-
-        /* If we get here it means some of the requested data is contained in the next pages. */
-        if (oggbs->bytesRemainingInPage > 0) {
-            bytesSeeked += (int)oggbs->bytesRemainingInPage;
-            oggbs->bytesRemainingInPage = 0;
-        }
-
-        DRFLAC_ASSERT(bytesRemainingToSeek > 0);
-        if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_fail_on_crc_mismatch)) {
-            /* Failed to go to the next page. We either hit the end of the stream or had a CRC mismatch. */
-            return DRFLAC_FALSE;
-        }
-    }
-
-    return DRFLAC_TRUE;
-}
-
-
-static drflac_bool32 drflac_ogg__seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
-{
-    drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
-    drflac_uint64 originalBytePos;
-    drflac_uint64 runningGranulePosition;
-    drflac_uint64 runningFrameBytePos;
-    drflac_uint64 runningPCMFrameCount;
-
-    DRFLAC_ASSERT(oggbs != NULL);
-
-    originalBytePos = oggbs->currentBytePos;   /* For recovery. Points to the OggS identifier. */
-
-    /* First seek to the first frame. */
-    if (!drflac__seek_to_byte(&pFlac->bs, pFlac->firstFLACFramePosInBytes)) {
-        return DRFLAC_FALSE;
-    }
-    oggbs->bytesRemainingInPage = 0;
-
-    runningGranulePosition = 0;
-    for (;;) {
-        if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
-            drflac_oggbs__seek_physical(oggbs, originalBytePos, drflac_seek_origin_start);
-            return DRFLAC_FALSE;   /* Never did find that sample... */
-        }
-
-        runningFrameBytePos = oggbs->currentBytePos - drflac_ogg__get_page_header_size(&oggbs->currentPageHeader) - oggbs->pageDataSize;
-        if (oggbs->currentPageHeader.granulePosition >= pcmFrameIndex) {
-            break; /* The sample is somewhere in the previous page. */
-        }
-
-        /*
-        At this point we know the sample is not in the previous page. It could possibly be in this page. For simplicity we
-        disregard any pages that do not begin a fresh packet.
-        */
-        if ((oggbs->currentPageHeader.headerType & 0x01) == 0) {    /* <-- Is it a fresh page? */
-            if (oggbs->currentPageHeader.segmentTable[0] >= 2) {
-                drflac_uint8 firstBytesInPage[2];
-                firstBytesInPage[0] = oggbs->pageData[0];
-                firstBytesInPage[1] = oggbs->pageData[1];
-
-                if ((firstBytesInPage[0] == 0xFF) && (firstBytesInPage[1] & 0xFC) == 0xF8) {    /* <-- Does the page begin with a frame's sync code? */
-                    runningGranulePosition = oggbs->currentPageHeader.granulePosition;
-                }
-
-                continue;
-            }
-        }
-    }
-
-    /*
-    We found the page that that is closest to the sample, so now we need to find it. The first thing to do is seek to the
-    start of that page. In the loop above we checked that it was a fresh page which means this page is also the start of
-    a new frame. This property means that after we've seeked to the page we can immediately start looping over frames until
-    we find the one containing the target sample.
-    */
-    if (!drflac_oggbs__seek_physical(oggbs, runningFrameBytePos, drflac_seek_origin_start)) {
-        return DRFLAC_FALSE;
-    }
-    if (!drflac_oggbs__goto_next_page(oggbs, drflac_ogg_recover_on_crc_mismatch)) {
-        return DRFLAC_FALSE;
-    }
-
-    /*
-    At this point we'll be sitting on the first byte of the frame header of the first frame in the page. We just keep
-    looping over these frames until we find the one containing the sample we're after.
-    */
-    runningPCMFrameCount = runningGranulePosition;
-    for (;;) {
-        /*
-        There are two ways to find the sample and seek past irrelevant frames:
-          1) Use the native FLAC decoder.
-          2) Use Ogg's framing system.
-
-        Both of these options have their own pros and cons. Using the native FLAC decoder is slower because it needs to
-        do a full decode of the frame. Using Ogg's framing system is faster, but more complicated and involves some code
-        duplication for the decoding of frame headers.
-
-        Another thing to consider is that using the Ogg framing system will perform direct seeking of the physical Ogg
-        bitstream. This is important to consider because it means we cannot read data from the drflac_bs object using the
-        standard drflac__*() APIs because that will read in extra data for its own internal caching which in turn breaks
-        the positioning of the read pointer of the physical Ogg bitstream. Therefore, anything that would normally be read
-        using the native FLAC decoding APIs, such as drflac__read_next_flac_frame_header(), need to be re-implemented so as to
-        avoid the use of the drflac_bs object.
-
-        Considering these issues, I have decided to use the slower native FLAC decoding method for the following reasons:
-          1) Seeking is already partially accelerated using Ogg's paging system in the code block above.
-          2) Seeking in an Ogg encapsulated FLAC stream is probably quite uncommon.
-          3) Simplicity.
-        */
-        drflac_uint64 firstPCMFrameInFLACFrame = 0;
-        drflac_uint64 lastPCMFrameInFLACFrame = 0;
-        drflac_uint64 pcmFrameCountInThisFrame;
-
-        if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-            return DRFLAC_FALSE;
-        }
-
-        drflac__get_pcm_frame_range_of_current_flac_frame(pFlac, &firstPCMFrameInFLACFrame, &lastPCMFrameInFLACFrame);
-
-        pcmFrameCountInThisFrame = (lastPCMFrameInFLACFrame - firstPCMFrameInFLACFrame) + 1;
-
-        /* If we are seeking to the end of the file and we've just hit it, we're done. */
-        if (pcmFrameIndex == pFlac->totalPCMFrameCount && (runningPCMFrameCount + pcmFrameCountInThisFrame) == pFlac->totalPCMFrameCount) {
-            drflac_result result = drflac__decode_flac_frame(pFlac);
-            if (result == DRFLAC_SUCCESS) {
-                pFlac->currentPCMFrame = pcmFrameIndex;
-                pFlac->currentFLACFrame.pcmFramesRemaining = 0;
-                return DRFLAC_TRUE;
-            } else {
-                return DRFLAC_FALSE;
-            }
-        }
-
-        if (pcmFrameIndex < (runningPCMFrameCount + pcmFrameCountInThisFrame)) {
-            /*
-            The sample should be in this FLAC frame. We need to fully decode it, however if it's an invalid frame (a CRC mismatch), we need to pretend
-            it never existed and keep iterating.
-            */
-            drflac_result result = drflac__decode_flac_frame(pFlac);
-            if (result == DRFLAC_SUCCESS) {
-                /* The frame is valid. We just need to skip over some samples to ensure it's sample-exact. */
-                drflac_uint64 pcmFramesToDecode = (size_t)(pcmFrameIndex - runningPCMFrameCount);    /* <-- Safe cast because the maximum number of samples in a frame is 65535. */
-                if (pcmFramesToDecode == 0) {
-                    return DRFLAC_TRUE;
-                }
-
-                pFlac->currentPCMFrame = runningPCMFrameCount;
-
-                return drflac__seek_forward_by_pcm_frames(pFlac, pcmFramesToDecode) == pcmFramesToDecode;  /* <-- If this fails, something bad has happened (it should never fail). */
-            } else {
-                if (result == DRFLAC_CRC_MISMATCH) {
-                    continue;   /* CRC mismatch. Pretend this frame never existed. */
-                } else {
-                    return DRFLAC_FALSE;
-                }
-            }
-        } else {
-            /*
-            It's not in this frame. We need to seek past the frame, but check if there was a CRC mismatch. If so, we pretend this
-            frame never existed and leave the running sample count untouched.
-            */
-            drflac_result result = drflac__seek_to_next_flac_frame(pFlac);
-            if (result == DRFLAC_SUCCESS) {
-                runningPCMFrameCount += pcmFrameCountInThisFrame;
-            } else {
-                if (result == DRFLAC_CRC_MISMATCH) {
-                    continue;   /* CRC mismatch. Pretend this frame never existed. */
-                } else {
-                    return DRFLAC_FALSE;
-                }
-            }
-        }
-    }
-}
-
-
-
-static drflac_bool32 drflac__init_private__ogg(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, void* pUserDataMD, drflac_bool32 relaxed)
-{
-    drflac_ogg_page_header header;
-    drflac_uint32 crc32 = DRFLAC_OGG_CAPTURE_PATTERN_CRC32;
-    drflac_uint32 bytesRead = 0;
-
-    /* Pre Condition: The bit stream should be sitting just past the 4-byte OggS capture pattern. */
-    (void)relaxed;
-
-    pInit->container = drflac_container_ogg;
-    pInit->oggFirstBytePos = 0;
-
-    /*
-    We'll get here if the first 4 bytes of the stream were the OggS capture pattern, however it doesn't necessarily mean the
-    stream includes FLAC encoded audio. To check for this we need to scan the beginning-of-stream page markers and check if
-    any match the FLAC specification. Important to keep in mind that the stream may be multiplexed.
-    */
-    if (drflac_ogg__read_page_header_after_capture_pattern(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
-        return DRFLAC_FALSE;
-    }
-    pInit->runningFilePos += bytesRead;
-
-    for (;;) {
-        int pageBodySize;
-
-        /* Break if we're past the beginning of stream page. */
-        if ((header.headerType & 0x02) == 0) {
-            return DRFLAC_FALSE;
-        }
-
-        /* Check if it's a FLAC header. */
-        pageBodySize = drflac_ogg__get_page_body_size(&header);
-        if (pageBodySize == 51) {   /* 51 = the lacing value of the FLAC header packet. */
-            /* It could be a FLAC page... */
-            drflac_uint32 bytesRemainingInPage = pageBodySize;
-            drflac_uint8 packetType;
-
-            if (onRead(pUserData, &packetType, 1) != 1) {
-                return DRFLAC_FALSE;
-            }
-
-            bytesRemainingInPage -= 1;
-            if (packetType == 0x7F) {
-                /* Increasingly more likely to be a FLAC page... */
-                drflac_uint8 sig[4];
-                if (onRead(pUserData, sig, 4) != 4) {
-                    return DRFLAC_FALSE;
-                }
-
-                bytesRemainingInPage -= 4;
-                if (sig[0] == 'F' && sig[1] == 'L' && sig[2] == 'A' && sig[3] == 'C') {
-                    /* Almost certainly a FLAC page... */
-                    drflac_uint8 mappingVersion[2];
-                    if (onRead(pUserData, mappingVersion, 2) != 2) {
-                        return DRFLAC_FALSE;
-                    }
-
-                    if (mappingVersion[0] != 1) {
-                        return DRFLAC_FALSE;   /* Only supporting version 1.x of the Ogg mapping. */
-                    }
-
-                    /*
-                    The next 2 bytes are the non-audio packets, not including this one. We don't care about this because we're going to
-                    be handling it in a generic way based on the serial number and packet types.
-                    */
-                    if (!onSeek(pUserData, 2, drflac_seek_origin_current)) {
-                        return DRFLAC_FALSE;
-                    }
-
-                    /* Expecting the native FLAC signature "fLaC". */
-                    if (onRead(pUserData, sig, 4) != 4) {
-                        return DRFLAC_FALSE;
-                    }
-
-                    if (sig[0] == 'f' && sig[1] == 'L' && sig[2] == 'a' && sig[3] == 'C') {
-                        /* The remaining data in the page should be the STREAMINFO block. */
-                        drflac_streaminfo streaminfo;
-                        drflac_uint8 isLastBlock;
-                        drflac_uint8 blockType;
-                        drflac_uint32 blockSize;
-                        if (!drflac__read_and_decode_block_header(onRead, pUserData, &isLastBlock, &blockType, &blockSize)) {
-                            return DRFLAC_FALSE;
-                        }
-
-                        if (blockType != DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO || blockSize != 34) {
-                            return DRFLAC_FALSE;    /* Invalid block type. First block must be the STREAMINFO block. */
-                        }
-
-                        if (drflac__read_streaminfo(onRead, pUserData, &streaminfo)) {
-                            /* Success! */
-                            pInit->hasStreamInfoBlock      = DRFLAC_TRUE;
-                            pInit->sampleRate              = streaminfo.sampleRate;
-                            pInit->channels                = streaminfo.channels;
-                            pInit->bitsPerSample           = streaminfo.bitsPerSample;
-                            pInit->totalPCMFrameCount      = streaminfo.totalPCMFrameCount;
-                            pInit->maxBlockSizeInPCMFrames = streaminfo.maxBlockSizeInPCMFrames;
-                            pInit->hasMetadataBlocks       = !isLastBlock;
-
-                            if (onMeta) {
-                                drflac_metadata metadata;
-                                metadata.type = DRFLAC_METADATA_BLOCK_TYPE_STREAMINFO;
-                                metadata.pRawData = NULL;
-                                metadata.rawDataSize = 0;
-                                metadata.data.streaminfo = streaminfo;
-                                onMeta(pUserDataMD, &metadata);
-                            }
-
-                            pInit->runningFilePos  += pageBodySize;
-                            pInit->oggFirstBytePos  = pInit->runningFilePos - 79;   /* Subtracting 79 will place us right on top of the "OggS" identifier of the FLAC bos page. */
-                            pInit->oggSerial        = header.serialNumber;
-                            pInit->oggBosHeader     = header;
-                            break;
-                        } else {
-                            /* Failed to read STREAMINFO block. Aww, so close... */
-                            return DRFLAC_FALSE;
-                        }
-                    } else {
-                        /* Invalid file. */
-                        return DRFLAC_FALSE;
-                    }
-                } else {
-                    /* Not a FLAC header. Skip it. */
-                    if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
-                        return DRFLAC_FALSE;
-                    }
-                }
-            } else {
-                /* Not a FLAC header. Seek past the entire page and move on to the next. */
-                if (!onSeek(pUserData, bytesRemainingInPage, drflac_seek_origin_current)) {
-                    return DRFLAC_FALSE;
-                }
-            }
-        } else {
-            if (!onSeek(pUserData, pageBodySize, drflac_seek_origin_current)) {
-                return DRFLAC_FALSE;
-            }
-        }
-
-        pInit->runningFilePos += pageBodySize;
-
-
-        /* Read the header of the next page. */
-        if (drflac_ogg__read_page_header(onRead, pUserData, &header, &bytesRead, &crc32) != DRFLAC_SUCCESS) {
-            return DRFLAC_FALSE;
-        }
-        pInit->runningFilePos += bytesRead;
-    }
-
-    /*
-    If we get here it means we found a FLAC audio stream. We should be sitting on the first byte of the header of the next page. The next
-    packets in the FLAC logical stream contain the metadata. The only thing left to do in the initialization phase for Ogg is to create the
-    Ogg bistream object.
-    */
-    pInit->hasMetadataBlocks = DRFLAC_TRUE;    /* <-- Always have at least VORBIS_COMMENT metadata block. */
-    return DRFLAC_TRUE;
-}
-#endif
-
-static drflac_bool32 drflac__init_private(drflac_init_info* pInit, drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD)
-{
-    drflac_bool32 relaxed;
-    drflac_uint8 id[4];
-
-    if (pInit == NULL || onRead == NULL || onSeek == NULL) {
-        return DRFLAC_FALSE;
-    }
-
-    DRFLAC_ZERO_MEMORY(pInit, sizeof(*pInit));
-    pInit->onRead       = onRead;
-    pInit->onSeek       = onSeek;
-    pInit->onMeta       = onMeta;
-    pInit->container    = container;
-    pInit->pUserData    = pUserData;
-    pInit->pUserDataMD  = pUserDataMD;
-
-    pInit->bs.onRead    = onRead;
-    pInit->bs.onSeek    = onSeek;
-    pInit->bs.pUserData = pUserData;
-    drflac__reset_cache(&pInit->bs);
-
-
-    /* If the container is explicitly defined then we can try opening in relaxed mode. */
-    relaxed = container != drflac_container_unknown;
-
-    /* Skip over any ID3 tags. */
-    for (;;) {
-        if (onRead(pUserData, id, 4) != 4) {
-            return DRFLAC_FALSE;    /* Ran out of data. */
-        }
-        pInit->runningFilePos += 4;
-
-        if (id[0] == 'I' && id[1] == 'D' && id[2] == '3') {
-            drflac_uint8 header[6];
-            drflac_uint8 flags;
-            drflac_uint32 headerSize;
-
-            if (onRead(pUserData, header, 6) != 6) {
-                return DRFLAC_FALSE;    /* Ran out of data. */
-            }
-            pInit->runningFilePos += 6;
-
-            flags = header[1];
-
-            DRFLAC_COPY_MEMORY(&headerSize, header+2, 4);
-            headerSize = drflac__unsynchsafe_32(drflac__be2host_32(headerSize));
-            if (flags & 0x10) {
-                headerSize += 10;
-            }
-
-            if (!onSeek(pUserData, headerSize, drflac_seek_origin_current)) {
-                return DRFLAC_FALSE;    /* Failed to seek past the tag. */
-            }
-            pInit->runningFilePos += headerSize;
-        } else {
-            break;
-        }
-    }
-
-    if (id[0] == 'f' && id[1] == 'L' && id[2] == 'a' && id[3] == 'C') {
-        return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
-    }
-#ifndef DR_FLAC_NO_OGG
-    if (id[0] == 'O' && id[1] == 'g' && id[2] == 'g' && id[3] == 'S') {
-        return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
-    }
-#endif
-
-    /* If we get here it means we likely don't have a header. Try opening in relaxed mode, if applicable. */
-    if (relaxed) {
-        if (container == drflac_container_native) {
-            return drflac__init_private__native(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
-        }
-#ifndef DR_FLAC_NO_OGG
-        if (container == drflac_container_ogg) {
-            return drflac__init_private__ogg(pInit, onRead, onSeek, onMeta, pUserData, pUserDataMD, relaxed);
-        }
-#endif
-    }
-
-    /* Unsupported container. */
-    return DRFLAC_FALSE;
-}
-
-static void drflac__init_from_info(drflac* pFlac, const drflac_init_info* pInit)
-{
-    DRFLAC_ASSERT(pFlac != NULL);
-    DRFLAC_ASSERT(pInit != NULL);
-
-    DRFLAC_ZERO_MEMORY(pFlac, sizeof(*pFlac));
-    pFlac->bs                      = pInit->bs;
-    pFlac->onMeta                  = pInit->onMeta;
-    pFlac->pUserDataMD             = pInit->pUserDataMD;
-    pFlac->maxBlockSizeInPCMFrames = pInit->maxBlockSizeInPCMFrames;
-    pFlac->sampleRate              = pInit->sampleRate;
-    pFlac->channels                = (drflac_uint8)pInit->channels;
-    pFlac->bitsPerSample           = (drflac_uint8)pInit->bitsPerSample;
-    pFlac->totalPCMFrameCount      = pInit->totalPCMFrameCount;
-    pFlac->container               = pInit->container;
-}
-
-
-static drflac* drflac_open_with_metadata_private(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, void* pUserDataMD, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac_init_info init;
-    drflac_uint32 allocationSize;
-    drflac_uint32 wholeSIMDVectorCountPerChannel;
-    drflac_uint32 decodedSamplesAllocationSize;
-#ifndef DR_FLAC_NO_OGG
-    drflac_oggbs* pOggbs = NULL;
-#endif
-    drflac_uint64 firstFramePos;
-    drflac_uint64 seektablePos;
-    drflac_uint32 seekpointCount;
-    drflac_allocation_callbacks allocationCallbacks;
-    drflac* pFlac;
-
-    /* CPU support first. */
-    drflac__init_cpu_caps();
-
-    if (!drflac__init_private(&init, onRead, onSeek, onMeta, container, pUserData, pUserDataMD)) {
-        return NULL;
-    }
-
-    if (pAllocationCallbacks != NULL) {
-        allocationCallbacks = *pAllocationCallbacks;
-        if (allocationCallbacks.onFree == NULL || (allocationCallbacks.onMalloc == NULL && allocationCallbacks.onRealloc == NULL)) {
-            return NULL;    /* Invalid allocation callbacks. */
-        }
-    } else {
-        allocationCallbacks.pUserData = NULL;
-        allocationCallbacks.onMalloc  = drflac__malloc_default;
-        allocationCallbacks.onRealloc = drflac__realloc_default;
-        allocationCallbacks.onFree    = drflac__free_default;
-    }
-
-
-    /*
-    The size of the allocation for the drflac object needs to be large enough to fit the following:
-      1) The main members of the drflac structure
-      2) A block of memory large enough to store the decoded samples of the largest frame in the stream
-      3) If the container is Ogg, a drflac_oggbs object
-
-    The complicated part of the allocation is making sure there's enough room the decoded samples, taking into consideration
-    the different SIMD instruction sets.
-    */
-    allocationSize = sizeof(drflac);
-
-    /*
-    The allocation size for decoded frames depends on the number of 32-bit integers that fit inside the largest SIMD vector
-    we are supporting.
-    */
-    if ((init.maxBlockSizeInPCMFrames % (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) == 0) {
-        wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32)));
-    } else {
-        wholeSIMDVectorCountPerChannel = (init.maxBlockSizeInPCMFrames / (DRFLAC_MAX_SIMD_VECTOR_SIZE / sizeof(drflac_int32))) + 1;
-    }
-
-    decodedSamplesAllocationSize = wholeSIMDVectorCountPerChannel * DRFLAC_MAX_SIMD_VECTOR_SIZE * init.channels;
-
-    allocationSize += decodedSamplesAllocationSize;
-    allocationSize += DRFLAC_MAX_SIMD_VECTOR_SIZE;  /* Allocate extra bytes to ensure we have enough for alignment. */
-
-#ifndef DR_FLAC_NO_OGG
-    /* There's additional data required for Ogg streams. */
-    if (init.container == drflac_container_ogg) {
-        allocationSize += sizeof(drflac_oggbs);
-
-        pOggbs = (drflac_oggbs*)drflac__malloc_from_callbacks(sizeof(*pOggbs), &allocationCallbacks);
-        if (pOggbs == NULL) {
-            return NULL; /*DRFLAC_OUT_OF_MEMORY;*/
-        }
-
-        DRFLAC_ZERO_MEMORY(pOggbs, sizeof(*pOggbs));
-        pOggbs->onRead = onRead;
-        pOggbs->onSeek = onSeek;
-        pOggbs->pUserData = pUserData;
-        pOggbs->currentBytePos = init.oggFirstBytePos;
-        pOggbs->firstBytePos = init.oggFirstBytePos;
-        pOggbs->serialNumber = init.oggSerial;
-        pOggbs->bosPageHeader = init.oggBosHeader;
-        pOggbs->bytesRemainingInPage = 0;
-    }
-#endif
-
-    /*
-    This part is a bit awkward. We need to load the seektable so that it can be referenced in-memory, but I want the drflac object to
-    consist of only a single heap allocation. To this, the size of the seek table needs to be known, which we determine when reading
-    and decoding the metadata.
-    */
-    firstFramePos  = 42;   /* <-- We know we are at byte 42 at this point. */
-    seektablePos   = 0;
-    seekpointCount = 0;
-    if (init.hasMetadataBlocks) {
-        drflac_read_proc onReadOverride = onRead;
-        drflac_seek_proc onSeekOverride = onSeek;
-        void* pUserDataOverride = pUserData;
-
-#ifndef DR_FLAC_NO_OGG
-        if (init.container == drflac_container_ogg) {
-            onReadOverride = drflac__on_read_ogg;
-            onSeekOverride = drflac__on_seek_ogg;
-            pUserDataOverride = (void*)pOggbs;
-        }
-#endif
-
-        if (!drflac__read_and_decode_metadata(onReadOverride, onSeekOverride, onMeta, pUserDataOverride, pUserDataMD, &firstFramePos, &seektablePos, &seekpointCount, &allocationCallbacks)) {
-        #ifndef DR_FLAC_NO_OGG
-            drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
-        #endif
-            return NULL;
-        }
-
-        allocationSize += seekpointCount * sizeof(drflac_seekpoint);
-    }
-
-
-    pFlac = (drflac*)drflac__malloc_from_callbacks(allocationSize, &allocationCallbacks);
-    if (pFlac == NULL) {
-    #ifndef DR_FLAC_NO_OGG
-        drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
-    #endif
-        return NULL;
-    }
-
-    drflac__init_from_info(pFlac, &init);
-    pFlac->allocationCallbacks = allocationCallbacks;
-    pFlac->pDecodedSamples = (drflac_int32*)drflac_align((size_t)pFlac->pExtraData, DRFLAC_MAX_SIMD_VECTOR_SIZE);
-
-#ifndef DR_FLAC_NO_OGG
-    if (init.container == drflac_container_ogg) {
-        drflac_oggbs* pInternalOggbs = (drflac_oggbs*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize + (seekpointCount * sizeof(drflac_seekpoint)));
-        DRFLAC_COPY_MEMORY(pInternalOggbs, pOggbs, sizeof(*pOggbs));
-
-        /* At this point the pOggbs object has been handed over to pInternalOggbs and can be freed. */
-        drflac__free_from_callbacks(pOggbs, &allocationCallbacks);
-        pOggbs = NULL;
-
-        /* The Ogg bistream needs to be layered on top of the original bitstream. */
-        pFlac->bs.onRead = drflac__on_read_ogg;
-        pFlac->bs.onSeek = drflac__on_seek_ogg;
-        pFlac->bs.pUserData = (void*)pInternalOggbs;
-        pFlac->_oggbs = (void*)pInternalOggbs;
-    }
-#endif
-
-    pFlac->firstFLACFramePosInBytes = firstFramePos;
-
-    /* NOTE: Seektables are not currently compatible with Ogg encapsulation (Ogg has its own accelerated seeking system). I may change this later, so I'm leaving this here for now. */
-#ifndef DR_FLAC_NO_OGG
-    if (init.container == drflac_container_ogg)
-    {
-        pFlac->pSeekpoints = NULL;
-        pFlac->seekpointCount = 0;
-    }
-    else
-#endif
-    {
-        /* If we have a seektable we need to load it now, making sure we move back to where we were previously. */
-        if (seektablePos != 0) {
-            pFlac->seekpointCount = seekpointCount;
-            pFlac->pSeekpoints = (drflac_seekpoint*)((drflac_uint8*)pFlac->pDecodedSamples + decodedSamplesAllocationSize);
-
-            DRFLAC_ASSERT(pFlac->bs.onSeek != NULL);
-            DRFLAC_ASSERT(pFlac->bs.onRead != NULL);
-
-            /* Seek to the seektable, then just read directly into our seektable buffer. */
-            if (pFlac->bs.onSeek(pFlac->bs.pUserData, (int)seektablePos, drflac_seek_origin_start)) {
-                drflac_uint32 iSeekpoint;
-
-                for (iSeekpoint = 0; iSeekpoint < seekpointCount; iSeekpoint += 1) {
-                    if (pFlac->bs.onRead(pFlac->bs.pUserData, pFlac->pSeekpoints + iSeekpoint, DRFLAC_SEEKPOINT_SIZE_IN_BYTES) == DRFLAC_SEEKPOINT_SIZE_IN_BYTES) {
-                        /* Endian swap. */
-                        pFlac->pSeekpoints[iSeekpoint].firstPCMFrame   = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].firstPCMFrame);
-                        pFlac->pSeekpoints[iSeekpoint].flacFrameOffset = drflac__be2host_64(pFlac->pSeekpoints[iSeekpoint].flacFrameOffset);
-                        pFlac->pSeekpoints[iSeekpoint].pcmFrameCount   = drflac__be2host_16(pFlac->pSeekpoints[iSeekpoint].pcmFrameCount);
-                    } else {
-                        /* Failed to read the seektable. Pretend we don't have one. */
-                        pFlac->pSeekpoints = NULL;
-                        pFlac->seekpointCount = 0;
-                        break;
-                    }
-                }
-
-                /* We need to seek back to where we were. If this fails it's a critical error. */
-                if (!pFlac->bs.onSeek(pFlac->bs.pUserData, (int)pFlac->firstFLACFramePosInBytes, drflac_seek_origin_start)) {
-                    drflac__free_from_callbacks(pFlac, &allocationCallbacks);
-                    return NULL;
-                }
-            } else {
-                /* Failed to seek to the seektable. Ominous sign, but for now we can just pretend we don't have one. */
-                pFlac->pSeekpoints = NULL;
-                pFlac->seekpointCount = 0;
-            }
-        }
-    }
-
-
-    /*
-    If we get here, but don't have a STREAMINFO block, it means we've opened the stream in relaxed mode and need to decode
-    the first frame.
-    */
-    if (!init.hasStreamInfoBlock) {
-        pFlac->currentFLACFrame.header = init.firstFrameHeader;
-        for (;;) {
-            drflac_result result = drflac__decode_flac_frame(pFlac);
-            if (result == DRFLAC_SUCCESS) {
-                break;
-            } else {
-                if (result == DRFLAC_CRC_MISMATCH) {
-                    if (!drflac__read_next_flac_frame_header(&pFlac->bs, pFlac->bitsPerSample, &pFlac->currentFLACFrame.header)) {
-                        drflac__free_from_callbacks(pFlac, &allocationCallbacks);
-                        return NULL;
-                    }
-                    continue;
-                } else {
-                    drflac__free_from_callbacks(pFlac, &allocationCallbacks);
-                    return NULL;
-                }
-            }
-        }
-    }
-
-    return pFlac;
-}
-
-
-
-#ifndef DR_FLAC_NO_STDIO
-#include <stdio.h>
-#ifndef DR_FLAC_NO_WCHAR
-#include <wchar.h>      /* For wcslen(), wcsrtombs() */
-#endif
-
-/* Errno */
-/* drflac_result_from_errno() is only used for fopen() and wfopen() so putting it inside DR_WAV_NO_STDIO for now. If something else needs this later we can move it out. */
-#include <errno.h>
-static drflac_result drflac_result_from_errno(int e)
-{
-    switch (e)
-    {
-        case 0: return DRFLAC_SUCCESS;
-    #ifdef EPERM
-        case EPERM: return DRFLAC_INVALID_OPERATION;
-    #endif
-    #ifdef ENOENT
-        case ENOENT: return DRFLAC_DOES_NOT_EXIST;
-    #endif
-    #ifdef ESRCH
-        case ESRCH: return DRFLAC_DOES_NOT_EXIST;
-    #endif
-    #ifdef EINTR
-        case EINTR: return DRFLAC_INTERRUPT;
-    #endif
-    #ifdef EIO
-        case EIO: return DRFLAC_IO_ERROR;
-    #endif
-    #ifdef ENXIO
-        case ENXIO: return DRFLAC_DOES_NOT_EXIST;
-    #endif
-    #ifdef E2BIG
-        case E2BIG: return DRFLAC_INVALID_ARGS;
-    #endif
-    #ifdef ENOEXEC
-        case ENOEXEC: return DRFLAC_INVALID_FILE;
-    #endif
-    #ifdef EBADF
-        case EBADF: return DRFLAC_INVALID_FILE;
-    #endif
-    #ifdef ECHILD
-        case ECHILD: return DRFLAC_ERROR;
-    #endif
-    #ifdef EAGAIN
-        case EAGAIN: return DRFLAC_UNAVAILABLE;
-    #endif
-    #ifdef ENOMEM
-        case ENOMEM: return DRFLAC_OUT_OF_MEMORY;
-    #endif
-    #ifdef EACCES
-        case EACCES: return DRFLAC_ACCESS_DENIED;
-    #endif
-    #ifdef EFAULT
-        case EFAULT: return DRFLAC_BAD_ADDRESS;
-    #endif
-    #ifdef ENOTBLK
-        case ENOTBLK: return DRFLAC_ERROR;
-    #endif
-    #ifdef EBUSY
-        case EBUSY: return DRFLAC_BUSY;
-    #endif
-    #ifdef EEXIST
-        case EEXIST: return DRFLAC_ALREADY_EXISTS;
-    #endif
-    #ifdef EXDEV
-        case EXDEV: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENODEV
-        case ENODEV: return DRFLAC_DOES_NOT_EXIST;
-    #endif
-    #ifdef ENOTDIR
-        case ENOTDIR: return DRFLAC_NOT_DIRECTORY;
-    #endif
-    #ifdef EISDIR
-        case EISDIR: return DRFLAC_IS_DIRECTORY;
-    #endif
-    #ifdef EINVAL
-        case EINVAL: return DRFLAC_INVALID_ARGS;
-    #endif
-    #ifdef ENFILE
-        case ENFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
-    #endif
-    #ifdef EMFILE
-        case EMFILE: return DRFLAC_TOO_MANY_OPEN_FILES;
-    #endif
-    #ifdef ENOTTY
-        case ENOTTY: return DRFLAC_INVALID_OPERATION;
-    #endif
-    #ifdef ETXTBSY
-        case ETXTBSY: return DRFLAC_BUSY;
-    #endif
-    #ifdef EFBIG
-        case EFBIG: return DRFLAC_TOO_BIG;
-    #endif
-    #ifdef ENOSPC
-        case ENOSPC: return DRFLAC_NO_SPACE;
-    #endif
-    #ifdef ESPIPE
-        case ESPIPE: return DRFLAC_BAD_SEEK;
-    #endif
-    #ifdef EROFS
-        case EROFS: return DRFLAC_ACCESS_DENIED;
-    #endif
-    #ifdef EMLINK
-        case EMLINK: return DRFLAC_TOO_MANY_LINKS;
-    #endif
-    #ifdef EPIPE
-        case EPIPE: return DRFLAC_BAD_PIPE;
-    #endif
-    #ifdef EDOM
-        case EDOM: return DRFLAC_OUT_OF_RANGE;
-    #endif
-    #ifdef ERANGE
-        case ERANGE: return DRFLAC_OUT_OF_RANGE;
-    #endif
-    #ifdef EDEADLK
-        case EDEADLK: return DRFLAC_DEADLOCK;
-    #endif
-    #ifdef ENAMETOOLONG
-        case ENAMETOOLONG: return DRFLAC_PATH_TOO_LONG;
-    #endif
-    #ifdef ENOLCK
-        case ENOLCK: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENOSYS
-        case ENOSYS: return DRFLAC_NOT_IMPLEMENTED;
-    #endif
-    #ifdef ENOTEMPTY
-        case ENOTEMPTY: return DRFLAC_DIRECTORY_NOT_EMPTY;
-    #endif
-    #ifdef ELOOP
-        case ELOOP: return DRFLAC_TOO_MANY_LINKS;
-    #endif
-    #ifdef ENOMSG
-        case ENOMSG: return DRFLAC_NO_MESSAGE;
-    #endif
-    #ifdef EIDRM
-        case EIDRM: return DRFLAC_ERROR;
-    #endif
-    #ifdef ECHRNG
-        case ECHRNG: return DRFLAC_ERROR;
-    #endif
-    #ifdef EL2NSYNC
-        case EL2NSYNC: return DRFLAC_ERROR;
-    #endif
-    #ifdef EL3HLT
-        case EL3HLT: return DRFLAC_ERROR;
-    #endif
-    #ifdef EL3RST
-        case EL3RST: return DRFLAC_ERROR;
-    #endif
-    #ifdef ELNRNG
-        case ELNRNG: return DRFLAC_OUT_OF_RANGE;
-    #endif
-    #ifdef EUNATCH
-        case EUNATCH: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENOCSI
-        case ENOCSI: return DRFLAC_ERROR;
-    #endif
-    #ifdef EL2HLT
-        case EL2HLT: return DRFLAC_ERROR;
-    #endif
-    #ifdef EBADE
-        case EBADE: return DRFLAC_ERROR;
-    #endif
-    #ifdef EBADR
-        case EBADR: return DRFLAC_ERROR;
-    #endif
-    #ifdef EXFULL
-        case EXFULL: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENOANO
-        case ENOANO: return DRFLAC_ERROR;
-    #endif
-    #ifdef EBADRQC
-        case EBADRQC: return DRFLAC_ERROR;
-    #endif
-    #ifdef EBADSLT
-        case EBADSLT: return DRFLAC_ERROR;
-    #endif
-    #ifdef EBFONT
-        case EBFONT: return DRFLAC_INVALID_FILE;
-    #endif
-    #ifdef ENOSTR
-        case ENOSTR: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENODATA
-        case ENODATA: return DRFLAC_NO_DATA_AVAILABLE;
-    #endif
-    #ifdef ETIME
-        case ETIME: return DRFLAC_TIMEOUT;
-    #endif
-    #ifdef ENOSR
-        case ENOSR: return DRFLAC_NO_DATA_AVAILABLE;
-    #endif
-    #ifdef ENONET
-        case ENONET: return DRFLAC_NO_NETWORK;
-    #endif
-    #ifdef ENOPKG
-        case ENOPKG: return DRFLAC_ERROR;
-    #endif
-    #ifdef EREMOTE
-        case EREMOTE: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENOLINK
-        case ENOLINK: return DRFLAC_ERROR;
-    #endif
-    #ifdef EADV
-        case EADV: return DRFLAC_ERROR;
-    #endif
-    #ifdef ESRMNT
-        case ESRMNT: return DRFLAC_ERROR;
-    #endif
-    #ifdef ECOMM
-        case ECOMM: return DRFLAC_ERROR;
-    #endif
-    #ifdef EPROTO
-        case EPROTO: return DRFLAC_ERROR;
-    #endif
-    #ifdef EMULTIHOP
-        case EMULTIHOP: return DRFLAC_ERROR;
-    #endif
-    #ifdef EDOTDOT
-        case EDOTDOT: return DRFLAC_ERROR;
-    #endif
-    #ifdef EBADMSG
-        case EBADMSG: return DRFLAC_BAD_MESSAGE;
-    #endif
-    #ifdef EOVERFLOW
-        case EOVERFLOW: return DRFLAC_TOO_BIG;
-    #endif
-    #ifdef ENOTUNIQ
-        case ENOTUNIQ: return DRFLAC_NOT_UNIQUE;
-    #endif
-    #ifdef EBADFD
-        case EBADFD: return DRFLAC_ERROR;
-    #endif
-    #ifdef EREMCHG
-        case EREMCHG: return DRFLAC_ERROR;
-    #endif
-    #ifdef ELIBACC
-        case ELIBACC: return DRFLAC_ACCESS_DENIED;
-    #endif
-    #ifdef ELIBBAD
-        case ELIBBAD: return DRFLAC_INVALID_FILE;
-    #endif
-    #ifdef ELIBSCN
-        case ELIBSCN: return DRFLAC_INVALID_FILE;
-    #endif
-    #ifdef ELIBMAX
-        case ELIBMAX: return DRFLAC_ERROR;
-    #endif
-    #ifdef ELIBEXEC
-        case ELIBEXEC: return DRFLAC_ERROR;
-    #endif
-    #ifdef EILSEQ
-        case EILSEQ: return DRFLAC_INVALID_DATA;
-    #endif
-    #ifdef ERESTART
-        case ERESTART: return DRFLAC_ERROR;
-    #endif
-    #ifdef ESTRPIPE
-        case ESTRPIPE: return DRFLAC_ERROR;
-    #endif
-    #ifdef EUSERS
-        case EUSERS: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENOTSOCK
-        case ENOTSOCK: return DRFLAC_NOT_SOCKET;
-    #endif
-    #ifdef EDESTADDRREQ
-        case EDESTADDRREQ: return DRFLAC_NO_ADDRESS;
-    #endif
-    #ifdef EMSGSIZE
-        case EMSGSIZE: return DRFLAC_TOO_BIG;
-    #endif
-    #ifdef EPROTOTYPE
-        case EPROTOTYPE: return DRFLAC_BAD_PROTOCOL;
-    #endif
-    #ifdef ENOPROTOOPT
-        case ENOPROTOOPT: return DRFLAC_PROTOCOL_UNAVAILABLE;
-    #endif
-    #ifdef EPROTONOSUPPORT
-        case EPROTONOSUPPORT: return DRFLAC_PROTOCOL_NOT_SUPPORTED;
-    #endif
-    #ifdef ESOCKTNOSUPPORT
-        case ESOCKTNOSUPPORT: return DRFLAC_SOCKET_NOT_SUPPORTED;
-    #endif
-    #ifdef EOPNOTSUPP
-        case EOPNOTSUPP: return DRFLAC_INVALID_OPERATION;
-    #endif
-    #ifdef EPFNOSUPPORT
-        case EPFNOSUPPORT: return DRFLAC_PROTOCOL_FAMILY_NOT_SUPPORTED;
-    #endif
-    #ifdef EAFNOSUPPORT
-        case EAFNOSUPPORT: return DRFLAC_ADDRESS_FAMILY_NOT_SUPPORTED;
-    #endif
-    #ifdef EADDRINUSE
-        case EADDRINUSE: return DRFLAC_ALREADY_IN_USE;
-    #endif
-    #ifdef EADDRNOTAVAIL
-        case EADDRNOTAVAIL: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENETDOWN
-        case ENETDOWN: return DRFLAC_NO_NETWORK;
-    #endif
-    #ifdef ENETUNREACH
-        case ENETUNREACH: return DRFLAC_NO_NETWORK;
-    #endif
-    #ifdef ENETRESET
-        case ENETRESET: return DRFLAC_NO_NETWORK;
-    #endif
-    #ifdef ECONNABORTED
-        case ECONNABORTED: return DRFLAC_NO_NETWORK;
-    #endif
-    #ifdef ECONNRESET
-        case ECONNRESET: return DRFLAC_CONNECTION_RESET;
-    #endif
-    #ifdef ENOBUFS
-        case ENOBUFS: return DRFLAC_NO_SPACE;
-    #endif
-    #ifdef EISCONN
-        case EISCONN: return DRFLAC_ALREADY_CONNECTED;
-    #endif
-    #ifdef ENOTCONN
-        case ENOTCONN: return DRFLAC_NOT_CONNECTED;
-    #endif
-    #ifdef ESHUTDOWN
-        case ESHUTDOWN: return DRFLAC_ERROR;
-    #endif
-    #ifdef ETOOMANYREFS
-        case ETOOMANYREFS: return DRFLAC_ERROR;
-    #endif
-    #ifdef ETIMEDOUT
-        case ETIMEDOUT: return DRFLAC_TIMEOUT;
-    #endif
-    #ifdef ECONNREFUSED
-        case ECONNREFUSED: return DRFLAC_CONNECTION_REFUSED;
-    #endif
-    #ifdef EHOSTDOWN
-        case EHOSTDOWN: return DRFLAC_NO_HOST;
-    #endif
-    #ifdef EHOSTUNREACH
-        case EHOSTUNREACH: return DRFLAC_NO_HOST;
-    #endif
-    #ifdef EALREADY
-        case EALREADY: return DRFLAC_IN_PROGRESS;
-    #endif
-    #ifdef EINPROGRESS
-        case EINPROGRESS: return DRFLAC_IN_PROGRESS;
-    #endif
-    #ifdef ESTALE
-        case ESTALE: return DRFLAC_INVALID_FILE;
-    #endif
-    #ifdef EUCLEAN
-        case EUCLEAN: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENOTNAM
-        case ENOTNAM: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENAVAIL
-        case ENAVAIL: return DRFLAC_ERROR;
-    #endif
-    #ifdef EISNAM
-        case EISNAM: return DRFLAC_ERROR;
-    #endif
-    #ifdef EREMOTEIO
-        case EREMOTEIO: return DRFLAC_IO_ERROR;
-    #endif
-    #ifdef EDQUOT
-        case EDQUOT: return DRFLAC_NO_SPACE;
-    #endif
-    #ifdef ENOMEDIUM
-        case ENOMEDIUM: return DRFLAC_DOES_NOT_EXIST;
-    #endif
-    #ifdef EMEDIUMTYPE
-        case EMEDIUMTYPE: return DRFLAC_ERROR;
-    #endif
-    #ifdef ECANCELED
-        case ECANCELED: return DRFLAC_CANCELLED;
-    #endif
-    #ifdef ENOKEY
-        case ENOKEY: return DRFLAC_ERROR;
-    #endif
-    #ifdef EKEYEXPIRED
-        case EKEYEXPIRED: return DRFLAC_ERROR;
-    #endif
-    #ifdef EKEYREVOKED
-        case EKEYREVOKED: return DRFLAC_ERROR;
-    #endif
-    #ifdef EKEYREJECTED
-        case EKEYREJECTED: return DRFLAC_ERROR;
-    #endif
-    #ifdef EOWNERDEAD
-        case EOWNERDEAD: return DRFLAC_ERROR;
-    #endif
-    #ifdef ENOTRECOVERABLE
-        case ENOTRECOVERABLE: return DRFLAC_ERROR;
-    #endif
-    #ifdef ERFKILL
-        case ERFKILL: return DRFLAC_ERROR;
-    #endif
-    #ifdef EHWPOISON
-        case EHWPOISON: return DRFLAC_ERROR;
-    #endif
-        default: return DRFLAC_ERROR;
-    }
-}
-/* End Errno */
-
-/* fopen */
-static drflac_result drflac_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
-{
-#if defined(_MSC_VER) && _MSC_VER >= 1400
-    errno_t err;
-#endif
-
-    if (ppFile != NULL) {
-        *ppFile = NULL;  /* Safety. */
-    }
-
-    if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
-        return DRFLAC_INVALID_ARGS;
-    }
-
-#if defined(_MSC_VER) && _MSC_VER >= 1400
-    err = fopen_s(ppFile, pFilePath, pOpenMode);
-    if (err != 0) {
-        return drflac_result_from_errno(err);
-    }
-#else
-#if defined(_WIN32) || defined(__APPLE__)
-    *ppFile = fopen(pFilePath, pOpenMode);
-#else
-    #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
-        *ppFile = fopen64(pFilePath, pOpenMode);
-    #else
-        *ppFile = fopen(pFilePath, pOpenMode);
-    #endif
-#endif
-    if (*ppFile == NULL) {
-        drflac_result result = drflac_result_from_errno(errno);
-        if (result == DRFLAC_SUCCESS) {
-            result = DRFLAC_ERROR;   /* Just a safety check to make sure we never ever return success when pFile == NULL. */
-        }
-
-        return result;
-    }
-#endif
-
-    return DRFLAC_SUCCESS;
-}
-
-/*
-_wfopen() isn't always available in all compilation environments.
-
-    * Windows only.
-    * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
-    * MinGW-64 (both 32- and 64-bit) seems to support it.
-    * MinGW wraps it in !defined(__STRICT_ANSI__).
-    * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
-
-This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
-fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
-*/
-#if defined(_WIN32)
-    #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
-        #define DRFLAC_HAS_WFOPEN
-    #endif
-#endif
-
-#ifndef DR_FLAC_NO_WCHAR
-static drflac_result drflac_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    if (ppFile != NULL) {
-        *ppFile = NULL;  /* Safety. */
-    }
-
-    if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
-        return DRFLAC_INVALID_ARGS;
-    }
-
-#if defined(DRFLAC_HAS_WFOPEN)
-    {
-        /* Use _wfopen() on Windows. */
-    #if defined(_MSC_VER) && _MSC_VER >= 1400
-        errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
-        if (err != 0) {
-            return drflac_result_from_errno(err);
-        }
-    #else
-        *ppFile = _wfopen(pFilePath, pOpenMode);
-        if (*ppFile == NULL) {
-            return drflac_result_from_errno(errno);
-        }
-    #endif
-        (void)pAllocationCallbacks;
-    }
-#else
-    /*
-    Use fopen() on anything other than Windows. Requires a conversion. This is annoying because
-	fopen() is locale specific. The only real way I can think of to do this is with wcsrtombs(). Note
-	that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
-    maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler
-	error I'll look into improving compatibility.
-    */
-
-	/*
-	Some compilers don't support wchar_t or wcsrtombs() which we're using below. In this case we just
-	need to abort with an error. If you encounter a compiler lacking such support, add it to this list
-	and submit a bug report and it'll be added to the library upstream.
-	*/
-	#if defined(__DJGPP__)
-	{
-		/* Nothing to do here. This will fall through to the error check below. */
-	}
-	#else
-    {
-        mbstate_t mbs;
-        size_t lenMB;
-        const wchar_t* pFilePathTemp = pFilePath;
-        char* pFilePathMB = NULL;
-        char pOpenModeMB[32] = {0};
-
-        /* Get the length first. */
-        DRFLAC_ZERO_OBJECT(&mbs);
-        lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
-        if (lenMB == (size_t)-1) {
-            return drflac_result_from_errno(errno);
-        }
-
-        pFilePathMB = (char*)drflac__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
-        if (pFilePathMB == NULL) {
-            return DRFLAC_OUT_OF_MEMORY;
-        }
-
-        pFilePathTemp = pFilePath;
-        DRFLAC_ZERO_OBJECT(&mbs);
-        wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
-
-        /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
-        {
-            size_t i = 0;
-            for (;;) {
-                if (pOpenMode[i] == 0) {
-                    pOpenModeMB[i] = '\0';
-                    break;
-                }
-
-                pOpenModeMB[i] = (char)pOpenMode[i];
-                i += 1;
-            }
-        }
-
-        *ppFile = fopen(pFilePathMB, pOpenModeMB);
-
-        drflac__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
-    }
-	#endif
-
-    if (*ppFile == NULL) {
-        return DRFLAC_ERROR;
-    }
-#endif
-
-    return DRFLAC_SUCCESS;
-}
-#endif
-/* End fopen */
-
-static size_t drflac__on_read_stdio(void* pUserData, void* bufferOut, size_t bytesToRead)
-{
-    return fread(bufferOut, 1, bytesToRead, (FILE*)pUserData);
-}
-
-static drflac_bool32 drflac__on_seek_stdio(void* pUserData, int offset, drflac_seek_origin origin)
-{
-    DRFLAC_ASSERT(offset >= 0);  /* <-- Never seek backwards. */
-
-    return fseek((FILE*)pUserData, offset, (origin == drflac_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
-}
-
-
-DRFLAC_API drflac* drflac_open_file(const char* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-    FILE* pFile;
-
-    if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
-        return NULL;
-    }
-
-    pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        fclose(pFile);
-        return NULL;
-    }
-
-    return pFlac;
-}
-
-#ifndef DR_FLAC_NO_WCHAR
-DRFLAC_API drflac* drflac_open_file_w(const wchar_t* pFileName, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-    FILE* pFile;
-
-    if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
-        return NULL;
-    }
-
-    pFlac = drflac_open(drflac__on_read_stdio, drflac__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        fclose(pFile);
-        return NULL;
-    }
-
-    return pFlac;
-}
-#endif
-
-DRFLAC_API drflac* drflac_open_file_with_metadata(const char* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-    FILE* pFile;
-
-    if (drflac_fopen(&pFile, pFileName, "rb") != DRFLAC_SUCCESS) {
-        return NULL;
-    }
-
-    pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        fclose(pFile);
-        return pFlac;
-    }
-
-    return pFlac;
-}
-
-#ifndef DR_FLAC_NO_WCHAR
-DRFLAC_API drflac* drflac_open_file_with_metadata_w(const wchar_t* pFileName, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-    FILE* pFile;
-
-    if (drflac_wfopen(&pFile, pFileName, L"rb", pAllocationCallbacks) != DRFLAC_SUCCESS) {
-        return NULL;
-    }
-
-    pFlac = drflac_open_with_metadata_private(drflac__on_read_stdio, drflac__on_seek_stdio, onMeta, drflac_container_unknown, (void*)pFile, pUserData, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        fclose(pFile);
-        return pFlac;
-    }
-
-    return pFlac;
-}
-#endif
-#endif  /* DR_FLAC_NO_STDIO */
-
-static size_t drflac__on_read_memory(void* pUserData, void* bufferOut, size_t bytesToRead)
-{
-    drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
-    size_t bytesRemaining;
-
-    DRFLAC_ASSERT(memoryStream != NULL);
-    DRFLAC_ASSERT(memoryStream->dataSize >= memoryStream->currentReadPos);
-
-    bytesRemaining = memoryStream->dataSize - memoryStream->currentReadPos;
-    if (bytesToRead > bytesRemaining) {
-        bytesToRead = bytesRemaining;
-    }
-
-    if (bytesToRead > 0) {
-        DRFLAC_COPY_MEMORY(bufferOut, memoryStream->data + memoryStream->currentReadPos, bytesToRead);
-        memoryStream->currentReadPos += bytesToRead;
-    }
-
-    return bytesToRead;
-}
-
-static drflac_bool32 drflac__on_seek_memory(void* pUserData, int offset, drflac_seek_origin origin)
-{
-    drflac__memory_stream* memoryStream = (drflac__memory_stream*)pUserData;
-
-    DRFLAC_ASSERT(memoryStream != NULL);
-    DRFLAC_ASSERT(offset >= 0); /* <-- Never seek backwards. */
-
-    if (offset > (drflac_int64)memoryStream->dataSize) {
-        return DRFLAC_FALSE;
-    }
-
-    if (origin == drflac_seek_origin_current) {
-        if (memoryStream->currentReadPos + offset <= memoryStream->dataSize) {
-            memoryStream->currentReadPos += offset;
-        } else {
-            return DRFLAC_FALSE;  /* Trying to seek too far forward. */
-        }
-    } else {
-        if ((drflac_uint32)offset <= memoryStream->dataSize) {
-            memoryStream->currentReadPos = offset;
-        } else {
-            return DRFLAC_FALSE;  /* Trying to seek too far forward. */
-        }
-    }
-
-    return DRFLAC_TRUE;
-}
-
-DRFLAC_API drflac* drflac_open_memory(const void* pData, size_t dataSize, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac__memory_stream memoryStream;
-    drflac* pFlac;
-
-    memoryStream.data = (const drflac_uint8*)pData;
-    memoryStream.dataSize = dataSize;
-    memoryStream.currentReadPos = 0;
-    pFlac = drflac_open(drflac__on_read_memory, drflac__on_seek_memory, &memoryStream, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    pFlac->memoryStream = memoryStream;
-
-    /* This is an awful hack... */
-#ifndef DR_FLAC_NO_OGG
-    if (pFlac->container == drflac_container_ogg)
-    {
-        drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
-        oggbs->pUserData = &pFlac->memoryStream;
-    }
-    else
-#endif
-    {
-        pFlac->bs.pUserData = &pFlac->memoryStream;
-    }
-
-    return pFlac;
-}
-
-DRFLAC_API drflac* drflac_open_memory_with_metadata(const void* pData, size_t dataSize, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac__memory_stream memoryStream;
-    drflac* pFlac;
-
-    memoryStream.data = (const drflac_uint8*)pData;
-    memoryStream.dataSize = dataSize;
-    memoryStream.currentReadPos = 0;
-    pFlac = drflac_open_with_metadata_private(drflac__on_read_memory, drflac__on_seek_memory, onMeta, drflac_container_unknown, &memoryStream, pUserData, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    pFlac->memoryStream = memoryStream;
-
-    /* This is an awful hack... */
-#ifndef DR_FLAC_NO_OGG
-    if (pFlac->container == drflac_container_ogg)
-    {
-        drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
-        oggbs->pUserData = &pFlac->memoryStream;
-    }
-    else
-#endif
-    {
-        pFlac->bs.pUserData = &pFlac->memoryStream;
-    }
-
-    return pFlac;
-}
-
-
-
-DRFLAC_API drflac* drflac_open(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    return drflac_open_with_metadata_private(onRead, onSeek, NULL, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
-}
-DRFLAC_API drflac* drflac_open_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    return drflac_open_with_metadata_private(onRead, onSeek, NULL, container, pUserData, pUserData, pAllocationCallbacks);
-}
-
-DRFLAC_API drflac* drflac_open_with_metadata(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    return drflac_open_with_metadata_private(onRead, onSeek, onMeta, drflac_container_unknown, pUserData, pUserData, pAllocationCallbacks);
-}
-DRFLAC_API drflac* drflac_open_with_metadata_relaxed(drflac_read_proc onRead, drflac_seek_proc onSeek, drflac_meta_proc onMeta, drflac_container container, void* pUserData, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    return drflac_open_with_metadata_private(onRead, onSeek, onMeta, container, pUserData, pUserData, pAllocationCallbacks);
-}
-
-DRFLAC_API void drflac_close(drflac* pFlac)
-{
-    if (pFlac == NULL) {
-        return;
-    }
-
-#ifndef DR_FLAC_NO_STDIO
-    /*
-    If we opened the file with drflac_open_file() we will want to close the file handle. We can know whether or not drflac_open_file()
-    was used by looking at the callbacks.
-    */
-    if (pFlac->bs.onRead == drflac__on_read_stdio) {
-        fclose((FILE*)pFlac->bs.pUserData);
-    }
-
-#ifndef DR_FLAC_NO_OGG
-    /* Need to clean up Ogg streams a bit differently due to the way the bit streaming is chained. */
-    if (pFlac->container == drflac_container_ogg) {
-        drflac_oggbs* oggbs = (drflac_oggbs*)pFlac->_oggbs;
-        DRFLAC_ASSERT(pFlac->bs.onRead == drflac__on_read_ogg);
-
-        if (oggbs->onRead == drflac__on_read_stdio) {
-            fclose((FILE*)oggbs->pUserData);
-        }
-    }
-#endif
-#endif
-
-    drflac__free_from_callbacks(pFlac, &pFlac->allocationCallbacks);
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    for (i = 0; i < frameCount; ++i) {
-        drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-        drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-        drflac_uint32 right = left - side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left;
-        pOutputSamples[i*2+1] = (drflac_int32)right;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    for (i = 0; i < frameCount4; ++i) {
-        drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
-        drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
-        drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
-        drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
-
-        drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
-        drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
-        drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
-        drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
-
-        drflac_uint32 right0 = left0 - side0;
-        drflac_uint32 right1 = left1 - side1;
-        drflac_uint32 right2 = left2 - side2;
-        drflac_uint32 right3 = left3 - side3;
-
-        pOutputSamples[i*8+0] = (drflac_int32)left0;
-        pOutputSamples[i*8+1] = (drflac_int32)right0;
-        pOutputSamples[i*8+2] = (drflac_int32)left1;
-        pOutputSamples[i*8+3] = (drflac_int32)right1;
-        pOutputSamples[i*8+4] = (drflac_int32)left2;
-        pOutputSamples[i*8+5] = (drflac_int32)right2;
-        pOutputSamples[i*8+6] = (drflac_int32)left3;
-        pOutputSamples[i*8+7] = (drflac_int32)right3;
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
-        drflac_uint32 right = left - side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left;
-        pOutputSamples[i*2+1] = (drflac_int32)right;
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    for (i = 0; i < frameCount4; ++i) {
-        __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
-        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
-        __m128i right = _mm_sub_epi32(left, side);
-
-        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
-        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
-        drflac_uint32 right = left - side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left;
-        pOutputSamples[i*2+1] = (drflac_int32)right;
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-    int32x4_t shift0_4;
-    int32x4_t shift1_4;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    shift0_4 = vdupq_n_s32(shift0);
-    shift1_4 = vdupq_n_s32(shift1);
-
-    for (i = 0; i < frameCount4; ++i) {
-        uint32x4_t left;
-        uint32x4_t side;
-        uint32x4_t right;
-
-        left  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
-        side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
-        right = vsubq_u32(left, side);
-
-        drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
-        drflac_uint32 right = left - side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left;
-        pOutputSamples[i*2+1] = (drflac_int32)right;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_s32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_s32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    for (i = 0; i < frameCount; ++i) {
-        drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-        drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-        drflac_uint32 left  = right + side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left;
-        pOutputSamples[i*2+1] = (drflac_int32)right;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    for (i = 0; i < frameCount4; ++i) {
-        drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
-        drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
-        drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
-        drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
-
-        drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
-        drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
-        drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
-        drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
-
-        drflac_uint32 left0 = right0 + side0;
-        drflac_uint32 left1 = right1 + side1;
-        drflac_uint32 left2 = right2 + side2;
-        drflac_uint32 left3 = right3 + side3;
-
-        pOutputSamples[i*8+0] = (drflac_int32)left0;
-        pOutputSamples[i*8+1] = (drflac_int32)right0;
-        pOutputSamples[i*8+2] = (drflac_int32)left1;
-        pOutputSamples[i*8+3] = (drflac_int32)right1;
-        pOutputSamples[i*8+4] = (drflac_int32)left2;
-        pOutputSamples[i*8+5] = (drflac_int32)right2;
-        pOutputSamples[i*8+6] = (drflac_int32)left3;
-        pOutputSamples[i*8+7] = (drflac_int32)right3;
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 right = pInputSamples1U32[i] << shift1;
-        drflac_uint32 left  = right + side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left;
-        pOutputSamples[i*2+1] = (drflac_int32)right;
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    for (i = 0; i < frameCount4; ++i) {
-        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
-        __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
-        __m128i left  = _mm_add_epi32(right, side);
-
-        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
-        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 right = pInputSamples1U32[i] << shift1;
-        drflac_uint32 left  = right + side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left;
-        pOutputSamples[i*2+1] = (drflac_int32)right;
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-    int32x4_t shift0_4;
-    int32x4_t shift1_4;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    shift0_4 = vdupq_n_s32(shift0);
-    shift1_4 = vdupq_n_s32(shift1);
-
-    for (i = 0; i < frameCount4; ++i) {
-        uint32x4_t side;
-        uint32x4_t right;
-        uint32x4_t left;
-
-        side  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
-        right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
-        left  = vaddq_u32(right, side);
-
-        drflac__vst2q_u32((drflac_uint32*)pOutputSamples + i*8, vzipq_u32(left, right));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 right = pInputSamples1U32[i] << shift1;
-        drflac_uint32 left  = right + side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left;
-        pOutputSamples[i*2+1] = (drflac_int32)right;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_s32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_s32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    for (drflac_uint64 i = 0; i < frameCount; ++i) {
-        drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-        drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-        mid = (mid << 1) | (side & 0x01);
-
-        pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
-        pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_int32 shift = unusedBitsPerSample;
-
-    if (shift > 0) {
-        shift -= 1;
-        for (i = 0; i < frameCount4; ++i) {
-            drflac_uint32 temp0L;
-            drflac_uint32 temp1L;
-            drflac_uint32 temp2L;
-            drflac_uint32 temp3L;
-            drflac_uint32 temp0R;
-            drflac_uint32 temp1R;
-            drflac_uint32 temp2R;
-            drflac_uint32 temp3R;
-
-            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-
-            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid0 = (mid0 << 1) | (side0 & 0x01);
-            mid1 = (mid1 << 1) | (side1 & 0x01);
-            mid2 = (mid2 << 1) | (side2 & 0x01);
-            mid3 = (mid3 << 1) | (side3 & 0x01);
-
-            temp0L = (mid0 + side0) << shift;
-            temp1L = (mid1 + side1) << shift;
-            temp2L = (mid2 + side2) << shift;
-            temp3L = (mid3 + side3) << shift;
-
-            temp0R = (mid0 - side0) << shift;
-            temp1R = (mid1 - side1) << shift;
-            temp2R = (mid2 - side2) << shift;
-            temp3R = (mid3 - side3) << shift;
-
-            pOutputSamples[i*8+0] = (drflac_int32)temp0L;
-            pOutputSamples[i*8+1] = (drflac_int32)temp0R;
-            pOutputSamples[i*8+2] = (drflac_int32)temp1L;
-            pOutputSamples[i*8+3] = (drflac_int32)temp1R;
-            pOutputSamples[i*8+4] = (drflac_int32)temp2L;
-            pOutputSamples[i*8+5] = (drflac_int32)temp2R;
-            pOutputSamples[i*8+6] = (drflac_int32)temp3L;
-            pOutputSamples[i*8+7] = (drflac_int32)temp3R;
-        }
-    } else {
-        for (i = 0; i < frameCount4; ++i) {
-            drflac_uint32 temp0L;
-            drflac_uint32 temp1L;
-            drflac_uint32 temp2L;
-            drflac_uint32 temp3L;
-            drflac_uint32 temp0R;
-            drflac_uint32 temp1R;
-            drflac_uint32 temp2R;
-            drflac_uint32 temp3R;
-
-            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-
-            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid0 = (mid0 << 1) | (side0 & 0x01);
-            mid1 = (mid1 << 1) | (side1 & 0x01);
-            mid2 = (mid2 << 1) | (side2 & 0x01);
-            mid3 = (mid3 << 1) | (side3 & 0x01);
-
-            temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
-            temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
-            temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
-            temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
-
-            temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
-            temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
-            temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
-            temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
-
-            pOutputSamples[i*8+0] = (drflac_int32)temp0L;
-            pOutputSamples[i*8+1] = (drflac_int32)temp0R;
-            pOutputSamples[i*8+2] = (drflac_int32)temp1L;
-            pOutputSamples[i*8+3] = (drflac_int32)temp1R;
-            pOutputSamples[i*8+4] = (drflac_int32)temp2L;
-            pOutputSamples[i*8+5] = (drflac_int32)temp2R;
-            pOutputSamples[i*8+6] = (drflac_int32)temp3L;
-            pOutputSamples[i*8+7] = (drflac_int32)temp3R;
-        }
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-        drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-        mid = (mid << 1) | (side & 0x01);
-
-        pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample);
-        pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample);
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_int32 shift = unusedBitsPerSample;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    if (shift == 0) {
-        for (i = 0; i < frameCount4; ++i) {
-            __m128i mid;
-            __m128i side;
-            __m128i left;
-            __m128i right;
-
-            mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-            side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-
-            mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
-
-            left  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
-            right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
-
-            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
-            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
-            pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
-        }
-    } else {
-        shift -= 1;
-        for (i = 0; i < frameCount4; ++i) {
-            __m128i mid;
-            __m128i side;
-            __m128i left;
-            __m128i right;
-
-            mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-            side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-
-            mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
-
-            left  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
-            right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
-
-            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
-            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
-            pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
-        }
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_int32 shift = unusedBitsPerSample;
-    int32x4_t  wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
-    int32x4_t  wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
-    uint32x4_t one4;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-    wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-    one4         = vdupq_n_u32(1);
-
-    if (shift == 0) {
-        for (i = 0; i < frameCount4; ++i) {
-            uint32x4_t mid;
-            uint32x4_t side;
-            int32x4_t left;
-            int32x4_t right;
-
-            mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
-            side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
-
-            mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
-
-            left  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
-            right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
-
-            drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = (drflac_int32)(mid + side) >> 1;
-            pOutputSamples[i*2+1] = (drflac_int32)(mid - side) >> 1;
-        }
-    } else {
-        int32x4_t shift4;
-
-        shift -= 1;
-        shift4 = vdupq_n_s32(shift);
-
-        for (i = 0; i < frameCount4; ++i) {
-            uint32x4_t mid;
-            uint32x4_t side;
-            int32x4_t left;
-            int32x4_t right;
-
-            mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
-            side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
-
-            mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, one4));
-
-            left  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
-            right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
-
-            drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift);
-            pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift);
-        }
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_s32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_s32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    for (drflac_uint64 i = 0; i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample));
-        pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample));
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    for (i = 0; i < frameCount4; ++i) {
-        drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
-        drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
-        drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
-        drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
-
-        drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
-        drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
-        drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
-        drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
-
-        pOutputSamples[i*8+0] = (drflac_int32)tempL0;
-        pOutputSamples[i*8+1] = (drflac_int32)tempR0;
-        pOutputSamples[i*8+2] = (drflac_int32)tempL1;
-        pOutputSamples[i*8+3] = (drflac_int32)tempR1;
-        pOutputSamples[i*8+4] = (drflac_int32)tempL2;
-        pOutputSamples[i*8+5] = (drflac_int32)tempR2;
-        pOutputSamples[i*8+6] = (drflac_int32)tempL3;
-        pOutputSamples[i*8+7] = (drflac_int32)tempR3;
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
-        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    for (i = 0; i < frameCount4; ++i) {
-        __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
-        __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
-
-        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 0), _mm_unpacklo_epi32(left, right));
-        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8 + 4), _mm_unpackhi_epi32(left, right));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
-        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    int32x4_t shift4_0 = vdupq_n_s32(shift0);
-    int32x4_t shift4_1 = vdupq_n_s32(shift1);
-
-    for (i = 0; i < frameCount4; ++i) {
-        int32x4_t left;
-        int32x4_t right;
-
-        left  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift4_0));
-        right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift4_1));
-
-        drflac__vst2q_s32(pOutputSamples + i*8, vzipq_s32(left, right));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0);
-        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1);
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int32* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_s32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_s32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-
-DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s32(drflac* pFlac, drflac_uint64 framesToRead, drflac_int32* pBufferOut)
-{
-    drflac_uint64 framesRead;
-    drflac_uint32 unusedBitsPerSample;
-
-    if (pFlac == NULL || framesToRead == 0) {
-        return 0;
-    }
-
-    if (pBufferOut == NULL) {
-        return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
-    }
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
-    unusedBitsPerSample = 32 - pFlac->bitsPerSample;
-
-    framesRead = 0;
-    while (framesToRead > 0) {
-        /* If we've run out of samples in this frame, go to the next. */
-        if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
-            if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
-                break;  /* Couldn't read the next frame, so just break from the loop and return. */
-            }
-        } else {
-            unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
-            drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
-            drflac_uint64 frameCountThisIteration = framesToRead;
-
-            if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
-                frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
-            }
-
-            if (channelCount == 2) {
-                const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
-                const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
-
-                switch (pFlac->currentFLACFrame.header.channelAssignment)
-                {
-                    case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
-                    {
-                        drflac_read_pcm_frames_s32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-
-                    case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
-                    {
-                        drflac_read_pcm_frames_s32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-
-                    case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
-                    {
-                        drflac_read_pcm_frames_s32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-
-                    case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
-                    default:
-                    {
-                        drflac_read_pcm_frames_s32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-                }
-            } else {
-                /* Generic interleaving. */
-                drflac_uint64 i;
-                for (i = 0; i < frameCountThisIteration; ++i) {
-                    unsigned int j;
-                    for (j = 0; j < channelCount; ++j) {
-                        pBufferOut[(i*channelCount)+j] = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
-                    }
-                }
-            }
-
-            framesRead                += frameCountThisIteration;
-            pBufferOut                += frameCountThisIteration * channelCount;
-            framesToRead              -= frameCountThisIteration;
-            pFlac->currentPCMFrame    += frameCountThisIteration;
-            pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
-        }
-    }
-
-    return framesRead;
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    for (i = 0; i < frameCount; ++i) {
-        drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-        drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-        drflac_uint32 right = left - side;
-
-        left  >>= 16;
-        right >>= 16;
-
-        pOutputSamples[i*2+0] = (drflac_int16)left;
-        pOutputSamples[i*2+1] = (drflac_int16)right;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    for (i = 0; i < frameCount4; ++i) {
-        drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
-        drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
-        drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
-        drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
-
-        drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
-        drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
-        drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
-        drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
-
-        drflac_uint32 right0 = left0 - side0;
-        drflac_uint32 right1 = left1 - side1;
-        drflac_uint32 right2 = left2 - side2;
-        drflac_uint32 right3 = left3 - side3;
-
-        left0  >>= 16;
-        left1  >>= 16;
-        left2  >>= 16;
-        left3  >>= 16;
-
-        right0 >>= 16;
-        right1 >>= 16;
-        right2 >>= 16;
-        right3 >>= 16;
-
-        pOutputSamples[i*8+0] = (drflac_int16)left0;
-        pOutputSamples[i*8+1] = (drflac_int16)right0;
-        pOutputSamples[i*8+2] = (drflac_int16)left1;
-        pOutputSamples[i*8+3] = (drflac_int16)right1;
-        pOutputSamples[i*8+4] = (drflac_int16)left2;
-        pOutputSamples[i*8+5] = (drflac_int16)right2;
-        pOutputSamples[i*8+6] = (drflac_int16)left3;
-        pOutputSamples[i*8+7] = (drflac_int16)right3;
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
-        drflac_uint32 right = left - side;
-
-        left  >>= 16;
-        right >>= 16;
-
-        pOutputSamples[i*2+0] = (drflac_int16)left;
-        pOutputSamples[i*2+1] = (drflac_int16)right;
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    for (i = 0; i < frameCount4; ++i) {
-        __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
-        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
-        __m128i right = _mm_sub_epi32(left, side);
-
-        left  = _mm_srai_epi32(left,  16);
-        right = _mm_srai_epi32(right, 16);
-
-        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
-        drflac_uint32 right = left - side;
-
-        left  >>= 16;
-        right >>= 16;
-
-        pOutputSamples[i*2+0] = (drflac_int16)left;
-        pOutputSamples[i*2+1] = (drflac_int16)right;
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-    int32x4_t shift0_4;
-    int32x4_t shift1_4;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    shift0_4 = vdupq_n_s32(shift0);
-    shift1_4 = vdupq_n_s32(shift1);
-
-    for (i = 0; i < frameCount4; ++i) {
-        uint32x4_t left;
-        uint32x4_t side;
-        uint32x4_t right;
-
-        left  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
-        side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
-        right = vsubq_u32(left, side);
-
-        left  = vshrq_n_u32(left,  16);
-        right = vshrq_n_u32(right, 16);
-
-        drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
-        drflac_uint32 right = left - side;
-
-        left  >>= 16;
-        right >>= 16;
-
-        pOutputSamples[i*2+0] = (drflac_int16)left;
-        pOutputSamples[i*2+1] = (drflac_int16)right;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s16__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s16__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_s16__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_s16__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    for (i = 0; i < frameCount; ++i) {
-        drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-        drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-        drflac_uint32 left  = right + side;
-
-        left  >>= 16;
-        right >>= 16;
-
-        pOutputSamples[i*2+0] = (drflac_int16)left;
-        pOutputSamples[i*2+1] = (drflac_int16)right;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    for (i = 0; i < frameCount4; ++i) {
-        drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
-        drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
-        drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
-        drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
-
-        drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
-        drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
-        drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
-        drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
-
-        drflac_uint32 left0 = right0 + side0;
-        drflac_uint32 left1 = right1 + side1;
-        drflac_uint32 left2 = right2 + side2;
-        drflac_uint32 left3 = right3 + side3;
-
-        left0  >>= 16;
-        left1  >>= 16;
-        left2  >>= 16;
-        left3  >>= 16;
-
-        right0 >>= 16;
-        right1 >>= 16;
-        right2 >>= 16;
-        right3 >>= 16;
-
-        pOutputSamples[i*8+0] = (drflac_int16)left0;
-        pOutputSamples[i*8+1] = (drflac_int16)right0;
-        pOutputSamples[i*8+2] = (drflac_int16)left1;
-        pOutputSamples[i*8+3] = (drflac_int16)right1;
-        pOutputSamples[i*8+4] = (drflac_int16)left2;
-        pOutputSamples[i*8+5] = (drflac_int16)right2;
-        pOutputSamples[i*8+6] = (drflac_int16)left3;
-        pOutputSamples[i*8+7] = (drflac_int16)right3;
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 right = pInputSamples1U32[i] << shift1;
-        drflac_uint32 left  = right + side;
-
-        left  >>= 16;
-        right >>= 16;
-
-        pOutputSamples[i*2+0] = (drflac_int16)left;
-        pOutputSamples[i*2+1] = (drflac_int16)right;
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    for (i = 0; i < frameCount4; ++i) {
-        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
-        __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
-        __m128i left  = _mm_add_epi32(right, side);
-
-        left  = _mm_srai_epi32(left,  16);
-        right = _mm_srai_epi32(right, 16);
-
-        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 right = pInputSamples1U32[i] << shift1;
-        drflac_uint32 left  = right + side;
-
-        left  >>= 16;
-        right >>= 16;
-
-        pOutputSamples[i*2+0] = (drflac_int16)left;
-        pOutputSamples[i*2+1] = (drflac_int16)right;
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-    int32x4_t shift0_4;
-    int32x4_t shift1_4;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    shift0_4 = vdupq_n_s32(shift0);
-    shift1_4 = vdupq_n_s32(shift1);
-
-    for (i = 0; i < frameCount4; ++i) {
-        uint32x4_t side;
-        uint32x4_t right;
-        uint32x4_t left;
-
-        side  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
-        right = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
-        left  = vaddq_u32(right, side);
-
-        left  = vshrq_n_u32(left,  16);
-        right = vshrq_n_u32(right, 16);
-
-        drflac__vst2q_u16((drflac_uint16*)pOutputSamples + i*8, vzip_u16(vmovn_u32(left), vmovn_u32(right)));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 right = pInputSamples1U32[i] << shift1;
-        drflac_uint32 left  = right + side;
-
-        left  >>= 16;
-        right >>= 16;
-
-        pOutputSamples[i*2+0] = (drflac_int16)left;
-        pOutputSamples[i*2+1] = (drflac_int16)right;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s16__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s16__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_s16__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_s16__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    for (drflac_uint64 i = 0; i < frameCount; ++i) {
-        drflac_uint32 mid  = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-        drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-        mid = (mid << 1) | (side & 0x01);
-
-        pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
-        pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift = unusedBitsPerSample;
-
-    if (shift > 0) {
-        shift -= 1;
-        for (i = 0; i < frameCount4; ++i) {
-            drflac_uint32 temp0L;
-            drflac_uint32 temp1L;
-            drflac_uint32 temp2L;
-            drflac_uint32 temp3L;
-            drflac_uint32 temp0R;
-            drflac_uint32 temp1R;
-            drflac_uint32 temp2R;
-            drflac_uint32 temp3R;
-
-            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-
-            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid0 = (mid0 << 1) | (side0 & 0x01);
-            mid1 = (mid1 << 1) | (side1 & 0x01);
-            mid2 = (mid2 << 1) | (side2 & 0x01);
-            mid3 = (mid3 << 1) | (side3 & 0x01);
-
-            temp0L = (mid0 + side0) << shift;
-            temp1L = (mid1 + side1) << shift;
-            temp2L = (mid2 + side2) << shift;
-            temp3L = (mid3 + side3) << shift;
-
-            temp0R = (mid0 - side0) << shift;
-            temp1R = (mid1 - side1) << shift;
-            temp2R = (mid2 - side2) << shift;
-            temp3R = (mid3 - side3) << shift;
-
-            temp0L >>= 16;
-            temp1L >>= 16;
-            temp2L >>= 16;
-            temp3L >>= 16;
-
-            temp0R >>= 16;
-            temp1R >>= 16;
-            temp2R >>= 16;
-            temp3R >>= 16;
-
-            pOutputSamples[i*8+0] = (drflac_int16)temp0L;
-            pOutputSamples[i*8+1] = (drflac_int16)temp0R;
-            pOutputSamples[i*8+2] = (drflac_int16)temp1L;
-            pOutputSamples[i*8+3] = (drflac_int16)temp1R;
-            pOutputSamples[i*8+4] = (drflac_int16)temp2L;
-            pOutputSamples[i*8+5] = (drflac_int16)temp2R;
-            pOutputSamples[i*8+6] = (drflac_int16)temp3L;
-            pOutputSamples[i*8+7] = (drflac_int16)temp3R;
-        }
-    } else {
-        for (i = 0; i < frameCount4; ++i) {
-            drflac_uint32 temp0L;
-            drflac_uint32 temp1L;
-            drflac_uint32 temp2L;
-            drflac_uint32 temp3L;
-            drflac_uint32 temp0R;
-            drflac_uint32 temp1R;
-            drflac_uint32 temp2R;
-            drflac_uint32 temp3R;
-
-            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-
-            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid0 = (mid0 << 1) | (side0 & 0x01);
-            mid1 = (mid1 << 1) | (side1 & 0x01);
-            mid2 = (mid2 << 1) | (side2 & 0x01);
-            mid3 = (mid3 << 1) | (side3 & 0x01);
-
-            temp0L = ((drflac_int32)(mid0 + side0) >> 1);
-            temp1L = ((drflac_int32)(mid1 + side1) >> 1);
-            temp2L = ((drflac_int32)(mid2 + side2) >> 1);
-            temp3L = ((drflac_int32)(mid3 + side3) >> 1);
-
-            temp0R = ((drflac_int32)(mid0 - side0) >> 1);
-            temp1R = ((drflac_int32)(mid1 - side1) >> 1);
-            temp2R = ((drflac_int32)(mid2 - side2) >> 1);
-            temp3R = ((drflac_int32)(mid3 - side3) >> 1);
-
-            temp0L >>= 16;
-            temp1L >>= 16;
-            temp2L >>= 16;
-            temp3L >>= 16;
-
-            temp0R >>= 16;
-            temp1R >>= 16;
-            temp2R >>= 16;
-            temp3R >>= 16;
-
-            pOutputSamples[i*8+0] = (drflac_int16)temp0L;
-            pOutputSamples[i*8+1] = (drflac_int16)temp0R;
-            pOutputSamples[i*8+2] = (drflac_int16)temp1L;
-            pOutputSamples[i*8+3] = (drflac_int16)temp1R;
-            pOutputSamples[i*8+4] = (drflac_int16)temp2L;
-            pOutputSamples[i*8+5] = (drflac_int16)temp2R;
-            pOutputSamples[i*8+6] = (drflac_int16)temp3L;
-            pOutputSamples[i*8+7] = (drflac_int16)temp3R;
-        }
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-        drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-        mid = (mid << 1) | (side & 0x01);
-
-        pOutputSamples[i*2+0] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) >> 16);
-        pOutputSamples[i*2+1] = (drflac_int16)(((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) >> 16);
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift = unusedBitsPerSample;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    if (shift == 0) {
-        for (i = 0; i < frameCount4; ++i) {
-            __m128i mid;
-            __m128i side;
-            __m128i left;
-            __m128i right;
-
-            mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-            side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-
-            mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
-
-            left  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
-            right = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
-
-            left  = _mm_srai_epi32(left,  16);
-            right = _mm_srai_epi32(right, 16);
-
-            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
-            pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
-        }
-    } else {
-        shift -= 1;
-        for (i = 0; i < frameCount4; ++i) {
-            __m128i mid;
-            __m128i side;
-            __m128i left;
-            __m128i right;
-
-            mid   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-            side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-
-            mid   = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
-
-            left  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
-            right = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
-
-            left  = _mm_srai_epi32(left,  16);
-            right = _mm_srai_epi32(right, 16);
-
-            _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
-            pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
-        }
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift = unusedBitsPerSample;
-    int32x4_t wbpsShift0_4; /* wbps = Wasted Bits Per Sample */
-    int32x4_t wbpsShift1_4; /* wbps = Wasted Bits Per Sample */
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    wbpsShift0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-    wbpsShift1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-
-    if (shift == 0) {
-        for (i = 0; i < frameCount4; ++i) {
-            uint32x4_t mid;
-            uint32x4_t side;
-            int32x4_t left;
-            int32x4_t right;
-
-            mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
-            side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
-
-            mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
-
-            left  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
-            right = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
-
-            left  = vshrq_n_s32(left,  16);
-            right = vshrq_n_s32(right, 16);
-
-            drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = (drflac_int16)(((drflac_int32)(mid + side) >> 1) >> 16);
-            pOutputSamples[i*2+1] = (drflac_int16)(((drflac_int32)(mid - side) >> 1) >> 16);
-        }
-    } else {
-        int32x4_t shift4;
-
-        shift -= 1;
-        shift4 = vdupq_n_s32(shift);
-
-        for (i = 0; i < frameCount4; ++i) {
-            uint32x4_t mid;
-            uint32x4_t side;
-            int32x4_t left;
-            int32x4_t right;
-
-            mid   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbpsShift0_4);
-            side  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbpsShift1_4);
-
-            mid   = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
-
-            left  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
-            right = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
-
-            left  = vshrq_n_s32(left,  16);
-            right = vshrq_n_s32(right, 16);
-
-            drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = (drflac_int16)(((mid + side) << shift) >> 16);
-            pOutputSamples[i*2+1] = (drflac_int16)(((mid - side) << shift) >> 16);
-        }
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s16__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s16__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_s16__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_s16__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    for (drflac_uint64 i = 0; i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) >> 16);
-        pOutputSamples[i*2+1] = (drflac_int16)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) >> 16);
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    for (i = 0; i < frameCount4; ++i) {
-        drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
-        drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
-        drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
-        drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
-
-        drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
-        drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
-        drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
-        drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
-
-        tempL0 >>= 16;
-        tempL1 >>= 16;
-        tempL2 >>= 16;
-        tempL3 >>= 16;
-
-        tempR0 >>= 16;
-        tempR1 >>= 16;
-        tempR2 >>= 16;
-        tempR3 >>= 16;
-
-        pOutputSamples[i*8+0] = (drflac_int16)tempL0;
-        pOutputSamples[i*8+1] = (drflac_int16)tempR0;
-        pOutputSamples[i*8+2] = (drflac_int16)tempL1;
-        pOutputSamples[i*8+3] = (drflac_int16)tempR1;
-        pOutputSamples[i*8+4] = (drflac_int16)tempL2;
-        pOutputSamples[i*8+5] = (drflac_int16)tempR2;
-        pOutputSamples[i*8+6] = (drflac_int16)tempL3;
-        pOutputSamples[i*8+7] = (drflac_int16)tempR3;
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
-        pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    for (i = 0; i < frameCount4; ++i) {
-        __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
-        __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
-
-        left  = _mm_srai_epi32(left,  16);
-        right = _mm_srai_epi32(right, 16);
-
-        /* At this point we have results. We can now pack and interleave these into a single __m128i object and then store the in the output buffer. */
-        _mm_storeu_si128((__m128i*)(pOutputSamples + i*8), drflac__mm_packs_interleaved_epi32(left, right));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
-        pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    int32x4_t shift0_4 = vdupq_n_s32(shift0);
-    int32x4_t shift1_4 = vdupq_n_s32(shift1);
-
-    for (i = 0; i < frameCount4; ++i) {
-        int32x4_t left;
-        int32x4_t right;
-
-        left  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
-        right = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
-
-        left  = vshrq_n_s32(left,  16);
-        right = vshrq_n_s32(right, 16);
-
-        drflac__vst2q_s16(pOutputSamples + i*8, vzip_s16(vmovn_s32(left), vmovn_s32(right)));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int16)((pInputSamples0U32[i] << shift0) >> 16);
-        pOutputSamples[i*2+1] = (drflac_int16)((pInputSamples1U32[i] << shift1) >> 16);
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_s16__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, drflac_int16* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s16__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_s16__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_s16__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_s16__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-DRFLAC_API drflac_uint64 drflac_read_pcm_frames_s16(drflac* pFlac, drflac_uint64 framesToRead, drflac_int16* pBufferOut)
-{
-    drflac_uint64 framesRead;
-    drflac_uint32 unusedBitsPerSample;
-
-    if (pFlac == NULL || framesToRead == 0) {
-        return 0;
-    }
-
-    if (pBufferOut == NULL) {
-        return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
-    }
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
-    unusedBitsPerSample = 32 - pFlac->bitsPerSample;
-
-    framesRead = 0;
-    while (framesToRead > 0) {
-        /* If we've run out of samples in this frame, go to the next. */
-        if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
-            if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
-                break;  /* Couldn't read the next frame, so just break from the loop and return. */
-            }
-        } else {
-            unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
-            drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
-            drflac_uint64 frameCountThisIteration = framesToRead;
-
-            if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
-                frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
-            }
-
-            if (channelCount == 2) {
-                const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
-                const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
-
-                switch (pFlac->currentFLACFrame.header.channelAssignment)
-                {
-                    case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
-                    {
-                        drflac_read_pcm_frames_s16__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-
-                    case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
-                    {
-                        drflac_read_pcm_frames_s16__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-
-                    case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
-                    {
-                        drflac_read_pcm_frames_s16__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-
-                    case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
-                    default:
-                    {
-                        drflac_read_pcm_frames_s16__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-                }
-            } else {
-                /* Generic interleaving. */
-                drflac_uint64 i;
-                for (i = 0; i < frameCountThisIteration; ++i) {
-                    unsigned int j;
-                    for (j = 0; j < channelCount; ++j) {
-                        drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
-                        pBufferOut[(i*channelCount)+j] = (drflac_int16)(sampleS32 >> 16);
-                    }
-                }
-            }
-
-            framesRead                += frameCountThisIteration;
-            pBufferOut                += frameCountThisIteration * channelCount;
-            framesToRead              -= frameCountThisIteration;
-            pFlac->currentPCMFrame    += frameCountThisIteration;
-            pFlac->currentFLACFrame.pcmFramesRemaining -= (drflac_uint32)frameCountThisIteration;
-        }
-    }
-
-    return framesRead;
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    for (i = 0; i < frameCount; ++i) {
-        drflac_uint32 left  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-        drflac_uint32 side  = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-        drflac_uint32 right = left - side;
-
-        pOutputSamples[i*2+0] = (float)((drflac_int32)left  / 2147483648.0);
-        pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-    float factor = 1 / 2147483648.0;
-
-    for (i = 0; i < frameCount4; ++i) {
-        drflac_uint32 left0 = pInputSamples0U32[i*4+0] << shift0;
-        drflac_uint32 left1 = pInputSamples0U32[i*4+1] << shift0;
-        drflac_uint32 left2 = pInputSamples0U32[i*4+2] << shift0;
-        drflac_uint32 left3 = pInputSamples0U32[i*4+3] << shift0;
-
-        drflac_uint32 side0 = pInputSamples1U32[i*4+0] << shift1;
-        drflac_uint32 side1 = pInputSamples1U32[i*4+1] << shift1;
-        drflac_uint32 side2 = pInputSamples1U32[i*4+2] << shift1;
-        drflac_uint32 side3 = pInputSamples1U32[i*4+3] << shift1;
-
-        drflac_uint32 right0 = left0 - side0;
-        drflac_uint32 right1 = left1 - side1;
-        drflac_uint32 right2 = left2 - side2;
-        drflac_uint32 right3 = left3 - side3;
-
-        pOutputSamples[i*8+0] = (drflac_int32)left0  * factor;
-        pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
-        pOutputSamples[i*8+2] = (drflac_int32)left1  * factor;
-        pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
-        pOutputSamples[i*8+4] = (drflac_int32)left2  * factor;
-        pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
-        pOutputSamples[i*8+6] = (drflac_int32)left3  * factor;
-        pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
-        drflac_uint32 right = left - side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left  * factor;
-        pOutputSamples[i*2+1] = (drflac_int32)right * factor;
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
-    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
-    __m128 factor;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    factor = _mm_set1_ps(1.0f / 8388608.0f);
-
-    for (i = 0; i < frameCount4; ++i) {
-        __m128i left  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
-        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
-        __m128i right = _mm_sub_epi32(left, side);
-        __m128 leftf  = _mm_mul_ps(_mm_cvtepi32_ps(left),  factor);
-        __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
-
-        _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
-        _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
-        drflac_uint32 right = left - side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
-        pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
-    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
-    float32x4_t factor4;
-    int32x4_t shift0_4;
-    int32x4_t shift1_4;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    factor4  = vdupq_n_f32(1.0f / 8388608.0f);
-    shift0_4 = vdupq_n_s32(shift0);
-    shift1_4 = vdupq_n_s32(shift1);
-
-    for (i = 0; i < frameCount4; ++i) {
-        uint32x4_t left;
-        uint32x4_t side;
-        uint32x4_t right;
-        float32x4_t leftf;
-        float32x4_t rightf;
-
-        left   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
-        side   = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
-        right  = vsubq_u32(left, side);
-        leftf  = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)),  factor4);
-        rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
-
-        drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 left  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 side  = pInputSamples1U32[i] << shift1;
-        drflac_uint32 right = left - side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
-        pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_left_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_f32__decode_left_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_f32__decode_left_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_f32__decode_left_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_f32__decode_left_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    for (i = 0; i < frameCount; ++i) {
-        drflac_uint32 side  = (drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-        drflac_uint32 right = (drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-        drflac_uint32 left  = right + side;
-
-        pOutputSamples[i*2+0] = (float)((drflac_int32)left  / 2147483648.0);
-        pOutputSamples[i*2+1] = (float)((drflac_int32)right / 2147483648.0);
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-    float factor = 1 / 2147483648.0;
-
-    for (i = 0; i < frameCount4; ++i) {
-        drflac_uint32 side0  = pInputSamples0U32[i*4+0] << shift0;
-        drflac_uint32 side1  = pInputSamples0U32[i*4+1] << shift0;
-        drflac_uint32 side2  = pInputSamples0U32[i*4+2] << shift0;
-        drflac_uint32 side3  = pInputSamples0U32[i*4+3] << shift0;
-
-        drflac_uint32 right0 = pInputSamples1U32[i*4+0] << shift1;
-        drflac_uint32 right1 = pInputSamples1U32[i*4+1] << shift1;
-        drflac_uint32 right2 = pInputSamples1U32[i*4+2] << shift1;
-        drflac_uint32 right3 = pInputSamples1U32[i*4+3] << shift1;
-
-        drflac_uint32 left0 = right0 + side0;
-        drflac_uint32 left1 = right1 + side1;
-        drflac_uint32 left2 = right2 + side2;
-        drflac_uint32 left3 = right3 + side3;
-
-        pOutputSamples[i*8+0] = (drflac_int32)left0  * factor;
-        pOutputSamples[i*8+1] = (drflac_int32)right0 * factor;
-        pOutputSamples[i*8+2] = (drflac_int32)left1  * factor;
-        pOutputSamples[i*8+3] = (drflac_int32)right1 * factor;
-        pOutputSamples[i*8+4] = (drflac_int32)left2  * factor;
-        pOutputSamples[i*8+5] = (drflac_int32)right2 * factor;
-        pOutputSamples[i*8+6] = (drflac_int32)left3  * factor;
-        pOutputSamples[i*8+7] = (drflac_int32)right3 * factor;
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 right = pInputSamples1U32[i] << shift1;
-        drflac_uint32 left  = right + side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left  * factor;
-        pOutputSamples[i*2+1] = (drflac_int32)right * factor;
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
-    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
-    __m128 factor;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    factor = _mm_set1_ps(1.0f / 8388608.0f);
-
-    for (i = 0; i < frameCount4; ++i) {
-        __m128i side  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
-        __m128i right = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
-        __m128i left  = _mm_add_epi32(right, side);
-        __m128 leftf  = _mm_mul_ps(_mm_cvtepi32_ps(left),  factor);
-        __m128 rightf = _mm_mul_ps(_mm_cvtepi32_ps(right), factor);
-
-        _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
-        _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 right = pInputSamples1U32[i] << shift1;
-        drflac_uint32 left  = right + side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
-        pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
-    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
-    float32x4_t factor4;
-    int32x4_t shift0_4;
-    int32x4_t shift1_4;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    factor4  = vdupq_n_f32(1.0f / 8388608.0f);
-    shift0_4 = vdupq_n_s32(shift0);
-    shift1_4 = vdupq_n_s32(shift1);
-
-    for (i = 0; i < frameCount4; ++i) {
-        uint32x4_t side;
-        uint32x4_t right;
-        uint32x4_t left;
-        float32x4_t leftf;
-        float32x4_t rightf;
-
-        side   = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4);
-        right  = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4);
-        left   = vaddq_u32(right, side);
-        leftf  = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(left)),  factor4);
-        rightf = vmulq_f32(vcvtq_f32_s32(vreinterpretq_s32_u32(right)), factor4);
-
-        drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 side  = pInputSamples0U32[i] << shift0;
-        drflac_uint32 right = pInputSamples1U32[i] << shift1;
-        drflac_uint32 left  = right + side;
-
-        pOutputSamples[i*2+0] = (drflac_int32)left  / 8388608.0f;
-        pOutputSamples[i*2+1] = (drflac_int32)right / 8388608.0f;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_right_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_f32__decode_right_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_f32__decode_right_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_f32__decode_right_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_f32__decode_right_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    for (drflac_uint64 i = 0; i < frameCount; ++i) {
-        drflac_uint32 mid  = (drflac_uint32)pInputSamples0[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-        drflac_uint32 side = (drflac_uint32)pInputSamples1[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-        mid = (mid << 1) | (side & 0x01);
-
-        pOutputSamples[i*2+0] = (float)((((drflac_int32)(mid + side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
-        pOutputSamples[i*2+1] = (float)((((drflac_int32)(mid - side) >> 1) << (unusedBitsPerSample)) / 2147483648.0);
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift = unusedBitsPerSample;
-    float factor = 1 / 2147483648.0;
-
-    if (shift > 0) {
-        shift -= 1;
-        for (i = 0; i < frameCount4; ++i) {
-            drflac_uint32 temp0L;
-            drflac_uint32 temp1L;
-            drflac_uint32 temp2L;
-            drflac_uint32 temp3L;
-            drflac_uint32 temp0R;
-            drflac_uint32 temp1R;
-            drflac_uint32 temp2R;
-            drflac_uint32 temp3R;
-
-            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-
-            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid0 = (mid0 << 1) | (side0 & 0x01);
-            mid1 = (mid1 << 1) | (side1 & 0x01);
-            mid2 = (mid2 << 1) | (side2 & 0x01);
-            mid3 = (mid3 << 1) | (side3 & 0x01);
-
-            temp0L = (mid0 + side0) << shift;
-            temp1L = (mid1 + side1) << shift;
-            temp2L = (mid2 + side2) << shift;
-            temp3L = (mid3 + side3) << shift;
-
-            temp0R = (mid0 - side0) << shift;
-            temp1R = (mid1 - side1) << shift;
-            temp2R = (mid2 - side2) << shift;
-            temp3R = (mid3 - side3) << shift;
-
-            pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
-            pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
-            pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
-            pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
-            pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
-            pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
-            pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
-            pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
-        }
-    } else {
-        for (i = 0; i < frameCount4; ++i) {
-            drflac_uint32 temp0L;
-            drflac_uint32 temp1L;
-            drflac_uint32 temp2L;
-            drflac_uint32 temp3L;
-            drflac_uint32 temp0R;
-            drflac_uint32 temp1R;
-            drflac_uint32 temp2R;
-            drflac_uint32 temp3R;
-
-            drflac_uint32 mid0  = pInputSamples0U32[i*4+0] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid1  = pInputSamples0U32[i*4+1] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid2  = pInputSamples0U32[i*4+2] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 mid3  = pInputSamples0U32[i*4+3] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-
-            drflac_uint32 side0 = pInputSamples1U32[i*4+0] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side1 = pInputSamples1U32[i*4+1] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side2 = pInputSamples1U32[i*4+2] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-            drflac_uint32 side3 = pInputSamples1U32[i*4+3] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid0 = (mid0 << 1) | (side0 & 0x01);
-            mid1 = (mid1 << 1) | (side1 & 0x01);
-            mid2 = (mid2 << 1) | (side2 & 0x01);
-            mid3 = (mid3 << 1) | (side3 & 0x01);
-
-            temp0L = (drflac_uint32)((drflac_int32)(mid0 + side0) >> 1);
-            temp1L = (drflac_uint32)((drflac_int32)(mid1 + side1) >> 1);
-            temp2L = (drflac_uint32)((drflac_int32)(mid2 + side2) >> 1);
-            temp3L = (drflac_uint32)((drflac_int32)(mid3 + side3) >> 1);
-
-            temp0R = (drflac_uint32)((drflac_int32)(mid0 - side0) >> 1);
-            temp1R = (drflac_uint32)((drflac_int32)(mid1 - side1) >> 1);
-            temp2R = (drflac_uint32)((drflac_int32)(mid2 - side2) >> 1);
-            temp3R = (drflac_uint32)((drflac_int32)(mid3 - side3) >> 1);
-
-            pOutputSamples[i*8+0] = (drflac_int32)temp0L * factor;
-            pOutputSamples[i*8+1] = (drflac_int32)temp0R * factor;
-            pOutputSamples[i*8+2] = (drflac_int32)temp1L * factor;
-            pOutputSamples[i*8+3] = (drflac_int32)temp1R * factor;
-            pOutputSamples[i*8+4] = (drflac_int32)temp2L * factor;
-            pOutputSamples[i*8+5] = (drflac_int32)temp2R * factor;
-            pOutputSamples[i*8+6] = (drflac_int32)temp3L * factor;
-            pOutputSamples[i*8+7] = (drflac_int32)temp3R * factor;
-        }
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-        drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-        mid = (mid << 1) | (side & 0x01);
-
-        pOutputSamples[i*2+0] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid + side) >> 1) << unusedBitsPerSample) * factor;
-        pOutputSamples[i*2+1] = (drflac_int32)((drflac_uint32)((drflac_int32)(mid - side) >> 1) << unusedBitsPerSample) * factor;
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift = unusedBitsPerSample - 8;
-    float factor;
-    __m128 factor128;
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    factor = 1.0f / 8388608.0f;
-    factor128 = _mm_set1_ps(factor);
-
-    if (shift == 0) {
-        for (i = 0; i < frameCount4; ++i) {
-            __m128i mid;
-            __m128i side;
-            __m128i tempL;
-            __m128i tempR;
-            __m128  leftf;
-            __m128  rightf;
-
-            mid    = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-            side   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-
-            mid    = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
-
-            tempL  = _mm_srai_epi32(_mm_add_epi32(mid, side), 1);
-            tempR  = _mm_srai_epi32(_mm_sub_epi32(mid, side), 1);
-
-            leftf  = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
-            rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
-
-            _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
-            _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
-            pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
-        }
-    } else {
-        shift -= 1;
-        for (i = 0; i < frameCount4; ++i) {
-            __m128i mid;
-            __m128i side;
-            __m128i tempL;
-            __m128i tempR;
-            __m128 leftf;
-            __m128 rightf;
-
-            mid    = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-            side   = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-
-            mid    = _mm_or_si128(_mm_slli_epi32(mid, 1), _mm_and_si128(side, _mm_set1_epi32(0x01)));
-
-            tempL  = _mm_slli_epi32(_mm_add_epi32(mid, side), shift);
-            tempR  = _mm_slli_epi32(_mm_sub_epi32(mid, side), shift);
-
-            leftf  = _mm_mul_ps(_mm_cvtepi32_ps(tempL), factor128);
-            rightf = _mm_mul_ps(_mm_cvtepi32_ps(tempR), factor128);
-
-            _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
-            _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
-            pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
-        }
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift = unusedBitsPerSample - 8;
-    float factor;
-    float32x4_t factor4;
-    int32x4_t shift4;
-    int32x4_t wbps0_4;  /* Wasted Bits Per Sample */
-    int32x4_t wbps1_4;  /* Wasted Bits Per Sample */
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 24);
-
-    factor  = 1.0f / 8388608.0f;
-    factor4 = vdupq_n_f32(factor);
-    wbps0_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample);
-    wbps1_4 = vdupq_n_s32(pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample);
-
-    if (shift == 0) {
-        for (i = 0; i < frameCount4; ++i) {
-            int32x4_t lefti;
-            int32x4_t righti;
-            float32x4_t leftf;
-            float32x4_t rightf;
-
-            uint32x4_t mid  = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
-            uint32x4_t side = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
-
-            mid    = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
-
-            lefti  = vshrq_n_s32(vreinterpretq_s32_u32(vaddq_u32(mid, side)), 1);
-            righti = vshrq_n_s32(vreinterpretq_s32_u32(vsubq_u32(mid, side)), 1);
-
-            leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
-            rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
-
-            drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = ((drflac_int32)(mid + side) >> 1) * factor;
-            pOutputSamples[i*2+1] = ((drflac_int32)(mid - side) >> 1) * factor;
-        }
-    } else {
-        shift -= 1;
-        shift4 = vdupq_n_s32(shift);
-        for (i = 0; i < frameCount4; ++i) {
-            uint32x4_t mid;
-            uint32x4_t side;
-            int32x4_t lefti;
-            int32x4_t righti;
-            float32x4_t leftf;
-            float32x4_t rightf;
-
-            mid    = vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), wbps0_4);
-            side   = vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), wbps1_4);
-
-            mid    = vorrq_u32(vshlq_n_u32(mid, 1), vandq_u32(side, vdupq_n_u32(1)));
-
-            lefti  = vreinterpretq_s32_u32(vshlq_u32(vaddq_u32(mid, side), shift4));
-            righti = vreinterpretq_s32_u32(vshlq_u32(vsubq_u32(mid, side), shift4));
-
-            leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
-            rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
-
-            drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
-        }
-
-        for (i = (frameCount4 << 2); i < frameCount; ++i) {
-            drflac_uint32 mid  = pInputSamples0U32[i] << pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-            drflac_uint32 side = pInputSamples1U32[i] << pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-
-            mid = (mid << 1) | (side & 0x01);
-
-            pOutputSamples[i*2+0] = (drflac_int32)((mid + side) << shift) * factor;
-            pOutputSamples[i*2+1] = (drflac_int32)((mid - side) << shift) * factor;
-        }
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_mid_side(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_f32__decode_mid_side__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_f32__decode_mid_side__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_f32__decode_mid_side__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_f32__decode_mid_side__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-#if 0
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__reference(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    for (drflac_uint64 i = 0; i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (float)((drflac_int32)((drflac_uint32)pInputSamples0[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample)) / 2147483648.0);
-        pOutputSamples[i*2+1] = (float)((drflac_int32)((drflac_uint32)pInputSamples1[i] << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample)) / 2147483648.0);
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample;
-    drflac_uint32 shift1 = unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample;
-    float factor = 1 / 2147483648.0;
-
-    for (i = 0; i < frameCount4; ++i) {
-        drflac_uint32 tempL0 = pInputSamples0U32[i*4+0] << shift0;
-        drflac_uint32 tempL1 = pInputSamples0U32[i*4+1] << shift0;
-        drflac_uint32 tempL2 = pInputSamples0U32[i*4+2] << shift0;
-        drflac_uint32 tempL3 = pInputSamples0U32[i*4+3] << shift0;
-
-        drflac_uint32 tempR0 = pInputSamples1U32[i*4+0] << shift1;
-        drflac_uint32 tempR1 = pInputSamples1U32[i*4+1] << shift1;
-        drflac_uint32 tempR2 = pInputSamples1U32[i*4+2] << shift1;
-        drflac_uint32 tempR3 = pInputSamples1U32[i*4+3] << shift1;
-
-        pOutputSamples[i*8+0] = (drflac_int32)tempL0 * factor;
-        pOutputSamples[i*8+1] = (drflac_int32)tempR0 * factor;
-        pOutputSamples[i*8+2] = (drflac_int32)tempL1 * factor;
-        pOutputSamples[i*8+3] = (drflac_int32)tempR1 * factor;
-        pOutputSamples[i*8+4] = (drflac_int32)tempL2 * factor;
-        pOutputSamples[i*8+5] = (drflac_int32)tempR2 * factor;
-        pOutputSamples[i*8+6] = (drflac_int32)tempL3 * factor;
-        pOutputSamples[i*8+7] = (drflac_int32)tempR3 * factor;
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
-        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
-    }
-}
-
-#if defined(DRFLAC_SUPPORT_SSE2)
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
-    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
-
-    float factor = 1.0f / 8388608.0f;
-    __m128 factor128 = _mm_set1_ps(factor);
-
-    for (i = 0; i < frameCount4; ++i) {
-        __m128i lefti;
-        __m128i righti;
-        __m128 leftf;
-        __m128 rightf;
-
-        lefti  = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples0 + i), shift0);
-        righti = _mm_slli_epi32(_mm_loadu_si128((const __m128i*)pInputSamples1 + i), shift1);
-
-        leftf  = _mm_mul_ps(_mm_cvtepi32_ps(lefti),  factor128);
-        rightf = _mm_mul_ps(_mm_cvtepi32_ps(righti), factor128);
-
-        _mm_storeu_ps(pOutputSamples + i*8 + 0, _mm_unpacklo_ps(leftf, rightf));
-        _mm_storeu_ps(pOutputSamples + i*8 + 4, _mm_unpackhi_ps(leftf, rightf));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
-        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
-    }
-}
-#endif
-
-#if defined(DRFLAC_SUPPORT_NEON)
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo__neon(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-    drflac_uint64 i;
-    drflac_uint64 frameCount4 = frameCount >> 2;
-    const drflac_uint32* pInputSamples0U32 = (const drflac_uint32*)pInputSamples0;
-    const drflac_uint32* pInputSamples1U32 = (const drflac_uint32*)pInputSamples1;
-    drflac_uint32 shift0 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[0].wastedBitsPerSample) - 8;
-    drflac_uint32 shift1 = (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[1].wastedBitsPerSample) - 8;
-
-    float factor = 1.0f / 8388608.0f;
-    float32x4_t factor4 = vdupq_n_f32(factor);
-    int32x4_t shift0_4  = vdupq_n_s32(shift0);
-    int32x4_t shift1_4  = vdupq_n_s32(shift1);
-
-    for (i = 0; i < frameCount4; ++i) {
-        int32x4_t lefti;
-        int32x4_t righti;
-        float32x4_t leftf;
-        float32x4_t rightf;
-
-        lefti  = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples0U32 + i*4), shift0_4));
-        righti = vreinterpretq_s32_u32(vshlq_u32(vld1q_u32(pInputSamples1U32 + i*4), shift1_4));
-
-        leftf  = vmulq_f32(vcvtq_f32_s32(lefti),  factor4);
-        rightf = vmulq_f32(vcvtq_f32_s32(righti), factor4);
-
-        drflac__vst2q_f32(pOutputSamples + i*8, vzipq_f32(leftf, rightf));
-    }
-
-    for (i = (frameCount4 << 2); i < frameCount; ++i) {
-        pOutputSamples[i*2+0] = (drflac_int32)(pInputSamples0U32[i] << shift0) * factor;
-        pOutputSamples[i*2+1] = (drflac_int32)(pInputSamples1U32[i] << shift1) * factor;
-    }
-}
-#endif
-
-static DRFLAC_INLINE void drflac_read_pcm_frames_f32__decode_independent_stereo(drflac* pFlac, drflac_uint64 frameCount, drflac_uint32 unusedBitsPerSample, const drflac_int32* pInputSamples0, const drflac_int32* pInputSamples1, float* pOutputSamples)
-{
-#if defined(DRFLAC_SUPPORT_SSE2)
-    if (drflac__gIsSSE2Supported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_f32__decode_independent_stereo__sse2(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#elif defined(DRFLAC_SUPPORT_NEON)
-    if (drflac__gIsNEONSupported && pFlac->bitsPerSample <= 24) {
-        drflac_read_pcm_frames_f32__decode_independent_stereo__neon(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-    } else
-#endif
-    {
-        /* Scalar fallback. */
-#if 0
-        drflac_read_pcm_frames_f32__decode_independent_stereo__reference(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#else
-        drflac_read_pcm_frames_f32__decode_independent_stereo__scalar(pFlac, frameCount, unusedBitsPerSample, pInputSamples0, pInputSamples1, pOutputSamples);
-#endif
-    }
-}
-
-DRFLAC_API drflac_uint64 drflac_read_pcm_frames_f32(drflac* pFlac, drflac_uint64 framesToRead, float* pBufferOut)
-{
-    drflac_uint64 framesRead;
-    drflac_uint32 unusedBitsPerSample;
-
-    if (pFlac == NULL || framesToRead == 0) {
-        return 0;
-    }
-
-    if (pBufferOut == NULL) {
-        return drflac__seek_forward_by_pcm_frames(pFlac, framesToRead);
-    }
-
-    DRFLAC_ASSERT(pFlac->bitsPerSample <= 32);
-    unusedBitsPerSample = 32 - pFlac->bitsPerSample;
-
-    framesRead = 0;
-    while (framesToRead > 0) {
-        /* If we've run out of samples in this frame, go to the next. */
-        if (pFlac->currentFLACFrame.pcmFramesRemaining == 0) {
-            if (!drflac__read_and_decode_next_flac_frame(pFlac)) {
-                break;  /* Couldn't read the next frame, so just break from the loop and return. */
-            }
-        } else {
-            unsigned int channelCount = drflac__get_channel_count_from_channel_assignment(pFlac->currentFLACFrame.header.channelAssignment);
-            drflac_uint64 iFirstPCMFrame = pFlac->currentFLACFrame.header.blockSizeInPCMFrames - pFlac->currentFLACFrame.pcmFramesRemaining;
-            drflac_uint64 frameCountThisIteration = framesToRead;
-
-            if (frameCountThisIteration > pFlac->currentFLACFrame.pcmFramesRemaining) {
-                frameCountThisIteration = pFlac->currentFLACFrame.pcmFramesRemaining;
-            }
-
-            if (channelCount == 2) {
-                const drflac_int32* pDecodedSamples0 = pFlac->currentFLACFrame.subframes[0].pSamplesS32 + iFirstPCMFrame;
-                const drflac_int32* pDecodedSamples1 = pFlac->currentFLACFrame.subframes[1].pSamplesS32 + iFirstPCMFrame;
-
-                switch (pFlac->currentFLACFrame.header.channelAssignment)
-                {
-                    case DRFLAC_CHANNEL_ASSIGNMENT_LEFT_SIDE:
-                    {
-                        drflac_read_pcm_frames_f32__decode_left_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-
-                    case DRFLAC_CHANNEL_ASSIGNMENT_RIGHT_SIDE:
-                    {
-                        drflac_read_pcm_frames_f32__decode_right_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-
-                    case DRFLAC_CHANNEL_ASSIGNMENT_MID_SIDE:
-                    {
-                        drflac_read_pcm_frames_f32__decode_mid_side(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-
-                    case DRFLAC_CHANNEL_ASSIGNMENT_INDEPENDENT:
-                    default:
-                    {
-                        drflac_read_pcm_frames_f32__decode_independent_stereo(pFlac, frameCountThisIteration, unusedBitsPerSample, pDecodedSamples0, pDecodedSamples1, pBufferOut);
-                    } break;
-                }
-            } else {
-                /* Generic interleaving. */
-                drflac_uint64 i;
-                for (i = 0; i < frameCountThisIteration; ++i) {
-                    unsigned int j;
-                    for (j = 0; j < channelCount; ++j) {
-                        drflac_int32 sampleS32 = (drflac_int32)((drflac_uint32)(pFlac->currentFLACFrame.subframes[j].pSamplesS32[iFirstPCMFrame + i]) << (unusedBitsPerSample + pFlac->currentFLACFrame.subframes[j].wastedBitsPerSample));
-                        pBufferOut[(i*channelCount)+j] = (float)(sampleS32 / 2147483648.0);
-                    }
-                }
-            }
-
-            framesRead                += frameCountThisIteration;
-            pBufferOut                += frameCountThisIteration * channelCount;
-            framesToRead              -= frameCountThisIteration;
-            pFlac->currentPCMFrame    += frameCountThisIteration;
-            pFlac->currentFLACFrame.pcmFramesRemaining -= (unsigned int)frameCountThisIteration;
-        }
-    }
-
-    return framesRead;
-}
-
-
-DRFLAC_API drflac_bool32 drflac_seek_to_pcm_frame(drflac* pFlac, drflac_uint64 pcmFrameIndex)
-{
-    if (pFlac == NULL) {
-        return DRFLAC_FALSE;
-    }
-
-    /* Don't do anything if we're already on the seek point. */
-    if (pFlac->currentPCMFrame == pcmFrameIndex) {
-        return DRFLAC_TRUE;
-    }
-
-    /*
-    If we don't know where the first frame begins then we can't seek. This will happen when the STREAMINFO block was not present
-    when the decoder was opened.
-    */
-    if (pFlac->firstFLACFramePosInBytes == 0) {
-        return DRFLAC_FALSE;
-    }
-
-    if (pcmFrameIndex == 0) {
-        pFlac->currentPCMFrame = 0;
-        return drflac__seek_to_first_frame(pFlac);
-    } else {
-        drflac_bool32 wasSuccessful = DRFLAC_FALSE;
-        drflac_uint64 originalPCMFrame = pFlac->currentPCMFrame;
-
-        /* Clamp the sample to the end. */
-        if (pcmFrameIndex > pFlac->totalPCMFrameCount) {
-            pcmFrameIndex = pFlac->totalPCMFrameCount;
-        }
-
-        /* If the target sample and the current sample are in the same frame we just move the position forward. */
-        if (pcmFrameIndex > pFlac->currentPCMFrame) {
-            /* Forward. */
-            drflac_uint32 offset = (drflac_uint32)(pcmFrameIndex - pFlac->currentPCMFrame);
-            if (pFlac->currentFLACFrame.pcmFramesRemaining >  offset) {
-                pFlac->currentFLACFrame.pcmFramesRemaining -= offset;
-                pFlac->currentPCMFrame = pcmFrameIndex;
-                return DRFLAC_TRUE;
-            }
-        } else {
-            /* Backward. */
-            drflac_uint32 offsetAbs = (drflac_uint32)(pFlac->currentPCMFrame - pcmFrameIndex);
-            drflac_uint32 currentFLACFramePCMFrameCount = pFlac->currentFLACFrame.header.blockSizeInPCMFrames;
-            drflac_uint32 currentFLACFramePCMFramesConsumed = currentFLACFramePCMFrameCount - pFlac->currentFLACFrame.pcmFramesRemaining;
-            if (currentFLACFramePCMFramesConsumed > offsetAbs) {
-                pFlac->currentFLACFrame.pcmFramesRemaining += offsetAbs;
-                pFlac->currentPCMFrame = pcmFrameIndex;
-                return DRFLAC_TRUE;
-            }
-        }
-
-        /*
-        Different techniques depending on encapsulation. Using the native FLAC seektable with Ogg encapsulation is a bit awkward so
-        we'll instead use Ogg's natural seeking facility.
-        */
-#ifndef DR_FLAC_NO_OGG
-        if (pFlac->container == drflac_container_ogg)
-        {
-            wasSuccessful = drflac_ogg__seek_to_pcm_frame(pFlac, pcmFrameIndex);
-        }
-        else
-#endif
-        {
-            /* First try seeking via the seek table. If this fails, fall back to a brute force seek which is much slower. */
-            if (/*!wasSuccessful && */!pFlac->_noSeekTableSeek) {
-                wasSuccessful = drflac__seek_to_pcm_frame__seek_table(pFlac, pcmFrameIndex);
-            }
-
-#if !defined(DR_FLAC_NO_CRC)
-            /* Fall back to binary search if seek table seeking fails. This requires the length of the stream to be known. */
-            if (!wasSuccessful && !pFlac->_noBinarySearchSeek && pFlac->totalPCMFrameCount > 0) {
-                wasSuccessful = drflac__seek_to_pcm_frame__binary_search(pFlac, pcmFrameIndex);
-            }
-#endif
-
-            /* Fall back to brute force if all else fails. */
-            if (!wasSuccessful && !pFlac->_noBruteForceSeek) {
-                wasSuccessful = drflac__seek_to_pcm_frame__brute_force(pFlac, pcmFrameIndex);
-            }
-        }
-
-        if (wasSuccessful) {
-            pFlac->currentPCMFrame = pcmFrameIndex;
-        } else {
-            /* Seek failed. Try putting the decoder back to it's original state. */
-            if (drflac_seek_to_pcm_frame(pFlac, originalPCMFrame) == DRFLAC_FALSE) {
-                /* Failed to seek back to the original PCM frame. Fall back to 0. */
-                drflac_seek_to_pcm_frame(pFlac, 0);
-            }
-        }
-
-        return wasSuccessful;
-    }
-}
-
-
-
-/* High Level APIs */
-
-/* SIZE_MAX */
-#if defined(SIZE_MAX)
-    #define DRFLAC_SIZE_MAX  SIZE_MAX
-#else
-    #if defined(DRFLAC_64BIT)
-        #define DRFLAC_SIZE_MAX  ((drflac_uint64)0xFFFFFFFFFFFFFFFF)
-    #else
-        #define DRFLAC_SIZE_MAX  0xFFFFFFFF
-    #endif
-#endif
-/* End SIZE_MAX */
-
-
-/* Using a macro as the definition of the drflac__full_decode_and_close_*() API family. Sue me. */
-#define DRFLAC_DEFINE_FULL_READ_AND_CLOSE(extension, type) \
-static type* drflac__full_read_and_close_ ## extension (drflac* pFlac, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut)\
-{                                                                                                                                                                   \
-    type* pSampleData = NULL;                                                                                                                                       \
-    drflac_uint64 totalPCMFrameCount;                                                                                                                               \
-                                                                                                                                                                    \
-    DRFLAC_ASSERT(pFlac != NULL);                                                                                                                                   \
-                                                                                                                                                                    \
-    totalPCMFrameCount = pFlac->totalPCMFrameCount;                                                                                                                 \
-                                                                                                                                                                    \
-    if (totalPCMFrameCount == 0) {                                                                                                                                  \
-        type buffer[4096];                                                                                                                                          \
-        drflac_uint64 pcmFramesRead;                                                                                                                                \
-        size_t sampleDataBufferSize = sizeof(buffer);                                                                                                               \
-                                                                                                                                                                    \
-        pSampleData = (type*)drflac__malloc_from_callbacks(sampleDataBufferSize, &pFlac->allocationCallbacks);                                                      \
-        if (pSampleData == NULL) {                                                                                                                                  \
-            goto on_error;                                                                                                                                          \
-        }                                                                                                                                                           \
-                                                                                                                                                                    \
-        while ((pcmFramesRead = (drflac_uint64)drflac_read_pcm_frames_##extension(pFlac, sizeof(buffer)/sizeof(buffer[0])/pFlac->channels, buffer)) > 0) {          \
-            if (((totalPCMFrameCount + pcmFramesRead) * pFlac->channels * sizeof(type)) > sampleDataBufferSize) {                                                   \
-                type* pNewSampleData;                                                                                                                               \
-                size_t newSampleDataBufferSize;                                                                                                                     \
-                                                                                                                                                                    \
-                newSampleDataBufferSize = sampleDataBufferSize * 2;                                                                                                 \
-                pNewSampleData = (type*)drflac__realloc_from_callbacks(pSampleData, newSampleDataBufferSize, sampleDataBufferSize, &pFlac->allocationCallbacks);    \
-                if (pNewSampleData == NULL) {                                                                                                                       \
-                    drflac__free_from_callbacks(pSampleData, &pFlac->allocationCallbacks);                                                                          \
-                    goto on_error;                                                                                                                                  \
-                }                                                                                                                                                   \
-                                                                                                                                                                    \
-                sampleDataBufferSize = newSampleDataBufferSize;                                                                                                     \
-                pSampleData = pNewSampleData;                                                                                                                       \
-            }                                                                                                                                                       \
-                                                                                                                                                                    \
-            DRFLAC_COPY_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), buffer, (size_t)(pcmFramesRead*pFlac->channels*sizeof(type)));                   \
-            totalPCMFrameCount += pcmFramesRead;                                                                                                                    \
-        }                                                                                                                                                           \
-                                                                                                                                                                    \
-        /* At this point everything should be decoded, but we just want to fill the unused part buffer with silence - need to                                       \
-           protect those ears from random noise! */                                                                                                                 \
-        DRFLAC_ZERO_MEMORY(pSampleData + (totalPCMFrameCount*pFlac->channels), (size_t)(sampleDataBufferSize - totalPCMFrameCount*pFlac->channels*sizeof(type)));   \
-    } else {                                                                                                                                                        \
-        drflac_uint64 dataSize = totalPCMFrameCount*pFlac->channels*sizeof(type);                                                                                   \
-        if (dataSize > (drflac_uint64)DRFLAC_SIZE_MAX) {                                                                                                            \
-            goto on_error;  /* The decoded data is too big. */                                                                                                      \
-        }                                                                                                                                                           \
-                                                                                                                                                                    \
-        pSampleData = (type*)drflac__malloc_from_callbacks((size_t)dataSize, &pFlac->allocationCallbacks);    /* <-- Safe cast as per the check above. */           \
-        if (pSampleData == NULL) {                                                                                                                                  \
-            goto on_error;                                                                                                                                          \
-        }                                                                                                                                                           \
-                                                                                                                                                                    \
-        totalPCMFrameCount = drflac_read_pcm_frames_##extension(pFlac, pFlac->totalPCMFrameCount, pSampleData);                                                     \
-    }                                                                                                                                                               \
-                                                                                                                                                                    \
-    if (sampleRateOut) *sampleRateOut = pFlac->sampleRate;                                                                                                          \
-    if (channelsOut) *channelsOut = pFlac->channels;                                                                                                                \
-    if (totalPCMFrameCountOut) *totalPCMFrameCountOut = totalPCMFrameCount;                                                                                         \
-                                                                                                                                                                    \
-    drflac_close(pFlac);                                                                                                                                            \
-    return pSampleData;                                                                                                                                             \
-                                                                                                                                                                    \
-on_error:                                                                                                                                                           \
-    drflac_close(pFlac);                                                                                                                                            \
-    return NULL;                                                                                                                                                    \
-}
-
-DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s32, drflac_int32)
-DRFLAC_DEFINE_FULL_READ_AND_CLOSE(s16, drflac_int16)
-DRFLAC_DEFINE_FULL_READ_AND_CLOSE(f32, float)
-
-DRFLAC_API drflac_int32* drflac_open_and_read_pcm_frames_s32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-
-    if (channelsOut) {
-        *channelsOut = 0;
-    }
-    if (sampleRateOut) {
-        *sampleRateOut = 0;
-    }
-    if (totalPCMFrameCountOut) {
-        *totalPCMFrameCountOut = 0;
-    }
-
-    pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    return drflac__full_read_and_close_s32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
-}
-
-DRFLAC_API drflac_int16* drflac_open_and_read_pcm_frames_s16(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-
-    if (channelsOut) {
-        *channelsOut = 0;
-    }
-    if (sampleRateOut) {
-        *sampleRateOut = 0;
-    }
-    if (totalPCMFrameCountOut) {
-        *totalPCMFrameCountOut = 0;
-    }
-
-    pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    return drflac__full_read_and_close_s16(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
-}
-
-DRFLAC_API float* drflac_open_and_read_pcm_frames_f32(drflac_read_proc onRead, drflac_seek_proc onSeek, void* pUserData, unsigned int* channelsOut, unsigned int* sampleRateOut, drflac_uint64* totalPCMFrameCountOut, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-
-    if (channelsOut) {
-        *channelsOut = 0;
-    }
-    if (sampleRateOut) {
-        *sampleRateOut = 0;
-    }
-    if (totalPCMFrameCountOut) {
-        *totalPCMFrameCountOut = 0;
-    }
-
-    pFlac = drflac_open(onRead, onSeek, pUserData, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    return drflac__full_read_and_close_f32(pFlac, channelsOut, sampleRateOut, totalPCMFrameCountOut);
-}
-
-#ifndef DR_FLAC_NO_STDIO
-DRFLAC_API drflac_int32* drflac_open_file_and_read_pcm_frames_s32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-
-    if (sampleRate) {
-        *sampleRate = 0;
-    }
-    if (channels) {
-        *channels = 0;
-    }
-    if (totalPCMFrameCount) {
-        *totalPCMFrameCount = 0;
-    }
-
-    pFlac = drflac_open_file(filename, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
-}
-
-DRFLAC_API drflac_int16* drflac_open_file_and_read_pcm_frames_s16(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-
-    if (sampleRate) {
-        *sampleRate = 0;
-    }
-    if (channels) {
-        *channels = 0;
-    }
-    if (totalPCMFrameCount) {
-        *totalPCMFrameCount = 0;
-    }
-
-    pFlac = drflac_open_file(filename, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
-}
-
-DRFLAC_API float* drflac_open_file_and_read_pcm_frames_f32(const char* filename, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-
-    if (sampleRate) {
-        *sampleRate = 0;
-    }
-    if (channels) {
-        *channels = 0;
-    }
-    if (totalPCMFrameCount) {
-        *totalPCMFrameCount = 0;
-    }
-
-    pFlac = drflac_open_file(filename, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
-}
-#endif
-
-DRFLAC_API drflac_int32* drflac_open_memory_and_read_pcm_frames_s32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-
-    if (sampleRate) {
-        *sampleRate = 0;
-    }
-    if (channels) {
-        *channels = 0;
-    }
-    if (totalPCMFrameCount) {
-        *totalPCMFrameCount = 0;
-    }
-
-    pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    return drflac__full_read_and_close_s32(pFlac, channels, sampleRate, totalPCMFrameCount);
-}
-
-DRFLAC_API drflac_int16* drflac_open_memory_and_read_pcm_frames_s16(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-
-    if (sampleRate) {
-        *sampleRate = 0;
-    }
-    if (channels) {
-        *channels = 0;
-    }
-    if (totalPCMFrameCount) {
-        *totalPCMFrameCount = 0;
-    }
-
-    pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    return drflac__full_read_and_close_s16(pFlac, channels, sampleRate, totalPCMFrameCount);
-}
-
-DRFLAC_API float* drflac_open_memory_and_read_pcm_frames_f32(const void* data, size_t dataSize, unsigned int* channels, unsigned int* sampleRate, drflac_uint64* totalPCMFrameCount, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    drflac* pFlac;
-
-    if (sampleRate) {
-        *sampleRate = 0;
-    }
-    if (channels) {
-        *channels = 0;
-    }
-    if (totalPCMFrameCount) {
-        *totalPCMFrameCount = 0;
-    }
-
-    pFlac = drflac_open_memory(data, dataSize, pAllocationCallbacks);
-    if (pFlac == NULL) {
-        return NULL;
-    }
-
-    return drflac__full_read_and_close_f32(pFlac, channels, sampleRate, totalPCMFrameCount);
-}
-
-
-DRFLAC_API void drflac_free(void* p, const drflac_allocation_callbacks* pAllocationCallbacks)
-{
-    if (pAllocationCallbacks != NULL) {
-        drflac__free_from_callbacks(p, pAllocationCallbacks);
-    } else {
-        drflac__free_default(p, NULL);
-    }
-}
-
-
-
-
-DRFLAC_API void drflac_init_vorbis_comment_iterator(drflac_vorbis_comment_iterator* pIter, drflac_uint32 commentCount, const void* pComments)
-{
-    if (pIter == NULL) {
-        return;
-    }
-
-    pIter->countRemaining = commentCount;
-    pIter->pRunningData   = (const char*)pComments;
-}
-
-DRFLAC_API const char* drflac_next_vorbis_comment(drflac_vorbis_comment_iterator* pIter, drflac_uint32* pCommentLengthOut)
-{
-    drflac_int32 length;
-    const char* pComment;
-
-    /* Safety. */
-    if (pCommentLengthOut) {
-        *pCommentLengthOut = 0;
-    }
-
-    if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
-        return NULL;
-    }
-
-    length = drflac__le2host_32_ptr_unaligned(pIter->pRunningData);
-    pIter->pRunningData += 4;
-
-    pComment = pIter->pRunningData;
-    pIter->pRunningData += length;
-    pIter->countRemaining -= 1;
-
-    if (pCommentLengthOut) {
-        *pCommentLengthOut = length;
-    }
-
-    return pComment;
-}
-
-
-
-
-DRFLAC_API void drflac_init_cuesheet_track_iterator(drflac_cuesheet_track_iterator* pIter, drflac_uint32 trackCount, const void* pTrackData)
-{
-    if (pIter == NULL) {
-        return;
-    }
-
-    pIter->countRemaining = trackCount;
-    pIter->pRunningData   = (const char*)pTrackData;
-}
-
-DRFLAC_API drflac_bool32 drflac_next_cuesheet_track(drflac_cuesheet_track_iterator* pIter, drflac_cuesheet_track* pCuesheetTrack)
-{
-    drflac_cuesheet_track cuesheetTrack;
-    const char* pRunningData;
-    drflac_uint64 offsetHi;
-    drflac_uint64 offsetLo;
-
-    if (pIter == NULL || pIter->countRemaining == 0 || pIter->pRunningData == NULL) {
-        return DRFLAC_FALSE;
-    }
-
-    pRunningData = pIter->pRunningData;
-
-    offsetHi                   = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
-    offsetLo                   = drflac__be2host_32(*(const drflac_uint32*)pRunningData); pRunningData += 4;
-    cuesheetTrack.offset       = offsetLo | (offsetHi << 32);
-    cuesheetTrack.trackNumber  = pRunningData[0];                                         pRunningData += 1;
-    DRFLAC_COPY_MEMORY(cuesheetTrack.ISRC, pRunningData, sizeof(cuesheetTrack.ISRC));     pRunningData += 12;
-    cuesheetTrack.isAudio      = (pRunningData[0] & 0x80) != 0;
-    cuesheetTrack.preEmphasis  = (pRunningData[0] & 0x40) != 0;                           pRunningData += 14;
-    cuesheetTrack.indexCount   = pRunningData[0];                                         pRunningData += 1;
-    cuesheetTrack.pIndexPoints = (const drflac_cuesheet_track_index*)pRunningData;        pRunningData += cuesheetTrack.indexCount * sizeof(drflac_cuesheet_track_index);
-
-    pIter->pRunningData = pRunningData;
-    pIter->countRemaining -= 1;
-
-    if (pCuesheetTrack) {
-        *pCuesheetTrack = cuesheetTrack;
-    }
-
-    return DRFLAC_TRUE;
-}
-
-#if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
-    #pragma GCC diagnostic pop
-#endif
-#endif  /* dr_flac_c */
-#endif  /* DR_FLAC_IMPLEMENTATION */
-
-
-/*
-REVISION HISTORY
-================
-v0.12.42 - 2023-11-02
-  - Fix build for ARMv6-M.
-  - Fix a compilation warning with GCC.
-
-v0.12.41 - 2023-06-17
-  - Fix an incorrect date in revision history. No functional change.
-
-v0.12.40 - 2023-05-22
-  - Minor code restructure. No functional change.
-
-v0.12.39 - 2022-09-17
-  - Fix compilation with DJGPP.
-  - Fix compilation error with Visual Studio 2019 and the ARM build.
-  - Fix an error with SSE 4.1 detection.
-  - Add support for disabling wchar_t with DR_WAV_NO_WCHAR.
-  - Improve compatibility with compilers which lack support for explicit struct packing.
-  - Improve compatibility with low-end and embedded hardware by reducing the amount of stack
-    allocation when loading an Ogg encapsulated file.
-
-v0.12.38 - 2022-04-10
-  - Fix compilation error on older versions of GCC.
-
-v0.12.37 - 2022-02-12
-  - Improve ARM detection.
-
-v0.12.36 - 2022-02-07
-  - Fix a compilation error with the ARM build.
-
-v0.12.35 - 2022-02-06
-  - Fix a bug due to underestimating the amount of precision required for the prediction stage.
-  - Fix some bugs found from fuzz testing.
-
-v0.12.34 - 2022-01-07
-  - Fix some misalignment bugs when reading metadata.
-
-v0.12.33 - 2021-12-22
-  - Fix a bug with seeking when the seek table does not start at PCM frame 0.
-
-v0.12.32 - 2021-12-11
-  - Fix a warning with Clang.
-
-v0.12.31 - 2021-08-16
-  - Silence some warnings.
-
-v0.12.30 - 2021-07-31
-  - Fix platform detection for ARM64.
-
-v0.12.29 - 2021-04-02
-  - Fix a bug where the running PCM frame index is set to an invalid value when over-seeking.
-  - Fix a decoding error due to an incorrect validation check.
-
-v0.12.28 - 2021-02-21
-  - Fix a warning due to referencing _MSC_VER when it is undefined.
-
-v0.12.27 - 2021-01-31
-  - Fix a static analysis warning.
-
-v0.12.26 - 2021-01-17
-  - Fix a compilation warning due to _BSD_SOURCE being deprecated.
-
-v0.12.25 - 2020-12-26
-  - Update documentation.
-
-v0.12.24 - 2020-11-29
-  - Fix ARM64/NEON detection when compiling with MSVC.
-
-v0.12.23 - 2020-11-21
-  - Fix compilation with OpenWatcom.
-
-v0.12.22 - 2020-11-01
-  - Fix an error with the previous release.
-
-v0.12.21 - 2020-11-01
-  - Fix a possible deadlock when seeking.
-  - Improve compiler support for older versions of GCC.
-
-v0.12.20 - 2020-09-08
-  - Fix a compilation error on older compilers.
-
-v0.12.19 - 2020-08-30
-  - Fix a bug due to an undefined 32-bit shift.
-
-v0.12.18 - 2020-08-14
-  - Fix a crash when compiling with clang-cl.
-
-v0.12.17 - 2020-08-02
-  - Simplify sized types.
-
-v0.12.16 - 2020-07-25
-  - Fix a compilation warning.
-
-v0.12.15 - 2020-07-06
-  - Check for negative LPC shifts and return an error.
-
-v0.12.14 - 2020-06-23
-  - Add include guard for the implementation section.
-
-v0.12.13 - 2020-05-16
-  - Add compile-time and run-time version querying.
-    - DRFLAC_VERSION_MINOR
-    - DRFLAC_VERSION_MAJOR
-    - DRFLAC_VERSION_REVISION
-    - DRFLAC_VERSION_STRING
-    - drflac_version()
-    - drflac_version_string()
-
-v0.12.12 - 2020-04-30
-  - Fix compilation errors with VC6.
-
-v0.12.11 - 2020-04-19
-  - Fix some pedantic warnings.
-  - Fix some undefined behaviour warnings.
-
-v0.12.10 - 2020-04-10
-  - Fix some bugs when trying to seek with an invalid seek table.
-
-v0.12.9 - 2020-04-05
-  - Fix warnings.
-
-v0.12.8 - 2020-04-04
-  - Add drflac_open_file_w() and drflac_open_file_with_metadata_w().
-  - Fix some static analysis warnings.
-  - Minor documentation updates.
-
-v0.12.7 - 2020-03-14
-  - Fix compilation errors with VC6.
-
-v0.12.6 - 2020-03-07
-  - Fix compilation error with Visual Studio .NET 2003.
-
-v0.12.5 - 2020-01-30
-  - Silence some static analysis warnings.
-
-v0.12.4 - 2020-01-29
-  - Silence some static analysis warnings.
-
-v0.12.3 - 2019-12-02
-  - Fix some warnings when compiling with GCC and the -Og flag.
-  - Fix a crash in out-of-memory situations.
-  - Fix potential integer overflow bug.
-  - Fix some static analysis warnings.
-  - Fix a possible crash when using custom memory allocators without a custom realloc() implementation.
-  - Fix a bug with binary search seeking where the bits per sample is not a multiple of 8.
-
-v0.12.2 - 2019-10-07
-  - Internal code clean up.
-
-v0.12.1 - 2019-09-29
-  - Fix some Clang Static Analyzer warnings.
-  - Fix an unused variable warning.
-
-v0.12.0 - 2019-09-23
-  - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
-    routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
-    - drflac_open()
-    - drflac_open_relaxed()
-    - drflac_open_with_metadata()
-    - drflac_open_with_metadata_relaxed()
-    - drflac_open_file()
-    - drflac_open_file_with_metadata()
-    - drflac_open_memory()
-    - drflac_open_memory_with_metadata()
-    - drflac_open_and_read_pcm_frames_s32()
-    - drflac_open_and_read_pcm_frames_s16()
-    - drflac_open_and_read_pcm_frames_f32()
-    - drflac_open_file_and_read_pcm_frames_s32()
-    - drflac_open_file_and_read_pcm_frames_s16()
-    - drflac_open_file_and_read_pcm_frames_f32()
-    - drflac_open_memory_and_read_pcm_frames_s32()
-    - drflac_open_memory_and_read_pcm_frames_s16()
-    - drflac_open_memory_and_read_pcm_frames_f32()
-    Set this extra parameter to NULL to use defaults which is the same as the previous behaviour. Setting this NULL will use
-    DRFLAC_MALLOC, DRFLAC_REALLOC and DRFLAC_FREE.
-  - Remove deprecated APIs:
-    - drflac_read_s32()
-    - drflac_read_s16()
-    - drflac_read_f32()
-    - drflac_seek_to_sample()
-    - drflac_open_and_decode_s32()
-    - drflac_open_and_decode_s16()
-    - drflac_open_and_decode_f32()
-    - drflac_open_and_decode_file_s32()
-    - drflac_open_and_decode_file_s16()
-    - drflac_open_and_decode_file_f32()
-    - drflac_open_and_decode_memory_s32()
-    - drflac_open_and_decode_memory_s16()
-    - drflac_open_and_decode_memory_f32()
-  - Remove drflac.totalSampleCount which is now replaced with drflac.totalPCMFrameCount. You can emulate drflac.totalSampleCount
-    by doing pFlac->totalPCMFrameCount*pFlac->channels.
-  - Rename drflac.currentFrame to drflac.currentFLACFrame to remove ambiguity with PCM frames.
-  - Fix errors when seeking to the end of a stream.
-  - Optimizations to seeking.
-  - SSE improvements and optimizations.
-  - ARM NEON optimizations.
-  - Optimizations to drflac_read_pcm_frames_s16().
-  - Optimizations to drflac_read_pcm_frames_s32().
-
-v0.11.10 - 2019-06-26
-  - Fix a compiler error.
-
-v0.11.9 - 2019-06-16
-  - Silence some ThreadSanitizer warnings.
-
-v0.11.8 - 2019-05-21
-  - Fix warnings.
-
-v0.11.7 - 2019-05-06
-  - C89 fixes.
-
-v0.11.6 - 2019-05-05
-  - Add support for C89.
-  - Fix a compiler warning when CRC is disabled.
-  - Change license to choice of public domain or MIT-0.
-
-v0.11.5 - 2019-04-19
-  - Fix a compiler error with GCC.
-
-v0.11.4 - 2019-04-17
-  - Fix some warnings with GCC when compiling with -std=c99.
-
-v0.11.3 - 2019-04-07
-  - Silence warnings with GCC.
-
-v0.11.2 - 2019-03-10
-  - Fix a warning.
-
-v0.11.1 - 2019-02-17
-  - Fix a potential bug with seeking.
-
-v0.11.0 - 2018-12-16
-  - API CHANGE: Deprecated drflac_read_s32(), drflac_read_s16() and drflac_read_f32() and replaced them with
-    drflac_read_pcm_frames_s32(), drflac_read_pcm_frames_s16() and drflac_read_pcm_frames_f32(). The new APIs take
-    and return PCM frame counts instead of sample counts. To upgrade you will need to change the input count by
-    dividing it by the channel count, and then do the same with the return value.
-  - API_CHANGE: Deprecated drflac_seek_to_sample() and replaced with drflac_seek_to_pcm_frame(). Same rules as
-    the changes to drflac_read_*() apply.
-  - API CHANGE: Deprecated drflac_open_and_decode_*() and replaced with drflac_open_*_and_read_*(). Same rules as
-    the changes to drflac_read_*() apply.
-  - Optimizations.
-
-v0.10.0 - 2018-09-11
-  - Remove the DR_FLAC_NO_WIN32_IO option and the Win32 file IO functionality. If you need to use Win32 file IO you
-    need to do it yourself via the callback API.
-  - Fix the clang build.
-  - Fix undefined behavior.
-  - Fix errors with CUESHEET metdata blocks.
-  - Add an API for iterating over each cuesheet track in the CUESHEET metadata block. This works the same way as the
-    Vorbis comment API.
-  - Other miscellaneous bug fixes, mostly relating to invalid FLAC streams.
-  - Minor optimizations.
-
-v0.9.11 - 2018-08-29
-  - Fix a bug with sample reconstruction.
-
-v0.9.10 - 2018-08-07
-  - Improve 64-bit detection.
-
-v0.9.9 - 2018-08-05
-  - Fix C++ build on older versions of GCC.
-
-v0.9.8 - 2018-07-24
-  - Fix compilation errors.
-
-v0.9.7 - 2018-07-05
-  - Fix a warning.
-
-v0.9.6 - 2018-06-29
-  - Fix some typos.
-
-v0.9.5 - 2018-06-23
-  - Fix some warnings.
-
-v0.9.4 - 2018-06-14
-  - Optimizations to seeking.
-  - Clean up.
-
-v0.9.3 - 2018-05-22
-  - Bug fix.
-
-v0.9.2 - 2018-05-12
-  - Fix a compilation error due to a missing break statement.
-
-v0.9.1 - 2018-04-29
-  - Fix compilation error with Clang.
-
-v0.9 - 2018-04-24
-  - Fix Clang build.
-  - Start using major.minor.revision versioning.
-
-v0.8g - 2018-04-19
-  - Fix build on non-x86/x64 architectures.
-
-v0.8f - 2018-02-02
-  - Stop pretending to support changing rate/channels mid stream.
-
-v0.8e - 2018-02-01
-  - Fix a crash when the block size of a frame is larger than the maximum block size defined by the FLAC stream.
-  - Fix a crash the the Rice partition order is invalid.
-
-v0.8d - 2017-09-22
-  - Add support for decoding streams with ID3 tags. ID3 tags are just skipped.
-
-v0.8c - 2017-09-07
-  - Fix warning on non-x86/x64 architectures.
-
-v0.8b - 2017-08-19
-  - Fix build on non-x86/x64 architectures.
-
-v0.8a - 2017-08-13
-  - A small optimization for the Clang build.
-
-v0.8 - 2017-08-12
-  - API CHANGE: Rename dr_* types to drflac_*.
-  - Optimizations. This brings dr_flac back to about the same class of efficiency as the reference implementation.
-  - Add support for custom implementations of malloc(), realloc(), etc.
-  - Add CRC checking to Ogg encapsulated streams.
-  - Fix VC++ 6 build. This is only for the C++ compiler. The C compiler is not currently supported.
-  - Bug fixes.
-
-v0.7 - 2017-07-23
-  - Add support for opening a stream without a header block. To do this, use drflac_open_relaxed() / drflac_open_with_metadata_relaxed().
-
-v0.6 - 2017-07-22
-  - Add support for recovering from invalid frames. With this change, dr_flac will simply skip over invalid frames as if they
-    never existed. Frames are checked against their sync code, the CRC-8 of the frame header and the CRC-16 of the whole frame.
-
-v0.5 - 2017-07-16
-  - Fix typos.
-  - Change drflac_bool* types to unsigned.
-  - Add CRC checking. This makes dr_flac slower, but can be disabled with #define DR_FLAC_NO_CRC.
-
-v0.4f - 2017-03-10
-  - Fix a couple of bugs with the bitstreaming code.
-
-v0.4e - 2017-02-17
-  - Fix some warnings.
-
-v0.4d - 2016-12-26
-  - Add support for 32-bit floating-point PCM decoding.
-  - Use drflac_int* and drflac_uint* sized types to improve compiler support.
-  - Minor improvements to documentation.
-
-v0.4c - 2016-12-26
-  - Add support for signed 16-bit integer PCM decoding.
-
-v0.4b - 2016-10-23
-  - A minor change to drflac_bool8 and drflac_bool32 types.
-
-v0.4a - 2016-10-11
-  - Rename drBool32 to drflac_bool32 for styling consistency.
-
-v0.4 - 2016-09-29
-  - API/ABI CHANGE: Use fixed size 32-bit booleans instead of the built-in bool type.
-  - API CHANGE: Rename drflac_open_and_decode*() to drflac_open_and_decode*_s32().
-  - API CHANGE: Swap the order of "channels" and "sampleRate" parameters in drflac_open_and_decode*(). Rationale for this is to
-    keep it consistent with drflac_audio.
-
-v0.3f - 2016-09-21
-  - Fix a warning with GCC.
-
-v0.3e - 2016-09-18
-  - Fixed a bug where GCC 4.3+ was not getting properly identified.
-  - Fixed a few typos.
-  - Changed date formats to ISO 8601 (YYYY-MM-DD).
-
-v0.3d - 2016-06-11
-  - Minor clean up.
-
-v0.3c - 2016-05-28
-  - Fixed compilation error.
-
-v0.3b - 2016-05-16
-  - Fixed Linux/GCC build.
-  - Updated documentation.
-
-v0.3a - 2016-05-15
-  - Minor fixes to documentation.
-
-v0.3 - 2016-05-11
-  - Optimizations. Now at about parity with the reference implementation on 32-bit builds.
-  - Lots of clean up.
-
-v0.2b - 2016-05-10
-  - Bug fixes.
-
-v0.2a - 2016-05-10
-  - Made drflac_open_and_decode() more robust.
-  - Removed an unused debugging variable
-
-v0.2 - 2016-05-09
-  - Added support for Ogg encapsulation.
-  - API CHANGE. Have the onSeek callback take a third argument which specifies whether or not the seek
-    should be relative to the start or the current position. Also changes the seeking rules such that
-    seeking offsets will never be negative.
-  - Have drflac_open_and_decode() fail gracefully if the stream has an unknown total sample count.
-
-v0.1b - 2016-05-07
-  - Properly close the file handle in drflac_open_file() and family when the decoder fails to initialize.
-  - Removed a stale comment.
-
-v0.1a - 2016-05-05
-  - Minor formatting changes.
-  - Fixed a warning on the GCC build.
-
-v0.1 - 2016-05-03
-  - Initial versioned release.
-*/
-
-/*
-This software is available as a choice of the following licenses. Choose
-whichever you prefer.
-
-===============================================================================
-ALTERNATIVE 1 - Public Domain (www.unlicense.org)
-===============================================================================
-This is free and unencumbered software released into the public domain.
-
-Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
-software, either in source code form or as a compiled binary, for any purpose,
-commercial or non-commercial, and by any means.
-
-In jurisdictions that recognize copyright laws, the author or authors of this
-software dedicate any and all copyright interest in the software to the public
-domain. We make this dedication for the benefit of the public at large and to
-the detriment of our heirs and successors. We intend this dedication to be an
-overt act of relinquishment in perpetuity of all present and future rights to
-this software under copyright law.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
-ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
-For more information, please refer to <http://unlicense.org/>
-
-===============================================================================
-ALTERNATIVE 2 - MIT No Attribution
-===============================================================================
-Copyright 2023 David Reid
-
-Permission is hereby granted, free of charge, to any person obtaining a copy of
-this software and associated documentation files (the "Software"), to deal in
-the Software without restriction, including without limitation the rights to
-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
-of the Software, and to permit persons to whom the Software is furnished to do
-so.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
-*/
diff --git a/modules/audio_decoders/source/dr_mp3.h b/modules/audio_decoders/source/dr_mp3.h
deleted file mode 100644
index 84849ee..0000000
--- a/modules/audio_decoders/source/dr_mp3.h
+++ /dev/null
@@ -1,4834 +0,0 @@
-/*
-MP3 audio decoder. Choice of public domain or MIT-0. See license statements at the end of this file.
-dr_mp3 - v0.6.38 - 2023-11-02
-
-David Reid - mackron@gmail.com
-
-GitHub: https://github.com/mackron/dr_libs
-
-Based on minimp3 (https://github.com/lieff/minimp3) which is where the real work was done. See the bottom of this file for differences between minimp3 and dr_mp3.
-*/
-
-/*
-RELEASE NOTES - VERSION 0.6
-===========================
-Version 0.6 includes breaking changes with the configuration of decoders. The ability to customize the number of output channels and the sample rate has been
-removed. You must now use the channel count and sample rate reported by the MP3 stream itself, and all channel and sample rate conversion must be done
-yourself.
-
-
-Changes to Initialization
--------------------------
-Previously, `drmp3_init()`, etc. took a pointer to a `drmp3_config` object that allowed you to customize the output channels and sample rate. This has been
-removed. If you need the old behaviour you will need to convert the data yourself or just not upgrade. The following APIs have changed.
-
-    `drmp3_init()`
-    `drmp3_init_memory()`
-    `drmp3_init_file()`
-
-
-Miscellaneous Changes
----------------------
-Support for loading a file from a `wchar_t` string has been added via the `drmp3_init_file_w()` API.
-*/
-
-/*
-Introducation
-=============
-dr_mp3 is a single file library. To use it, do something like the following in one .c file.
-
-    ```c
-    #define DR_MP3_IMPLEMENTATION
-    #include "dr_mp3.h"
-    ```
-
-You can then #include this file in other parts of the program as you would with any other header file. To decode audio data, do something like the following:
-
-    ```c
-    drmp3 mp3;
-    if (!drmp3_init_file(&mp3, "MySong.mp3", NULL)) {
-        // Failed to open file
-    }
-
-    ...
-
-    drmp3_uint64 framesRead = drmp3_read_pcm_frames_f32(pMP3, framesToRead, pFrames);
-    ```
-
-The drmp3 object is transparent so you can get access to the channel count and sample rate like so:
-
-    ```
-    drmp3_uint32 channels = mp3.channels;
-    drmp3_uint32 sampleRate = mp3.sampleRate;
-    ```
-
-The example above initializes a decoder from a file, but you can also initialize it from a block of memory and read and seek callbacks with
-`drmp3_init_memory()` and `drmp3_init()` respectively.
-
-You do not need to do any annoying memory management when reading PCM frames - this is all managed internally. You can request any number of PCM frames in each
-call to `drmp3_read_pcm_frames_f32()` and it will return as many PCM frames as it can, up to the requested amount.
-
-You can also decode an entire file in one go with `drmp3_open_and_read_pcm_frames_f32()`, `drmp3_open_memory_and_read_pcm_frames_f32()` and
-`drmp3_open_file_and_read_pcm_frames_f32()`.
-
-
-Build Options
-=============
-#define these options before including this file.
-
-#define DR_MP3_NO_STDIO
-  Disable drmp3_init_file(), etc.
-
-#define DR_MP3_NO_SIMD
-  Disable SIMD optimizations.
-*/
-
-#ifndef dr_mp3_h
-#define dr_mp3_h
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#define DRMP3_STRINGIFY(x)      #x
-#define DRMP3_XSTRINGIFY(x)     DRMP3_STRINGIFY(x)
-
-#define DRMP3_VERSION_MAJOR     0
-#define DRMP3_VERSION_MINOR     6
-#define DRMP3_VERSION_REVISION  38
-#define DRMP3_VERSION_STRING    DRMP3_XSTRINGIFY(DRMP3_VERSION_MAJOR) "." DRMP3_XSTRINGIFY(DRMP3_VERSION_MINOR) "." DRMP3_XSTRINGIFY(DRMP3_VERSION_REVISION)
-
-#include <stddef.h> /* For size_t. */
-
-/* Sized Types */
-typedef   signed char           drmp3_int8;
-typedef unsigned char           drmp3_uint8;
-typedef   signed short          drmp3_int16;
-typedef unsigned short          drmp3_uint16;
-typedef   signed int            drmp3_int32;
-typedef unsigned int            drmp3_uint32;
-#if defined(_MSC_VER) && !defined(__clang__)
-    typedef   signed __int64    drmp3_int64;
-    typedef unsigned __int64    drmp3_uint64;
-#else
-    #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
-        #pragma GCC diagnostic push
-        #pragma GCC diagnostic ignored "-Wlong-long"
-        #if defined(__clang__)
-            #pragma GCC diagnostic ignored "-Wc++11-long-long"
-        #endif
-    #endif
-    typedef   signed long long  drmp3_int64;
-    typedef unsigned long long  drmp3_uint64;
-    #if defined(__clang__) || (defined(__GNUC__) && (__GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6)))
-        #pragma GCC diagnostic pop
-    #endif
-#endif
-#if defined(__LP64__) || defined(_WIN64) || (defined(__x86_64__) && !defined(__ILP32__)) || defined(_M_X64) || defined(__ia64) || defined (_M_IA64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
-    typedef drmp3_uint64        drmp3_uintptr;
-#else
-    typedef drmp3_uint32        drmp3_uintptr;
-#endif
-typedef drmp3_uint8             drmp3_bool8;
-typedef drmp3_uint32            drmp3_bool32;
-#define DRMP3_TRUE              1
-#define DRMP3_FALSE             0
-/* End Sized Types */
-
-/* Decorations */
-#if !defined(DRMP3_API)
-    #if defined(DRMP3_DLL)
-        #if defined(_WIN32)
-            #define DRMP3_DLL_IMPORT  __declspec(dllimport)
-            #define DRMP3_DLL_EXPORT  __declspec(dllexport)
-            #define DRMP3_DLL_PRIVATE static
-        #else
-            #if defined(__GNUC__) && __GNUC__ >= 4
-                #define DRMP3_DLL_IMPORT  __attribute__((visibility("default")))
-                #define DRMP3_DLL_EXPORT  __attribute__((visibility("default")))
-                #define DRMP3_DLL_PRIVATE __attribute__((visibility("hidden")))
-            #else
-                #define DRMP3_DLL_IMPORT
-                #define DRMP3_DLL_EXPORT
-                #define DRMP3_DLL_PRIVATE static
-            #endif
-        #endif
-
-        #if defined(DR_MP3_IMPLEMENTATION) || defined(DRMP3_IMPLEMENTATION)
-            #define DRMP3_API  DRMP3_DLL_EXPORT
-        #else
-            #define DRMP3_API  DRMP3_DLL_IMPORT
-        #endif
-        #define DRMP3_PRIVATE DRMP3_DLL_PRIVATE
-    #else
-        #define DRMP3_API extern
-        #define DRMP3_PRIVATE static
-    #endif
-#endif
-/* End Decorations */
-
-/* Result Codes */
-typedef drmp3_int32 drmp3_result;
-#define DRMP3_SUCCESS                        0
-#define DRMP3_ERROR                         -1   /* A generic error. */
-#define DRMP3_INVALID_ARGS                  -2
-#define DRMP3_INVALID_OPERATION             -3
-#define DRMP3_OUT_OF_MEMORY                 -4
-#define DRMP3_OUT_OF_RANGE                  -5
-#define DRMP3_ACCESS_DENIED                 -6
-#define DRMP3_DOES_NOT_EXIST                -7
-#define DRMP3_ALREADY_EXISTS                -8
-#define DRMP3_TOO_MANY_OPEN_FILES           -9
-#define DRMP3_INVALID_FILE                  -10
-#define DRMP3_TOO_BIG                       -11
-#define DRMP3_PATH_TOO_LONG                 -12
-#define DRMP3_NAME_TOO_LONG                 -13
-#define DRMP3_NOT_DIRECTORY                 -14
-#define DRMP3_IS_DIRECTORY                  -15
-#define DRMP3_DIRECTORY_NOT_EMPTY           -16
-#define DRMP3_END_OF_FILE                   -17
-#define DRMP3_NO_SPACE                      -18
-#define DRMP3_BUSY                          -19
-#define DRMP3_IO_ERROR                      -20
-#define DRMP3_INTERRUPT                     -21
-#define DRMP3_UNAVAILABLE                   -22
-#define DRMP3_ALREADY_IN_USE                -23
-#define DRMP3_BAD_ADDRESS                   -24
-#define DRMP3_BAD_SEEK                      -25
-#define DRMP3_BAD_PIPE                      -26
-#define DRMP3_DEADLOCK                      -27
-#define DRMP3_TOO_MANY_LINKS                -28
-#define DRMP3_NOT_IMPLEMENTED               -29
-#define DRMP3_NO_MESSAGE                    -30
-#define DRMP3_BAD_MESSAGE                   -31
-#define DRMP3_NO_DATA_AVAILABLE             -32
-#define DRMP3_INVALID_DATA                  -33
-#define DRMP3_TIMEOUT                       -34
-#define DRMP3_NO_NETWORK                    -35
-#define DRMP3_NOT_UNIQUE                    -36
-#define DRMP3_NOT_SOCKET                    -37
-#define DRMP3_NO_ADDRESS                    -38
-#define DRMP3_BAD_PROTOCOL                  -39
-#define DRMP3_PROTOCOL_UNAVAILABLE          -40
-#define DRMP3_PROTOCOL_NOT_SUPPORTED        -41
-#define DRMP3_PROTOCOL_FAMILY_NOT_SUPPORTED -42
-#define DRMP3_ADDRESS_FAMILY_NOT_SUPPORTED  -43
-#define DRMP3_SOCKET_NOT_SUPPORTED          -44
-#define DRMP3_CONNECTION_RESET              -45
-#define DRMP3_ALREADY_CONNECTED             -46
-#define DRMP3_NOT_CONNECTED                 -47
-#define DRMP3_CONNECTION_REFUSED            -48
-#define DRMP3_NO_HOST                       -49
-#define DRMP3_IN_PROGRESS                   -50
-#define DRMP3_CANCELLED                     -51
-#define DRMP3_MEMORY_ALREADY_MAPPED         -52
-#define DRMP3_AT_END                        -53
-/* End Result Codes */
-
-#define DRMP3_MAX_PCM_FRAMES_PER_MP3_FRAME  1152
-#define DRMP3_MAX_SAMPLES_PER_FRAME         (DRMP3_MAX_PCM_FRAMES_PER_MP3_FRAME*2)
-
-/* Inline */
-#ifdef _MSC_VER
-    #define DRMP3_INLINE __forceinline
-#elif defined(__GNUC__)
-    /*
-    I've had a bug report where GCC is emitting warnings about functions possibly not being inlineable. This warning happens when
-    the __attribute__((always_inline)) attribute is defined without an "inline" statement. I think therefore there must be some
-    case where "__inline__" is not always defined, thus the compiler emitting these warnings. When using -std=c89 or -ansi on the
-    command line, we cannot use the "inline" keyword and instead need to use "__inline__". In an attempt to work around this issue
-    I am using "__inline__" only when we're compiling in strict ANSI mode.
-    */
-    #if defined(__STRICT_ANSI__)
-        #define DRMP3_GNUC_INLINE_HINT __inline__
-    #else
-        #define DRMP3_GNUC_INLINE_HINT inline
-    #endif
-
-    #if (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 2)) || defined(__clang__)
-        #define DRMP3_INLINE DRMP3_GNUC_INLINE_HINT __attribute__((always_inline))
-    #else
-        #define DRMP3_INLINE DRMP3_GNUC_INLINE_HINT
-    #endif
-#elif defined(__WATCOMC__)
-    #define DRMP3_INLINE __inline
-#else
-    #define DRMP3_INLINE
-#endif
-/* End Inline */
-
-
-DRMP3_API void drmp3_version(drmp3_uint32* pMajor, drmp3_uint32* pMinor, drmp3_uint32* pRevision);
-DRMP3_API const char* drmp3_version_string(void);
-
-
-/* Allocation Callbacks */
-typedef struct
-{
-    void* pUserData;
-    void* (* onMalloc)(size_t sz, void* pUserData);
-    void* (* onRealloc)(void* p, size_t sz, void* pUserData);
-    void  (* onFree)(void* p, void* pUserData);
-} drmp3_allocation_callbacks;
-/* End Allocation Callbacks */
-
-
-/*
-Low Level Push API
-==================
-*/
-typedef struct
-{
-    int frame_bytes, channels, hz, layer, bitrate_kbps;
-} drmp3dec_frame_info;
-
-typedef struct
-{
-    float mdct_overlap[2][9*32], qmf_state[15*2*32];
-    int reserv, free_format_bytes;
-    drmp3_uint8 header[4], reserv_buf[511];
-} drmp3dec;
-
-/* Initializes a low level decoder. */
-DRMP3_API void drmp3dec_init(drmp3dec *dec);
-
-/* Reads a frame from a low level decoder. */
-DRMP3_API int drmp3dec_decode_frame(drmp3dec *dec, const drmp3_uint8 *mp3, int mp3_bytes, void *pcm, drmp3dec_frame_info *info);
-
-/* Helper for converting between f32 and s16. */
-DRMP3_API void drmp3dec_f32_to_s16(const float *in, drmp3_int16 *out, size_t num_samples);
-
-
-
-/*
-Main API (Pull API)
-===================
-*/
-typedef enum
-{
-    drmp3_seek_origin_start,
-    drmp3_seek_origin_current
-} drmp3_seek_origin;
-
-typedef struct
-{
-    drmp3_uint64 seekPosInBytes;        /* Points to the first byte of an MP3 frame. */
-    drmp3_uint64 pcmFrameIndex;         /* The index of the PCM frame this seek point targets. */
-    drmp3_uint16 mp3FramesToDiscard;    /* The number of whole MP3 frames to be discarded before pcmFramesToDiscard. */
-    drmp3_uint16 pcmFramesToDiscard;    /* The number of leading samples to read and discard. These are discarded after mp3FramesToDiscard. */
-} drmp3_seek_point;
-
-/*
-Callback for when data is read. Return value is the number of bytes actually read.
-
-pUserData   [in]  The user data that was passed to drmp3_init(), drmp3_open() and family.
-pBufferOut  [out] The output buffer.
-bytesToRead [in]  The number of bytes to read.
-
-Returns the number of bytes actually read.
-
-A return value of less than bytesToRead indicates the end of the stream. Do _not_ return from this callback until
-either the entire bytesToRead is filled or you have reached the end of the stream.
-*/
-typedef size_t (* drmp3_read_proc)(void* pUserData, void* pBufferOut, size_t bytesToRead);
-
-/*
-Callback for when data needs to be seeked.
-
-pUserData [in] The user data that was passed to drmp3_init(), drmp3_open() and family.
-offset    [in] The number of bytes to move, relative to the origin. Will never be negative.
-origin    [in] The origin of the seek - the current position or the start of the stream.
-
-Returns whether or not the seek was successful.
-
-Whether or not it is relative to the beginning or current position is determined by the "origin" parameter which
-will be either drmp3_seek_origin_start or drmp3_seek_origin_current.
-*/
-typedef drmp3_bool32 (* drmp3_seek_proc)(void* pUserData, int offset, drmp3_seek_origin origin);
-
-typedef struct
-{
-    drmp3_uint32 channels;
-    drmp3_uint32 sampleRate;
-} drmp3_config;
-
-typedef struct
-{
-    drmp3dec decoder;
-    drmp3_uint32 channels;
-    drmp3_uint32 sampleRate;
-    drmp3_read_proc onRead;
-    drmp3_seek_proc onSeek;
-    void* pUserData;
-    drmp3_allocation_callbacks allocationCallbacks;
-    drmp3_uint32 mp3FrameChannels;      /* The number of channels in the currently loaded MP3 frame. Internal use only. */
-    drmp3_uint32 mp3FrameSampleRate;    /* The sample rate of the currently loaded MP3 frame. Internal use only. */
-    drmp3_uint32 pcmFramesConsumedInMP3Frame;
-    drmp3_uint32 pcmFramesRemainingInMP3Frame;
-    drmp3_uint8 pcmFrames[sizeof(float)*DRMP3_MAX_SAMPLES_PER_FRAME];  /* <-- Multipled by sizeof(float) to ensure there's enough room for DR_MP3_FLOAT_OUTPUT. */
-    drmp3_uint64 currentPCMFrame;       /* The current PCM frame, globally, based on the output sample rate. Mainly used for seeking. */
-    drmp3_uint64 streamCursor;          /* The current byte the decoder is sitting on in the raw stream. */
-    drmp3_seek_point* pSeekPoints;      /* NULL by default. Set with drmp3_bind_seek_table(). Memory is owned by the client. dr_mp3 will never attempt to free this pointer. */
-    drmp3_uint32 seekPointCount;        /* The number of items in pSeekPoints. When set to 0 assumes to no seek table. Defaults to zero. */
-    size_t dataSize;
-    size_t dataCapacity;
-    size_t dataConsumed;
-    drmp3_uint8* pData;
-    drmp3_bool32 atEnd : 1;
-    struct
-    {
-        const drmp3_uint8* pData;
-        size_t dataSize;
-        size_t currentReadPos;
-    } memory;   /* Only used for decoders that were opened against a block of memory. */
-} drmp3;
-
-/*
-Initializes an MP3 decoder.
-
-onRead    [in]           The function to call when data needs to be read from the client.
-onSeek    [in]           The function to call when the read position of the client data needs to move.
-pUserData [in, optional] A pointer to application defined data that will be passed to onRead and onSeek.
-
-Returns true if successful; false otherwise.
-
-Close the loader with drmp3_uninit().
-
-See also: drmp3_init_file(), drmp3_init_memory(), drmp3_uninit()
-*/
-DRMP3_API drmp3_bool32 drmp3_init(drmp3* pMP3, drmp3_read_proc onRead, drmp3_seek_proc onSeek, void* pUserData, const drmp3_allocation_callbacks* pAllocationCallbacks);
-
-/*
-Initializes an MP3 decoder from a block of memory.
-
-This does not create a copy of the data. It is up to the application to ensure the buffer remains valid for
-the lifetime of the drmp3 object.
-
-The buffer should contain the contents of the entire MP3 file.
-*/
-DRMP3_API drmp3_bool32 drmp3_init_memory(drmp3* pMP3, const void* pData, size_t dataSize, const drmp3_allocation_callbacks* pAllocationCallbacks);
-
-#ifndef DR_MP3_NO_STDIO
-/*
-Initializes an MP3 decoder from a file.
-
-This holds the internal FILE object until drmp3_uninit() is called. Keep this in mind if you're caching drmp3
-objects because the operating system may restrict the number of file handles an application can have open at
-any given time.
-*/
-DRMP3_API drmp3_bool32 drmp3_init_file(drmp3* pMP3, const char* pFilePath, const drmp3_allocation_callbacks* pAllocationCallbacks);
-DRMP3_API drmp3_bool32 drmp3_init_file_w(drmp3* pMP3, const wchar_t* pFilePath, const drmp3_allocation_callbacks* pAllocationCallbacks);
-#endif
-
-/*
-Uninitializes an MP3 decoder.
-*/
-DRMP3_API void drmp3_uninit(drmp3* pMP3);
-
-/*
-Reads PCM frames as interleaved 32-bit IEEE floating point PCM.
-
-Note that framesToRead specifies the number of PCM frames to read, _not_ the number of MP3 frames.
-*/
-DRMP3_API drmp3_uint64 drmp3_read_pcm_frames_f32(drmp3* pMP3, drmp3_uint64 framesToRead, float* pBufferOut);
-
-/*
-Reads PCM frames as interleaved signed 16-bit integer PCM.
-
-Note that framesToRead specifies the number of PCM frames to read, _not_ the number of MP3 frames.
-*/
-DRMP3_API drmp3_uint64 drmp3_read_pcm_frames_s16(drmp3* pMP3, drmp3_uint64 framesToRead, drmp3_int16* pBufferOut);
-
-/*
-Seeks to a specific frame.
-
-Note that this is _not_ an MP3 frame, but rather a PCM frame.
-*/
-DRMP3_API drmp3_bool32 drmp3_seek_to_pcm_frame(drmp3* pMP3, drmp3_uint64 frameIndex);
-
-/*
-Calculates the total number of PCM frames in the MP3 stream. Cannot be used for infinite streams such as internet
-radio. Runs in linear time. Returns 0 on error.
-*/
-DRMP3_API drmp3_uint64 drmp3_get_pcm_frame_count(drmp3* pMP3);
-
-/*
-Calculates the total number of MP3 frames in the MP3 stream. Cannot be used for infinite streams such as internet
-radio. Runs in linear time. Returns 0 on error.
-*/
-DRMP3_API drmp3_uint64 drmp3_get_mp3_frame_count(drmp3* pMP3);
-
-/*
-Calculates the total number of MP3 and PCM frames in the MP3 stream. Cannot be used for infinite streams such as internet
-radio. Runs in linear time. Returns 0 on error.
-
-This is equivalent to calling drmp3_get_mp3_frame_count() and drmp3_get_pcm_frame_count() except that it's more efficient.
-*/
-DRMP3_API drmp3_bool32 drmp3_get_mp3_and_pcm_frame_count(drmp3* pMP3, drmp3_uint64* pMP3FrameCount, drmp3_uint64* pPCMFrameCount);
-
-/*
-Calculates the seekpoints based on PCM frames. This is slow.
-
-pSeekpoint count is a pointer to a uint32 containing the seekpoint count. On input it contains the desired count.
-On output it contains the actual count. The reason for this design is that the client may request too many
-seekpoints, in which case dr_mp3 will return a corrected count.
-
-Note that seektable seeking is not quite sample exact when the MP3 stream contains inconsistent sample rates.
-*/
-DRMP3_API drmp3_bool32 drmp3_calculate_seek_points(drmp3* pMP3, drmp3_uint32* pSeekPointCount, drmp3_seek_point* pSeekPoints);
-
-/*
-Binds a seek table to the decoder.
-
-This does _not_ make a copy of pSeekPoints - it only references it. It is up to the application to ensure this
-remains valid while it is bound to the decoder.
-
-Use drmp3_calculate_seek_points() to calculate the seek points.
-*/
-DRMP3_API drmp3_bool32 drmp3_bind_seek_table(drmp3* pMP3, drmp3_uint32 seekPointCount, drmp3_seek_point* pSeekPoints);
-
-
-/*
-Opens an decodes an entire MP3 stream as a single operation.
-
-On output pConfig will receive the channel count and sample rate of the stream.
-
-Free the returned pointer with drmp3_free().
-*/
-DRMP3_API float* drmp3_open_and_read_pcm_frames_f32(drmp3_read_proc onRead, drmp3_seek_proc onSeek, void* pUserData, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks);
-DRMP3_API drmp3_int16* drmp3_open_and_read_pcm_frames_s16(drmp3_read_proc onRead, drmp3_seek_proc onSeek, void* pUserData, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks);
-
-DRMP3_API float* drmp3_open_memory_and_read_pcm_frames_f32(const void* pData, size_t dataSize, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks);
-DRMP3_API drmp3_int16* drmp3_open_memory_and_read_pcm_frames_s16(const void* pData, size_t dataSize, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks);
-
-#ifndef DR_MP3_NO_STDIO
-DRMP3_API float* drmp3_open_file_and_read_pcm_frames_f32(const char* filePath, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks);
-DRMP3_API drmp3_int16* drmp3_open_file_and_read_pcm_frames_s16(const char* filePath, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks);
-#endif
-
-/*
-Allocates a block of memory on the heap.
-*/
-DRMP3_API void* drmp3_malloc(size_t sz, const drmp3_allocation_callbacks* pAllocationCallbacks);
-
-/*
-Frees any memory that was allocated by a public drmp3 API.
-*/
-DRMP3_API void drmp3_free(void* p, const drmp3_allocation_callbacks* pAllocationCallbacks);
-
-#ifdef __cplusplus
-}
-#endif
-#endif  /* dr_mp3_h */
-
-
-/************************************************************************************************************************************************************
- ************************************************************************************************************************************************************
-
- IMPLEMENTATION
-
- ************************************************************************************************************************************************************
- ************************************************************************************************************************************************************/
-#if defined(DR_MP3_IMPLEMENTATION) || defined(DRMP3_IMPLEMENTATION)
-#ifndef dr_mp3_c
-#define dr_mp3_c
-
-#include <stdlib.h>
-#include <string.h>
-#include <limits.h> /* For INT_MAX */
-
-DRMP3_API void drmp3_version(drmp3_uint32* pMajor, drmp3_uint32* pMinor, drmp3_uint32* pRevision)
-{
-    if (pMajor) {
-        *pMajor = DRMP3_VERSION_MAJOR;
-    }
-
-    if (pMinor) {
-        *pMinor = DRMP3_VERSION_MINOR;
-    }
-
-    if (pRevision) {
-        *pRevision = DRMP3_VERSION_REVISION;
-    }
-}
-
-DRMP3_API const char* drmp3_version_string(void)
-{
-    return DRMP3_VERSION_STRING;
-}
-
-/* Disable SIMD when compiling with TCC for now. */
-#if defined(__TINYC__)
-#define DR_MP3_NO_SIMD
-#endif
-
-#define DRMP3_OFFSET_PTR(p, offset) ((void*)((drmp3_uint8*)(p) + (offset)))
-
-#define DRMP3_MAX_FREE_FORMAT_FRAME_SIZE  2304    /* more than ISO spec's */
-#ifndef DRMP3_MAX_FRAME_SYNC_MATCHES
-#define DRMP3_MAX_FRAME_SYNC_MATCHES      10
-#endif
-
-#define DRMP3_MAX_L3_FRAME_PAYLOAD_BYTES  DRMP3_MAX_FREE_FORMAT_FRAME_SIZE /* MUST be >= 320000/8/32000*1152 = 1440 */
-
-#define DRMP3_MAX_BITRESERVOIR_BYTES      511
-#define DRMP3_SHORT_BLOCK_TYPE            2
-#define DRMP3_STOP_BLOCK_TYPE             3
-#define DRMP3_MODE_MONO                   3
-#define DRMP3_MODE_JOINT_STEREO           1
-#define DRMP3_HDR_SIZE                    4
-#define DRMP3_HDR_IS_MONO(h)              (((h[3]) & 0xC0) == 0xC0)
-#define DRMP3_HDR_IS_MS_STEREO(h)         (((h[3]) & 0xE0) == 0x60)
-#define DRMP3_HDR_IS_FREE_FORMAT(h)       (((h[2]) & 0xF0) == 0)
-#define DRMP3_HDR_IS_CRC(h)               (!((h[1]) & 1))
-#define DRMP3_HDR_TEST_PADDING(h)         ((h[2]) & 0x2)
-#define DRMP3_HDR_TEST_MPEG1(h)           ((h[1]) & 0x8)
-#define DRMP3_HDR_TEST_NOT_MPEG25(h)      ((h[1]) & 0x10)
-#define DRMP3_HDR_TEST_I_STEREO(h)        ((h[3]) & 0x10)
-#define DRMP3_HDR_TEST_MS_STEREO(h)       ((h[3]) & 0x20)
-#define DRMP3_HDR_GET_STEREO_MODE(h)      (((h[3]) >> 6) & 3)
-#define DRMP3_HDR_GET_STEREO_MODE_EXT(h)  (((h[3]) >> 4) & 3)
-#define DRMP3_HDR_GET_LAYER(h)            (((h[1]) >> 1) & 3)
-#define DRMP3_HDR_GET_BITRATE(h)          ((h[2]) >> 4)
-#define DRMP3_HDR_GET_SAMPLE_RATE(h)      (((h[2]) >> 2) & 3)
-#define DRMP3_HDR_GET_MY_SAMPLE_RATE(h)   (DRMP3_HDR_GET_SAMPLE_RATE(h) + (((h[1] >> 3) & 1) + ((h[1] >> 4) & 1))*3)
-#define DRMP3_HDR_IS_FRAME_576(h)         ((h[1] & 14) == 2)
-#define DRMP3_HDR_IS_LAYER_1(h)           ((h[1] & 6) == 6)
-
-#define DRMP3_BITS_DEQUANTIZER_OUT        -1
-#define DRMP3_MAX_SCF                     (255 + DRMP3_BITS_DEQUANTIZER_OUT*4 - 210)
-#define DRMP3_MAX_SCFI                    ((DRMP3_MAX_SCF + 3) & ~3)
-
-#define DRMP3_MIN(a, b)           ((a) > (b) ? (b) : (a))
-#define DRMP3_MAX(a, b)           ((a) < (b) ? (b) : (a))
-
-#if !defined(DR_MP3_NO_SIMD)
-
-#if !defined(DR_MP3_ONLY_SIMD) && (defined(_M_X64) || defined(__x86_64__) || defined(__aarch64__) || defined(_M_ARM64))
-/* x64 always have SSE2, arm64 always have neon, no need for generic code */
-#define DR_MP3_ONLY_SIMD
-#endif
-
-#if ((defined(_MSC_VER) && _MSC_VER >= 1400) && defined(_M_X64)) || ((defined(__i386) || defined(_M_IX86) || defined(__i386__) || defined(__x86_64__)) && ((defined(_M_IX86_FP) && _M_IX86_FP == 2) || defined(__SSE2__)))
-#if defined(_MSC_VER)
-#include <intrin.h>
-#endif
-#include <emmintrin.h>
-#define DRMP3_HAVE_SSE 1
-#define DRMP3_HAVE_SIMD 1
-#define DRMP3_VSTORE _mm_storeu_ps
-#define DRMP3_VLD _mm_loadu_ps
-#define DRMP3_VSET _mm_set1_ps
-#define DRMP3_VADD _mm_add_ps
-#define DRMP3_VSUB _mm_sub_ps
-#define DRMP3_VMUL _mm_mul_ps
-#define DRMP3_VMAC(a, x, y) _mm_add_ps(a, _mm_mul_ps(x, y))
-#define DRMP3_VMSB(a, x, y) _mm_sub_ps(a, _mm_mul_ps(x, y))
-#define DRMP3_VMUL_S(x, s)  _mm_mul_ps(x, _mm_set1_ps(s))
-#define DRMP3_VREV(x) _mm_shuffle_ps(x, x, _MM_SHUFFLE(0, 1, 2, 3))
-typedef __m128 drmp3_f4;
-#if defined(_MSC_VER) || defined(DR_MP3_ONLY_SIMD)
-#define drmp3_cpuid __cpuid
-#else
-static __inline__ __attribute__((always_inline)) void drmp3_cpuid(int CPUInfo[], const int InfoType)
-{
-#if defined(__PIC__)
-    __asm__ __volatile__(
-#if defined(__x86_64__)
-        "push %%rbx\n"
-        "cpuid\n"
-        "xchgl %%ebx, %1\n"
-        "pop  %%rbx\n"
-#else
-        "xchgl %%ebx, %1\n"
-        "cpuid\n"
-        "xchgl %%ebx, %1\n"
-#endif
-        : "=a" (CPUInfo[0]), "=r" (CPUInfo[1]), "=c" (CPUInfo[2]), "=d" (CPUInfo[3])
-        : "a" (InfoType));
-#else
-    __asm__ __volatile__(
-        "cpuid"
-        : "=a" (CPUInfo[0]), "=b" (CPUInfo[1]), "=c" (CPUInfo[2]), "=d" (CPUInfo[3])
-        : "a" (InfoType));
-#endif
-}
-#endif
-static int drmp3_have_simd(void)
-{
-#ifdef DR_MP3_ONLY_SIMD
-    return 1;
-#else
-    static int g_have_simd;
-    int CPUInfo[4];
-#ifdef MINIMP3_TEST
-    static int g_counter;
-    if (g_counter++ > 100)
-        return 0;
-#endif
-    if (g_have_simd)
-        goto end;
-    drmp3_cpuid(CPUInfo, 0);
-    if (CPUInfo[0] > 0)
-    {
-        drmp3_cpuid(CPUInfo, 1);
-        g_have_simd = (CPUInfo[3] & (1 << 26)) + 1; /* SSE2 */
-        return g_have_simd - 1;
-    }
-
-end:
-    return g_have_simd - 1;
-#endif
-}
-#elif defined(__ARM_NEON) || defined(__aarch64__) || defined(_M_ARM64)
-#include <arm_neon.h>
-#define DRMP3_HAVE_SSE 0
-#define DRMP3_HAVE_SIMD 1
-#define DRMP3_VSTORE vst1q_f32
-#define DRMP3_VLD vld1q_f32
-#define DRMP3_VSET vmovq_n_f32
-#define DRMP3_VADD vaddq_f32
-#define DRMP3_VSUB vsubq_f32
-#define DRMP3_VMUL vmulq_f32
-#define DRMP3_VMAC(a, x, y) vmlaq_f32(a, x, y)
-#define DRMP3_VMSB(a, x, y) vmlsq_f32(a, x, y)
-#define DRMP3_VMUL_S(x, s)  vmulq_f32(x, vmovq_n_f32(s))
-#define DRMP3_VREV(x) vcombine_f32(vget_high_f32(vrev64q_f32(x)), vget_low_f32(vrev64q_f32(x)))
-typedef float32x4_t drmp3_f4;
-static int drmp3_have_simd(void)
-{   /* TODO: detect neon for !DR_MP3_ONLY_SIMD */
-    return 1;
-}
-#else
-#define DRMP3_HAVE_SSE 0
-#define DRMP3_HAVE_SIMD 0
-#ifdef DR_MP3_ONLY_SIMD
-#error DR_MP3_ONLY_SIMD used, but SSE/NEON not enabled
-#endif
-#endif
-
-#else
-
-#define DRMP3_HAVE_SIMD 0
-
-#endif
-
-#if defined(__ARM_ARCH) && (__ARM_ARCH >= 6) && !defined(__aarch64__) && !defined(_M_ARM64) && !defined(__ARM_ARCH_6M__)
-#define DRMP3_HAVE_ARMV6 1
-static __inline__ __attribute__((always_inline)) drmp3_int32 drmp3_clip_int16_arm(drmp3_int32 a)
-{
-    drmp3_int32 x = 0;
-    __asm__ ("ssat %0, #16, %1" : "=r"(x) : "r"(a));
-    return x;
-}
-#else
-#define DRMP3_HAVE_ARMV6 0
-#endif
-
-
-/* Standard library stuff. */
-#ifndef DRMP3_ASSERT
-#include <assert.h>
-#define DRMP3_ASSERT(expression) assert(expression)
-#endif
-#ifndef DRMP3_COPY_MEMORY
-#define DRMP3_COPY_MEMORY(dst, src, sz) memcpy((dst), (src), (sz))
-#endif
-#ifndef DRMP3_MOVE_MEMORY
-#define DRMP3_MOVE_MEMORY(dst, src, sz) memmove((dst), (src), (sz))
-#endif
-#ifndef DRMP3_ZERO_MEMORY
-#define DRMP3_ZERO_MEMORY(p, sz) memset((p), 0, (sz))
-#endif
-#define DRMP3_ZERO_OBJECT(p) DRMP3_ZERO_MEMORY((p), sizeof(*(p)))
-#ifndef DRMP3_MALLOC
-#define DRMP3_MALLOC(sz) malloc((sz))
-#endif
-#ifndef DRMP3_REALLOC
-#define DRMP3_REALLOC(p, sz) realloc((p), (sz))
-#endif
-#ifndef DRMP3_FREE
-#define DRMP3_FREE(p) free((p))
-#endif
-
-typedef struct
-{
-    const drmp3_uint8 *buf;
-    int pos, limit;
-} drmp3_bs;
-
-typedef struct
-{
-    float scf[3*64];
-    drmp3_uint8 total_bands, stereo_bands, bitalloc[64], scfcod[64];
-} drmp3_L12_scale_info;
-
-typedef struct
-{
-    drmp3_uint8 tab_offset, code_tab_width, band_count;
-} drmp3_L12_subband_alloc;
-
-typedef struct
-{
-    const drmp3_uint8 *sfbtab;
-    drmp3_uint16 part_23_length, big_values, scalefac_compress;
-    drmp3_uint8 global_gain, block_type, mixed_block_flag, n_long_sfb, n_short_sfb;
-    drmp3_uint8 table_select[3], region_count[3], subblock_gain[3];
-    drmp3_uint8 preflag, scalefac_scale, count1_table, scfsi;
-} drmp3_L3_gr_info;
-
-typedef struct
-{
-    drmp3_bs bs;
-    drmp3_uint8 maindata[DRMP3_MAX_BITRESERVOIR_BYTES + DRMP3_MAX_L3_FRAME_PAYLOAD_BYTES];
-    drmp3_L3_gr_info gr_info[4];
-    float grbuf[2][576], scf[40], syn[18 + 15][2*32];
-    drmp3_uint8 ist_pos[2][39];
-} drmp3dec_scratch;
-
-static void drmp3_bs_init(drmp3_bs *bs, const drmp3_uint8 *data, int bytes)
-{
-    bs->buf   = data;
-    bs->pos   = 0;
-    bs->limit = bytes*8;
-}
-
-static drmp3_uint32 drmp3_bs_get_bits(drmp3_bs *bs, int n)
-{
-    drmp3_uint32 next, cache = 0, s = bs->pos & 7;
-    int shl = n + s;
-    const drmp3_uint8 *p = bs->buf + (bs->pos >> 3);
-    if ((bs->pos += n) > bs->limit)
-        return 0;
-    next = *p++ & (255 >> s);
-    while ((shl -= 8) > 0)
-    {
-        cache |= next << shl;
-        next = *p++;
-    }
-    return cache | (next >> -shl);
-}
-
-static int drmp3_hdr_valid(const drmp3_uint8 *h)
-{
-    return h[0] == 0xff &&
-        ((h[1] & 0xF0) == 0xf0 || (h[1] & 0xFE) == 0xe2) &&
-        (DRMP3_HDR_GET_LAYER(h) != 0) &&
-        (DRMP3_HDR_GET_BITRATE(h) != 15) &&
-        (DRMP3_HDR_GET_SAMPLE_RATE(h) != 3);
-}
-
-static int drmp3_hdr_compare(const drmp3_uint8 *h1, const drmp3_uint8 *h2)
-{
-    return drmp3_hdr_valid(h2) &&
-        ((h1[1] ^ h2[1]) & 0xFE) == 0 &&
-        ((h1[2] ^ h2[2]) & 0x0C) == 0 &&
-        !(DRMP3_HDR_IS_FREE_FORMAT(h1) ^ DRMP3_HDR_IS_FREE_FORMAT(h2));
-}
-
-static unsigned drmp3_hdr_bitrate_kbps(const drmp3_uint8 *h)
-{
-    static const drmp3_uint8 halfrate[2][3][15] = {
-        { { 0,4,8,12,16,20,24,28,32,40,48,56,64,72,80 }, { 0,4,8,12,16,20,24,28,32,40,48,56,64,72,80 }, { 0,16,24,28,32,40,48,56,64,72,80,88,96,112,128 } },
-        { { 0,16,20,24,28,32,40,48,56,64,80,96,112,128,160 }, { 0,16,24,28,32,40,48,56,64,80,96,112,128,160,192 }, { 0,16,32,48,64,80,96,112,128,144,160,176,192,208,224 } },
-    };
-    return 2*halfrate[!!DRMP3_HDR_TEST_MPEG1(h)][DRMP3_HDR_GET_LAYER(h) - 1][DRMP3_HDR_GET_BITRATE(h)];
-}
-
-static unsigned drmp3_hdr_sample_rate_hz(const drmp3_uint8 *h)
-{
-    static const unsigned g_hz[3] = { 44100, 48000, 32000 };
-    return g_hz[DRMP3_HDR_GET_SAMPLE_RATE(h)] >> (int)!DRMP3_HDR_TEST_MPEG1(h) >> (int)!DRMP3_HDR_TEST_NOT_MPEG25(h);
-}
-
-static unsigned drmp3_hdr_frame_samples(const drmp3_uint8 *h)
-{
-    return DRMP3_HDR_IS_LAYER_1(h) ? 384 : (1152 >> (int)DRMP3_HDR_IS_FRAME_576(h));
-}
-
-static int drmp3_hdr_frame_bytes(const drmp3_uint8 *h, int free_format_size)
-{
-    int frame_bytes = drmp3_hdr_frame_samples(h)*drmp3_hdr_bitrate_kbps(h)*125/drmp3_hdr_sample_rate_hz(h);
-    if (DRMP3_HDR_IS_LAYER_1(h))
-    {
-        frame_bytes &= ~3; /* slot align */
-    }
-    return frame_bytes ? frame_bytes : free_format_size;
-}
-
-static int drmp3_hdr_padding(const drmp3_uint8 *h)
-{
-    return DRMP3_HDR_TEST_PADDING(h) ? (DRMP3_HDR_IS_LAYER_1(h) ? 4 : 1) : 0;
-}
-
-#ifndef DR_MP3_ONLY_MP3
-static const drmp3_L12_subband_alloc *drmp3_L12_subband_alloc_table(const drmp3_uint8 *hdr, drmp3_L12_scale_info *sci)
-{
-    const drmp3_L12_subband_alloc *alloc;
-    int mode = DRMP3_HDR_GET_STEREO_MODE(hdr);
-    int nbands, stereo_bands = (mode == DRMP3_MODE_MONO) ? 0 : (mode == DRMP3_MODE_JOINT_STEREO) ? (DRMP3_HDR_GET_STEREO_MODE_EXT(hdr) << 2) + 4 : 32;
-
-    if (DRMP3_HDR_IS_LAYER_1(hdr))
-    {
-        static const drmp3_L12_subband_alloc g_alloc_L1[] = { { 76, 4, 32 } };
-        alloc = g_alloc_L1;
-        nbands = 32;
-    } else if (!DRMP3_HDR_TEST_MPEG1(hdr))
-    {
-        static const drmp3_L12_subband_alloc g_alloc_L2M2[] = { { 60, 4, 4 }, { 44, 3, 7 }, { 44, 2, 19 } };
-        alloc = g_alloc_L2M2;
-        nbands = 30;
-    } else
-    {
-        static const drmp3_L12_subband_alloc g_alloc_L2M1[] = { { 0, 4, 3 }, { 16, 4, 8 }, { 32, 3, 12 }, { 40, 2, 7 } };
-        int sample_rate_idx = DRMP3_HDR_GET_SAMPLE_RATE(hdr);
-        unsigned kbps = drmp3_hdr_bitrate_kbps(hdr) >> (int)(mode != DRMP3_MODE_MONO);
-        if (!kbps) /* free-format */
-        {
-            kbps = 192;
-        }
-
-        alloc = g_alloc_L2M1;
-        nbands = 27;
-        if (kbps < 56)
-        {
-            static const drmp3_L12_subband_alloc g_alloc_L2M1_lowrate[] = { { 44, 4, 2 }, { 44, 3, 10 } };
-            alloc = g_alloc_L2M1_lowrate;
-            nbands = sample_rate_idx == 2 ? 12 : 8;
-        } else if (kbps >= 96 && sample_rate_idx != 1)
-        {
-            nbands = 30;
-        }
-    }
-
-    sci->total_bands = (drmp3_uint8)nbands;
-    sci->stereo_bands = (drmp3_uint8)DRMP3_MIN(stereo_bands, nbands);
-
-    return alloc;
-}
-
-static void drmp3_L12_read_scalefactors(drmp3_bs *bs, drmp3_uint8 *pba, drmp3_uint8 *scfcod, int bands, float *scf)
-{
-    static const float g_deq_L12[18*3] = {
-#define DRMP3_DQ(x) 9.53674316e-07f/x, 7.56931807e-07f/x, 6.00777173e-07f/x
-        DRMP3_DQ(3),DRMP3_DQ(7),DRMP3_DQ(15),DRMP3_DQ(31),DRMP3_DQ(63),DRMP3_DQ(127),DRMP3_DQ(255),DRMP3_DQ(511),DRMP3_DQ(1023),DRMP3_DQ(2047),DRMP3_DQ(4095),DRMP3_DQ(8191),DRMP3_DQ(16383),DRMP3_DQ(32767),DRMP3_DQ(65535),DRMP3_DQ(3),DRMP3_DQ(5),DRMP3_DQ(9)
-    };
-    int i, m;
-    for (i = 0; i < bands; i++)
-    {
-        float s = 0;
-        int ba = *pba++;
-        int mask = ba ? 4 + ((19 >> scfcod[i]) & 3) : 0;
-        for (m = 4; m; m >>= 1)
-        {
-            if (mask & m)
-            {
-                int b = drmp3_bs_get_bits(bs, 6);
-                s = g_deq_L12[ba*3 - 6 + b % 3]*(int)(1 << 21 >> b/3);
-            }
-            *scf++ = s;
-        }
-    }
-}
-
-static void drmp3_L12_read_scale_info(const drmp3_uint8 *hdr, drmp3_bs *bs, drmp3_L12_scale_info *sci)
-{
-    static const drmp3_uint8 g_bitalloc_code_tab[] = {
-        0,17, 3, 4, 5,6,7, 8,9,10,11,12,13,14,15,16,
-        0,17,18, 3,19,4,5, 6,7, 8, 9,10,11,12,13,16,
-        0,17,18, 3,19,4,5,16,
-        0,17,18,16,
-        0,17,18,19, 4,5,6, 7,8, 9,10,11,12,13,14,15,
-        0,17,18, 3,19,4,5, 6,7, 8, 9,10,11,12,13,14,
-        0, 2, 3, 4, 5,6,7, 8,9,10,11,12,13,14,15,16
-    };
-    const drmp3_L12_subband_alloc *subband_alloc = drmp3_L12_subband_alloc_table(hdr, sci);
-
-    int i, k = 0, ba_bits = 0;
-    const drmp3_uint8 *ba_code_tab = g_bitalloc_code_tab;
-
-    for (i = 0; i < sci->total_bands; i++)
-    {
-        drmp3_uint8 ba;
-        if (i == k)
-        {
-            k += subband_alloc->band_count;
-            ba_bits = subband_alloc->code_tab_width;
-            ba_code_tab = g_bitalloc_code_tab + subband_alloc->tab_offset;
-            subband_alloc++;
-        }
-        ba = ba_code_tab[drmp3_bs_get_bits(bs, ba_bits)];
-        sci->bitalloc[2*i] = ba;
-        if (i < sci->stereo_bands)
-        {
-            ba = ba_code_tab[drmp3_bs_get_bits(bs, ba_bits)];
-        }
-        sci->bitalloc[2*i + 1] = sci->stereo_bands ? ba : 0;
-    }
-
-    for (i = 0; i < 2*sci->total_bands; i++)
-    {
-        sci->scfcod[i] = (drmp3_uint8)(sci->bitalloc[i] ? DRMP3_HDR_IS_LAYER_1(hdr) ? 2 : drmp3_bs_get_bits(bs, 2) : 6);
-    }
-
-    drmp3_L12_read_scalefactors(bs, sci->bitalloc, sci->scfcod, sci->total_bands*2, sci->scf);
-
-    for (i = sci->stereo_bands; i < sci->total_bands; i++)
-    {
-        sci->bitalloc[2*i + 1] = 0;
-    }
-}
-
-static int drmp3_L12_dequantize_granule(float *grbuf, drmp3_bs *bs, drmp3_L12_scale_info *sci, int group_size)
-{
-    int i, j, k, choff = 576;
-    for (j = 0; j < 4; j++)
-    {
-        float *dst = grbuf + group_size*j;
-        for (i = 0; i < 2*sci->total_bands; i++)
-        {
-            int ba = sci->bitalloc[i];
-            if (ba != 0)
-            {
-                if (ba < 17)
-                {
-                    int half = (1 << (ba - 1)) - 1;
-                    for (k = 0; k < group_size; k++)
-                    {
-                        dst[k] = (float)((int)drmp3_bs_get_bits(bs, ba) - half);
-                    }
-                } else
-                {
-                    unsigned mod = (2 << (ba - 17)) + 1;    /* 3, 5, 9 */
-                    unsigned code = drmp3_bs_get_bits(bs, mod + 2 - (mod >> 3));  /* 5, 7, 10 */
-                    for (k = 0; k < group_size; k++, code /= mod)
-                    {
-                        dst[k] = (float)((int)(code % mod - mod/2));
-                    }
-                }
-            }
-            dst += choff;
-            choff = 18 - choff;
-        }
-    }
-    return group_size*4;
-}
-
-static void drmp3_L12_apply_scf_384(drmp3_L12_scale_info *sci, const float *scf, float *dst)
-{
-    int i, k;
-    DRMP3_COPY_MEMORY(dst + 576 + sci->stereo_bands*18, dst + sci->stereo_bands*18, (sci->total_bands - sci->stereo_bands)*18*sizeof(float));
-    for (i = 0; i < sci->total_bands; i++, dst += 18, scf += 6)
-    {
-        for (k = 0; k < 12; k++)
-        {
-            dst[k + 0]   *= scf[0];
-            dst[k + 576] *= scf[3];
-        }
-    }
-}
-#endif
-
-static int drmp3_L3_read_side_info(drmp3_bs *bs, drmp3_L3_gr_info *gr, const drmp3_uint8 *hdr)
-{
-    static const drmp3_uint8 g_scf_long[8][23] = {
-        { 6,6,6,6,6,6,8,10,12,14,16,20,24,28,32,38,46,52,60,68,58,54,0 },
-        { 12,12,12,12,12,12,16,20,24,28,32,40,48,56,64,76,90,2,2,2,2,2,0 },
-        { 6,6,6,6,6,6,8,10,12,14,16,20,24,28,32,38,46,52,60,68,58,54,0 },
-        { 6,6,6,6,6,6,8,10,12,14,16,18,22,26,32,38,46,54,62,70,76,36,0 },
-        { 6,6,6,6,6,6,8,10,12,14,16,20,24,28,32,38,46,52,60,68,58,54,0 },
-        { 4,4,4,4,4,4,6,6,8,8,10,12,16,20,24,28,34,42,50,54,76,158,0 },
-        { 4,4,4,4,4,4,6,6,6,8,10,12,16,18,22,28,34,40,46,54,54,192,0 },
-        { 4,4,4,4,4,4,6,6,8,10,12,16,20,24,30,38,46,56,68,84,102,26,0 }
-    };
-    static const drmp3_uint8 g_scf_short[8][40] = {
-        { 4,4,4,4,4,4,4,4,4,6,6,6,8,8,8,10,10,10,12,12,12,14,14,14,18,18,18,24,24,24,30,30,30,40,40,40,18,18,18,0 },
-        { 8,8,8,8,8,8,8,8,8,12,12,12,16,16,16,20,20,20,24,24,24,28,28,28,36,36,36,2,2,2,2,2,2,2,2,2,26,26,26,0 },
-        { 4,4,4,4,4,4,4,4,4,6,6,6,6,6,6,8,8,8,10,10,10,14,14,14,18,18,18,26,26,26,32,32,32,42,42,42,18,18,18,0 },
-        { 4,4,4,4,4,4,4,4,4,6,6,6,8,8,8,10,10,10,12,12,12,14,14,14,18,18,18,24,24,24,32,32,32,44,44,44,12,12,12,0 },
-        { 4,4,4,4,4,4,4,4,4,6,6,6,8,8,8,10,10,10,12,12,12,14,14,14,18,18,18,24,24,24,30,30,30,40,40,40,18,18,18,0 },
-        { 4,4,4,4,4,4,4,4,4,4,4,4,6,6,6,8,8,8,10,10,10,12,12,12,14,14,14,18,18,18,22,22,22,30,30,30,56,56,56,0 },
-        { 4,4,4,4,4,4,4,4,4,4,4,4,6,6,6,6,6,6,10,10,10,12,12,12,14,14,14,16,16,16,20,20,20,26,26,26,66,66,66,0 },
-        { 4,4,4,4,4,4,4,4,4,4,4,4,6,6,6,8,8,8,12,12,12,16,16,16,20,20,20,26,26,26,34,34,34,42,42,42,12,12,12,0 }
-    };
-    static const drmp3_uint8 g_scf_mixed[8][40] = {
-        { 6,6,6,6,6,6,6,6,6,8,8,8,10,10,10,12,12,12,14,14,14,18,18,18,24,24,24,30,30,30,40,40,40,18,18,18,0 },
-        { 12,12,12,4,4,4,8,8,8,12,12,12,16,16,16,20,20,20,24,24,24,28,28,28,36,36,36,2,2,2,2,2,2,2,2,2,26,26,26,0 },
-        { 6,6,6,6,6,6,6,6,6,6,6,6,8,8,8,10,10,10,14,14,14,18,18,18,26,26,26,32,32,32,42,42,42,18,18,18,0 },
-        { 6,6,6,6,6,6,6,6,6,8,8,8,10,10,10,12,12,12,14,14,14,18,18,18,24,24,24,32,32,32,44,44,44,12,12,12,0 },
-        { 6,6,6,6,6,6,6,6,6,8,8,8,10,10,10,12,12,12,14,14,14,18,18,18,24,24,24,30,30,30,40,40,40,18,18,18,0 },
-        { 4,4,4,4,4,4,6,6,4,4,4,6,6,6,8,8,8,10,10,10,12,12,12,14,14,14,18,18,18,22,22,22,30,30,30,56,56,56,0 },
-        { 4,4,4,4,4,4,6,6,4,4,4,6,6,6,6,6,6,10,10,10,12,12,12,14,14,14,16,16,16,20,20,20,26,26,26,66,66,66,0 },
-        { 4,4,4,4,4,4,6,6,4,4,4,6,6,6,8,8,8,12,12,12,16,16,16,20,20,20,26,26,26,34,34,34,42,42,42,12,12,12,0 }
-    };
-
-    unsigned tables, scfsi = 0;
-    int main_data_begin, part_23_sum = 0;
-    int gr_count = DRMP3_HDR_IS_MONO(hdr) ? 1 : 2;
-    int sr_idx = DRMP3_HDR_GET_MY_SAMPLE_RATE(hdr); sr_idx -= (sr_idx != 0);
-
-    if (DRMP3_HDR_TEST_MPEG1(hdr))
-    {
-        gr_count *= 2;
-        main_data_begin = drmp3_bs_get_bits(bs, 9);
-        scfsi = drmp3_bs_get_bits(bs, 7 + gr_count);
-    } else
-    {
-        main_data_begin = drmp3_bs_get_bits(bs, 8 + gr_count) >> gr_count;
-    }
-
-    do
-    {
-        if (DRMP3_HDR_IS_MONO(hdr))
-        {
-            scfsi <<= 4;
-        }
-        gr->part_23_length = (drmp3_uint16)drmp3_bs_get_bits(bs, 12);
-        part_23_sum += gr->part_23_length;
-        gr->big_values = (drmp3_uint16)drmp3_bs_get_bits(bs,  9);
-        if (gr->big_values > 288)
-        {
-            return -1;
-        }
-        gr->global_gain = (drmp3_uint8)drmp3_bs_get_bits(bs, 8);
-        gr->scalefac_compress = (drmp3_uint16)drmp3_bs_get_bits(bs, DRMP3_HDR_TEST_MPEG1(hdr) ? 4 : 9);
-        gr->sfbtab = g_scf_long[sr_idx];
-        gr->n_long_sfb  = 22;
-        gr->n_short_sfb = 0;
-        if (drmp3_bs_get_bits(bs, 1))
-        {
-            gr->block_type = (drmp3_uint8)drmp3_bs_get_bits(bs, 2);
-            if (!gr->block_type)
-            {
-                return -1;
-            }
-            gr->mixed_block_flag = (drmp3_uint8)drmp3_bs_get_bits(bs, 1);
-            gr->region_count[0] = 7;
-            gr->region_count[1] = 255;
-            if (gr->block_type == DRMP3_SHORT_BLOCK_TYPE)
-            {
-                scfsi &= 0x0F0F;
-                if (!gr->mixed_block_flag)
-                {
-                    gr->region_count[0] = 8;
-                    gr->sfbtab = g_scf_short[sr_idx];
-                    gr->n_long_sfb = 0;
-                    gr->n_short_sfb = 39;
-                } else
-                {
-                    gr->sfbtab = g_scf_mixed[sr_idx];
-                    gr->n_long_sfb = DRMP3_HDR_TEST_MPEG1(hdr) ? 8 : 6;
-                    gr->n_short_sfb = 30;
-                }
-            }
-            tables = drmp3_bs_get_bits(bs, 10);
-            tables <<= 5;
-            gr->subblock_gain[0] = (drmp3_uint8)drmp3_bs_get_bits(bs, 3);
-            gr->subblock_gain[1] = (drmp3_uint8)drmp3_bs_get_bits(bs, 3);
-            gr->subblock_gain[2] = (drmp3_uint8)drmp3_bs_get_bits(bs, 3);
-        } else
-        {
-            gr->block_type = 0;
-            gr->mixed_block_flag = 0;
-            tables = drmp3_bs_get_bits(bs, 15);
-            gr->region_count[0] = (drmp3_uint8)drmp3_bs_get_bits(bs, 4);
-            gr->region_count[1] = (drmp3_uint8)drmp3_bs_get_bits(bs, 3);
-            gr->region_count[2] = 255;
-        }
-        gr->table_select[0] = (drmp3_uint8)(tables >> 10);
-        gr->table_select[1] = (drmp3_uint8)((tables >> 5) & 31);
-        gr->table_select[2] = (drmp3_uint8)((tables) & 31);
-        gr->preflag = (drmp3_uint8)(DRMP3_HDR_TEST_MPEG1(hdr) ? drmp3_bs_get_bits(bs, 1) : (gr->scalefac_compress >= 500));
-        gr->scalefac_scale = (drmp3_uint8)drmp3_bs_get_bits(bs, 1);
-        gr->count1_table = (drmp3_uint8)drmp3_bs_get_bits(bs, 1);
-        gr->scfsi = (drmp3_uint8)((scfsi >> 12) & 15);
-        scfsi <<= 4;
-        gr++;
-    } while(--gr_count);
-
-    if (part_23_sum + bs->pos > bs->limit + main_data_begin*8)
-    {
-        return -1;
-    }
-
-    return main_data_begin;
-}
-
-static void drmp3_L3_read_scalefactors(drmp3_uint8 *scf, drmp3_uint8 *ist_pos, const drmp3_uint8 *scf_size, const drmp3_uint8 *scf_count, drmp3_bs *bitbuf, int scfsi)
-{
-    int i, k;
-    for (i = 0; i < 4 && scf_count[i]; i++, scfsi *= 2)
-    {
-        int cnt = scf_count[i];
-        if (scfsi & 8)
-        {
-            DRMP3_COPY_MEMORY(scf, ist_pos, cnt);
-        } else
-        {
-            int bits = scf_size[i];
-            if (!bits)
-            {
-                DRMP3_ZERO_MEMORY(scf, cnt);
-                DRMP3_ZERO_MEMORY(ist_pos, cnt);
-            } else
-            {
-                int max_scf = (scfsi < 0) ? (1 << bits) - 1 : -1;
-                for (k = 0; k < cnt; k++)
-                {
-                    int s = drmp3_bs_get_bits(bitbuf, bits);
-                    ist_pos[k] = (drmp3_uint8)(s == max_scf ? -1 : s);
-                    scf[k] = (drmp3_uint8)s;
-                }
-            }
-        }
-        ist_pos += cnt;
-        scf += cnt;
-    }
-    scf[0] = scf[1] = scf[2] = 0;
-}
-
-static float drmp3_L3_ldexp_q2(float y, int exp_q2)
-{
-    static const float g_expfrac[4] = { 9.31322575e-10f,7.83145814e-10f,6.58544508e-10f,5.53767716e-10f };
-    int e;
-    do
-    {
-        e = DRMP3_MIN(30*4, exp_q2);
-        y *= g_expfrac[e & 3]*(1 << 30 >> (e >> 2));
-    } while ((exp_q2 -= e) > 0);
-    return y;
-}
-
-static void drmp3_L3_decode_scalefactors(const drmp3_uint8 *hdr, drmp3_uint8 *ist_pos, drmp3_bs *bs, const drmp3_L3_gr_info *gr, float *scf, int ch)
-{
-    static const drmp3_uint8 g_scf_partitions[3][28] = {
-        { 6,5,5, 5,6,5,5,5,6,5, 7,3,11,10,0,0, 7, 7, 7,0, 6, 6,6,3, 8, 8,5,0 },
-        { 8,9,6,12,6,9,9,9,6,9,12,6,15,18,0,0, 6,15,12,0, 6,12,9,6, 6,18,9,0 },
-        { 9,9,6,12,9,9,9,9,9,9,12,6,18,18,0,0,12,12,12,0,12, 9,9,6,15,12,9,0 }
-    };
-    const drmp3_uint8 *scf_partition = g_scf_partitions[!!gr->n_short_sfb + !gr->n_long_sfb];
-    drmp3_uint8 scf_size[4], iscf[40];
-    int i, scf_shift = gr->scalefac_scale + 1, gain_exp, scfsi = gr->scfsi;
-    float gain;
-
-    if (DRMP3_HDR_TEST_MPEG1(hdr))
-    {
-        static const drmp3_uint8 g_scfc_decode[16] = { 0,1,2,3, 12,5,6,7, 9,10,11,13, 14,15,18,19 };
-        int part = g_scfc_decode[gr->scalefac_compress];
-        scf_size[1] = scf_size[0] = (drmp3_uint8)(part >> 2);
-        scf_size[3] = scf_size[2] = (drmp3_uint8)(part & 3);
-    } else
-    {
-        static const drmp3_uint8 g_mod[6*4] = { 5,5,4,4,5,5,4,1,4,3,1,1,5,6,6,1,4,4,4,1,4,3,1,1 };
-        int k, modprod, sfc, ist = DRMP3_HDR_TEST_I_STEREO(hdr) && ch;
-        sfc = gr->scalefac_compress >> ist;
-        for (k = ist*3*4; sfc >= 0; sfc -= modprod, k += 4)
-        {
-            for (modprod = 1, i = 3; i >= 0; i--)
-            {
-                scf_size[i] = (drmp3_uint8)(sfc / modprod % g_mod[k + i]);
-                modprod *= g_mod[k + i];
-            }
-        }
-        scf_partition += k;
-        scfsi = -16;
-    }
-    drmp3_L3_read_scalefactors(iscf, ist_pos, scf_size, scf_partition, bs, scfsi);
-
-    if (gr->n_short_sfb)
-    {
-        int sh = 3 - scf_shift;
-        for (i = 0; i < gr->n_short_sfb; i += 3)
-        {
-            iscf[gr->n_long_sfb + i + 0] = (drmp3_uint8)(iscf[gr->n_long_sfb + i + 0] + (gr->subblock_gain[0] << sh));
-            iscf[gr->n_long_sfb + i + 1] = (drmp3_uint8)(iscf[gr->n_long_sfb + i + 1] + (gr->subblock_gain[1] << sh));
-            iscf[gr->n_long_sfb + i + 2] = (drmp3_uint8)(iscf[gr->n_long_sfb + i + 2] + (gr->subblock_gain[2] << sh));
-        }
-    } else if (gr->preflag)
-    {
-        static const drmp3_uint8 g_preamp[10] = { 1,1,1,1,2,2,3,3,3,2 };
-        for (i = 0; i < 10; i++)
-        {
-            iscf[11 + i] = (drmp3_uint8)(iscf[11 + i] + g_preamp[i]);
-        }
-    }
-
-    gain_exp = gr->global_gain + DRMP3_BITS_DEQUANTIZER_OUT*4 - 210 - (DRMP3_HDR_IS_MS_STEREO(hdr) ? 2 : 0);
-    gain = drmp3_L3_ldexp_q2(1 << (DRMP3_MAX_SCFI/4),  DRMP3_MAX_SCFI - gain_exp);
-    for (i = 0; i < (int)(gr->n_long_sfb + gr->n_short_sfb); i++)
-    {
-        scf[i] = drmp3_L3_ldexp_q2(gain, iscf[i] << scf_shift);
-    }
-}
-
-static const float g_drmp3_pow43[129 + 16] = {
-    0,-1,-2.519842f,-4.326749f,-6.349604f,-8.549880f,-10.902724f,-13.390518f,-16.000000f,-18.720754f,-21.544347f,-24.463781f,-27.473142f,-30.567351f,-33.741992f,-36.993181f,
-    0,1,2.519842f,4.326749f,6.349604f,8.549880f,10.902724f,13.390518f,16.000000f,18.720754f,21.544347f,24.463781f,27.473142f,30.567351f,33.741992f,36.993181f,40.317474f,43.711787f,47.173345f,50.699631f,54.288352f,57.937408f,61.644865f,65.408941f,69.227979f,73.100443f,77.024898f,81.000000f,85.024491f,89.097188f,93.216975f,97.382800f,101.593667f,105.848633f,110.146801f,114.487321f,118.869381f,123.292209f,127.755065f,132.257246f,136.798076f,141.376907f,145.993119f,150.646117f,155.335327f,160.060199f,164.820202f,169.614826f,174.443577f,179.305980f,184.201575f,189.129918f,194.090580f,199.083145f,204.107210f,209.162385f,214.248292f,219.364564f,224.510845f,229.686789f,234.892058f,240.126328f,245.389280f,250.680604f,256.000000f,261.347174f,266.721841f,272.123723f,277.552547f,283.008049f,288.489971f,293.998060f,299.532071f,305.091761f,310.676898f,316.287249f,321.922592f,327.582707f,333.267377f,338.976394f,344.709550f,350.466646f,356.247482f,362.051866f,367.879608f,373.730522f,379.604427f,385.501143f,391.420496f,397.362314f,403.326427f,409.312672f,415.320884f,421.350905f,427.402579f,433.475750f,439.570269f,445.685987f,451.822757f,457.980436f,464.158883f,470.357960f,476.577530f,482.817459f,489.077615f,495.357868f,501.658090f,507.978156f,514.317941f,520.677324f,527.056184f,533.454404f,539.871867f,546.308458f,552.764065f,559.238575f,565.731879f,572.243870f,578.774440f,585.323483f,591.890898f,598.476581f,605.080431f,611.702349f,618.342238f,625.000000f,631.675540f,638.368763f,645.079578f
-};
-
-static float drmp3_L3_pow_43(int x)
-{
-    float frac;
-    int sign, mult = 256;
-
-    if (x < 129)
-    {
-        return g_drmp3_pow43[16 + x];
-    }
-
-    if (x < 1024)
-    {
-        mult = 16;
-        x <<= 3;
-    }
-
-    sign = 2*x & 64;
-    frac = (float)((x & 63) - sign) / ((x & ~63) + sign);
-    return g_drmp3_pow43[16 + ((x + sign) >> 6)]*(1.f + frac*((4.f/3) + frac*(2.f/9)))*mult;
-}
-
-static void drmp3_L3_huffman(float *dst, drmp3_bs *bs, const drmp3_L3_gr_info *gr_info, const float *scf, int layer3gr_limit)
-{
-    static const drmp3_int16 tabs[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
-        785,785,785,785,784,784,784,784,513,513,513,513,513,513,513,513,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,
-        -255,1313,1298,1282,785,785,785,785,784,784,784,784,769,769,769,769,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,290,288,
-        -255,1313,1298,1282,769,769,769,769,529,529,529,529,529,529,529,529,528,528,528,528,528,528,528,528,512,512,512,512,512,512,512,512,290,288,
-        -253,-318,-351,-367,785,785,785,785,784,784,784,784,769,769,769,769,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,819,818,547,547,275,275,275,275,561,560,515,546,289,274,288,258,
-        -254,-287,1329,1299,1314,1312,1057,1057,1042,1042,1026,1026,784,784,784,784,529,529,529,529,529,529,529,529,769,769,769,769,768,768,768,768,563,560,306,306,291,259,
-        -252,-413,-477,-542,1298,-575,1041,1041,784,784,784,784,769,769,769,769,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,-383,-399,1107,1092,1106,1061,849,849,789,789,1104,1091,773,773,1076,1075,341,340,325,309,834,804,577,577,532,532,516,516,832,818,803,816,561,561,531,531,515,546,289,289,288,258,
-        -252,-429,-493,-559,1057,1057,1042,1042,529,529,529,529,529,529,529,529,784,784,784,784,769,769,769,769,512,512,512,512,512,512,512,512,-382,1077,-415,1106,1061,1104,849,849,789,789,1091,1076,1029,1075,834,834,597,581,340,340,339,324,804,833,532,532,832,772,818,803,817,787,816,771,290,290,290,290,288,258,
-        -253,-349,-414,-447,-463,1329,1299,-479,1314,1312,1057,1057,1042,1042,1026,1026,785,785,785,785,784,784,784,784,769,769,769,769,768,768,768,768,-319,851,821,-335,836,850,805,849,341,340,325,336,533,533,579,579,564,564,773,832,578,548,563,516,321,276,306,291,304,259,
-        -251,-572,-733,-830,-863,-879,1041,1041,784,784,784,784,769,769,769,769,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,-511,-527,-543,1396,1351,1381,1366,1395,1335,1380,-559,1334,1138,1138,1063,1063,1350,1392,1031,1031,1062,1062,1364,1363,1120,1120,1333,1348,881,881,881,881,375,374,359,373,343,358,341,325,791,791,1123,1122,-703,1105,1045,-719,865,865,790,790,774,774,1104,1029,338,293,323,308,-799,-815,833,788,772,818,803,816,322,292,307,320,561,531,515,546,289,274,288,258,
-        -251,-525,-605,-685,-765,-831,-846,1298,1057,1057,1312,1282,785,785,785,785,784,784,784,784,769,769,769,769,512,512,512,512,512,512,512,512,1399,1398,1383,1367,1382,1396,1351,-511,1381,1366,1139,1139,1079,1079,1124,1124,1364,1349,1363,1333,882,882,882,882,807,807,807,807,1094,1094,1136,1136,373,341,535,535,881,775,867,822,774,-591,324,338,-671,849,550,550,866,864,609,609,293,336,534,534,789,835,773,-751,834,804,308,307,833,788,832,772,562,562,547,547,305,275,560,515,290,290,
-        -252,-397,-477,-557,-622,-653,-719,-735,-750,1329,1299,1314,1057,1057,1042,1042,1312,1282,1024,1024,785,785,785,785,784,784,784,784,769,769,769,769,-383,1127,1141,1111,1126,1140,1095,1110,869,869,883,883,1079,1109,882,882,375,374,807,868,838,881,791,-463,867,822,368,263,852,837,836,-543,610,610,550,550,352,336,534,534,865,774,851,821,850,805,593,533,579,564,773,832,578,578,548,548,577,577,307,276,306,291,516,560,259,259,
-        -250,-2107,-2507,-2764,-2909,-2974,-3007,-3023,1041,1041,1040,1040,769,769,769,769,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,-767,-1052,-1213,-1277,-1358,-1405,-1469,-1535,-1550,-1582,-1614,-1647,-1662,-1694,-1726,-1759,-1774,-1807,-1822,-1854,-1886,1565,-1919,-1935,-1951,-1967,1731,1730,1580,1717,-1983,1729,1564,-1999,1548,-2015,-2031,1715,1595,-2047,1714,-2063,1610,-2079,1609,-2095,1323,1323,1457,1457,1307,1307,1712,1547,1641,1700,1699,1594,1685,1625,1442,1442,1322,1322,-780,-973,-910,1279,1278,1277,1262,1276,1261,1275,1215,1260,1229,-959,974,974,989,989,-943,735,478,478,495,463,506,414,-1039,1003,958,1017,927,942,987,957,431,476,1272,1167,1228,-1183,1256,-1199,895,895,941,941,1242,1227,1212,1135,1014,1014,490,489,503,487,910,1013,985,925,863,894,970,955,1012,847,-1343,831,755,755,984,909,428,366,754,559,-1391,752,486,457,924,997,698,698,983,893,740,740,908,877,739,739,667,667,953,938,497,287,271,271,683,606,590,712,726,574,302,302,738,736,481,286,526,725,605,711,636,724,696,651,589,681,666,710,364,467,573,695,466,466,301,465,379,379,709,604,665,679,316,316,634,633,436,436,464,269,424,394,452,332,438,363,347,408,393,448,331,422,362,407,392,421,346,406,391,376,375,359,1441,1306,-2367,1290,-2383,1337,-2399,-2415,1426,1321,-2431,1411,1336,-2447,-2463,-2479,1169,1169,1049,1049,1424,1289,1412,1352,1319,-2495,1154,1154,1064,1064,1153,1153,416,390,360,404,403,389,344,374,373,343,358,372,327,357,342,311,356,326,1395,1394,1137,1137,1047,1047,1365,1392,1287,1379,1334,1364,1349,1378,1318,1363,792,792,792,792,1152,1152,1032,1032,1121,1121,1046,1046,1120,1120,1030,1030,-2895,1106,1061,1104,849,849,789,789,1091,1076,1029,1090,1060,1075,833,833,309,324,532,532,832,772,818,803,561,561,531,560,515,546,289,274,288,258,
-        -250,-1179,-1579,-1836,-1996,-2124,-2253,-2333,-2413,-2477,-2542,-2574,-2607,-2622,-2655,1314,1313,1298,1312,1282,785,785,785,785,1040,1040,1025,1025,768,768,768,768,-766,-798,-830,-862,-895,-911,-927,-943,-959,-975,-991,-1007,-1023,-1039,-1055,-1070,1724,1647,-1103,-1119,1631,1767,1662,1738,1708,1723,-1135,1780,1615,1779,1599,1677,1646,1778,1583,-1151,1777,1567,1737,1692,1765,1722,1707,1630,1751,1661,1764,1614,1736,1676,1763,1750,1645,1598,1721,1691,1762,1706,1582,1761,1566,-1167,1749,1629,767,766,751,765,494,494,735,764,719,749,734,763,447,447,748,718,477,506,431,491,446,476,461,505,415,430,475,445,504,399,460,489,414,503,383,474,429,459,502,502,746,752,488,398,501,473,413,472,486,271,480,270,-1439,-1455,1357,-1471,-1487,-1503,1341,1325,-1519,1489,1463,1403,1309,-1535,1372,1448,1418,1476,1356,1462,1387,-1551,1475,1340,1447,1402,1386,-1567,1068,1068,1474,1461,455,380,468,440,395,425,410,454,364,467,466,464,453,269,409,448,268,432,1371,1473,1432,1417,1308,1460,1355,1446,1459,1431,1083,1083,1401,1416,1458,1445,1067,1067,1370,1457,1051,1051,1291,1430,1385,1444,1354,1415,1400,1443,1082,1082,1173,1113,1186,1066,1185,1050,-1967,1158,1128,1172,1097,1171,1081,-1983,1157,1112,416,266,375,400,1170,1142,1127,1065,793,793,1169,1033,1156,1096,1141,1111,1155,1080,1126,1140,898,898,808,808,897,897,792,792,1095,1152,1032,1125,1110,1139,1079,1124,882,807,838,881,853,791,-2319,867,368,263,822,852,837,866,806,865,-2399,851,352,262,534,534,821,836,594,594,549,549,593,593,533,533,848,773,579,579,564,578,548,563,276,276,577,576,306,291,516,560,305,305,275,259,
-        -251,-892,-2058,-2620,-2828,-2957,-3023,-3039,1041,1041,1040,1040,769,769,769,769,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,256,-511,-527,-543,-559,1530,-575,-591,1528,1527,1407,1526,1391,1023,1023,1023,1023,1525,1375,1268,1268,1103,1103,1087,1087,1039,1039,1523,-604,815,815,815,815,510,495,509,479,508,463,507,447,431,505,415,399,-734,-782,1262,-815,1259,1244,-831,1258,1228,-847,-863,1196,-879,1253,987,987,748,-767,493,493,462,477,414,414,686,669,478,446,461,445,474,429,487,458,412,471,1266,1264,1009,1009,799,799,-1019,-1276,-1452,-1581,-1677,-1757,-1821,-1886,-1933,-1997,1257,1257,1483,1468,1512,1422,1497,1406,1467,1496,1421,1510,1134,1134,1225,1225,1466,1451,1374,1405,1252,1252,1358,1480,1164,1164,1251,1251,1238,1238,1389,1465,-1407,1054,1101,-1423,1207,-1439,830,830,1248,1038,1237,1117,1223,1148,1236,1208,411,426,395,410,379,269,1193,1222,1132,1235,1221,1116,976,976,1192,1162,1177,1220,1131,1191,963,963,-1647,961,780,-1663,558,558,994,993,437,408,393,407,829,978,813,797,947,-1743,721,721,377,392,844,950,828,890,706,706,812,859,796,960,948,843,934,874,571,571,-1919,690,555,689,421,346,539,539,944,779,918,873,932,842,903,888,570,570,931,917,674,674,-2575,1562,-2591,1609,-2607,1654,1322,1322,1441,1441,1696,1546,1683,1593,1669,1624,1426,1426,1321,1321,1639,1680,1425,1425,1305,1305,1545,1668,1608,1623,1667,1592,1638,1666,1320,1320,1652,1607,1409,1409,1304,1304,1288,1288,1664,1637,1395,1395,1335,1335,1622,1636,1394,1394,1319,1319,1606,1621,1392,1392,1137,1137,1137,1137,345,390,360,375,404,373,1047,-2751,-2767,-2783,1062,1121,1046,-2799,1077,-2815,1106,1061,789,789,1105,1104,263,355,310,340,325,354,352,262,339,324,1091,1076,1029,1090,1060,1075,833,833,788,788,1088,1028,818,818,803,803,561,561,531,531,816,771,546,546,289,274,288,258,
-        -253,-317,-381,-446,-478,-509,1279,1279,-811,-1179,-1451,-1756,-1900,-2028,-2189,-2253,-2333,-2414,-2445,-2511,-2526,1313,1298,-2559,1041,1041,1040,1040,1025,1025,1024,1024,1022,1007,1021,991,1020,975,1019,959,687,687,1018,1017,671,671,655,655,1016,1015,639,639,758,758,623,623,757,607,756,591,755,575,754,559,543,543,1009,783,-575,-621,-685,-749,496,-590,750,749,734,748,974,989,1003,958,988,973,1002,942,987,957,972,1001,926,986,941,971,956,1000,910,985,925,999,894,970,-1071,-1087,-1102,1390,-1135,1436,1509,1451,1374,-1151,1405,1358,1480,1420,-1167,1507,1494,1389,1342,1465,1435,1450,1326,1505,1310,1493,1373,1479,1404,1492,1464,1419,428,443,472,397,736,526,464,464,486,457,442,471,484,482,1357,1449,1434,1478,1388,1491,1341,1490,1325,1489,1463,1403,1309,1477,1372,1448,1418,1433,1476,1356,1462,1387,-1439,1475,1340,1447,1402,1474,1324,1461,1371,1473,269,448,1432,1417,1308,1460,-1711,1459,-1727,1441,1099,1099,1446,1386,1431,1401,-1743,1289,1083,1083,1160,1160,1458,1445,1067,1067,1370,1457,1307,1430,1129,1129,1098,1098,268,432,267,416,266,400,-1887,1144,1187,1082,1173,1113,1186,1066,1050,1158,1128,1143,1172,1097,1171,1081,420,391,1157,1112,1170,1142,1127,1065,1169,1049,1156,1096,1141,1111,1155,1080,1126,1154,1064,1153,1140,1095,1048,-2159,1125,1110,1137,-2175,823,823,1139,1138,807,807,384,264,368,263,868,838,853,791,867,822,852,837,866,806,865,790,-2319,851,821,836,352,262,850,805,849,-2399,533,533,835,820,336,261,578,548,563,577,532,532,832,772,562,562,547,547,305,275,560,515,290,290,288,258 };
-    static const drmp3_uint8 tab32[] = { 130,162,193,209,44,28,76,140,9,9,9,9,9,9,9,9,190,254,222,238,126,94,157,157,109,61,173,205};
-    static const drmp3_uint8 tab33[] = { 252,236,220,204,188,172,156,140,124,108,92,76,60,44,28,12 };
-    static const drmp3_int16 tabindex[2*16] = { 0,32,64,98,0,132,180,218,292,364,426,538,648,746,0,1126,1460,1460,1460,1460,1460,1460,1460,1460,1842,1842,1842,1842,1842,1842,1842,1842 };
-    static const drmp3_uint8 g_linbits[] =  { 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,3,4,6,8,10,13,4,5,6,7,8,9,11,13 };
-
-#define DRMP3_PEEK_BITS(n)    (bs_cache >> (32 - (n)))
-#define DRMP3_FLUSH_BITS(n)   { bs_cache <<= (n); bs_sh += (n); }
-#define DRMP3_CHECK_BITS      while (bs_sh >= 0) { bs_cache |= (drmp3_uint32)*bs_next_ptr++ << bs_sh; bs_sh -= 8; }
-#define DRMP3_BSPOS           ((bs_next_ptr - bs->buf)*8 - 24 + bs_sh)
-
-    float one = 0.0f;
-    int ireg = 0, big_val_cnt = gr_info->big_values;
-    const drmp3_uint8 *sfb = gr_info->sfbtab;
-    const drmp3_uint8 *bs_next_ptr = bs->buf + bs->pos/8;
-    drmp3_uint32 bs_cache = (((bs_next_ptr[0]*256u + bs_next_ptr[1])*256u + bs_next_ptr[2])*256u + bs_next_ptr[3]) << (bs->pos & 7);
-    int pairs_to_decode, np, bs_sh = (bs->pos & 7) - 8;
-    bs_next_ptr += 4;
-
-    while (big_val_cnt > 0)
-    {
-        int tab_num = gr_info->table_select[ireg];
-        int sfb_cnt = gr_info->region_count[ireg++];
-        const drmp3_int16 *codebook = tabs + tabindex[tab_num];
-        int linbits = g_linbits[tab_num];
-        if (linbits)
-        {
-            do
-            {
-                np = *sfb++ / 2;
-                pairs_to_decode = DRMP3_MIN(big_val_cnt, np);
-                one = *scf++;
-                do
-                {
-                    int j, w = 5;
-                    int leaf = codebook[DRMP3_PEEK_BITS(w)];
-                    while (leaf < 0)
-                    {
-                        DRMP3_FLUSH_BITS(w);
-                        w = leaf & 7;
-                        leaf = codebook[DRMP3_PEEK_BITS(w) - (leaf >> 3)];
-                    }
-                    DRMP3_FLUSH_BITS(leaf >> 8);
-
-                    for (j = 0; j < 2; j++, dst++, leaf >>= 4)
-                    {
-                        int lsb = leaf & 0x0F;
-                        if (lsb == 15)
-                        {
-                            lsb += DRMP3_PEEK_BITS(linbits);
-                            DRMP3_FLUSH_BITS(linbits);
-                            DRMP3_CHECK_BITS;
-                            *dst = one*drmp3_L3_pow_43(lsb)*((drmp3_int32)bs_cache < 0 ? -1: 1);
-                        } else
-                        {
-                            *dst = g_drmp3_pow43[16 + lsb - 16*(bs_cache >> 31)]*one;
-                        }
-                        DRMP3_FLUSH_BITS(lsb ? 1 : 0);
-                    }
-                    DRMP3_CHECK_BITS;
-                } while (--pairs_to_decode);
-            } while ((big_val_cnt -= np) > 0 && --sfb_cnt >= 0);
-        } else
-        {
-            do
-            {
-                np = *sfb++ / 2;
-                pairs_to_decode = DRMP3_MIN(big_val_cnt, np);
-                one = *scf++;
-                do
-                {
-                    int j, w = 5;
-                    int leaf = codebook[DRMP3_PEEK_BITS(w)];
-                    while (leaf < 0)
-                    {
-                        DRMP3_FLUSH_BITS(w);
-                        w = leaf & 7;
-                        leaf = codebook[DRMP3_PEEK_BITS(w) - (leaf >> 3)];
-                    }
-                    DRMP3_FLUSH_BITS(leaf >> 8);
-
-                    for (j = 0; j < 2; j++, dst++, leaf >>= 4)
-                    {
-                        int lsb = leaf & 0x0F;
-                        *dst = g_drmp3_pow43[16 + lsb - 16*(bs_cache >> 31)]*one;
-                        DRMP3_FLUSH_BITS(lsb ? 1 : 0);
-                    }
-                    DRMP3_CHECK_BITS;
-                } while (--pairs_to_decode);
-            } while ((big_val_cnt -= np) > 0 && --sfb_cnt >= 0);
-        }
-    }
-
-    for (np = 1 - big_val_cnt;; dst += 4)
-    {
-        const drmp3_uint8 *codebook_count1 = (gr_info->count1_table) ? tab33 : tab32;
-        int leaf = codebook_count1[DRMP3_PEEK_BITS(4)];
-        if (!(leaf & 8))
-        {
-            leaf = codebook_count1[(leaf >> 3) + (bs_cache << 4 >> (32 - (leaf & 3)))];
-        }
-        DRMP3_FLUSH_BITS(leaf & 7);
-        if (DRMP3_BSPOS > layer3gr_limit)
-        {
-            break;
-        }
-#define DRMP3_RELOAD_SCALEFACTOR  if (!--np) { np = *sfb++/2; if (!np) break; one = *scf++; }
-#define DRMP3_DEQ_COUNT1(s) if (leaf & (128 >> s)) { dst[s] = ((drmp3_int32)bs_cache < 0) ? -one : one; DRMP3_FLUSH_BITS(1) }
-        DRMP3_RELOAD_SCALEFACTOR;
-        DRMP3_DEQ_COUNT1(0);
-        DRMP3_DEQ_COUNT1(1);
-        DRMP3_RELOAD_SCALEFACTOR;
-        DRMP3_DEQ_COUNT1(2);
-        DRMP3_DEQ_COUNT1(3);
-        DRMP3_CHECK_BITS;
-    }
-
-    bs->pos = layer3gr_limit;
-}
-
-static void drmp3_L3_midside_stereo(float *left, int n)
-{
-    int i = 0;
-    float *right = left + 576;
-#if DRMP3_HAVE_SIMD
-    if (drmp3_have_simd())
-    {
-        for (; i < n - 3; i += 4)
-        {
-            drmp3_f4 vl = DRMP3_VLD(left + i);
-            drmp3_f4 vr = DRMP3_VLD(right + i);
-            DRMP3_VSTORE(left + i, DRMP3_VADD(vl, vr));
-            DRMP3_VSTORE(right + i, DRMP3_VSUB(vl, vr));
-        }
-#ifdef __GNUC__
-        /* Workaround for spurious -Waggressive-loop-optimizations warning from gcc.
-         * For more info see: https://github.com/lieff/minimp3/issues/88
-         */
-        if (__builtin_constant_p(n % 4 == 0) && n % 4 == 0)
-            return;
-#endif
-    }
-#endif
-    for (; i < n; i++)
-    {
-        float a = left[i];
-        float b = right[i];
-        left[i] = a + b;
-        right[i] = a - b;
-    }
-}
-
-static void drmp3_L3_intensity_stereo_band(float *left, int n, float kl, float kr)
-{
-    int i;
-    for (i = 0; i < n; i++)
-    {
-        left[i + 576] = left[i]*kr;
-        left[i] = left[i]*kl;
-    }
-}
-
-static void drmp3_L3_stereo_top_band(const float *right, const drmp3_uint8 *sfb, int nbands, int max_band[3])
-{
-    int i, k;
-
-    max_band[0] = max_band[1] = max_band[2] = -1;
-
-    for (i = 0; i < nbands; i++)
-    {
-        for (k = 0; k < sfb[i]; k += 2)
-        {
-            if (right[k] != 0 || right[k + 1] != 0)
-            {
-                max_band[i % 3] = i;
-                break;
-            }
-        }
-        right += sfb[i];
-    }
-}
-
-static void drmp3_L3_stereo_process(float *left, const drmp3_uint8 *ist_pos, const drmp3_uint8 *sfb, const drmp3_uint8 *hdr, int max_band[3], int mpeg2_sh)
-{
-    static const float g_pan[7*2] = { 0,1,0.21132487f,0.78867513f,0.36602540f,0.63397460f,0.5f,0.5f,0.63397460f,0.36602540f,0.78867513f,0.21132487f,1,0 };
-    unsigned i, max_pos = DRMP3_HDR_TEST_MPEG1(hdr) ? 7 : 64;
-
-    for (i = 0; sfb[i]; i++)
-    {
-        unsigned ipos = ist_pos[i];
-        if ((int)i > max_band[i % 3] && ipos < max_pos)
-        {
-            float kl, kr, s = DRMP3_HDR_TEST_MS_STEREO(hdr) ? 1.41421356f : 1;
-            if (DRMP3_HDR_TEST_MPEG1(hdr))
-            {
-                kl = g_pan[2*ipos];
-                kr = g_pan[2*ipos + 1];
-            } else
-            {
-                kl = 1;
-                kr = drmp3_L3_ldexp_q2(1, (ipos + 1) >> 1 << mpeg2_sh);
-                if (ipos & 1)
-                {
-                    kl = kr;
-                    kr = 1;
-                }
-            }
-            drmp3_L3_intensity_stereo_band(left, sfb[i], kl*s, kr*s);
-        } else if (DRMP3_HDR_TEST_MS_STEREO(hdr))
-        {
-            drmp3_L3_midside_stereo(left, sfb[i]);
-        }
-        left += sfb[i];
-    }
-}
-
-static void drmp3_L3_intensity_stereo(float *left, drmp3_uint8 *ist_pos, const drmp3_L3_gr_info *gr, const drmp3_uint8 *hdr)
-{
-    int max_band[3], n_sfb = gr->n_long_sfb + gr->n_short_sfb;
-    int i, max_blocks = gr->n_short_sfb ? 3 : 1;
-
-    drmp3_L3_stereo_top_band(left + 576, gr->sfbtab, n_sfb, max_band);
-    if (gr->n_long_sfb)
-    {
-        max_band[0] = max_band[1] = max_band[2] = DRMP3_MAX(DRMP3_MAX(max_band[0], max_band[1]), max_band[2]);
-    }
-    for (i = 0; i < max_blocks; i++)
-    {
-        int default_pos = DRMP3_HDR_TEST_MPEG1(hdr) ? 3 : 0;
-        int itop = n_sfb - max_blocks + i;
-        int prev = itop - max_blocks;
-        ist_pos[itop] = (drmp3_uint8)(max_band[i] >= prev ? default_pos : ist_pos[prev]);
-    }
-    drmp3_L3_stereo_process(left, ist_pos, gr->sfbtab, hdr, max_band, gr[1].scalefac_compress & 1);
-}
-
-static void drmp3_L3_reorder(float *grbuf, float *scratch, const drmp3_uint8 *sfb)
-{
-    int i, len;
-    float *src = grbuf, *dst = scratch;
-
-    for (;0 != (len = *sfb); sfb += 3, src += 2*len)
-    {
-        for (i = 0; i < len; i++, src++)
-        {
-            *dst++ = src[0*len];
-            *dst++ = src[1*len];
-            *dst++ = src[2*len];
-        }
-    }
-    DRMP3_COPY_MEMORY(grbuf, scratch, (dst - scratch)*sizeof(float));
-}
-
-static void drmp3_L3_antialias(float *grbuf, int nbands)
-{
-    static const float g_aa[2][8] = {
-        {0.85749293f,0.88174200f,0.94962865f,0.98331459f,0.99551782f,0.99916056f,0.99989920f,0.99999316f},
-        {0.51449576f,0.47173197f,0.31337745f,0.18191320f,0.09457419f,0.04096558f,0.01419856f,0.00369997f}
-    };
-
-    for (; nbands > 0; nbands--, grbuf += 18)
-    {
-        int i = 0;
-#if DRMP3_HAVE_SIMD
-        if (drmp3_have_simd()) for (; i < 8; i += 4)
-        {
-            drmp3_f4 vu = DRMP3_VLD(grbuf + 18 + i);
-            drmp3_f4 vd = DRMP3_VLD(grbuf + 14 - i);
-            drmp3_f4 vc0 = DRMP3_VLD(g_aa[0] + i);
-            drmp3_f4 vc1 = DRMP3_VLD(g_aa[1] + i);
-            vd = DRMP3_VREV(vd);
-            DRMP3_VSTORE(grbuf + 18 + i, DRMP3_VSUB(DRMP3_VMUL(vu, vc0), DRMP3_VMUL(vd, vc1)));
-            vd = DRMP3_VADD(DRMP3_VMUL(vu, vc1), DRMP3_VMUL(vd, vc0));
-            DRMP3_VSTORE(grbuf + 14 - i, DRMP3_VREV(vd));
-        }
-#endif
-#ifndef DR_MP3_ONLY_SIMD
-        for(; i < 8; i++)
-        {
-            float u = grbuf[18 + i];
-            float d = grbuf[17 - i];
-            grbuf[18 + i] = u*g_aa[0][i] - d*g_aa[1][i];
-            grbuf[17 - i] = u*g_aa[1][i] + d*g_aa[0][i];
-        }
-#endif
-    }
-}
-
-static void drmp3_L3_dct3_9(float *y)
-{
-    float s0, s1, s2, s3, s4, s5, s6, s7, s8, t0, t2, t4;
-
-    s0 = y[0]; s2 = y[2]; s4 = y[4]; s6 = y[6]; s8 = y[8];
-    t0 = s0 + s6*0.5f;
-    s0 -= s6;
-    t4 = (s4 + s2)*0.93969262f;
-    t2 = (s8 + s2)*0.76604444f;
-    s6 = (s4 - s8)*0.17364818f;
-    s4 += s8 - s2;
-
-    s2 = s0 - s4*0.5f;
-    y[4] = s4 + s0;
-    s8 = t0 - t2 + s6;
-    s0 = t0 - t4 + t2;
-    s4 = t0 + t4 - s6;
-
-    s1 = y[1]; s3 = y[3]; s5 = y[5]; s7 = y[7];
-
-    s3 *= 0.86602540f;
-    t0 = (s5 + s1)*0.98480775f;
-    t4 = (s5 - s7)*0.34202014f;
-    t2 = (s1 + s7)*0.64278761f;
-    s1 = (s1 - s5 - s7)*0.86602540f;
-
-    s5 = t0 - s3 - t2;
-    s7 = t4 - s3 - t0;
-    s3 = t4 + s3 - t2;
-
-    y[0] = s4 - s7;
-    y[1] = s2 + s1;
-    y[2] = s0 - s3;
-    y[3] = s8 + s5;
-    y[5] = s8 - s5;
-    y[6] = s0 + s3;
-    y[7] = s2 - s1;
-    y[8] = s4 + s7;
-}
-
-static void drmp3_L3_imdct36(float *grbuf, float *overlap, const float *window, int nbands)
-{
-    int i, j;
-    static const float g_twid9[18] = {
-        0.73727734f,0.79335334f,0.84339145f,0.88701083f,0.92387953f,0.95371695f,0.97629601f,0.99144486f,0.99904822f,0.67559021f,0.60876143f,0.53729961f,0.46174861f,0.38268343f,0.30070580f,0.21643961f,0.13052619f,0.04361938f
-    };
-
-    for (j = 0; j < nbands; j++, grbuf += 18, overlap += 9)
-    {
-        float co[9], si[9];
-        co[0] = -grbuf[0];
-        si[0] = grbuf[17];
-        for (i = 0; i < 4; i++)
-        {
-            si[8 - 2*i] =   grbuf[4*i + 1] - grbuf[4*i + 2];
-            co[1 + 2*i] =   grbuf[4*i + 1] + grbuf[4*i + 2];
-            si[7 - 2*i] =   grbuf[4*i + 4] - grbuf[4*i + 3];
-            co[2 + 2*i] = -(grbuf[4*i + 3] + grbuf[4*i + 4]);
-        }
-        drmp3_L3_dct3_9(co);
-        drmp3_L3_dct3_9(si);
-
-        si[1] = -si[1];
-        si[3] = -si[3];
-        si[5] = -si[5];
-        si[7] = -si[7];
-
-        i = 0;
-
-#if DRMP3_HAVE_SIMD
-        if (drmp3_have_simd()) for (; i < 8; i += 4)
-        {
-            drmp3_f4 vovl = DRMP3_VLD(overlap + i);
-            drmp3_f4 vc = DRMP3_VLD(co + i);
-            drmp3_f4 vs = DRMP3_VLD(si + i);
-            drmp3_f4 vr0 = DRMP3_VLD(g_twid9 + i);
-            drmp3_f4 vr1 = DRMP3_VLD(g_twid9 + 9 + i);
-            drmp3_f4 vw0 = DRMP3_VLD(window + i);
-            drmp3_f4 vw1 = DRMP3_VLD(window + 9 + i);
-            drmp3_f4 vsum = DRMP3_VADD(DRMP3_VMUL(vc, vr1), DRMP3_VMUL(vs, vr0));
-            DRMP3_VSTORE(overlap + i, DRMP3_VSUB(DRMP3_VMUL(vc, vr0), DRMP3_VMUL(vs, vr1)));
-            DRMP3_VSTORE(grbuf + i, DRMP3_VSUB(DRMP3_VMUL(vovl, vw0), DRMP3_VMUL(vsum, vw1)));
-            vsum = DRMP3_VADD(DRMP3_VMUL(vovl, vw1), DRMP3_VMUL(vsum, vw0));
-            DRMP3_VSTORE(grbuf + 14 - i, DRMP3_VREV(vsum));
-        }
-#endif
-        for (; i < 9; i++)
-        {
-            float ovl  = overlap[i];
-            float sum  = co[i]*g_twid9[9 + i] + si[i]*g_twid9[0 + i];
-            overlap[i] = co[i]*g_twid9[0 + i] - si[i]*g_twid9[9 + i];
-            grbuf[i]      = ovl*window[0 + i] - sum*window[9 + i];
-            grbuf[17 - i] = ovl*window[9 + i] + sum*window[0 + i];
-        }
-    }
-}
-
-static void drmp3_L3_idct3(float x0, float x1, float x2, float *dst)
-{
-    float m1 = x1*0.86602540f;
-    float a1 = x0 - x2*0.5f;
-    dst[1] = x0 + x2;
-    dst[0] = a1 + m1;
-    dst[2] = a1 - m1;
-}
-
-static void drmp3_L3_imdct12(float *x, float *dst, float *overlap)
-{
-    static const float g_twid3[6] = { 0.79335334f,0.92387953f,0.99144486f, 0.60876143f,0.38268343f,0.13052619f };
-    float co[3], si[3];
-    int i;
-
-    drmp3_L3_idct3(-x[0], x[6] + x[3], x[12] + x[9], co);
-    drmp3_L3_idct3(x[15], x[12] - x[9], x[6] - x[3], si);
-    si[1] = -si[1];
-
-    for (i = 0; i < 3; i++)
-    {
-        float ovl  = overlap[i];
-        float sum  = co[i]*g_twid3[3 + i] + si[i]*g_twid3[0 + i];
-        overlap[i] = co[i]*g_twid3[0 + i] - si[i]*g_twid3[3 + i];
-        dst[i]     = ovl*g_twid3[2 - i] - sum*g_twid3[5 - i];
-        dst[5 - i] = ovl*g_twid3[5 - i] + sum*g_twid3[2 - i];
-    }
-}
-
-static void drmp3_L3_imdct_short(float *grbuf, float *overlap, int nbands)
-{
-    for (;nbands > 0; nbands--, overlap += 9, grbuf += 18)
-    {
-        float tmp[18];
-        DRMP3_COPY_MEMORY(tmp, grbuf, sizeof(tmp));
-        DRMP3_COPY_MEMORY(grbuf, overlap, 6*sizeof(float));
-        drmp3_L3_imdct12(tmp, grbuf + 6, overlap + 6);
-        drmp3_L3_imdct12(tmp + 1, grbuf + 12, overlap + 6);
-        drmp3_L3_imdct12(tmp + 2, overlap, overlap + 6);
-    }
-}
-
-static void drmp3_L3_change_sign(float *grbuf)
-{
-    int b, i;
-    for (b = 0, grbuf += 18; b < 32; b += 2, grbuf += 36)
-        for (i = 1; i < 18; i += 2)
-            grbuf[i] = -grbuf[i];
-}
-
-static void drmp3_L3_imdct_gr(float *grbuf, float *overlap, unsigned block_type, unsigned n_long_bands)
-{
-    static const float g_mdct_window[2][18] = {
-        { 0.99904822f,0.99144486f,0.97629601f,0.95371695f,0.92387953f,0.88701083f,0.84339145f,0.79335334f,0.73727734f,0.04361938f,0.13052619f,0.21643961f,0.30070580f,0.38268343f,0.46174861f,0.53729961f,0.60876143f,0.67559021f },
-        { 1,1,1,1,1,1,0.99144486f,0.92387953f,0.79335334f,0,0,0,0,0,0,0.13052619f,0.38268343f,0.60876143f }
-    };
-    if (n_long_bands)
-    {
-        drmp3_L3_imdct36(grbuf, overlap, g_mdct_window[0], n_long_bands);
-        grbuf += 18*n_long_bands;
-        overlap += 9*n_long_bands;
-    }
-    if (block_type == DRMP3_SHORT_BLOCK_TYPE)
-        drmp3_L3_imdct_short(grbuf, overlap, 32 - n_long_bands);
-    else
-        drmp3_L3_imdct36(grbuf, overlap, g_mdct_window[block_type == DRMP3_STOP_BLOCK_TYPE], 32 - n_long_bands);
-}
-
-static void drmp3_L3_save_reservoir(drmp3dec *h, drmp3dec_scratch *s)
-{
-    int pos = (s->bs.pos + 7)/8u;
-    int remains = s->bs.limit/8u - pos;
-    if (remains > DRMP3_MAX_BITRESERVOIR_BYTES)
-    {
-        pos += remains - DRMP3_MAX_BITRESERVOIR_BYTES;
-        remains = DRMP3_MAX_BITRESERVOIR_BYTES;
-    }
-    if (remains > 0)
-    {
-        DRMP3_MOVE_MEMORY(h->reserv_buf, s->maindata + pos, remains);
-    }
-    h->reserv = remains;
-}
-
-static int drmp3_L3_restore_reservoir(drmp3dec *h, drmp3_bs *bs, drmp3dec_scratch *s, int main_data_begin)
-{
-    int frame_bytes = (bs->limit - bs->pos)/8;
-    int bytes_have = DRMP3_MIN(h->reserv, main_data_begin);
-    DRMP3_COPY_MEMORY(s->maindata, h->reserv_buf + DRMP3_MAX(0, h->reserv - main_data_begin), DRMP3_MIN(h->reserv, main_data_begin));
-    DRMP3_COPY_MEMORY(s->maindata + bytes_have, bs->buf + bs->pos/8, frame_bytes);
-    drmp3_bs_init(&s->bs, s->maindata, bytes_have + frame_bytes);
-    return h->reserv >= main_data_begin;
-}
-
-static void drmp3_L3_decode(drmp3dec *h, drmp3dec_scratch *s, drmp3_L3_gr_info *gr_info, int nch)
-{
-    int ch;
-
-    for (ch = 0; ch < nch; ch++)
-    {
-        int layer3gr_limit = s->bs.pos + gr_info[ch].part_23_length;
-        drmp3_L3_decode_scalefactors(h->header, s->ist_pos[ch], &s->bs, gr_info + ch, s->scf, ch);
-        drmp3_L3_huffman(s->grbuf[ch], &s->bs, gr_info + ch, s->scf, layer3gr_limit);
-    }
-
-    if (DRMP3_HDR_TEST_I_STEREO(h->header))
-    {
-        drmp3_L3_intensity_stereo(s->grbuf[0], s->ist_pos[1], gr_info, h->header);
-    } else if (DRMP3_HDR_IS_MS_STEREO(h->header))
-    {
-        drmp3_L3_midside_stereo(s->grbuf[0], 576);
-    }
-
-    for (ch = 0; ch < nch; ch++, gr_info++)
-    {
-        int aa_bands = 31;
-        int n_long_bands = (gr_info->mixed_block_flag ? 2 : 0) << (int)(DRMP3_HDR_GET_MY_SAMPLE_RATE(h->header) == 2);
-
-        if (gr_info->n_short_sfb)
-        {
-            aa_bands = n_long_bands - 1;
-            drmp3_L3_reorder(s->grbuf[ch] + n_long_bands*18, s->syn[0], gr_info->sfbtab + gr_info->n_long_sfb);
-        }
-
-        drmp3_L3_antialias(s->grbuf[ch], aa_bands);
-        drmp3_L3_imdct_gr(s->grbuf[ch], h->mdct_overlap[ch], gr_info->block_type, n_long_bands);
-        drmp3_L3_change_sign(s->grbuf[ch]);
-    }
-}
-
-static void drmp3d_DCT_II(float *grbuf, int n)
-{
-    static const float g_sec[24] = {
-        10.19000816f,0.50060302f,0.50241929f,3.40760851f,0.50547093f,0.52249861f,2.05778098f,0.51544732f,0.56694406f,1.48416460f,0.53104258f,0.64682180f,1.16943991f,0.55310392f,0.78815460f,0.97256821f,0.58293498f,1.06067765f,0.83934963f,0.62250412f,1.72244716f,0.74453628f,0.67480832f,5.10114861f
-    };
-    int i, k = 0;
-#if DRMP3_HAVE_SIMD
-    if (drmp3_have_simd()) for (; k < n; k += 4)
-    {
-        drmp3_f4 t[4][8], *x;
-        float *y = grbuf + k;
-
-        for (x = t[0], i = 0; i < 8; i++, x++)
-        {
-            drmp3_f4 x0 = DRMP3_VLD(&y[i*18]);
-            drmp3_f4 x1 = DRMP3_VLD(&y[(15 - i)*18]);
-            drmp3_f4 x2 = DRMP3_VLD(&y[(16 + i)*18]);
-            drmp3_f4 x3 = DRMP3_VLD(&y[(31 - i)*18]);
-            drmp3_f4 t0 = DRMP3_VADD(x0, x3);
-            drmp3_f4 t1 = DRMP3_VADD(x1, x2);
-            drmp3_f4 t2 = DRMP3_VMUL_S(DRMP3_VSUB(x1, x2), g_sec[3*i + 0]);
-            drmp3_f4 t3 = DRMP3_VMUL_S(DRMP3_VSUB(x0, x3), g_sec[3*i + 1]);
-            x[0] = DRMP3_VADD(t0, t1);
-            x[8] = DRMP3_VMUL_S(DRMP3_VSUB(t0, t1), g_sec[3*i + 2]);
-            x[16] = DRMP3_VADD(t3, t2);
-            x[24] = DRMP3_VMUL_S(DRMP3_VSUB(t3, t2), g_sec[3*i + 2]);
-        }
-        for (x = t[0], i = 0; i < 4; i++, x += 8)
-        {
-            drmp3_f4 x0 = x[0], x1 = x[1], x2 = x[2], x3 = x[3], x4 = x[4], x5 = x[5], x6 = x[6], x7 = x[7], xt;
-            xt = DRMP3_VSUB(x0, x7); x0 = DRMP3_VADD(x0, x7);
-            x7 = DRMP3_VSUB(x1, x6); x1 = DRMP3_VADD(x1, x6);
-            x6 = DRMP3_VSUB(x2, x5); x2 = DRMP3_VADD(x2, x5);
-            x5 = DRMP3_VSUB(x3, x4); x3 = DRMP3_VADD(x3, x4);
-            x4 = DRMP3_VSUB(x0, x3); x0 = DRMP3_VADD(x0, x3);
-            x3 = DRMP3_VSUB(x1, x2); x1 = DRMP3_VADD(x1, x2);
-            x[0] = DRMP3_VADD(x0, x1);
-            x[4] = DRMP3_VMUL_S(DRMP3_VSUB(x0, x1), 0.70710677f);
-            x5 = DRMP3_VADD(x5, x6);
-            x6 = DRMP3_VMUL_S(DRMP3_VADD(x6, x7), 0.70710677f);
-            x7 = DRMP3_VADD(x7, xt);
-            x3 = DRMP3_VMUL_S(DRMP3_VADD(x3, x4), 0.70710677f);
-            x5 = DRMP3_VSUB(x5, DRMP3_VMUL_S(x7, 0.198912367f)); /* rotate by PI/8 */
-            x7 = DRMP3_VADD(x7, DRMP3_VMUL_S(x5, 0.382683432f));
-            x5 = DRMP3_VSUB(x5, DRMP3_VMUL_S(x7, 0.198912367f));
-            x0 = DRMP3_VSUB(xt, x6); xt = DRMP3_VADD(xt, x6);
-            x[1] = DRMP3_VMUL_S(DRMP3_VADD(xt, x7), 0.50979561f);
-            x[2] = DRMP3_VMUL_S(DRMP3_VADD(x4, x3), 0.54119611f);
-            x[3] = DRMP3_VMUL_S(DRMP3_VSUB(x0, x5), 0.60134488f);
-            x[5] = DRMP3_VMUL_S(DRMP3_VADD(x0, x5), 0.89997619f);
-            x[6] = DRMP3_VMUL_S(DRMP3_VSUB(x4, x3), 1.30656302f);
-            x[7] = DRMP3_VMUL_S(DRMP3_VSUB(xt, x7), 2.56291556f);
-        }
-
-        if (k > n - 3)
-        {
-#if DRMP3_HAVE_SSE
-#define DRMP3_VSAVE2(i, v) _mm_storel_pi((__m64 *)(void*)&y[i*18], v)
-#else
-#define DRMP3_VSAVE2(i, v) vst1_f32((float32_t *)&y[(i)*18],  vget_low_f32(v))
-#endif
-            for (i = 0; i < 7; i++, y += 4*18)
-            {
-                drmp3_f4 s = DRMP3_VADD(t[3][i], t[3][i + 1]);
-                DRMP3_VSAVE2(0, t[0][i]);
-                DRMP3_VSAVE2(1, DRMP3_VADD(t[2][i], s));
-                DRMP3_VSAVE2(2, DRMP3_VADD(t[1][i], t[1][i + 1]));
-                DRMP3_VSAVE2(3, DRMP3_VADD(t[2][1 + i], s));
-            }
-            DRMP3_VSAVE2(0, t[0][7]);
-            DRMP3_VSAVE2(1, DRMP3_VADD(t[2][7], t[3][7]));
-            DRMP3_VSAVE2(2, t[1][7]);
-            DRMP3_VSAVE2(3, t[3][7]);
-        } else
-        {
-#define DRMP3_VSAVE4(i, v) DRMP3_VSTORE(&y[(i)*18], v)
-            for (i = 0; i < 7; i++, y += 4*18)
-            {
-                drmp3_f4 s = DRMP3_VADD(t[3][i], t[3][i + 1]);
-                DRMP3_VSAVE4(0, t[0][i]);
-                DRMP3_VSAVE4(1, DRMP3_VADD(t[2][i], s));
-                DRMP3_VSAVE4(2, DRMP3_VADD(t[1][i], t[1][i + 1]));
-                DRMP3_VSAVE4(3, DRMP3_VADD(t[2][1 + i], s));
-            }
-            DRMP3_VSAVE4(0, t[0][7]);
-            DRMP3_VSAVE4(1, DRMP3_VADD(t[2][7], t[3][7]));
-            DRMP3_VSAVE4(2, t[1][7]);
-            DRMP3_VSAVE4(3, t[3][7]);
-        }
-    } else
-#endif
-#ifdef DR_MP3_ONLY_SIMD
-    {} /* for HAVE_SIMD=1, MINIMP3_ONLY_SIMD=1 case we do not need non-intrinsic "else" branch */
-#else
-    for (; k < n; k++)
-    {
-        float t[4][8], *x, *y = grbuf + k;
-
-        for (x = t[0], i = 0; i < 8; i++, x++)
-        {
-            float x0 = y[i*18];
-            float x1 = y[(15 - i)*18];
-            float x2 = y[(16 + i)*18];
-            float x3 = y[(31 - i)*18];
-            float t0 = x0 + x3;
-            float t1 = x1 + x2;
-            float t2 = (x1 - x2)*g_sec[3*i + 0];
-            float t3 = (x0 - x3)*g_sec[3*i + 1];
-            x[0] = t0 + t1;
-            x[8] = (t0 - t1)*g_sec[3*i + 2];
-            x[16] = t3 + t2;
-            x[24] = (t3 - t2)*g_sec[3*i + 2];
-        }
-        for (x = t[0], i = 0; i < 4; i++, x += 8)
-        {
-            float x0 = x[0], x1 = x[1], x2 = x[2], x3 = x[3], x4 = x[4], x5 = x[5], x6 = x[6], x7 = x[7], xt;
-            xt = x0 - x7; x0 += x7;
-            x7 = x1 - x6; x1 += x6;
-            x6 = x2 - x5; x2 += x5;
-            x5 = x3 - x4; x3 += x4;
-            x4 = x0 - x3; x0 += x3;
-            x3 = x1 - x2; x1 += x2;
-            x[0] = x0 + x1;
-            x[4] = (x0 - x1)*0.70710677f;
-            x5 =  x5 + x6;
-            x6 = (x6 + x7)*0.70710677f;
-            x7 =  x7 + xt;
-            x3 = (x3 + x4)*0.70710677f;
-            x5 -= x7*0.198912367f;  /* rotate by PI/8 */
-            x7 += x5*0.382683432f;
-            x5 -= x7*0.198912367f;
-            x0 = xt - x6; xt += x6;
-            x[1] = (xt + x7)*0.50979561f;
-            x[2] = (x4 + x3)*0.54119611f;
-            x[3] = (x0 - x5)*0.60134488f;
-            x[5] = (x0 + x5)*0.89997619f;
-            x[6] = (x4 - x3)*1.30656302f;
-            x[7] = (xt - x7)*2.56291556f;
-
-        }
-        for (i = 0; i < 7; i++, y += 4*18)
-        {
-            y[0*18] = t[0][i];
-            y[1*18] = t[2][i] + t[3][i] + t[3][i + 1];
-            y[2*18] = t[1][i] + t[1][i + 1];
-            y[3*18] = t[2][i + 1] + t[3][i] + t[3][i + 1];
-        }
-        y[0*18] = t[0][7];
-        y[1*18] = t[2][7] + t[3][7];
-        y[2*18] = t[1][7];
-        y[3*18] = t[3][7];
-    }
-#endif
-}
-
-#ifndef DR_MP3_FLOAT_OUTPUT
-typedef drmp3_int16 drmp3d_sample_t;
-
-static drmp3_int16 drmp3d_scale_pcm(float sample)
-{
-    drmp3_int16 s;
-#if DRMP3_HAVE_ARMV6
-    drmp3_int32 s32 = (drmp3_int32)(sample + .5f);
-    s32 -= (s32 < 0);
-    s = (drmp3_int16)drmp3_clip_int16_arm(s32);
-#else
-    if (sample >=  32766.5) return (drmp3_int16) 32767;
-    if (sample <= -32767.5) return (drmp3_int16)-32768;
-    s = (drmp3_int16)(sample + .5f);
-    s -= (s < 0);   /* away from zero, to be compliant */
-#endif
-    return s;
-}
-#else
-typedef float drmp3d_sample_t;
-
-static float drmp3d_scale_pcm(float sample)
-{
-    return sample*(1.f/32768.f);
-}
-#endif
-
-static void drmp3d_synth_pair(drmp3d_sample_t *pcm, int nch, const float *z)
-{
-    float a;
-    a  = (z[14*64] - z[    0]) * 29;
-    a += (z[ 1*64] + z[13*64]) * 213;
-    a += (z[12*64] - z[ 2*64]) * 459;
-    a += (z[ 3*64] + z[11*64]) * 2037;
-    a += (z[10*64] - z[ 4*64]) * 5153;
-    a += (z[ 5*64] + z[ 9*64]) * 6574;
-    a += (z[ 8*64] - z[ 6*64]) * 37489;
-    a +=  z[ 7*64]             * 75038;
-    pcm[0] = drmp3d_scale_pcm(a);
-
-    z += 2;
-    a  = z[14*64] * 104;
-    a += z[12*64] * 1567;
-    a += z[10*64] * 9727;
-    a += z[ 8*64] * 64019;
-    a += z[ 6*64] * -9975;
-    a += z[ 4*64] * -45;
-    a += z[ 2*64] * 146;
-    a += z[ 0*64] * -5;
-    pcm[16*nch] = drmp3d_scale_pcm(a);
-}
-
-static void drmp3d_synth(float *xl, drmp3d_sample_t *dstl, int nch, float *lins)
-{
-    int i;
-    float *xr = xl + 576*(nch - 1);
-    drmp3d_sample_t *dstr = dstl + (nch - 1);
-
-    static const float g_win[] = {
-        -1,26,-31,208,218,401,-519,2063,2000,4788,-5517,7134,5959,35640,-39336,74992,
-        -1,24,-35,202,222,347,-581,2080,1952,4425,-5879,7640,5288,33791,-41176,74856,
-        -1,21,-38,196,225,294,-645,2087,1893,4063,-6237,8092,4561,31947,-43006,74630,
-        -1,19,-41,190,227,244,-711,2085,1822,3705,-6589,8492,3776,30112,-44821,74313,
-        -1,17,-45,183,228,197,-779,2075,1739,3351,-6935,8840,2935,28289,-46617,73908,
-        -1,16,-49,176,228,153,-848,2057,1644,3004,-7271,9139,2037,26482,-48390,73415,
-        -2,14,-53,169,227,111,-919,2032,1535,2663,-7597,9389,1082,24694,-50137,72835,
-        -2,13,-58,161,224,72,-991,2001,1414,2330,-7910,9592,70,22929,-51853,72169,
-        -2,11,-63,154,221,36,-1064,1962,1280,2006,-8209,9750,-998,21189,-53534,71420,
-        -2,10,-68,147,215,2,-1137,1919,1131,1692,-8491,9863,-2122,19478,-55178,70590,
-        -3,9,-73,139,208,-29,-1210,1870,970,1388,-8755,9935,-3300,17799,-56778,69679,
-        -3,8,-79,132,200,-57,-1283,1817,794,1095,-8998,9966,-4533,16155,-58333,68692,
-        -4,7,-85,125,189,-83,-1356,1759,605,814,-9219,9959,-5818,14548,-59838,67629,
-        -4,7,-91,117,177,-106,-1428,1698,402,545,-9416,9916,-7154,12980,-61289,66494,
-        -5,6,-97,111,163,-127,-1498,1634,185,288,-9585,9838,-8540,11455,-62684,65290
-    };
-    float *zlin = lins + 15*64;
-    const float *w = g_win;
-
-    zlin[4*15]     = xl[18*16];
-    zlin[4*15 + 1] = xr[18*16];
-    zlin[4*15 + 2] = xl[0];
-    zlin[4*15 + 3] = xr[0];
-
-    zlin[4*31]     = xl[1 + 18*16];
-    zlin[4*31 + 1] = xr[1 + 18*16];
-    zlin[4*31 + 2] = xl[1];
-    zlin[4*31 + 3] = xr[1];
-
-    drmp3d_synth_pair(dstr, nch, lins + 4*15 + 1);
-    drmp3d_synth_pair(dstr + 32*nch, nch, lins + 4*15 + 64 + 1);
-    drmp3d_synth_pair(dstl, nch, lins + 4*15);
-    drmp3d_synth_pair(dstl + 32*nch, nch, lins + 4*15 + 64);
-
-#if DRMP3_HAVE_SIMD
-    if (drmp3_have_simd()) for (i = 14; i >= 0; i--)
-    {
-#define DRMP3_VLOAD(k) drmp3_f4 w0 = DRMP3_VSET(*w++); drmp3_f4 w1 = DRMP3_VSET(*w++); drmp3_f4 vz = DRMP3_VLD(&zlin[4*i - 64*k]); drmp3_f4 vy = DRMP3_VLD(&zlin[4*i - 64*(15 - k)]);
-#define DRMP3_V0(k) { DRMP3_VLOAD(k) b =               DRMP3_VADD(DRMP3_VMUL(vz, w1), DRMP3_VMUL(vy, w0)) ; a =               DRMP3_VSUB(DRMP3_VMUL(vz, w0), DRMP3_VMUL(vy, w1));  }
-#define DRMP3_V1(k) { DRMP3_VLOAD(k) b = DRMP3_VADD(b, DRMP3_VADD(DRMP3_VMUL(vz, w1), DRMP3_VMUL(vy, w0))); a = DRMP3_VADD(a, DRMP3_VSUB(DRMP3_VMUL(vz, w0), DRMP3_VMUL(vy, w1))); }
-#define DRMP3_V2(k) { DRMP3_VLOAD(k) b = DRMP3_VADD(b, DRMP3_VADD(DRMP3_VMUL(vz, w1), DRMP3_VMUL(vy, w0))); a = DRMP3_VADD(a, DRMP3_VSUB(DRMP3_VMUL(vy, w1), DRMP3_VMUL(vz, w0))); }
-        drmp3_f4 a, b;
-        zlin[4*i]     = xl[18*(31 - i)];
-        zlin[4*i + 1] = xr[18*(31 - i)];
-        zlin[4*i + 2] = xl[1 + 18*(31 - i)];
-        zlin[4*i + 3] = xr[1 + 18*(31 - i)];
-        zlin[4*i + 64] = xl[1 + 18*(1 + i)];
-        zlin[4*i + 64 + 1] = xr[1 + 18*(1 + i)];
-        zlin[4*i - 64 + 2] = xl[18*(1 + i)];
-        zlin[4*i - 64 + 3] = xr[18*(1 + i)];
-
-        DRMP3_V0(0) DRMP3_V2(1) DRMP3_V1(2) DRMP3_V2(3) DRMP3_V1(4) DRMP3_V2(5) DRMP3_V1(6) DRMP3_V2(7)
-
-        {
-#ifndef DR_MP3_FLOAT_OUTPUT
-#if DRMP3_HAVE_SSE
-            static const drmp3_f4 g_max = { 32767.0f, 32767.0f, 32767.0f, 32767.0f };
-            static const drmp3_f4 g_min = { -32768.0f, -32768.0f, -32768.0f, -32768.0f };
-            __m128i pcm8 = _mm_packs_epi32(_mm_cvtps_epi32(_mm_max_ps(_mm_min_ps(a, g_max), g_min)),
-                                           _mm_cvtps_epi32(_mm_max_ps(_mm_min_ps(b, g_max), g_min)));
-            dstr[(15 - i)*nch] = (drmp3_int16)_mm_extract_epi16(pcm8, 1);
-            dstr[(17 + i)*nch] = (drmp3_int16)_mm_extract_epi16(pcm8, 5);
-            dstl[(15 - i)*nch] = (drmp3_int16)_mm_extract_epi16(pcm8, 0);
-            dstl[(17 + i)*nch] = (drmp3_int16)_mm_extract_epi16(pcm8, 4);
-            dstr[(47 - i)*nch] = (drmp3_int16)_mm_extract_epi16(pcm8, 3);
-            dstr[(49 + i)*nch] = (drmp3_int16)_mm_extract_epi16(pcm8, 7);
-            dstl[(47 - i)*nch] = (drmp3_int16)_mm_extract_epi16(pcm8, 2);
-            dstl[(49 + i)*nch] = (drmp3_int16)_mm_extract_epi16(pcm8, 6);
-#else
-            int16x4_t pcma, pcmb;
-            a = DRMP3_VADD(a, DRMP3_VSET(0.5f));
-            b = DRMP3_VADD(b, DRMP3_VSET(0.5f));
-            pcma = vqmovn_s32(vqaddq_s32(vcvtq_s32_f32(a), vreinterpretq_s32_u32(vcltq_f32(a, DRMP3_VSET(0)))));
-            pcmb = vqmovn_s32(vqaddq_s32(vcvtq_s32_f32(b), vreinterpretq_s32_u32(vcltq_f32(b, DRMP3_VSET(0)))));
-            vst1_lane_s16(dstr + (15 - i)*nch, pcma, 1);
-            vst1_lane_s16(dstr + (17 + i)*nch, pcmb, 1);
-            vst1_lane_s16(dstl + (15 - i)*nch, pcma, 0);
-            vst1_lane_s16(dstl + (17 + i)*nch, pcmb, 0);
-            vst1_lane_s16(dstr + (47 - i)*nch, pcma, 3);
-            vst1_lane_s16(dstr + (49 + i)*nch, pcmb, 3);
-            vst1_lane_s16(dstl + (47 - i)*nch, pcma, 2);
-            vst1_lane_s16(dstl + (49 + i)*nch, pcmb, 2);
-#endif
-#else
-        #if DRMP3_HAVE_SSE
-            static const drmp3_f4 g_scale = { 1.0f/32768.0f, 1.0f/32768.0f, 1.0f/32768.0f, 1.0f/32768.0f };
-        #else
-            const drmp3_f4 g_scale = vdupq_n_f32(1.0f/32768.0f);
-        #endif
-            a = DRMP3_VMUL(a, g_scale);
-            b = DRMP3_VMUL(b, g_scale);
-#if DRMP3_HAVE_SSE
-            _mm_store_ss(dstr + (15 - i)*nch, _mm_shuffle_ps(a, a, _MM_SHUFFLE(1, 1, 1, 1)));
-            _mm_store_ss(dstr + (17 + i)*nch, _mm_shuffle_ps(b, b, _MM_SHUFFLE(1, 1, 1, 1)));
-            _mm_store_ss(dstl + (15 - i)*nch, _mm_shuffle_ps(a, a, _MM_SHUFFLE(0, 0, 0, 0)));
-            _mm_store_ss(dstl + (17 + i)*nch, _mm_shuffle_ps(b, b, _MM_SHUFFLE(0, 0, 0, 0)));
-            _mm_store_ss(dstr + (47 - i)*nch, _mm_shuffle_ps(a, a, _MM_SHUFFLE(3, 3, 3, 3)));
-            _mm_store_ss(dstr + (49 + i)*nch, _mm_shuffle_ps(b, b, _MM_SHUFFLE(3, 3, 3, 3)));
-            _mm_store_ss(dstl + (47 - i)*nch, _mm_shuffle_ps(a, a, _MM_SHUFFLE(2, 2, 2, 2)));
-            _mm_store_ss(dstl + (49 + i)*nch, _mm_shuffle_ps(b, b, _MM_SHUFFLE(2, 2, 2, 2)));
-#else
-            vst1q_lane_f32(dstr + (15 - i)*nch, a, 1);
-            vst1q_lane_f32(dstr + (17 + i)*nch, b, 1);
-            vst1q_lane_f32(dstl + (15 - i)*nch, a, 0);
-            vst1q_lane_f32(dstl + (17 + i)*nch, b, 0);
-            vst1q_lane_f32(dstr + (47 - i)*nch, a, 3);
-            vst1q_lane_f32(dstr + (49 + i)*nch, b, 3);
-            vst1q_lane_f32(dstl + (47 - i)*nch, a, 2);
-            vst1q_lane_f32(dstl + (49 + i)*nch, b, 2);
-#endif
-#endif /* DR_MP3_FLOAT_OUTPUT */
-        }
-    } else
-#endif
-#ifdef DR_MP3_ONLY_SIMD
-    {} /* for HAVE_SIMD=1, MINIMP3_ONLY_SIMD=1 case we do not need non-intrinsic "else" branch */
-#else
-    for (i = 14; i >= 0; i--)
-    {
-#define DRMP3_LOAD(k) float w0 = *w++; float w1 = *w++; float *vz = &zlin[4*i - k*64]; float *vy = &zlin[4*i - (15 - k)*64];
-#define DRMP3_S0(k) { int j; DRMP3_LOAD(k); for (j = 0; j < 4; j++) b[j]  = vz[j]*w1 + vy[j]*w0, a[j]  = vz[j]*w0 - vy[j]*w1; }
-#define DRMP3_S1(k) { int j; DRMP3_LOAD(k); for (j = 0; j < 4; j++) b[j] += vz[j]*w1 + vy[j]*w0, a[j] += vz[j]*w0 - vy[j]*w1; }
-#define DRMP3_S2(k) { int j; DRMP3_LOAD(k); for (j = 0; j < 4; j++) b[j] += vz[j]*w1 + vy[j]*w0, a[j] += vy[j]*w1 - vz[j]*w0; }
-        float a[4], b[4];
-
-        zlin[4*i]     = xl[18*(31 - i)];
-        zlin[4*i + 1] = xr[18*(31 - i)];
-        zlin[4*i + 2] = xl[1 + 18*(31 - i)];
-        zlin[4*i + 3] = xr[1 + 18*(31 - i)];
-        zlin[4*(i + 16)]   = xl[1 + 18*(1 + i)];
-        zlin[4*(i + 16) + 1] = xr[1 + 18*(1 + i)];
-        zlin[4*(i - 16) + 2] = xl[18*(1 + i)];
-        zlin[4*(i - 16) + 3] = xr[18*(1 + i)];
-
-        DRMP3_S0(0) DRMP3_S2(1) DRMP3_S1(2) DRMP3_S2(3) DRMP3_S1(4) DRMP3_S2(5) DRMP3_S1(6) DRMP3_S2(7)
-
-        dstr[(15 - i)*nch] = drmp3d_scale_pcm(a[1]);
-        dstr[(17 + i)*nch] = drmp3d_scale_pcm(b[1]);
-        dstl[(15 - i)*nch] = drmp3d_scale_pcm(a[0]);
-        dstl[(17 + i)*nch] = drmp3d_scale_pcm(b[0]);
-        dstr[(47 - i)*nch] = drmp3d_scale_pcm(a[3]);
-        dstr[(49 + i)*nch] = drmp3d_scale_pcm(b[3]);
-        dstl[(47 - i)*nch] = drmp3d_scale_pcm(a[2]);
-        dstl[(49 + i)*nch] = drmp3d_scale_pcm(b[2]);
-    }
-#endif
-}
-
-static void drmp3d_synth_granule(float *qmf_state, float *grbuf, int nbands, int nch, drmp3d_sample_t *pcm, float *lins)
-{
-    int i;
-    for (i = 0; i < nch; i++)
-    {
-        drmp3d_DCT_II(grbuf + 576*i, nbands);
-    }
-
-    DRMP3_COPY_MEMORY(lins, qmf_state, sizeof(float)*15*64);
-
-    for (i = 0; i < nbands; i += 2)
-    {
-        drmp3d_synth(grbuf + i, pcm + 32*nch*i, nch, lins + i*64);
-    }
-#ifndef DR_MP3_NONSTANDARD_BUT_LOGICAL
-    if (nch == 1)
-    {
-        for (i = 0; i < 15*64; i += 2)
-        {
-            qmf_state[i] = lins[nbands*64 + i];
-        }
-    } else
-#endif
-    {
-        DRMP3_COPY_MEMORY(qmf_state, lins + nbands*64, sizeof(float)*15*64);
-    }
-}
-
-static int drmp3d_match_frame(const drmp3_uint8 *hdr, int mp3_bytes, int frame_bytes)
-{
-    int i, nmatch;
-    for (i = 0, nmatch = 0; nmatch < DRMP3_MAX_FRAME_SYNC_MATCHES; nmatch++)
-    {
-        i += drmp3_hdr_frame_bytes(hdr + i, frame_bytes) + drmp3_hdr_padding(hdr + i);
-        if (i + DRMP3_HDR_SIZE > mp3_bytes)
-            return nmatch > 0;
-        if (!drmp3_hdr_compare(hdr, hdr + i))
-            return 0;
-    }
-    return 1;
-}
-
-static int drmp3d_find_frame(const drmp3_uint8 *mp3, int mp3_bytes, int *free_format_bytes, int *ptr_frame_bytes)
-{
-    int i, k;
-    for (i = 0; i < mp3_bytes - DRMP3_HDR_SIZE; i++, mp3++)
-    {
-        if (drmp3_hdr_valid(mp3))
-        {
-            int frame_bytes = drmp3_hdr_frame_bytes(mp3, *free_format_bytes);
-            int frame_and_padding = frame_bytes + drmp3_hdr_padding(mp3);
-
-            for (k = DRMP3_HDR_SIZE; !frame_bytes && k < DRMP3_MAX_FREE_FORMAT_FRAME_SIZE && i + 2*k < mp3_bytes - DRMP3_HDR_SIZE; k++)
-            {
-                if (drmp3_hdr_compare(mp3, mp3 + k))
-                {
-                    int fb = k - drmp3_hdr_padding(mp3);
-                    int nextfb = fb + drmp3_hdr_padding(mp3 + k);
-                    if (i + k + nextfb + DRMP3_HDR_SIZE > mp3_bytes || !drmp3_hdr_compare(mp3, mp3 + k + nextfb))
-                        continue;
-                    frame_and_padding = k;
-                    frame_bytes = fb;
-                    *free_format_bytes = fb;
-                }
-            }
-
-            if ((frame_bytes && i + frame_and_padding <= mp3_bytes &&
-                drmp3d_match_frame(mp3, mp3_bytes - i, frame_bytes)) ||
-                (!i && frame_and_padding == mp3_bytes))
-            {
-                *ptr_frame_bytes = frame_and_padding;
-                return i;
-            }
-            *free_format_bytes = 0;
-        }
-    }
-    *ptr_frame_bytes = 0;
-    return mp3_bytes;
-}
-
-DRMP3_API void drmp3dec_init(drmp3dec *dec)
-{
-    dec->header[0] = 0;
-}
-
-DRMP3_API int drmp3dec_decode_frame(drmp3dec *dec, const drmp3_uint8 *mp3, int mp3_bytes, void *pcm, drmp3dec_frame_info *info)
-{
-    int i = 0, igr, frame_size = 0, success = 1;
-    const drmp3_uint8 *hdr;
-    drmp3_bs bs_frame[1];
-    drmp3dec_scratch scratch;
-
-    if (mp3_bytes > 4 && dec->header[0] == 0xff && drmp3_hdr_compare(dec->header, mp3))
-    {
-        frame_size = drmp3_hdr_frame_bytes(mp3, dec->free_format_bytes) + drmp3_hdr_padding(mp3);
-        if (frame_size != mp3_bytes && (frame_size + DRMP3_HDR_SIZE > mp3_bytes || !drmp3_hdr_compare(mp3, mp3 + frame_size)))
-        {
-            frame_size = 0;
-        }
-    }
-    if (!frame_size)
-    {
-        DRMP3_ZERO_MEMORY(dec, sizeof(drmp3dec));
-        i = drmp3d_find_frame(mp3, mp3_bytes, &dec->free_format_bytes, &frame_size);
-        if (!frame_size || i + frame_size > mp3_bytes)
-        {
-            info->frame_bytes = i;
-            return 0;
-        }
-    }
-
-    hdr = mp3 + i;
-    DRMP3_COPY_MEMORY(dec->header, hdr, DRMP3_HDR_SIZE);
-    info->frame_bytes = i + frame_size;
-    info->channels = DRMP3_HDR_IS_MONO(hdr) ? 1 : 2;
-    info->hz = drmp3_hdr_sample_rate_hz(hdr);
-    info->layer = 4 - DRMP3_HDR_GET_LAYER(hdr);
-    info->bitrate_kbps = drmp3_hdr_bitrate_kbps(hdr);
-
-    drmp3_bs_init(bs_frame, hdr + DRMP3_HDR_SIZE, frame_size - DRMP3_HDR_SIZE);
-    if (DRMP3_HDR_IS_CRC(hdr))
-    {
-        drmp3_bs_get_bits(bs_frame, 16);
-    }
-
-    if (info->layer == 3)
-    {
-        int main_data_begin = drmp3_L3_read_side_info(bs_frame, scratch.gr_info, hdr);
-        if (main_data_begin < 0 || bs_frame->pos > bs_frame->limit)
-        {
-            drmp3dec_init(dec);
-            return 0;
-        }
-        success = drmp3_L3_restore_reservoir(dec, bs_frame, &scratch, main_data_begin);
-        if (success && pcm != NULL)
-        {
-            for (igr = 0; igr < (DRMP3_HDR_TEST_MPEG1(hdr) ? 2 : 1); igr++, pcm = DRMP3_OFFSET_PTR(pcm, sizeof(drmp3d_sample_t)*576*info->channels))
-            {
-                DRMP3_ZERO_MEMORY(scratch.grbuf[0], 576*2*sizeof(float));
-                drmp3_L3_decode(dec, &scratch, scratch.gr_info + igr*info->channels, info->channels);
-                drmp3d_synth_granule(dec->qmf_state, scratch.grbuf[0], 18, info->channels, (drmp3d_sample_t*)pcm, scratch.syn[0]);
-            }
-        }
-        drmp3_L3_save_reservoir(dec, &scratch);
-    } else
-    {
-#ifdef DR_MP3_ONLY_MP3
-        return 0;
-#else
-        drmp3_L12_scale_info sci[1];
-
-        if (pcm == NULL) {
-            return drmp3_hdr_frame_samples(hdr);
-        }
-
-        drmp3_L12_read_scale_info(hdr, bs_frame, sci);
-
-        DRMP3_ZERO_MEMORY(scratch.grbuf[0], 576*2*sizeof(float));
-        for (i = 0, igr = 0; igr < 3; igr++)
-        {
-            if (12 == (i += drmp3_L12_dequantize_granule(scratch.grbuf[0] + i, bs_frame, sci, info->layer | 1)))
-            {
-                i = 0;
-                drmp3_L12_apply_scf_384(sci, sci->scf + igr, scratch.grbuf[0]);
-                drmp3d_synth_granule(dec->qmf_state, scratch.grbuf[0], 12, info->channels, (drmp3d_sample_t*)pcm, scratch.syn[0]);
-                DRMP3_ZERO_MEMORY(scratch.grbuf[0], 576*2*sizeof(float));
-                pcm = DRMP3_OFFSET_PTR(pcm, sizeof(drmp3d_sample_t)*384*info->channels);
-            }
-            if (bs_frame->pos > bs_frame->limit)
-            {
-                drmp3dec_init(dec);
-                return 0;
-            }
-        }
-#endif
-    }
-
-    return success*drmp3_hdr_frame_samples(dec->header);
-}
-
-DRMP3_API void drmp3dec_f32_to_s16(const float *in, drmp3_int16 *out, size_t num_samples)
-{
-    size_t i = 0;
-#if DRMP3_HAVE_SIMD
-    size_t aligned_count = num_samples & ~7;
-    for(; i < aligned_count; i+=8)
-    {
-        drmp3_f4 scale = DRMP3_VSET(32768.0f);
-        drmp3_f4 a = DRMP3_VMUL(DRMP3_VLD(&in[i  ]), scale);
-        drmp3_f4 b = DRMP3_VMUL(DRMP3_VLD(&in[i+4]), scale);
-#if DRMP3_HAVE_SSE
-        drmp3_f4 s16max = DRMP3_VSET( 32767.0f);
-        drmp3_f4 s16min = DRMP3_VSET(-32768.0f);
-        __m128i pcm8 = _mm_packs_epi32(_mm_cvtps_epi32(_mm_max_ps(_mm_min_ps(a, s16max), s16min)),
-                                        _mm_cvtps_epi32(_mm_max_ps(_mm_min_ps(b, s16max), s16min)));
-        out[i  ] = (drmp3_int16)_mm_extract_epi16(pcm8, 0);
-        out[i+1] = (drmp3_int16)_mm_extract_epi16(pcm8, 1);
-        out[i+2] = (drmp3_int16)_mm_extract_epi16(pcm8, 2);
-        out[i+3] = (drmp3_int16)_mm_extract_epi16(pcm8, 3);
-        out[i+4] = (drmp3_int16)_mm_extract_epi16(pcm8, 4);
-        out[i+5] = (drmp3_int16)_mm_extract_epi16(pcm8, 5);
-        out[i+6] = (drmp3_int16)_mm_extract_epi16(pcm8, 6);
-        out[i+7] = (drmp3_int16)_mm_extract_epi16(pcm8, 7);
-#else
-        int16x4_t pcma, pcmb;
-        a = DRMP3_VADD(a, DRMP3_VSET(0.5f));
-        b = DRMP3_VADD(b, DRMP3_VSET(0.5f));
-        pcma = vqmovn_s32(vqaddq_s32(vcvtq_s32_f32(a), vreinterpretq_s32_u32(vcltq_f32(a, DRMP3_VSET(0)))));
-        pcmb = vqmovn_s32(vqaddq_s32(vcvtq_s32_f32(b), vreinterpretq_s32_u32(vcltq_f32(b, DRMP3_VSET(0)))));
-        vst1_lane_s16(out+i  , pcma, 0);
-        vst1_lane_s16(out+i+1, pcma, 1);
-        vst1_lane_s16(out+i+2, pcma, 2);
-        vst1_lane_s16(out+i+3, pcma, 3);
-        vst1_lane_s16(out+i+4, pcmb, 0);
-        vst1_lane_s16(out+i+5, pcmb, 1);
-        vst1_lane_s16(out+i+6, pcmb, 2);
-        vst1_lane_s16(out+i+7, pcmb, 3);
-#endif
-    }
-#endif
-    for(; i < num_samples; i++)
-    {
-        float sample = in[i] * 32768.0f;
-        if (sample >=  32766.5)
-            out[i] = (drmp3_int16) 32767;
-        else if (sample <= -32767.5)
-            out[i] = (drmp3_int16)-32768;
-        else
-        {
-            short s = (drmp3_int16)(sample + .5f);
-            s -= (s < 0);   /* away from zero, to be compliant */
-            out[i] = s;
-        }
-    }
-}
-
-
-
-/************************************************************************************************************************************************************
-
- Main Public API
-
- ************************************************************************************************************************************************************/
-/* SIZE_MAX */
-#if defined(SIZE_MAX)
-    #define DRMP3_SIZE_MAX  SIZE_MAX
-#else
-    #if defined(_WIN64) || defined(_LP64) || defined(__LP64__)
-        #define DRMP3_SIZE_MAX  ((drmp3_uint64)0xFFFFFFFFFFFFFFFF)
-    #else
-        #define DRMP3_SIZE_MAX  0xFFFFFFFF
-    #endif
-#endif
-/* End SIZE_MAX */
-
-/* Options. */
-#ifndef DRMP3_SEEK_LEADING_MP3_FRAMES
-#define DRMP3_SEEK_LEADING_MP3_FRAMES   2
-#endif
-
-#define DRMP3_MIN_DATA_CHUNK_SIZE   16384
-
-/* The size in bytes of each chunk of data to read from the MP3 stream. minimp3 recommends at least 16K, but in an attempt to reduce data movement I'm making this slightly larger. */
-#ifndef DRMP3_DATA_CHUNK_SIZE
-#define DRMP3_DATA_CHUNK_SIZE  (DRMP3_MIN_DATA_CHUNK_SIZE*4)
-#endif
-
-
-#define DRMP3_COUNTOF(x)        (sizeof(x) / sizeof(x[0]))
-#define DRMP3_CLAMP(x, lo, hi)  (DRMP3_MAX(lo, DRMP3_MIN(x, hi)))
-
-#ifndef DRMP3_PI_D
-#define DRMP3_PI_D    3.14159265358979323846264
-#endif
-
-#define DRMP3_DEFAULT_RESAMPLER_LPF_ORDER   2
-
-static DRMP3_INLINE float drmp3_mix_f32(float x, float y, float a)
-{
-    return x*(1-a) + y*a;
-}
-static DRMP3_INLINE float drmp3_mix_f32_fast(float x, float y, float a)
-{
-    float r0 = (y - x);
-    float r1 = r0*a;
-    return x + r1;
-    /*return x + (y - x)*a;*/
-}
-
-
-/*
-Greatest common factor using Euclid's algorithm iteratively.
-*/
-static DRMP3_INLINE drmp3_uint32 drmp3_gcf_u32(drmp3_uint32 a, drmp3_uint32 b)
-{
-    for (;;) {
-        if (b == 0) {
-            break;
-        } else {
-            drmp3_uint32 t = a;
-            a = b;
-            b = t % a;
-        }
-    }
-
-    return a;
-}
-
-
-static void* drmp3__malloc_default(size_t sz, void* pUserData)
-{
-    (void)pUserData;
-    return DRMP3_MALLOC(sz);
-}
-
-static void* drmp3__realloc_default(void* p, size_t sz, void* pUserData)
-{
-    (void)pUserData;
-    return DRMP3_REALLOC(p, sz);
-}
-
-static void drmp3__free_default(void* p, void* pUserData)
-{
-    (void)pUserData;
-    DRMP3_FREE(p);
-}
-
-
-static void* drmp3__malloc_from_callbacks(size_t sz, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    if (pAllocationCallbacks == NULL) {
-        return NULL;
-    }
-
-    if (pAllocationCallbacks->onMalloc != NULL) {
-        return pAllocationCallbacks->onMalloc(sz, pAllocationCallbacks->pUserData);
-    }
-
-    /* Try using realloc(). */
-    if (pAllocationCallbacks->onRealloc != NULL) {
-        return pAllocationCallbacks->onRealloc(NULL, sz, pAllocationCallbacks->pUserData);
-    }
-
-    return NULL;
-}
-
-static void* drmp3__realloc_from_callbacks(void* p, size_t szNew, size_t szOld, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    if (pAllocationCallbacks == NULL) {
-        return NULL;
-    }
-
-    if (pAllocationCallbacks->onRealloc != NULL) {
-        return pAllocationCallbacks->onRealloc(p, szNew, pAllocationCallbacks->pUserData);
-    }
-
-    /* Try emulating realloc() in terms of malloc()/free(). */
-    if (pAllocationCallbacks->onMalloc != NULL && pAllocationCallbacks->onFree != NULL) {
-        void* p2;
-
-        p2 = pAllocationCallbacks->onMalloc(szNew, pAllocationCallbacks->pUserData);
-        if (p2 == NULL) {
-            return NULL;
-        }
-
-        if (p != NULL) {
-            DRMP3_COPY_MEMORY(p2, p, szOld);
-            pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
-        }
-
-        return p2;
-    }
-
-    return NULL;
-}
-
-static void drmp3__free_from_callbacks(void* p, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    if (p == NULL || pAllocationCallbacks == NULL) {
-        return;
-    }
-
-    if (pAllocationCallbacks->onFree != NULL) {
-        pAllocationCallbacks->onFree(p, pAllocationCallbacks->pUserData);
-    }
-}
-
-
-static drmp3_allocation_callbacks drmp3_copy_allocation_callbacks_or_defaults(const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    if (pAllocationCallbacks != NULL) {
-        /* Copy. */
-        return *pAllocationCallbacks;
-    } else {
-        /* Defaults. */
-        drmp3_allocation_callbacks allocationCallbacks;
-        allocationCallbacks.pUserData = NULL;
-        allocationCallbacks.onMalloc  = drmp3__malloc_default;
-        allocationCallbacks.onRealloc = drmp3__realloc_default;
-        allocationCallbacks.onFree    = drmp3__free_default;
-        return allocationCallbacks;
-    }
-}
-
-
-
-static size_t drmp3__on_read(drmp3* pMP3, void* pBufferOut, size_t bytesToRead)
-{
-    size_t bytesRead = pMP3->onRead(pMP3->pUserData, pBufferOut, bytesToRead);
-    pMP3->streamCursor += bytesRead;
-    return bytesRead;
-}
-
-static drmp3_bool32 drmp3__on_seek(drmp3* pMP3, int offset, drmp3_seek_origin origin)
-{
-    DRMP3_ASSERT(offset >= 0);
-
-    if (!pMP3->onSeek(pMP3->pUserData, offset, origin)) {
-        return DRMP3_FALSE;
-    }
-
-    if (origin == drmp3_seek_origin_start) {
-        pMP3->streamCursor = (drmp3_uint64)offset;
-    } else {
-        pMP3->streamCursor += offset;
-    }
-
-    return DRMP3_TRUE;
-}
-
-static drmp3_bool32 drmp3__on_seek_64(drmp3* pMP3, drmp3_uint64 offset, drmp3_seek_origin origin)
-{
-    if (offset <= 0x7FFFFFFF) {
-        return drmp3__on_seek(pMP3, (int)offset, origin);
-    }
-
-
-    /* Getting here "offset" is too large for a 32-bit integer. We just keep seeking forward until we hit the offset. */
-    if (!drmp3__on_seek(pMP3, 0x7FFFFFFF, drmp3_seek_origin_start)) {
-        return DRMP3_FALSE;
-    }
-
-    offset -= 0x7FFFFFFF;
-    while (offset > 0) {
-        if (offset <= 0x7FFFFFFF) {
-            if (!drmp3__on_seek(pMP3, (int)offset, drmp3_seek_origin_current)) {
-                return DRMP3_FALSE;
-            }
-            offset = 0;
-        } else {
-            if (!drmp3__on_seek(pMP3, 0x7FFFFFFF, drmp3_seek_origin_current)) {
-                return DRMP3_FALSE;
-            }
-            offset -= 0x7FFFFFFF;
-        }
-    }
-
-    return DRMP3_TRUE;
-}
-
-
-static drmp3_uint32 drmp3_decode_next_frame_ex__callbacks(drmp3* pMP3, drmp3d_sample_t* pPCMFrames)
-{
-    drmp3_uint32 pcmFramesRead = 0;
-
-    DRMP3_ASSERT(pMP3 != NULL);
-    DRMP3_ASSERT(pMP3->onRead != NULL);
-
-    if (pMP3->atEnd) {
-        return 0;
-    }
-
-    for (;;) {
-        drmp3dec_frame_info info;
-
-        /* minimp3 recommends doing data submission in chunks of at least 16K. If we don't have at least 16K bytes available, get more. */
-        if (pMP3->dataSize < DRMP3_MIN_DATA_CHUNK_SIZE) {
-            size_t bytesRead;
-
-            /* First we need to move the data down. */
-            if (pMP3->pData != NULL) {
-                DRMP3_MOVE_MEMORY(pMP3->pData, pMP3->pData + pMP3->dataConsumed, pMP3->dataSize);
-            }
-
-            pMP3->dataConsumed = 0;
-
-            if (pMP3->dataCapacity < DRMP3_DATA_CHUNK_SIZE) {
-                drmp3_uint8* pNewData;
-                size_t newDataCap;
-
-                newDataCap = DRMP3_DATA_CHUNK_SIZE;
-
-                pNewData = (drmp3_uint8*)drmp3__realloc_from_callbacks(pMP3->pData, newDataCap, pMP3->dataCapacity, &pMP3->allocationCallbacks);
-                if (pNewData == NULL) {
-                    return 0; /* Out of memory. */
-                }
-
-                pMP3->pData = pNewData;
-                pMP3->dataCapacity = newDataCap;
-            }
-
-            bytesRead = drmp3__on_read(pMP3, pMP3->pData + pMP3->dataSize, (pMP3->dataCapacity - pMP3->dataSize));
-            if (bytesRead == 0) {
-                if (pMP3->dataSize == 0) {
-                    pMP3->atEnd = DRMP3_TRUE;
-                    return 0; /* No data. */
-                }
-            }
-
-            pMP3->dataSize += bytesRead;
-        }
-
-        if (pMP3->dataSize > INT_MAX) {
-            pMP3->atEnd = DRMP3_TRUE;
-            return 0; /* File too big. */
-        }
-
-        DRMP3_ASSERT(pMP3->pData != NULL);
-        DRMP3_ASSERT(pMP3->dataCapacity > 0);
-
-        /* Do a runtime check here to try silencing a false-positive from clang-analyzer. */
-        if (pMP3->pData == NULL) {
-            return 0;
-        }
-
-        pcmFramesRead = drmp3dec_decode_frame(&pMP3->decoder, pMP3->pData + pMP3->dataConsumed, (int)pMP3->dataSize, pPCMFrames, &info);    /* <-- Safe size_t -> int conversion thanks to the check above. */
-
-        /* Consume the data. */
-        if (info.frame_bytes > 0) {
-            pMP3->dataConsumed += (size_t)info.frame_bytes;
-            pMP3->dataSize     -= (size_t)info.frame_bytes;
-        }
-
-        /* pcmFramesRead will be equal to 0 if decoding failed. If it is zero and info.frame_bytes > 0 then we have successfully decoded the frame. */
-        if (pcmFramesRead > 0) {
-            pcmFramesRead = drmp3_hdr_frame_samples(pMP3->decoder.header);
-            pMP3->pcmFramesConsumedInMP3Frame = 0;
-            pMP3->pcmFramesRemainingInMP3Frame = pcmFramesRead;
-            pMP3->mp3FrameChannels = info.channels;
-            pMP3->mp3FrameSampleRate = info.hz;
-            break;
-        } else if (info.frame_bytes == 0) {
-            /* Need more data. minimp3 recommends doing data submission in 16K chunks. */
-            size_t bytesRead;
-
-            /* First we need to move the data down. */
-            DRMP3_MOVE_MEMORY(pMP3->pData, pMP3->pData + pMP3->dataConsumed, pMP3->dataSize);
-            pMP3->dataConsumed = 0;
-
-            if (pMP3->dataCapacity == pMP3->dataSize) {
-                /* No room. Expand. */
-                drmp3_uint8* pNewData;
-                size_t newDataCap;
-
-                newDataCap = pMP3->dataCapacity + DRMP3_DATA_CHUNK_SIZE;
-
-                pNewData = (drmp3_uint8*)drmp3__realloc_from_callbacks(pMP3->pData, newDataCap, pMP3->dataCapacity, &pMP3->allocationCallbacks);
-                if (pNewData == NULL) {
-                    return 0; /* Out of memory. */
-                }
-
-                pMP3->pData = pNewData;
-                pMP3->dataCapacity = newDataCap;
-            }
-
-            /* Fill in a chunk. */
-            bytesRead = drmp3__on_read(pMP3, pMP3->pData + pMP3->dataSize, (pMP3->dataCapacity - pMP3->dataSize));
-            if (bytesRead == 0) {
-                pMP3->atEnd = DRMP3_TRUE;
-                return 0; /* Error reading more data. */
-            }
-
-            pMP3->dataSize += bytesRead;
-        }
-    };
-
-    return pcmFramesRead;
-}
-
-static drmp3_uint32 drmp3_decode_next_frame_ex__memory(drmp3* pMP3, drmp3d_sample_t* pPCMFrames)
-{
-    drmp3_uint32 pcmFramesRead = 0;
-    drmp3dec_frame_info info;
-
-    DRMP3_ASSERT(pMP3 != NULL);
-    DRMP3_ASSERT(pMP3->memory.pData != NULL);
-
-    if (pMP3->atEnd) {
-        return 0;
-    }
-
-    for (;;) {
-        pcmFramesRead = drmp3dec_decode_frame(&pMP3->decoder, pMP3->memory.pData + pMP3->memory.currentReadPos, (int)(pMP3->memory.dataSize - pMP3->memory.currentReadPos), pPCMFrames, &info);
-        if (pcmFramesRead > 0) {
-            pcmFramesRead = drmp3_hdr_frame_samples(pMP3->decoder.header);
-            pMP3->pcmFramesConsumedInMP3Frame  = 0;
-            pMP3->pcmFramesRemainingInMP3Frame = pcmFramesRead;
-            pMP3->mp3FrameChannels             = info.channels;
-            pMP3->mp3FrameSampleRate           = info.hz;
-            break;
-        } else if (info.frame_bytes > 0) {
-            /* No frames were read, but it looks like we skipped past one. Read the next MP3 frame. */
-            pMP3->memory.currentReadPos += (size_t)info.frame_bytes;
-        } else {
-            /* Nothing at all was read. Abort. */
-            break;
-        }
-    }
-
-    /* Consume the data. */
-    pMP3->memory.currentReadPos += (size_t)info.frame_bytes;
-
-    return pcmFramesRead;
-}
-
-static drmp3_uint32 drmp3_decode_next_frame_ex(drmp3* pMP3, drmp3d_sample_t* pPCMFrames)
-{
-    if (pMP3->memory.pData != NULL && pMP3->memory.dataSize > 0) {
-        return drmp3_decode_next_frame_ex__memory(pMP3, pPCMFrames);
-    } else {
-        return drmp3_decode_next_frame_ex__callbacks(pMP3, pPCMFrames);
-    }
-}
-
-static drmp3_uint32 drmp3_decode_next_frame(drmp3* pMP3)
-{
-    DRMP3_ASSERT(pMP3 != NULL);
-    return drmp3_decode_next_frame_ex(pMP3, (drmp3d_sample_t*)pMP3->pcmFrames);
-}
-
-#if 0
-static drmp3_uint32 drmp3_seek_next_frame(drmp3* pMP3)
-{
-    drmp3_uint32 pcmFrameCount;
-
-    DRMP3_ASSERT(pMP3 != NULL);
-
-    pcmFrameCount = drmp3_decode_next_frame_ex(pMP3, NULL);
-    if (pcmFrameCount == 0) {
-        return 0;
-    }
-
-    /* We have essentially just skipped past the frame, so just set the remaining samples to 0. */
-    pMP3->currentPCMFrame             += pcmFrameCount;
-    pMP3->pcmFramesConsumedInMP3Frame  = pcmFrameCount;
-    pMP3->pcmFramesRemainingInMP3Frame = 0;
-
-    return pcmFrameCount;
-}
-#endif
-
-static drmp3_bool32 drmp3_init_internal(drmp3* pMP3, drmp3_read_proc onRead, drmp3_seek_proc onSeek, void* pUserData, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    DRMP3_ASSERT(pMP3 != NULL);
-    DRMP3_ASSERT(onRead != NULL);
-
-    /* This function assumes the output object has already been reset to 0. Do not do that here, otherwise things will break. */
-    drmp3dec_init(&pMP3->decoder);
-
-    pMP3->onRead = onRead;
-    pMP3->onSeek = onSeek;
-    pMP3->pUserData = pUserData;
-    pMP3->allocationCallbacks = drmp3_copy_allocation_callbacks_or_defaults(pAllocationCallbacks);
-
-    if (pMP3->allocationCallbacks.onFree == NULL || (pMP3->allocationCallbacks.onMalloc == NULL && pMP3->allocationCallbacks.onRealloc == NULL)) {
-        return DRMP3_FALSE;    /* Invalid allocation callbacks. */
-    }
-
-    /* Decode the first frame to confirm that it is indeed a valid MP3 stream. */
-    if (drmp3_decode_next_frame(pMP3) == 0) {
-        drmp3__free_from_callbacks(pMP3->pData, &pMP3->allocationCallbacks);    /* The call above may have allocated memory. Need to make sure it's freed before aborting. */
-        return DRMP3_FALSE; /* Not a valid MP3 stream. */
-    }
-
-    pMP3->channels   = pMP3->mp3FrameChannels;
-    pMP3->sampleRate = pMP3->mp3FrameSampleRate;
-
-    return DRMP3_TRUE;
-}
-
-DRMP3_API drmp3_bool32 drmp3_init(drmp3* pMP3, drmp3_read_proc onRead, drmp3_seek_proc onSeek, void* pUserData, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    if (pMP3 == NULL || onRead == NULL) {
-        return DRMP3_FALSE;
-    }
-
-    DRMP3_ZERO_OBJECT(pMP3);
-    return drmp3_init_internal(pMP3, onRead, onSeek, pUserData, pAllocationCallbacks);
-}
-
-
-static size_t drmp3__on_read_memory(void* pUserData, void* pBufferOut, size_t bytesToRead)
-{
-    drmp3* pMP3 = (drmp3*)pUserData;
-    size_t bytesRemaining;
-
-    DRMP3_ASSERT(pMP3 != NULL);
-    DRMP3_ASSERT(pMP3->memory.dataSize >= pMP3->memory.currentReadPos);
-
-    bytesRemaining = pMP3->memory.dataSize - pMP3->memory.currentReadPos;
-    if (bytesToRead > bytesRemaining) {
-        bytesToRead = bytesRemaining;
-    }
-
-    if (bytesToRead > 0) {
-        DRMP3_COPY_MEMORY(pBufferOut, pMP3->memory.pData + pMP3->memory.currentReadPos, bytesToRead);
-        pMP3->memory.currentReadPos += bytesToRead;
-    }
-
-    return bytesToRead;
-}
-
-static drmp3_bool32 drmp3__on_seek_memory(void* pUserData, int byteOffset, drmp3_seek_origin origin)
-{
-    drmp3* pMP3 = (drmp3*)pUserData;
-
-    DRMP3_ASSERT(pMP3 != NULL);
-
-    if (origin == drmp3_seek_origin_current) {
-        if (byteOffset > 0) {
-            if (pMP3->memory.currentReadPos + byteOffset > pMP3->memory.dataSize) {
-                byteOffset = (int)(pMP3->memory.dataSize - pMP3->memory.currentReadPos);  /* Trying to seek too far forward. */
-            }
-        } else {
-            if (pMP3->memory.currentReadPos < (size_t)-byteOffset) {
-                byteOffset = -(int)pMP3->memory.currentReadPos;  /* Trying to seek too far backwards. */
-            }
-        }
-
-        /* This will never underflow thanks to the clamps above. */
-        pMP3->memory.currentReadPos += byteOffset;
-    } else {
-        if ((drmp3_uint32)byteOffset <= pMP3->memory.dataSize) {
-            pMP3->memory.currentReadPos = byteOffset;
-        } else {
-            pMP3->memory.currentReadPos = pMP3->memory.dataSize;  /* Trying to seek too far forward. */
-        }
-    }
-
-    return DRMP3_TRUE;
-}
-
-DRMP3_API drmp3_bool32 drmp3_init_memory(drmp3* pMP3, const void* pData, size_t dataSize, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    if (pMP3 == NULL) {
-        return DRMP3_FALSE;
-    }
-
-    DRMP3_ZERO_OBJECT(pMP3);
-
-    if (pData == NULL || dataSize == 0) {
-        return DRMP3_FALSE;
-    }
-
-    pMP3->memory.pData = (const drmp3_uint8*)pData;
-    pMP3->memory.dataSize = dataSize;
-    pMP3->memory.currentReadPos = 0;
-
-    return drmp3_init_internal(pMP3, drmp3__on_read_memory, drmp3__on_seek_memory, pMP3, pAllocationCallbacks);
-}
-
-
-#ifndef DR_MP3_NO_STDIO
-#include <stdio.h>
-#include <wchar.h>      /* For wcslen(), wcsrtombs() */
-
-/* Errno */
-/* drmp3_result_from_errno() is only used inside DR_MP3_NO_STDIO for now. Move this out if it's ever used elsewhere. */
-#include <errno.h>
-static drmp3_result drmp3_result_from_errno(int e)
-{
-    switch (e)
-    {
-        case 0: return DRMP3_SUCCESS;
-    #ifdef EPERM
-        case EPERM: return DRMP3_INVALID_OPERATION;
-    #endif
-    #ifdef ENOENT
-        case ENOENT: return DRMP3_DOES_NOT_EXIST;
-    #endif
-    #ifdef ESRCH
-        case ESRCH: return DRMP3_DOES_NOT_EXIST;
-    #endif
-    #ifdef EINTR
-        case EINTR: return DRMP3_INTERRUPT;
-    #endif
-    #ifdef EIO
-        case EIO: return DRMP3_IO_ERROR;
-    #endif
-    #ifdef ENXIO
-        case ENXIO: return DRMP3_DOES_NOT_EXIST;
-    #endif
-    #ifdef E2BIG
-        case E2BIG: return DRMP3_INVALID_ARGS;
-    #endif
-    #ifdef ENOEXEC
-        case ENOEXEC: return DRMP3_INVALID_FILE;
-    #endif
-    #ifdef EBADF
-        case EBADF: return DRMP3_INVALID_FILE;
-    #endif
-    #ifdef ECHILD
-        case ECHILD: return DRMP3_ERROR;
-    #endif
-    #ifdef EAGAIN
-        case EAGAIN: return DRMP3_UNAVAILABLE;
-    #endif
-    #ifdef ENOMEM
-        case ENOMEM: return DRMP3_OUT_OF_MEMORY;
-    #endif
-    #ifdef EACCES
-        case EACCES: return DRMP3_ACCESS_DENIED;
-    #endif
-    #ifdef EFAULT
-        case EFAULT: return DRMP3_BAD_ADDRESS;
-    #endif
-    #ifdef ENOTBLK
-        case ENOTBLK: return DRMP3_ERROR;
-    #endif
-    #ifdef EBUSY
-        case EBUSY: return DRMP3_BUSY;
-    #endif
-    #ifdef EEXIST
-        case EEXIST: return DRMP3_ALREADY_EXISTS;
-    #endif
-    #ifdef EXDEV
-        case EXDEV: return DRMP3_ERROR;
-    #endif
-    #ifdef ENODEV
-        case ENODEV: return DRMP3_DOES_NOT_EXIST;
-    #endif
-    #ifdef ENOTDIR
-        case ENOTDIR: return DRMP3_NOT_DIRECTORY;
-    #endif
-    #ifdef EISDIR
-        case EISDIR: return DRMP3_IS_DIRECTORY;
-    #endif
-    #ifdef EINVAL
-        case EINVAL: return DRMP3_INVALID_ARGS;
-    #endif
-    #ifdef ENFILE
-        case ENFILE: return DRMP3_TOO_MANY_OPEN_FILES;
-    #endif
-    #ifdef EMFILE
-        case EMFILE: return DRMP3_TOO_MANY_OPEN_FILES;
-    #endif
-    #ifdef ENOTTY
-        case ENOTTY: return DRMP3_INVALID_OPERATION;
-    #endif
-    #ifdef ETXTBSY
-        case ETXTBSY: return DRMP3_BUSY;
-    #endif
-    #ifdef EFBIG
-        case EFBIG: return DRMP3_TOO_BIG;
-    #endif
-    #ifdef ENOSPC
-        case ENOSPC: return DRMP3_NO_SPACE;
-    #endif
-    #ifdef ESPIPE
-        case ESPIPE: return DRMP3_BAD_SEEK;
-    #endif
-    #ifdef EROFS
-        case EROFS: return DRMP3_ACCESS_DENIED;
-    #endif
-    #ifdef EMLINK
-        case EMLINK: return DRMP3_TOO_MANY_LINKS;
-    #endif
-    #ifdef EPIPE
-        case EPIPE: return DRMP3_BAD_PIPE;
-    #endif
-    #ifdef EDOM
-        case EDOM: return DRMP3_OUT_OF_RANGE;
-    #endif
-    #ifdef ERANGE
-        case ERANGE: return DRMP3_OUT_OF_RANGE;
-    #endif
-    #ifdef EDEADLK
-        case EDEADLK: return DRMP3_DEADLOCK;
-    #endif
-    #ifdef ENAMETOOLONG
-        case ENAMETOOLONG: return DRMP3_PATH_TOO_LONG;
-    #endif
-    #ifdef ENOLCK
-        case ENOLCK: return DRMP3_ERROR;
-    #endif
-    #ifdef ENOSYS
-        case ENOSYS: return DRMP3_NOT_IMPLEMENTED;
-    #endif
-    #ifdef ENOTEMPTY
-        case ENOTEMPTY: return DRMP3_DIRECTORY_NOT_EMPTY;
-    #endif
-    #ifdef ELOOP
-        case ELOOP: return DRMP3_TOO_MANY_LINKS;
-    #endif
-    #ifdef ENOMSG
-        case ENOMSG: return DRMP3_NO_MESSAGE;
-    #endif
-    #ifdef EIDRM
-        case EIDRM: return DRMP3_ERROR;
-    #endif
-    #ifdef ECHRNG
-        case ECHRNG: return DRMP3_ERROR;
-    #endif
-    #ifdef EL2NSYNC
-        case EL2NSYNC: return DRMP3_ERROR;
-    #endif
-    #ifdef EL3HLT
-        case EL3HLT: return DRMP3_ERROR;
-    #endif
-    #ifdef EL3RST
-        case EL3RST: return DRMP3_ERROR;
-    #endif
-    #ifdef ELNRNG
-        case ELNRNG: return DRMP3_OUT_OF_RANGE;
-    #endif
-    #ifdef EUNATCH
-        case EUNATCH: return DRMP3_ERROR;
-    #endif
-    #ifdef ENOCSI
-        case ENOCSI: return DRMP3_ERROR;
-    #endif
-    #ifdef EL2HLT
-        case EL2HLT: return DRMP3_ERROR;
-    #endif
-    #ifdef EBADE
-        case EBADE: return DRMP3_ERROR;
-    #endif
-    #ifdef EBADR
-        case EBADR: return DRMP3_ERROR;
-    #endif
-    #ifdef EXFULL
-        case EXFULL: return DRMP3_ERROR;
-    #endif
-    #ifdef ENOANO
-        case ENOANO: return DRMP3_ERROR;
-    #endif
-    #ifdef EBADRQC
-        case EBADRQC: return DRMP3_ERROR;
-    #endif
-    #ifdef EBADSLT
-        case EBADSLT: return DRMP3_ERROR;
-    #endif
-    #ifdef EBFONT
-        case EBFONT: return DRMP3_INVALID_FILE;
-    #endif
-    #ifdef ENOSTR
-        case ENOSTR: return DRMP3_ERROR;
-    #endif
-    #ifdef ENODATA
-        case ENODATA: return DRMP3_NO_DATA_AVAILABLE;
-    #endif
-    #ifdef ETIME
-        case ETIME: return DRMP3_TIMEOUT;
-    #endif
-    #ifdef ENOSR
-        case ENOSR: return DRMP3_NO_DATA_AVAILABLE;
-    #endif
-    #ifdef ENONET
-        case ENONET: return DRMP3_NO_NETWORK;
-    #endif
-    #ifdef ENOPKG
-        case ENOPKG: return DRMP3_ERROR;
-    #endif
-    #ifdef EREMOTE
-        case EREMOTE: return DRMP3_ERROR;
-    #endif
-    #ifdef ENOLINK
-        case ENOLINK: return DRMP3_ERROR;
-    #endif
-    #ifdef EADV
-        case EADV: return DRMP3_ERROR;
-    #endif
-    #ifdef ESRMNT
-        case ESRMNT: return DRMP3_ERROR;
-    #endif
-    #ifdef ECOMM
-        case ECOMM: return DRMP3_ERROR;
-    #endif
-    #ifdef EPROTO
-        case EPROTO: return DRMP3_ERROR;
-    #endif
-    #ifdef EMULTIHOP
-        case EMULTIHOP: return DRMP3_ERROR;
-    #endif
-    #ifdef EDOTDOT
-        case EDOTDOT: return DRMP3_ERROR;
-    #endif
-    #ifdef EBADMSG
-        case EBADMSG: return DRMP3_BAD_MESSAGE;
-    #endif
-    #ifdef EOVERFLOW
-        case EOVERFLOW: return DRMP3_TOO_BIG;
-    #endif
-    #ifdef ENOTUNIQ
-        case ENOTUNIQ: return DRMP3_NOT_UNIQUE;
-    #endif
-    #ifdef EBADFD
-        case EBADFD: return DRMP3_ERROR;
-    #endif
-    #ifdef EREMCHG
-        case EREMCHG: return DRMP3_ERROR;
-    #endif
-    #ifdef ELIBACC
-        case ELIBACC: return DRMP3_ACCESS_DENIED;
-    #endif
-    #ifdef ELIBBAD
-        case ELIBBAD: return DRMP3_INVALID_FILE;
-    #endif
-    #ifdef ELIBSCN
-        case ELIBSCN: return DRMP3_INVALID_FILE;
-    #endif
-    #ifdef ELIBMAX
-        case ELIBMAX: return DRMP3_ERROR;
-    #endif
-    #ifdef ELIBEXEC
-        case ELIBEXEC: return DRMP3_ERROR;
-    #endif
-    #ifdef EILSEQ
-        case EILSEQ: return DRMP3_INVALID_DATA;
-    #endif
-    #ifdef ERESTART
-        case ERESTART: return DRMP3_ERROR;
-    #endif
-    #ifdef ESTRPIPE
-        case ESTRPIPE: return DRMP3_ERROR;
-    #endif
-    #ifdef EUSERS
-        case EUSERS: return DRMP3_ERROR;
-    #endif
-    #ifdef ENOTSOCK
-        case ENOTSOCK: return DRMP3_NOT_SOCKET;
-    #endif
-    #ifdef EDESTADDRREQ
-        case EDESTADDRREQ: return DRMP3_NO_ADDRESS;
-    #endif
-    #ifdef EMSGSIZE
-        case EMSGSIZE: return DRMP3_TOO_BIG;
-    #endif
-    #ifdef EPROTOTYPE
-        case EPROTOTYPE: return DRMP3_BAD_PROTOCOL;
-    #endif
-    #ifdef ENOPROTOOPT
-        case ENOPROTOOPT: return DRMP3_PROTOCOL_UNAVAILABLE;
-    #endif
-    #ifdef EPROTONOSUPPORT
-        case EPROTONOSUPPORT: return DRMP3_PROTOCOL_NOT_SUPPORTED;
-    #endif
-    #ifdef ESOCKTNOSUPPORT
-        case ESOCKTNOSUPPORT: return DRMP3_SOCKET_NOT_SUPPORTED;
-    #endif
-    #ifdef EOPNOTSUPP
-        case EOPNOTSUPP: return DRMP3_INVALID_OPERATION;
-    #endif
-    #ifdef EPFNOSUPPORT
-        case EPFNOSUPPORT: return DRMP3_PROTOCOL_FAMILY_NOT_SUPPORTED;
-    #endif
-    #ifdef EAFNOSUPPORT
-        case EAFNOSUPPORT: return DRMP3_ADDRESS_FAMILY_NOT_SUPPORTED;
-    #endif
-    #ifdef EADDRINUSE
-        case EADDRINUSE: return DRMP3_ALREADY_IN_USE;
-    #endif
-    #ifdef EADDRNOTAVAIL
-        case EADDRNOTAVAIL: return DRMP3_ERROR;
-    #endif
-    #ifdef ENETDOWN
-        case ENETDOWN: return DRMP3_NO_NETWORK;
-    #endif
-    #ifdef ENETUNREACH
-        case ENETUNREACH: return DRMP3_NO_NETWORK;
-    #endif
-    #ifdef ENETRESET
-        case ENETRESET: return DRMP3_NO_NETWORK;
-    #endif
-    #ifdef ECONNABORTED
-        case ECONNABORTED: return DRMP3_NO_NETWORK;
-    #endif
-    #ifdef ECONNRESET
-        case ECONNRESET: return DRMP3_CONNECTION_RESET;
-    #endif
-    #ifdef ENOBUFS
-        case ENOBUFS: return DRMP3_NO_SPACE;
-    #endif
-    #ifdef EISCONN
-        case EISCONN: return DRMP3_ALREADY_CONNECTED;
-    #endif
-    #ifdef ENOTCONN
-        case ENOTCONN: return DRMP3_NOT_CONNECTED;
-    #endif
-    #ifdef ESHUTDOWN
-        case ESHUTDOWN: return DRMP3_ERROR;
-    #endif
-    #ifdef ETOOMANYREFS
-        case ETOOMANYREFS: return DRMP3_ERROR;
-    #endif
-    #ifdef ETIMEDOUT
-        case ETIMEDOUT: return DRMP3_TIMEOUT;
-    #endif
-    #ifdef ECONNREFUSED
-        case ECONNREFUSED: return DRMP3_CONNECTION_REFUSED;
-    #endif
-    #ifdef EHOSTDOWN
-        case EHOSTDOWN: return DRMP3_NO_HOST;
-    #endif
-    #ifdef EHOSTUNREACH
-        case EHOSTUNREACH: return DRMP3_NO_HOST;
-    #endif
-    #ifdef EALREADY
-        case EALREADY: return DRMP3_IN_PROGRESS;
-    #endif
-    #ifdef EINPROGRESS
-        case EINPROGRESS: return DRMP3_IN_PROGRESS;
-    #endif
-    #ifdef ESTALE
-        case ESTALE: return DRMP3_INVALID_FILE;
-    #endif
-    #ifdef EUCLEAN
-        case EUCLEAN: return DRMP3_ERROR;
-    #endif
-    #ifdef ENOTNAM
-        case ENOTNAM: return DRMP3_ERROR;
-    #endif
-    #ifdef ENAVAIL
-        case ENAVAIL: return DRMP3_ERROR;
-    #endif
-    #ifdef EISNAM
-        case EISNAM: return DRMP3_ERROR;
-    #endif
-    #ifdef EREMOTEIO
-        case EREMOTEIO: return DRMP3_IO_ERROR;
-    #endif
-    #ifdef EDQUOT
-        case EDQUOT: return DRMP3_NO_SPACE;
-    #endif
-    #ifdef ENOMEDIUM
-        case ENOMEDIUM: return DRMP3_DOES_NOT_EXIST;
-    #endif
-    #ifdef EMEDIUMTYPE
-        case EMEDIUMTYPE: return DRMP3_ERROR;
-    #endif
-    #ifdef ECANCELED
-        case ECANCELED: return DRMP3_CANCELLED;
-    #endif
-    #ifdef ENOKEY
-        case ENOKEY: return DRMP3_ERROR;
-    #endif
-    #ifdef EKEYEXPIRED
-        case EKEYEXPIRED: return DRMP3_ERROR;
-    #endif
-    #ifdef EKEYREVOKED
-        case EKEYREVOKED: return DRMP3_ERROR;
-    #endif
-    #ifdef EKEYREJECTED
-        case EKEYREJECTED: return DRMP3_ERROR;
-    #endif
-    #ifdef EOWNERDEAD
-        case EOWNERDEAD: return DRMP3_ERROR;
-    #endif
-    #ifdef ENOTRECOVERABLE
-        case ENOTRECOVERABLE: return DRMP3_ERROR;
-    #endif
-    #ifdef ERFKILL
-        case ERFKILL: return DRMP3_ERROR;
-    #endif
-    #ifdef EHWPOISON
-        case EHWPOISON: return DRMP3_ERROR;
-    #endif
-        default: return DRMP3_ERROR;
-    }
-}
-/* End Errno */
-
-/* fopen */
-static drmp3_result drmp3_fopen(FILE** ppFile, const char* pFilePath, const char* pOpenMode)
-{
-#if defined(_MSC_VER) && _MSC_VER >= 1400
-    errno_t err;
-#endif
-
-    if (ppFile != NULL) {
-        *ppFile = NULL;  /* Safety. */
-    }
-
-    if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
-        return DRMP3_INVALID_ARGS;
-    }
-
-#if defined(_MSC_VER) && _MSC_VER >= 1400
-    err = fopen_s(ppFile, pFilePath, pOpenMode);
-    if (err != 0) {
-        return drmp3_result_from_errno(err);
-    }
-#else
-#if defined(_WIN32) || defined(__APPLE__)
-    *ppFile = fopen(pFilePath, pOpenMode);
-#else
-    #if defined(_FILE_OFFSET_BITS) && _FILE_OFFSET_BITS == 64 && defined(_LARGEFILE64_SOURCE)
-        *ppFile = fopen64(pFilePath, pOpenMode);
-    #else
-        *ppFile = fopen(pFilePath, pOpenMode);
-    #endif
-#endif
-    if (*ppFile == NULL) {
-        drmp3_result result = drmp3_result_from_errno(errno);
-        if (result == DRMP3_SUCCESS) {
-            result = DRMP3_ERROR;   /* Just a safety check to make sure we never ever return success when pFile == NULL. */
-        }
-
-        return result;
-    }
-#endif
-
-    return DRMP3_SUCCESS;
-}
-
-/*
-_wfopen() isn't always available in all compilation environments.
-
-    * Windows only.
-    * MSVC seems to support it universally as far back as VC6 from what I can tell (haven't checked further back).
-    * MinGW-64 (both 32- and 64-bit) seems to support it.
-    * MinGW wraps it in !defined(__STRICT_ANSI__).
-    * OpenWatcom wraps it in !defined(_NO_EXT_KEYS).
-
-This can be reviewed as compatibility issues arise. The preference is to use _wfopen_s() and _wfopen() as opposed to the wcsrtombs()
-fallback, so if you notice your compiler not detecting this properly I'm happy to look at adding support.
-*/
-#if defined(_WIN32)
-    #if defined(_MSC_VER) || defined(__MINGW64__) || (!defined(__STRICT_ANSI__) && !defined(_NO_EXT_KEYS))
-        #define DRMP3_HAS_WFOPEN
-    #endif
-#endif
-
-static drmp3_result drmp3_wfopen(FILE** ppFile, const wchar_t* pFilePath, const wchar_t* pOpenMode, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    if (ppFile != NULL) {
-        *ppFile = NULL;  /* Safety. */
-    }
-
-    if (pFilePath == NULL || pOpenMode == NULL || ppFile == NULL) {
-        return DRMP3_INVALID_ARGS;
-    }
-
-#if defined(DRMP3_HAS_WFOPEN)
-    {
-        /* Use _wfopen() on Windows. */
-    #if defined(_MSC_VER) && _MSC_VER >= 1400
-        errno_t err = _wfopen_s(ppFile, pFilePath, pOpenMode);
-        if (err != 0) {
-            return drmp3_result_from_errno(err);
-        }
-    #else
-        *ppFile = _wfopen(pFilePath, pOpenMode);
-        if (*ppFile == NULL) {
-            return drmp3_result_from_errno(errno);
-        }
-    #endif
-        (void)pAllocationCallbacks;
-    }
-#else
-    /*
-    Use fopen() on anything other than Windows. Requires a conversion. This is annoying because
-	fopen() is locale specific. The only real way I can think of to do this is with wcsrtombs(). Note
-	that wcstombs() is apparently not thread-safe because it uses a static global mbstate_t object for
-    maintaining state. I've checked this with -std=c89 and it works, but if somebody get's a compiler
-	error I'll look into improving compatibility.
-    */
-
-	/*
-	Some compilers don't support wchar_t or wcsrtombs() which we're using below. In this case we just
-	need to abort with an error. If you encounter a compiler lacking such support, add it to this list
-	and submit a bug report and it'll be added to the library upstream.
-	*/
-	#if defined(__DJGPP__)
-	{
-		/* Nothing to do here. This will fall through to the error check below. */
-	}
-	#else
-    {
-        mbstate_t mbs;
-        size_t lenMB;
-        const wchar_t* pFilePathTemp = pFilePath;
-        char* pFilePathMB = NULL;
-        char pOpenModeMB[32] = {0};
-
-        /* Get the length first. */
-        DRMP3_ZERO_OBJECT(&mbs);
-        lenMB = wcsrtombs(NULL, &pFilePathTemp, 0, &mbs);
-        if (lenMB == (size_t)-1) {
-            return drmp3_result_from_errno(errno);
-        }
-
-        pFilePathMB = (char*)drmp3__malloc_from_callbacks(lenMB + 1, pAllocationCallbacks);
-        if (pFilePathMB == NULL) {
-            return DRMP3_OUT_OF_MEMORY;
-        }
-
-        pFilePathTemp = pFilePath;
-        DRMP3_ZERO_OBJECT(&mbs);
-        wcsrtombs(pFilePathMB, &pFilePathTemp, lenMB + 1, &mbs);
-
-        /* The open mode should always consist of ASCII characters so we should be able to do a trivial conversion. */
-        {
-            size_t i = 0;
-            for (;;) {
-                if (pOpenMode[i] == 0) {
-                    pOpenModeMB[i] = '\0';
-                    break;
-                }
-
-                pOpenModeMB[i] = (char)pOpenMode[i];
-                i += 1;
-            }
-        }
-
-        *ppFile = fopen(pFilePathMB, pOpenModeMB);
-
-        drmp3__free_from_callbacks(pFilePathMB, pAllocationCallbacks);
-    }
-	#endif
-
-    if (*ppFile == NULL) {
-        return DRMP3_ERROR;
-    }
-#endif
-
-    return DRMP3_SUCCESS;
-}
-/* End fopen */
-
-
-static size_t drmp3__on_read_stdio(void* pUserData, void* pBufferOut, size_t bytesToRead)
-{
-    return fread(pBufferOut, 1, bytesToRead, (FILE*)pUserData);
-}
-
-static drmp3_bool32 drmp3__on_seek_stdio(void* pUserData, int offset, drmp3_seek_origin origin)
-{
-    return fseek((FILE*)pUserData, offset, (origin == drmp3_seek_origin_current) ? SEEK_CUR : SEEK_SET) == 0;
-}
-
-DRMP3_API drmp3_bool32 drmp3_init_file(drmp3* pMP3, const char* pFilePath, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    drmp3_bool32 result;
-    FILE* pFile;
-
-    if (drmp3_fopen(&pFile, pFilePath, "rb") != DRMP3_SUCCESS) {
-        return DRMP3_FALSE;
-    }
-
-    result = drmp3_init(pMP3, drmp3__on_read_stdio, drmp3__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
-    if (result != DRMP3_TRUE) {
-        fclose(pFile);
-        return result;
-    }
-
-    return DRMP3_TRUE;
-}
-
-DRMP3_API drmp3_bool32 drmp3_init_file_w(drmp3* pMP3, const wchar_t* pFilePath, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    drmp3_bool32 result;
-    FILE* pFile;
-
-    if (drmp3_wfopen(&pFile, pFilePath, L"rb", pAllocationCallbacks) != DRMP3_SUCCESS) {
-        return DRMP3_FALSE;
-    }
-
-    result = drmp3_init(pMP3, drmp3__on_read_stdio, drmp3__on_seek_stdio, (void*)pFile, pAllocationCallbacks);
-    if (result != DRMP3_TRUE) {
-        fclose(pFile);
-        return result;
-    }
-
-    return DRMP3_TRUE;
-}
-#endif
-
-DRMP3_API void drmp3_uninit(drmp3* pMP3)
-{
-    if (pMP3 == NULL) {
-        return;
-    }
-    
-#ifndef DR_MP3_NO_STDIO
-    if (pMP3->onRead == drmp3__on_read_stdio) {
-        FILE* pFile = (FILE*)pMP3->pUserData;
-        if (pFile != NULL) {
-            fclose(pFile);
-            pMP3->pUserData = NULL; /* Make sure the file handle is cleared to NULL to we don't attempt to close it a second time. */
-        }
-    }
-#endif
-
-    drmp3__free_from_callbacks(pMP3->pData, &pMP3->allocationCallbacks);
-}
-
-#if defined(DR_MP3_FLOAT_OUTPUT)
-static void drmp3_f32_to_s16(drmp3_int16* dst, const float* src, drmp3_uint64 sampleCount)
-{
-    drmp3_uint64 i;
-    drmp3_uint64 i4;
-    drmp3_uint64 sampleCount4;
-
-    /* Unrolled. */
-    i = 0;
-    sampleCount4 = sampleCount >> 2;
-    for (i4 = 0; i4 < sampleCount4; i4 += 1) {
-        float x0 = src[i+0];
-        float x1 = src[i+1];
-        float x2 = src[i+2];
-        float x3 = src[i+3];
-
-        x0 = ((x0 < -1) ? -1 : ((x0 > 1) ? 1 : x0));
-        x1 = ((x1 < -1) ? -1 : ((x1 > 1) ? 1 : x1));
-        x2 = ((x2 < -1) ? -1 : ((x2 > 1) ? 1 : x2));
-        x3 = ((x3 < -1) ? -1 : ((x3 > 1) ? 1 : x3));
-
-        x0 = x0 * 32767.0f;
-        x1 = x1 * 32767.0f;
-        x2 = x2 * 32767.0f;
-        x3 = x3 * 32767.0f;
-
-        dst[i+0] = (drmp3_int16)x0;
-        dst[i+1] = (drmp3_int16)x1;
-        dst[i+2] = (drmp3_int16)x2;
-        dst[i+3] = (drmp3_int16)x3;
-
-        i += 4;
-    }
-
-    /* Leftover. */
-    for (; i < sampleCount; i += 1) {
-        float x = src[i];
-        x = ((x < -1) ? -1 : ((x > 1) ? 1 : x));    /* clip */
-        x = x * 32767.0f;                           /* -1..1 to -32767..32767 */
-
-        dst[i] = (drmp3_int16)x;
-    }
-}
-#endif
-
-#if !defined(DR_MP3_FLOAT_OUTPUT)
-static void drmp3_s16_to_f32(float* dst, const drmp3_int16* src, drmp3_uint64 sampleCount)
-{
-    drmp3_uint64 i;
-    for (i = 0; i < sampleCount; i += 1) {
-        float x = (float)src[i];
-        x = x * 0.000030517578125f;         /* -32768..32767 to -1..0.999969482421875 */
-        dst[i] = x;
-    }
-}
-#endif
-
-
-static drmp3_uint64 drmp3_read_pcm_frames_raw(drmp3* pMP3, drmp3_uint64 framesToRead, void* pBufferOut)
-{
-    drmp3_uint64 totalFramesRead = 0;
-
-    DRMP3_ASSERT(pMP3 != NULL);
-    DRMP3_ASSERT(pMP3->onRead != NULL);
-
-    while (framesToRead > 0) {
-        drmp3_uint32 framesToConsume = (drmp3_uint32)DRMP3_MIN(pMP3->pcmFramesRemainingInMP3Frame, framesToRead);
-        if (pBufferOut != NULL) {
-        #if defined(DR_MP3_FLOAT_OUTPUT)
-            /* f32 */
-            float* pFramesOutF32 = (float*)DRMP3_OFFSET_PTR(pBufferOut,          sizeof(float) * totalFramesRead                   * pMP3->channels);
-            float* pFramesInF32  = (float*)DRMP3_OFFSET_PTR(&pMP3->pcmFrames[0], sizeof(float) * pMP3->pcmFramesConsumedInMP3Frame * pMP3->mp3FrameChannels);
-            DRMP3_COPY_MEMORY(pFramesOutF32, pFramesInF32, sizeof(float) * framesToConsume * pMP3->channels);
-        #else
-            /* s16 */
-            drmp3_int16* pFramesOutS16 = (drmp3_int16*)DRMP3_OFFSET_PTR(pBufferOut,          sizeof(drmp3_int16) * totalFramesRead                   * pMP3->channels);
-            drmp3_int16* pFramesInS16  = (drmp3_int16*)DRMP3_OFFSET_PTR(&pMP3->pcmFrames[0], sizeof(drmp3_int16) * pMP3->pcmFramesConsumedInMP3Frame * pMP3->mp3FrameChannels);
-            DRMP3_COPY_MEMORY(pFramesOutS16, pFramesInS16, sizeof(drmp3_int16) * framesToConsume * pMP3->channels);
-        #endif
-        }
-
-        pMP3->currentPCMFrame              += framesToConsume;
-        pMP3->pcmFramesConsumedInMP3Frame  += framesToConsume;
-        pMP3->pcmFramesRemainingInMP3Frame -= framesToConsume;
-        totalFramesRead                    += framesToConsume;
-        framesToRead                       -= framesToConsume;
-
-        if (framesToRead == 0) {
-            break;
-        }
-
-        DRMP3_ASSERT(pMP3->pcmFramesRemainingInMP3Frame == 0);
-
-        /*
-        At this point we have exhausted our in-memory buffer so we need to re-fill. Note that the sample rate may have changed
-        at this point which means we'll also need to update our sample rate conversion pipeline.
-        */
-        if (drmp3_decode_next_frame(pMP3) == 0) {
-            break;
-        }
-    }
-
-    return totalFramesRead;
-}
-
-
-DRMP3_API drmp3_uint64 drmp3_read_pcm_frames_f32(drmp3* pMP3, drmp3_uint64 framesToRead, float* pBufferOut)
-{
-    if (pMP3 == NULL || pMP3->onRead == NULL) {
-        return 0;
-    }
-
-#if defined(DR_MP3_FLOAT_OUTPUT)
-    /* Fast path. No conversion required. */
-    return drmp3_read_pcm_frames_raw(pMP3, framesToRead, pBufferOut);
-#else
-    /* Slow path. Convert from s16 to f32. */
-    {
-        drmp3_int16 pTempS16[8192];
-        drmp3_uint64 totalPCMFramesRead = 0;
-
-        while (totalPCMFramesRead < framesToRead) {
-            drmp3_uint64 framesJustRead;
-            drmp3_uint64 framesRemaining = framesToRead - totalPCMFramesRead;
-            drmp3_uint64 framesToReadNow = DRMP3_COUNTOF(pTempS16) / pMP3->channels;
-            if (framesToReadNow > framesRemaining) {
-                framesToReadNow = framesRemaining;
-            }
-
-            framesJustRead = drmp3_read_pcm_frames_raw(pMP3, framesToReadNow, pTempS16);
-            if (framesJustRead == 0) {
-                break;
-            }
-
-            drmp3_s16_to_f32((float*)DRMP3_OFFSET_PTR(pBufferOut, sizeof(float) * totalPCMFramesRead * pMP3->channels), pTempS16, framesJustRead * pMP3->channels);
-            totalPCMFramesRead += framesJustRead;
-        }
-
-        return totalPCMFramesRead;
-    }
-#endif
-}
-
-DRMP3_API drmp3_uint64 drmp3_read_pcm_frames_s16(drmp3* pMP3, drmp3_uint64 framesToRead, drmp3_int16* pBufferOut)
-{
-    if (pMP3 == NULL || pMP3->onRead == NULL) {
-        return 0;
-    }
-
-#if !defined(DR_MP3_FLOAT_OUTPUT)
-    /* Fast path. No conversion required. */
-    return drmp3_read_pcm_frames_raw(pMP3, framesToRead, pBufferOut);
-#else
-    /* Slow path. Convert from f32 to s16. */
-    {
-        float pTempF32[4096];
-        drmp3_uint64 totalPCMFramesRead = 0;
-
-        while (totalPCMFramesRead < framesToRead) {
-            drmp3_uint64 framesJustRead;
-            drmp3_uint64 framesRemaining = framesToRead - totalPCMFramesRead;
-            drmp3_uint64 framesToReadNow = DRMP3_COUNTOF(pTempF32) / pMP3->channels;
-            if (framesToReadNow > framesRemaining) {
-                framesToReadNow = framesRemaining;
-            }
-
-            framesJustRead = drmp3_read_pcm_frames_raw(pMP3, framesToReadNow, pTempF32);
-            if (framesJustRead == 0) {
-                break;
-            }
-
-            drmp3_f32_to_s16((drmp3_int16*)DRMP3_OFFSET_PTR(pBufferOut, sizeof(drmp3_int16) * totalPCMFramesRead * pMP3->channels), pTempF32, framesJustRead * pMP3->channels);
-            totalPCMFramesRead += framesJustRead;
-        }
-
-        return totalPCMFramesRead;
-    }
-#endif
-}
-
-static void drmp3_reset(drmp3* pMP3)
-{
-    DRMP3_ASSERT(pMP3 != NULL);
-
-    pMP3->pcmFramesConsumedInMP3Frame = 0;
-    pMP3->pcmFramesRemainingInMP3Frame = 0;
-    pMP3->currentPCMFrame = 0;
-    pMP3->dataSize = 0;
-    pMP3->atEnd = DRMP3_FALSE;
-    drmp3dec_init(&pMP3->decoder);
-}
-
-static drmp3_bool32 drmp3_seek_to_start_of_stream(drmp3* pMP3)
-{
-    DRMP3_ASSERT(pMP3 != NULL);
-    DRMP3_ASSERT(pMP3->onSeek != NULL);
-
-    /* Seek to the start of the stream to begin with. */
-    if (!drmp3__on_seek(pMP3, 0, drmp3_seek_origin_start)) {
-        return DRMP3_FALSE;
-    }
-
-    /* Clear any cached data. */
-    drmp3_reset(pMP3);
-    return DRMP3_TRUE;
-}
-
-
-static drmp3_bool32 drmp3_seek_forward_by_pcm_frames__brute_force(drmp3* pMP3, drmp3_uint64 frameOffset)
-{
-    drmp3_uint64 framesRead;
-
-    /*
-    Just using a dumb read-and-discard for now. What would be nice is to parse only the header of the MP3 frame, and then skip over leading
-    frames without spending the time doing a full decode. I cannot see an easy way to do this in minimp3, however, so it may involve some
-    kind of manual processing.
-    */
-#if defined(DR_MP3_FLOAT_OUTPUT)
-    framesRead = drmp3_read_pcm_frames_f32(pMP3, frameOffset, NULL);
-#else
-    framesRead = drmp3_read_pcm_frames_s16(pMP3, frameOffset, NULL);
-#endif
-    if (framesRead != frameOffset) {
-        return DRMP3_FALSE;
-    }
-
-    return DRMP3_TRUE;
-}
-
-static drmp3_bool32 drmp3_seek_to_pcm_frame__brute_force(drmp3* pMP3, drmp3_uint64 frameIndex)
-{
-    DRMP3_ASSERT(pMP3 != NULL);
-
-    if (frameIndex == pMP3->currentPCMFrame) {
-        return DRMP3_TRUE;
-    }
-
-    /*
-    If we're moving foward we just read from where we're at. Otherwise we need to move back to the start of
-    the stream and read from the beginning.
-    */
-    if (frameIndex < pMP3->currentPCMFrame) {
-        /* Moving backward. Move to the start of the stream and then move forward. */
-        if (!drmp3_seek_to_start_of_stream(pMP3)) {
-            return DRMP3_FALSE;
-        }
-    }
-
-    DRMP3_ASSERT(frameIndex >= pMP3->currentPCMFrame);
-    return drmp3_seek_forward_by_pcm_frames__brute_force(pMP3, (frameIndex - pMP3->currentPCMFrame));
-}
-
-static drmp3_bool32 drmp3_find_closest_seek_point(drmp3* pMP3, drmp3_uint64 frameIndex, drmp3_uint32* pSeekPointIndex)
-{
-    drmp3_uint32 iSeekPoint;
-
-    DRMP3_ASSERT(pSeekPointIndex != NULL);
-
-    *pSeekPointIndex = 0;
-
-    if (frameIndex < pMP3->pSeekPoints[0].pcmFrameIndex) {
-        return DRMP3_FALSE;
-    }
-
-    /* Linear search for simplicity to begin with while I'm getting this thing working. Once it's all working change this to a binary search. */
-    for (iSeekPoint = 0; iSeekPoint < pMP3->seekPointCount; ++iSeekPoint) {
-        if (pMP3->pSeekPoints[iSeekPoint].pcmFrameIndex > frameIndex) {
-            break;  /* Found it. */
-        }
-
-        *pSeekPointIndex = iSeekPoint;
-    }
-
-    return DRMP3_TRUE;
-}
-
-static drmp3_bool32 drmp3_seek_to_pcm_frame__seek_table(drmp3* pMP3, drmp3_uint64 frameIndex)
-{
-    drmp3_seek_point seekPoint;
-    drmp3_uint32 priorSeekPointIndex;
-    drmp3_uint16 iMP3Frame;
-    drmp3_uint64 leftoverFrames;
-
-    DRMP3_ASSERT(pMP3 != NULL);
-    DRMP3_ASSERT(pMP3->pSeekPoints != NULL);
-    DRMP3_ASSERT(pMP3->seekPointCount > 0);
-
-    /* If there is no prior seekpoint it means the target PCM frame comes before the first seek point. Just assume a seekpoint at the start of the file in this case. */
-    if (drmp3_find_closest_seek_point(pMP3, frameIndex, &priorSeekPointIndex)) {
-        seekPoint = pMP3->pSeekPoints[priorSeekPointIndex];
-    } else {
-        seekPoint.seekPosInBytes     = 0;
-        seekPoint.pcmFrameIndex      = 0;
-        seekPoint.mp3FramesToDiscard = 0;
-        seekPoint.pcmFramesToDiscard = 0;
-    }
-
-    /* First thing to do is seek to the first byte of the relevant MP3 frame. */
-    if (!drmp3__on_seek_64(pMP3, seekPoint.seekPosInBytes, drmp3_seek_origin_start)) {
-        return DRMP3_FALSE; /* Failed to seek. */
-    }
-
-    /* Clear any cached data. */
-    drmp3_reset(pMP3);
-
-    /* Whole MP3 frames need to be discarded first. */
-    for (iMP3Frame = 0; iMP3Frame < seekPoint.mp3FramesToDiscard; ++iMP3Frame) {
-        drmp3_uint32 pcmFramesRead;
-        drmp3d_sample_t* pPCMFrames;
-
-        /* Pass in non-null for the last frame because we want to ensure the sample rate converter is preloaded correctly. */
-        pPCMFrames = NULL;
-        if (iMP3Frame == seekPoint.mp3FramesToDiscard-1) {
-            pPCMFrames = (drmp3d_sample_t*)pMP3->pcmFrames;
-        }
-
-        /* We first need to decode the next frame. */
-        pcmFramesRead = drmp3_decode_next_frame_ex(pMP3, pPCMFrames);
-        if (pcmFramesRead == 0) {
-            return DRMP3_FALSE;
-        }
-    }
-
-    /* We seeked to an MP3 frame in the raw stream so we need to make sure the current PCM frame is set correctly. */
-    pMP3->currentPCMFrame = seekPoint.pcmFrameIndex - seekPoint.pcmFramesToDiscard;
-
-    /*
-    Now at this point we can follow the same process as the brute force technique where we just skip over unnecessary MP3 frames and then
-    read-and-discard at least 2 whole MP3 frames.
-    */
-    leftoverFrames = frameIndex - pMP3->currentPCMFrame;
-    return drmp3_seek_forward_by_pcm_frames__brute_force(pMP3, leftoverFrames);
-}
-
-DRMP3_API drmp3_bool32 drmp3_seek_to_pcm_frame(drmp3* pMP3, drmp3_uint64 frameIndex)
-{
-    if (pMP3 == NULL || pMP3->onSeek == NULL) {
-        return DRMP3_FALSE;
-    }
-
-    if (frameIndex == 0) {
-        return drmp3_seek_to_start_of_stream(pMP3);
-    }
-
-    /* Use the seek table if we have one. */
-    if (pMP3->pSeekPoints != NULL && pMP3->seekPointCount > 0) {
-        return drmp3_seek_to_pcm_frame__seek_table(pMP3, frameIndex);
-    } else {
-        return drmp3_seek_to_pcm_frame__brute_force(pMP3, frameIndex);
-    }
-}
-
-DRMP3_API drmp3_bool32 drmp3_get_mp3_and_pcm_frame_count(drmp3* pMP3, drmp3_uint64* pMP3FrameCount, drmp3_uint64* pPCMFrameCount)
-{
-    drmp3_uint64 currentPCMFrame;
-    drmp3_uint64 totalPCMFrameCount;
-    drmp3_uint64 totalMP3FrameCount;
-
-    if (pMP3 == NULL) {
-        return DRMP3_FALSE;
-    }
-
-    /*
-    The way this works is we move back to the start of the stream, iterate over each MP3 frame and calculate the frame count based
-    on our output sample rate, the seek back to the PCM frame we were sitting on before calling this function.
-    */
-
-    /* The stream must support seeking for this to work. */
-    if (pMP3->onSeek == NULL) {
-        return DRMP3_FALSE;
-    }
-
-    /* We'll need to seek back to where we were, so grab the PCM frame we're currently sitting on so we can restore later. */
-    currentPCMFrame = pMP3->currentPCMFrame;
-    
-    if (!drmp3_seek_to_start_of_stream(pMP3)) {
-        return DRMP3_FALSE;
-    }
-
-    totalPCMFrameCount = 0;
-    totalMP3FrameCount = 0;
-
-    for (;;) {
-        drmp3_uint32 pcmFramesInCurrentMP3Frame;
-
-        pcmFramesInCurrentMP3Frame = drmp3_decode_next_frame_ex(pMP3, NULL);
-        if (pcmFramesInCurrentMP3Frame == 0) {
-            break;
-        }
-
-        totalPCMFrameCount += pcmFramesInCurrentMP3Frame;
-        totalMP3FrameCount += 1;
-    }
-
-    /* Finally, we need to seek back to where we were. */
-    if (!drmp3_seek_to_start_of_stream(pMP3)) {
-        return DRMP3_FALSE;
-    }
-
-    if (!drmp3_seek_to_pcm_frame(pMP3, currentPCMFrame)) {
-        return DRMP3_FALSE;
-    }
-
-    if (pMP3FrameCount != NULL) {
-        *pMP3FrameCount = totalMP3FrameCount;
-    }
-    if (pPCMFrameCount != NULL) {
-        *pPCMFrameCount = totalPCMFrameCount;
-    }
-
-    return DRMP3_TRUE;
-}
-
-DRMP3_API drmp3_uint64 drmp3_get_pcm_frame_count(drmp3* pMP3)
-{
-    drmp3_uint64 totalPCMFrameCount;
-    if (!drmp3_get_mp3_and_pcm_frame_count(pMP3, NULL, &totalPCMFrameCount)) {
-        return 0;
-    }
-
-    return totalPCMFrameCount;
-}
-
-DRMP3_API drmp3_uint64 drmp3_get_mp3_frame_count(drmp3* pMP3)
-{
-    drmp3_uint64 totalMP3FrameCount;
-    if (!drmp3_get_mp3_and_pcm_frame_count(pMP3, &totalMP3FrameCount, NULL)) {
-        return 0;
-    }
-
-    return totalMP3FrameCount;
-}
-
-static void drmp3__accumulate_running_pcm_frame_count(drmp3* pMP3, drmp3_uint32 pcmFrameCountIn, drmp3_uint64* pRunningPCMFrameCount, float* pRunningPCMFrameCountFractionalPart)
-{
-    float srcRatio;
-    float pcmFrameCountOutF;
-    drmp3_uint32 pcmFrameCountOut;
-
-    srcRatio = (float)pMP3->mp3FrameSampleRate / (float)pMP3->sampleRate;
-    DRMP3_ASSERT(srcRatio > 0);
-
-    pcmFrameCountOutF = *pRunningPCMFrameCountFractionalPart + (pcmFrameCountIn / srcRatio);
-    pcmFrameCountOut  = (drmp3_uint32)pcmFrameCountOutF;
-    *pRunningPCMFrameCountFractionalPart = pcmFrameCountOutF - pcmFrameCountOut;
-    *pRunningPCMFrameCount += pcmFrameCountOut;
-}
-
-typedef struct
-{
-    drmp3_uint64 bytePos;
-    drmp3_uint64 pcmFrameIndex; /* <-- After sample rate conversion. */
-} drmp3__seeking_mp3_frame_info;
-
-DRMP3_API drmp3_bool32 drmp3_calculate_seek_points(drmp3* pMP3, drmp3_uint32* pSeekPointCount, drmp3_seek_point* pSeekPoints)
-{
-    drmp3_uint32 seekPointCount;
-    drmp3_uint64 currentPCMFrame;
-    drmp3_uint64 totalMP3FrameCount;
-    drmp3_uint64 totalPCMFrameCount;
-
-    if (pMP3 == NULL || pSeekPointCount == NULL || pSeekPoints == NULL) {
-        return DRMP3_FALSE; /* Invalid args. */
-    }
-
-    seekPointCount = *pSeekPointCount;
-    if (seekPointCount == 0) {
-        return DRMP3_FALSE;  /* The client has requested no seek points. Consider this to be invalid arguments since the client has probably not intended this. */
-    }
-
-    /* We'll need to seek back to the current sample after calculating the seekpoints so we need to go ahead and grab the current location at the top. */
-    currentPCMFrame = pMP3->currentPCMFrame;
-    
-    /* We never do more than the total number of MP3 frames and we limit it to 32-bits. */
-    if (!drmp3_get_mp3_and_pcm_frame_count(pMP3, &totalMP3FrameCount, &totalPCMFrameCount)) {
-        return DRMP3_FALSE;
-    }
-
-    /* If there's less than DRMP3_SEEK_LEADING_MP3_FRAMES+1 frames we just report 1 seek point which will be the very start of the stream. */
-    if (totalMP3FrameCount < DRMP3_SEEK_LEADING_MP3_FRAMES+1) {
-        seekPointCount = 1;
-        pSeekPoints[0].seekPosInBytes     = 0;
-        pSeekPoints[0].pcmFrameIndex      = 0;
-        pSeekPoints[0].mp3FramesToDiscard = 0;
-        pSeekPoints[0].pcmFramesToDiscard = 0;
-    } else {
-        drmp3_uint64 pcmFramesBetweenSeekPoints;
-        drmp3__seeking_mp3_frame_info mp3FrameInfo[DRMP3_SEEK_LEADING_MP3_FRAMES+1];
-        drmp3_uint64 runningPCMFrameCount = 0;
-        float runningPCMFrameCountFractionalPart = 0;
-        drmp3_uint64 nextTargetPCMFrame;
-        drmp3_uint32 iMP3Frame;
-        drmp3_uint32 iSeekPoint;
-
-        if (seekPointCount > totalMP3FrameCount-1) {
-            seekPointCount = (drmp3_uint32)totalMP3FrameCount-1;
-        }
-
-        pcmFramesBetweenSeekPoints = totalPCMFrameCount / (seekPointCount+1);
-
-        /*
-        Here is where we actually calculate the seek points. We need to start by moving the start of the stream. We then enumerate over each
-        MP3 frame.
-        */
-        if (!drmp3_seek_to_start_of_stream(pMP3)) {
-            return DRMP3_FALSE;
-        }
-
-        /*
-        We need to cache the byte positions of the previous MP3 frames. As a new MP3 frame is iterated, we cycle the byte positions in this
-        array. The value in the first item in this array is the byte position that will be reported in the next seek point.
-        */
-
-        /* We need to initialize the array of MP3 byte positions for the leading MP3 frames. */
-        for (iMP3Frame = 0; iMP3Frame < DRMP3_SEEK_LEADING_MP3_FRAMES+1; ++iMP3Frame) {
-            drmp3_uint32 pcmFramesInCurrentMP3FrameIn;
-
-            /* The byte position of the next frame will be the stream's cursor position, minus whatever is sitting in the buffer. */
-            DRMP3_ASSERT(pMP3->streamCursor >= pMP3->dataSize);
-            mp3FrameInfo[iMP3Frame].bytePos       = pMP3->streamCursor - pMP3->dataSize;
-            mp3FrameInfo[iMP3Frame].pcmFrameIndex = runningPCMFrameCount;
-
-            /* We need to get information about this frame so we can know how many samples it contained. */
-            pcmFramesInCurrentMP3FrameIn = drmp3_decode_next_frame_ex(pMP3, NULL);
-            if (pcmFramesInCurrentMP3FrameIn == 0) {
-                return DRMP3_FALSE; /* This should never happen. */
-            }
-
-            drmp3__accumulate_running_pcm_frame_count(pMP3, pcmFramesInCurrentMP3FrameIn, &runningPCMFrameCount, &runningPCMFrameCountFractionalPart);
-        }
-
-        /*
-        At this point we will have extracted the byte positions of the leading MP3 frames. We can now start iterating over each seek point and
-        calculate them.
-        */
-        nextTargetPCMFrame = 0;
-        for (iSeekPoint = 0; iSeekPoint < seekPointCount; ++iSeekPoint) {
-            nextTargetPCMFrame += pcmFramesBetweenSeekPoints;
-
-            for (;;) {
-                if (nextTargetPCMFrame < runningPCMFrameCount) {
-                    /* The next seek point is in the current MP3 frame. */
-                    pSeekPoints[iSeekPoint].seekPosInBytes     = mp3FrameInfo[0].bytePos;
-                    pSeekPoints[iSeekPoint].pcmFrameIndex      = nextTargetPCMFrame;
-                    pSeekPoints[iSeekPoint].mp3FramesToDiscard = DRMP3_SEEK_LEADING_MP3_FRAMES;
-                    pSeekPoints[iSeekPoint].pcmFramesToDiscard = (drmp3_uint16)(nextTargetPCMFrame - mp3FrameInfo[DRMP3_SEEK_LEADING_MP3_FRAMES-1].pcmFrameIndex);
-                    break;
-                } else {
-                    size_t i;
-                    drmp3_uint32 pcmFramesInCurrentMP3FrameIn;
-
-                    /*
-                    The next seek point is not in the current MP3 frame, so continue on to the next one. The first thing to do is cycle the cached
-                    MP3 frame info.
-                    */
-                    for (i = 0; i < DRMP3_COUNTOF(mp3FrameInfo)-1; ++i) {
-                        mp3FrameInfo[i] = mp3FrameInfo[i+1];
-                    }
-
-                    /* Cache previous MP3 frame info. */
-                    mp3FrameInfo[DRMP3_COUNTOF(mp3FrameInfo)-1].bytePos       = pMP3->streamCursor - pMP3->dataSize;
-                    mp3FrameInfo[DRMP3_COUNTOF(mp3FrameInfo)-1].pcmFrameIndex = runningPCMFrameCount;
-
-                    /*
-                    Go to the next MP3 frame. This shouldn't ever fail, but just in case it does we just set the seek point and break. If it happens, it
-                    should only ever do it for the last seek point.
-                    */
-                    pcmFramesInCurrentMP3FrameIn = drmp3_decode_next_frame_ex(pMP3, NULL);
-                    if (pcmFramesInCurrentMP3FrameIn == 0) {
-                        pSeekPoints[iSeekPoint].seekPosInBytes     = mp3FrameInfo[0].bytePos;
-                        pSeekPoints[iSeekPoint].pcmFrameIndex      = nextTargetPCMFrame;
-                        pSeekPoints[iSeekPoint].mp3FramesToDiscard = DRMP3_SEEK_LEADING_MP3_FRAMES;
-                        pSeekPoints[iSeekPoint].pcmFramesToDiscard = (drmp3_uint16)(nextTargetPCMFrame - mp3FrameInfo[DRMP3_SEEK_LEADING_MP3_FRAMES-1].pcmFrameIndex);
-                        break;
-                    }
-
-                    drmp3__accumulate_running_pcm_frame_count(pMP3, pcmFramesInCurrentMP3FrameIn, &runningPCMFrameCount, &runningPCMFrameCountFractionalPart);
-                }
-            }
-        }
-
-        /* Finally, we need to seek back to where we were. */
-        if (!drmp3_seek_to_start_of_stream(pMP3)) {
-            return DRMP3_FALSE;
-        }
-        if (!drmp3_seek_to_pcm_frame(pMP3, currentPCMFrame)) {
-            return DRMP3_FALSE;
-        }
-    }
-
-    *pSeekPointCount = seekPointCount;
-    return DRMP3_TRUE;
-}
-
-DRMP3_API drmp3_bool32 drmp3_bind_seek_table(drmp3* pMP3, drmp3_uint32 seekPointCount, drmp3_seek_point* pSeekPoints)
-{
-    if (pMP3 == NULL) {
-        return DRMP3_FALSE;
-    }
-
-    if (seekPointCount == 0 || pSeekPoints == NULL) {
-        /* Unbinding. */
-        pMP3->seekPointCount = 0;
-        pMP3->pSeekPoints = NULL;
-    } else {
-        /* Binding. */
-        pMP3->seekPointCount = seekPointCount;
-        pMP3->pSeekPoints = pSeekPoints;
-    }
-
-    return DRMP3_TRUE;
-}
-
-
-static float* drmp3__full_read_and_close_f32(drmp3* pMP3, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount)
-{
-    drmp3_uint64 totalFramesRead = 0;
-    drmp3_uint64 framesCapacity = 0;
-    float* pFrames = NULL;
-    float temp[4096];
-
-    DRMP3_ASSERT(pMP3 != NULL);
-
-    for (;;) {
-        drmp3_uint64 framesToReadRightNow = DRMP3_COUNTOF(temp) / pMP3->channels;
-        drmp3_uint64 framesJustRead = drmp3_read_pcm_frames_f32(pMP3, framesToReadRightNow, temp);
-        if (framesJustRead == 0) {
-            break;
-        }
-
-        /* Reallocate the output buffer if there's not enough room. */
-        if (framesCapacity < totalFramesRead + framesJustRead) {
-            drmp3_uint64 oldFramesBufferSize;
-            drmp3_uint64 newFramesBufferSize;
-            drmp3_uint64 newFramesCap;
-            float* pNewFrames;
-
-            newFramesCap = framesCapacity * 2;
-            if (newFramesCap < totalFramesRead + framesJustRead) {
-                newFramesCap = totalFramesRead + framesJustRead;
-            }
-
-            oldFramesBufferSize = framesCapacity * pMP3->channels * sizeof(float);
-            newFramesBufferSize = newFramesCap   * pMP3->channels * sizeof(float);
-            if (newFramesBufferSize > (drmp3_uint64)DRMP3_SIZE_MAX) {
-                break;
-            }
-
-            pNewFrames = (float*)drmp3__realloc_from_callbacks(pFrames, (size_t)newFramesBufferSize, (size_t)oldFramesBufferSize, &pMP3->allocationCallbacks);
-            if (pNewFrames == NULL) {
-                drmp3__free_from_callbacks(pFrames, &pMP3->allocationCallbacks);
-                break;
-            }
-
-            pFrames = pNewFrames;
-            framesCapacity = newFramesCap;
-        }
-
-        DRMP3_COPY_MEMORY(pFrames + totalFramesRead*pMP3->channels, temp, (size_t)(framesJustRead*pMP3->channels*sizeof(float)));
-        totalFramesRead += framesJustRead;
-
-        /* If the number of frames we asked for is less that what we actually read it means we've reached the end. */
-        if (framesJustRead != framesToReadRightNow) {
-            break;
-        }
-    }
-
-    if (pConfig != NULL) {
-        pConfig->channels   = pMP3->channels;
-        pConfig->sampleRate = pMP3->sampleRate;
-    }
-
-    drmp3_uninit(pMP3);
-
-    if (pTotalFrameCount) {
-        *pTotalFrameCount = totalFramesRead;
-    }
-
-    return pFrames;
-}
-
-static drmp3_int16* drmp3__full_read_and_close_s16(drmp3* pMP3, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount)
-{
-    drmp3_uint64 totalFramesRead = 0;
-    drmp3_uint64 framesCapacity = 0;
-    drmp3_int16* pFrames = NULL;
-    drmp3_int16 temp[4096];
-
-    DRMP3_ASSERT(pMP3 != NULL);
-
-    for (;;) {
-        drmp3_uint64 framesToReadRightNow = DRMP3_COUNTOF(temp) / pMP3->channels;
-        drmp3_uint64 framesJustRead = drmp3_read_pcm_frames_s16(pMP3, framesToReadRightNow, temp);
-        if (framesJustRead == 0) {
-            break;
-        }
-
-        /* Reallocate the output buffer if there's not enough room. */
-        if (framesCapacity < totalFramesRead + framesJustRead) {
-            drmp3_uint64 newFramesBufferSize;
-            drmp3_uint64 oldFramesBufferSize;
-            drmp3_uint64 newFramesCap;
-            drmp3_int16* pNewFrames;
-
-            newFramesCap = framesCapacity * 2;
-            if (newFramesCap < totalFramesRead + framesJustRead) {
-                newFramesCap = totalFramesRead + framesJustRead;
-            }
-
-            oldFramesBufferSize = framesCapacity * pMP3->channels * sizeof(drmp3_int16);
-            newFramesBufferSize = newFramesCap   * pMP3->channels * sizeof(drmp3_int16);
-            if (newFramesBufferSize > (drmp3_uint64)DRMP3_SIZE_MAX) {
-                break;
-            }
-
-            pNewFrames = (drmp3_int16*)drmp3__realloc_from_callbacks(pFrames, (size_t)newFramesBufferSize, (size_t)oldFramesBufferSize, &pMP3->allocationCallbacks);
-            if (pNewFrames == NULL) {
-                drmp3__free_from_callbacks(pFrames, &pMP3->allocationCallbacks);
-                break;
-            }
-
-            pFrames = pNewFrames;
-            framesCapacity = newFramesCap;
-        }
-
-        DRMP3_COPY_MEMORY(pFrames + totalFramesRead*pMP3->channels, temp, (size_t)(framesJustRead*pMP3->channels*sizeof(drmp3_int16)));
-        totalFramesRead += framesJustRead;
-
-        /* If the number of frames we asked for is less that what we actually read it means we've reached the end. */
-        if (framesJustRead != framesToReadRightNow) {
-            break;
-        }
-    }
-
-    if (pConfig != NULL) {
-        pConfig->channels   = pMP3->channels;
-        pConfig->sampleRate = pMP3->sampleRate;
-    }
-
-    drmp3_uninit(pMP3);
-
-    if (pTotalFrameCount) {
-        *pTotalFrameCount = totalFramesRead;
-    }
-
-    return pFrames;
-}
-
-
-DRMP3_API float* drmp3_open_and_read_pcm_frames_f32(drmp3_read_proc onRead, drmp3_seek_proc onSeek, void* pUserData, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    drmp3 mp3;
-    if (!drmp3_init(&mp3, onRead, onSeek, pUserData, pAllocationCallbacks)) {
-        return NULL;
-    }
-
-    return drmp3__full_read_and_close_f32(&mp3, pConfig, pTotalFrameCount);
-}
-
-DRMP3_API drmp3_int16* drmp3_open_and_read_pcm_frames_s16(drmp3_read_proc onRead, drmp3_seek_proc onSeek, void* pUserData, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    drmp3 mp3;
-    if (!drmp3_init(&mp3, onRead, onSeek, pUserData, pAllocationCallbacks)) {
-        return NULL;
-    }
-
-    return drmp3__full_read_and_close_s16(&mp3, pConfig, pTotalFrameCount);
-}
-
-
-DRMP3_API float* drmp3_open_memory_and_read_pcm_frames_f32(const void* pData, size_t dataSize, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    drmp3 mp3;
-    if (!drmp3_init_memory(&mp3, pData, dataSize, pAllocationCallbacks)) {
-        return NULL;
-    }
-
-    return drmp3__full_read_and_close_f32(&mp3, pConfig, pTotalFrameCount);
-}
-
-DRMP3_API drmp3_int16* drmp3_open_memory_and_read_pcm_frames_s16(const void* pData, size_t dataSize, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    drmp3 mp3;
-    if (!drmp3_init_memory(&mp3, pData, dataSize, pAllocationCallbacks)) {
-        return NULL;
-    }
-
-    return drmp3__full_read_and_close_s16(&mp3, pConfig, pTotalFrameCount);
-}
-
-
-#ifndef DR_MP3_NO_STDIO
-DRMP3_API float* drmp3_open_file_and_read_pcm_frames_f32(const char* filePath, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    drmp3 mp3;
-    if (!drmp3_init_file(&mp3, filePath, pAllocationCallbacks)) {
-        return NULL;
-    }
-
-    return drmp3__full_read_and_close_f32(&mp3, pConfig, pTotalFrameCount);
-}
-
-DRMP3_API drmp3_int16* drmp3_open_file_and_read_pcm_frames_s16(const char* filePath, drmp3_config* pConfig, drmp3_uint64* pTotalFrameCount, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    drmp3 mp3;
-    if (!drmp3_init_file(&mp3, filePath, pAllocationCallbacks)) {
-        return NULL;
-    }
-
-    return drmp3__full_read_and_close_s16(&mp3, pConfig, pTotalFrameCount);
-}
-#endif
-
-DRMP3_API void* drmp3_malloc(size_t sz, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    if (pAllocationCallbacks != NULL) {
-        return drmp3__malloc_from_callbacks(sz, pAllocationCallbacks);
-    } else {
-        return drmp3__malloc_default(sz, NULL);
-    }
-}
-
-DRMP3_API void drmp3_free(void* p, const drmp3_allocation_callbacks* pAllocationCallbacks)
-{
-    if (pAllocationCallbacks != NULL) {
-        drmp3__free_from_callbacks(p, pAllocationCallbacks);
-    } else {
-        drmp3__free_default(p, NULL);
-    }
-}
-
-#endif  /* dr_mp3_c */
-#endif  /*DR_MP3_IMPLEMENTATION*/
-
-/*
-DIFFERENCES BETWEEN minimp3 AND dr_mp3
-======================================
-- First, keep in mind that minimp3 (https://github.com/lieff/minimp3) is where all the real work was done. All of the
-  code relating to the actual decoding remains mostly unmodified, apart from some namespacing changes.
-- dr_mp3 adds a pulling style API which allows you to deliver raw data via callbacks. So, rather than pushing data
-  to the decoder, the decoder _pulls_ data from your callbacks.
-- In addition to callbacks, a decoder can be initialized from a block of memory and a file.
-- The dr_mp3 pull API reads PCM frames rather than whole MP3 frames.
-- dr_mp3 adds convenience APIs for opening and decoding entire files in one go.
-- dr_mp3 is fully namespaced, including the implementation section, which is more suitable when compiling projects
-  as a single translation unit (aka unity builds). At the time of writing this, a unity build is not possible when
-  using minimp3 in conjunction with stb_vorbis. dr_mp3 addresses this.
-*/
-
-/*
-RELEASE NOTES - v0.5.0
-=======================
-Version 0.5.0 has breaking API changes.
-
-Improved Client-Defined Memory Allocation
------------------------------------------
-The main change with this release is the addition of a more flexible way of implementing custom memory allocation routines. The
-existing system of DRMP3_MALLOC, DRMP3_REALLOC and DRMP3_FREE are still in place and will be used by default when no custom
-allocation callbacks are specified.
-
-To use the new system, you pass in a pointer to a drmp3_allocation_callbacks object to drmp3_init() and family, like this:
-
-    void* my_malloc(size_t sz, void* pUserData)
-    {
-        return malloc(sz);
-    }
-    void* my_realloc(void* p, size_t sz, void* pUserData)
-    {
-        return realloc(p, sz);
-    }
-    void my_free(void* p, void* pUserData)
-    {
-        free(p);
-    }
-
-    ...
-
-    drmp3_allocation_callbacks allocationCallbacks;
-    allocationCallbacks.pUserData = &myData;
-    allocationCallbacks.onMalloc  = my_malloc;
-    allocationCallbacks.onRealloc = my_realloc;
-    allocationCallbacks.onFree    = my_free;
-    drmp3_init_file(&mp3, "my_file.mp3", NULL, &allocationCallbacks);
-
-The advantage of this new system is that it allows you to specify user data which will be passed in to the allocation routines.
-
-Passing in null for the allocation callbacks object will cause dr_mp3 to use defaults which is the same as DRMP3_MALLOC,
-DRMP3_REALLOC and DRMP3_FREE and the equivalent of how it worked in previous versions.
-
-Every API that opens a drmp3 object now takes this extra parameter. These include the following:
-
-    drmp3_init()
-    drmp3_init_file()
-    drmp3_init_memory()
-    drmp3_open_and_read_pcm_frames_f32()
-    drmp3_open_and_read_pcm_frames_s16()
-    drmp3_open_memory_and_read_pcm_frames_f32()
-    drmp3_open_memory_and_read_pcm_frames_s16()
-    drmp3_open_file_and_read_pcm_frames_f32()
-    drmp3_open_file_and_read_pcm_frames_s16()
-
-Renamed APIs
-------------
-The following APIs have been renamed for consistency with other dr_* libraries and to make it clear that they return PCM frame
-counts rather than sample counts.
-
-    drmp3_open_and_read_f32()        -> drmp3_open_and_read_pcm_frames_f32()
-    drmp3_open_and_read_s16()        -> drmp3_open_and_read_pcm_frames_s16()
-    drmp3_open_memory_and_read_f32() -> drmp3_open_memory_and_read_pcm_frames_f32()
-    drmp3_open_memory_and_read_s16() -> drmp3_open_memory_and_read_pcm_frames_s16()
-    drmp3_open_file_and_read_f32()   -> drmp3_open_file_and_read_pcm_frames_f32()
-    drmp3_open_file_and_read_s16()   -> drmp3_open_file_and_read_pcm_frames_s16()
-*/
-
-/*
-REVISION HISTORY
-================
-v0.6.38 - 2023-11-02
-  - Fix build for ARMv6-M.
-
-v0.6.37 - 2023-07-07
-  - Silence a static analysis warning.
-
-v0.6.36 - 2023-06-17
-  - Fix an incorrect date in revision history. No functional change.
-
-v0.6.35 - 2023-05-22
-  - Minor code restructure. No functional change.
-
-v0.6.34 - 2022-09-17
-  - Fix compilation with DJGPP.
-  - Fix compilation when compiling with x86 with no SSE2.
-  - Remove an unnecessary variable from the drmp3 structure.
-
-v0.6.33 - 2022-04-10
-  - Fix compilation error with the MSVC ARM64 build.
-  - Fix compilation error on older versions of GCC.
-  - Remove some unused functions.
-
-v0.6.32 - 2021-12-11
-  - Fix a warning with Clang.
-
-v0.6.31 - 2021-08-22
-  - Fix a bug when loading from memory.
-
-v0.6.30 - 2021-08-16
-  - Silence some warnings.
-  - Replace memory operations with DRMP3_* macros.
-
-v0.6.29 - 2021-08-08
-  - Bring up to date with minimp3.
-
-v0.6.28 - 2021-07-31
-  - Fix platform detection for ARM64.
-  - Fix a compilation error with C89.
-
-v0.6.27 - 2021-02-21
-  - Fix a warning due to referencing _MSC_VER when it is undefined.
-
-v0.6.26 - 2021-01-31
-  - Bring up to date with minimp3.
-
-v0.6.25 - 2020-12-26
-  - Remove DRMP3_DEFAULT_CHANNELS and DRMP3_DEFAULT_SAMPLE_RATE which are leftovers from some removed APIs.
-
-v0.6.24 - 2020-12-07
-  - Fix a typo in version date for 0.6.23.
-
-v0.6.23 - 2020-12-03
-  - Fix an error where a file can be closed twice when initialization of the decoder fails.
-
-v0.6.22 - 2020-12-02
-  - Fix an error where it's possible for a file handle to be left open when initialization of the decoder fails.
-
-v0.6.21 - 2020-11-28
-  - Bring up to date with minimp3.
-
-v0.6.20 - 2020-11-21
-  - Fix compilation with OpenWatcom.
-
-v0.6.19 - 2020-11-13
-  - Minor code clean up.
-
-v0.6.18 - 2020-11-01
-  - Improve compiler support for older versions of GCC.
-
-v0.6.17 - 2020-09-28
-  - Bring up to date with minimp3.
-
-v0.6.16 - 2020-08-02
-  - Simplify sized types.
-
-v0.6.15 - 2020-07-25
-  - Fix a compilation warning.
-
-v0.6.14 - 2020-07-23
-  - Fix undefined behaviour with memmove().
-
-v0.6.13 - 2020-07-06
-  - Fix a bug when converting from s16 to f32 in drmp3_read_pcm_frames_f32().
-
-v0.6.12 - 2020-06-23
-  - Add include guard for the implementation section.
-
-v0.6.11 - 2020-05-26
-  - Fix use of uninitialized variable error.
-
-v0.6.10 - 2020-05-16
-  - Add compile-time and run-time version querying.
-    - DRMP3_VERSION_MINOR
-    - DRMP3_VERSION_MAJOR
-    - DRMP3_VERSION_REVISION
-    - DRMP3_VERSION_STRING
-    - drmp3_version()
-    - drmp3_version_string()
-
-v0.6.9 - 2020-04-30
-  - Change the `pcm` parameter of drmp3dec_decode_frame() to a `const drmp3_uint8*` for consistency with internal APIs.
-
-v0.6.8 - 2020-04-26
-  - Optimizations to decoding when initializing from memory.
-
-v0.6.7 - 2020-04-25
-  - Fix a compilation error with DR_MP3_NO_STDIO
-  - Optimization to decoding by reducing some data movement.
-
-v0.6.6 - 2020-04-23
-  - Fix a minor bug with the running PCM frame counter.
-
-v0.6.5 - 2020-04-19
-  - Fix compilation error on ARM builds.
-
-v0.6.4 - 2020-04-19
-  - Bring up to date with changes to minimp3.
-
-v0.6.3 - 2020-04-13
-  - Fix some pedantic warnings.
-
-v0.6.2 - 2020-04-10
-  - Fix a crash in drmp3_open_*_and_read_pcm_frames_*() if the output config object is NULL.
-
-v0.6.1 - 2020-04-05
-  - Fix warnings.
-
-v0.6.0 - 2020-04-04
-  - API CHANGE: Remove the pConfig parameter from the following APIs:
-    - drmp3_init()
-    - drmp3_init_memory()
-    - drmp3_init_file()
-  - Add drmp3_init_file_w() for opening a file from a wchar_t encoded path.
-
-v0.5.6 - 2020-02-12
-  - Bring up to date with minimp3.
-
-v0.5.5 - 2020-01-29
-  - Fix a memory allocation bug in high level s16 decoding APIs.
-
-v0.5.4 - 2019-12-02
-  - Fix a possible null pointer dereference when using custom memory allocators for realloc().
-
-v0.5.3 - 2019-11-14
-  - Fix typos in documentation.
-
-v0.5.2 - 2019-11-02
-  - Bring up to date with minimp3.
-
-v0.5.1 - 2019-10-08
-  - Fix a warning with GCC.
-
-v0.5.0 - 2019-10-07
-  - API CHANGE: Add support for user defined memory allocation routines. This system allows the program to specify their own memory allocation
-    routines with a user data pointer for client-specific contextual data. This adds an extra parameter to the end of the following APIs:
-    - drmp3_init()
-    - drmp3_init_file()
-    - drmp3_init_memory()
-    - drmp3_open_and_read_pcm_frames_f32()
-    - drmp3_open_and_read_pcm_frames_s16()
-    - drmp3_open_memory_and_read_pcm_frames_f32()
-    - drmp3_open_memory_and_read_pcm_frames_s16()
-    - drmp3_open_file_and_read_pcm_frames_f32()
-    - drmp3_open_file_and_read_pcm_frames_s16()
-  - API CHANGE: Renamed the following APIs:
-    - drmp3_open_and_read_f32()        -> drmp3_open_and_read_pcm_frames_f32()
-    - drmp3_open_and_read_s16()        -> drmp3_open_and_read_pcm_frames_s16()
-    - drmp3_open_memory_and_read_f32() -> drmp3_open_memory_and_read_pcm_frames_f32()
-    - drmp3_open_memory_and_read_s16() -> drmp3_open_memory_and_read_pcm_frames_s16()
-    - drmp3_open_file_and_read_f32()   -> drmp3_open_file_and_read_pcm_frames_f32()
-    - drmp3_open_file_and_read_s16()   -> drmp3_open_file_and_read_pcm_frames_s16()
-
-v0.4.7 - 2019-07-28
-  - Fix a compiler error.
-
-v0.4.6 - 2019-06-14
-  - Fix a compiler error.
-
-v0.4.5 - 2019-06-06
-  - Bring up to date with minimp3.
-
-v0.4.4 - 2019-05-06
-  - Fixes to the VC6 build.
-
-v0.4.3 - 2019-05-05
-  - Use the channel count and/or sample rate of the first MP3 frame instead of DRMP3_DEFAULT_CHANNELS and
-    DRMP3_DEFAULT_SAMPLE_RATE when they are set to 0. To use the old behaviour, just set the relevant property to
-    DRMP3_DEFAULT_CHANNELS or DRMP3_DEFAULT_SAMPLE_RATE.
-  - Add s16 reading APIs
-    - drmp3_read_pcm_frames_s16
-    - drmp3_open_memory_and_read_pcm_frames_s16
-    - drmp3_open_and_read_pcm_frames_s16
-    - drmp3_open_file_and_read_pcm_frames_s16
-  - Add drmp3_get_mp3_and_pcm_frame_count() to the public header section.
-  - Add support for C89.
-  - Change license to choice of public domain or MIT-0.
-
-v0.4.2 - 2019-02-21
-  - Fix a warning.
-
-v0.4.1 - 2018-12-30
-  - Fix a warning.
-
-v0.4.0 - 2018-12-16
-  - API CHANGE: Rename some APIs:
-    - drmp3_read_f32 -> to drmp3_read_pcm_frames_f32
-    - drmp3_seek_to_frame -> drmp3_seek_to_pcm_frame
-    - drmp3_open_and_decode_f32 -> drmp3_open_and_read_pcm_frames_f32
-    - drmp3_open_and_decode_memory_f32 -> drmp3_open_memory_and_read_pcm_frames_f32
-    - drmp3_open_and_decode_file_f32 -> drmp3_open_file_and_read_pcm_frames_f32
-  - Add drmp3_get_pcm_frame_count().
-  - Add drmp3_get_mp3_frame_count().
-  - Improve seeking performance.
-
-v0.3.2 - 2018-09-11
-  - Fix a couple of memory leaks.
-  - Bring up to date with minimp3.
-
-v0.3.1 - 2018-08-25
-  - Fix C++ build.
-
-v0.3.0 - 2018-08-25
-  - Bring up to date with minimp3. This has a minor API change: the "pcm" parameter of drmp3dec_decode_frame() has
-    been changed from short* to void* because it can now output both s16 and f32 samples, depending on whether or
-    not the DR_MP3_FLOAT_OUTPUT option is set.
-
-v0.2.11 - 2018-08-08
-  - Fix a bug where the last part of a file is not read.
-
-v0.2.10 - 2018-08-07
-  - Improve 64-bit detection.
-
-v0.2.9 - 2018-08-05
-  - Fix C++ build on older versions of GCC.
-  - Bring up to date with minimp3.
-
-v0.2.8 - 2018-08-02
-  - Fix compilation errors with older versions of GCC.
-
-v0.2.7 - 2018-07-13
-  - Bring up to date with minimp3.
-
-v0.2.6 - 2018-07-12
-  - Bring up to date with minimp3.
-
-v0.2.5 - 2018-06-22
-  - Bring up to date with minimp3.
-
-v0.2.4 - 2018-05-12
-  - Bring up to date with minimp3.
-
-v0.2.3 - 2018-04-29
-  - Fix TCC build.
-
-v0.2.2 - 2018-04-28
-  - Fix bug when opening a decoder from memory.
-
-v0.2.1 - 2018-04-27
-  - Efficiency improvements when the decoder reaches the end of the stream.
-
-v0.2 - 2018-04-21
-  - Bring up to date with minimp3.
-  - Start using major.minor.revision versioning.
-
-v0.1d - 2018-03-30
-  - Bring up to date with minimp3.
-
-v0.1c - 2018-03-11
-  - Fix C++ build error.
-
-v0.1b - 2018-03-07
-  - Bring up to date with minimp3.
-
-v0.1a - 2018-02-28
-  - Fix compilation error on GCC/Clang.
-  - Fix some warnings.
-
-v0.1 - 2018-02-xx
-  - Initial versioned release.
-*/
-
-/*
-This software is available as a choice of the following licenses. Choose
-whichever you prefer.
-
-===============================================================================
-ALTERNATIVE 1 - Public Domain (www.unlicense.org)
-===============================================================================
-This is free and unencumbered software released into the public domain.
-
-Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
-software, either in source code form or as a compiled binary, for any purpose,
-commercial or non-commercial, and by any means.
-
-In jurisdictions that recognize copyright laws, the author or authors of this
-software dedicate any and all copyright interest in the software to the public
-domain. We make this dedication for the benefit of the public at large and to
-the detriment of our heirs and successors. We intend this dedication to be an
-overt act of relinquishment in perpetuity of all present and future rights to
-this software under copyright law.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
-ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-
-For more information, please refer to <http://unlicense.org/>
-
-===============================================================================
-ALTERNATIVE 2 - MIT No Attribution
-===============================================================================
-Copyright 2023 David Reid
-
-Permission is hereby granted, free of charge, to any person obtaining a copy of
-this software and associated documentation files (the "Software"), to deal in
-the Software without restriction, including without limitation the rights to
-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
-of the Software, and to permit persons to whom the Software is furnished to do
-so.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
-*/
-
-/*
-    https://github.com/lieff/minimp3
-    To the extent possible under law, the author(s) have dedicated all copyright and related and neighboring rights to this software to the public domain worldwide.
-    This software is distributed without any warranty.
-    See <http://creativecommons.org/publicdomain/zero/1.0/>.
-*/
diff --git a/modules/jai-mpris/example/main b/modules/jai-mpris/example/main
new file mode 100755
index 0000000..368264d
Binary files /dev/null and b/modules/jai-mpris/example/main differ
diff --git a/modules/jai-mpris/example/main.jai b/modules/jai-mpris/example/main.jai
new file mode 100644
index 0000000..d63fe9b
--- /dev/null
+++ b/modules/jai-mpris/example/main.jai
@@ -0,0 +1,63 @@
+#import "Basic";
+#import,file "../module.jai";
+
+// State shared between the main loop and D-Bus callbacks.
+// Callbacks are #c_call so they can't use Jai allocators directly —
+// communicate via a plain struct instead.
+Player_State :: struct {
+    should_play_pause : bool;
+    should_next       : bool;
+    should_previous   : bool;
+}
+
+on_play_pause :: (ud: *void) #c_call {
+    state := cast(*Player_State) ud;
+    state.should_play_pause = true;
+}
+on_next :: (ud: *void) #c_call {
+    (cast(*Player_State) ud).should_next = true;
+}
+on_previous :: (ud: *void) #c_call {
+    (cast(*Player_State) ud).should_previous = true;
+}
+
+main :: () {
+    player := mpris_player_create("ExamplePlayer", "Example Music Player");
+    if !player { log_error("Failed to create MPRIS player"); return; }
+    defer mpris_player_destroy(player);
+
+    state: Player_State;
+    mpris_on_play_pause(player, on_play_pause, *state);
+    mpris_on_next      (player, on_next,       *state);
+    mpris_on_previous  (player, on_previous,   *state);
+
+    mpris_set_playback_status(player, "Playing");
+    mpris_set_can_go_next    (player, true);
+    mpris_set_can_go_previous(player, true);
+
+    meta: Mpris_Metadata;
+    meta.title     = "Some Song";
+    meta.artist    = "Some Artist";
+    meta.album     = "Some Album";
+    meta.length_us = 210 * 1_000_000;
+    mpris_set_metadata(player, meta);
+
+    print("Player registered. Press Ctrl+C to stop.\n");
+
+    while true {
+        mpris_process(player);
+
+        if state.should_play_pause {
+            state.should_play_pause = false;
+            print("PlayPause\n");
+        }
+        if state.should_next {
+            state.should_next = false;
+            print("Next\n");
+        }
+        if state.should_previous {
+            state.should_previous = false;
+            print("Previous\n");
+        }
+    }
+}
diff --git a/modules/jai-mpris/linux/libmpris.so b/modules/jai-mpris/linux/libmpris.so
new file mode 100755
index 0000000..196e81e
Binary files /dev/null and b/modules/jai-mpris/linux/libmpris.so differ
diff --git a/modules/jai-mpris/main.c b/modules/jai-mpris/main.c
new file mode 100644
index 0000000..812eff8
--- /dev/null
+++ b/modules/jai-mpris/main.c
@@ -0,0 +1,51 @@
+#include <stdio.h>
+#include <systemd/sd-bus.h>
+
+// 1. Handle Method Calls (Play, Pause, etc.)
+static int handle_play_pause(sd_bus_message *m, void *userdata, sd_bus_error *ret_error) {
+    printf("Action: Play/Pause toggled!\n");
+    return sd_bus_reply_method_return(m, NULL);
+}
+
+// 2. Define the MPRIS Interface VTable
+static const sd_bus_vtable mpris_vtable[] = {
+    SD_BUS_VTABLE_START(0),
+    // Methods
+    SD_BUS_METHOD("PlayPause", NULL, NULL, handle_play_pause, SD_BUS_VTABLE_UNPRIVILEGED),
+    // Properties (Metadata is complex, Status is a simple string)
+    SD_BUS_PROPERTY("PlaybackStatus", "s", NULL, offsetof(struct my_player, status), SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
+    SD_BUS_VTABLE_END
+};
+
+int main() {
+    sd_bus *bus = NULL;
+    int r;
+
+    // Connect to the user session bus
+    r = sd_bus_open_user(&bus);
+    if (r < 0) return fprintf(stderr, "Failed to connect to bus: %s\n", strerror(-r));
+
+    // Own the MPRIS name so menus find you
+    r = sd_bus_request_name(bus, "org.mpris.MediaPlayer2.MyCPlayer", 0);
+    if (r < 0) return fprintf(stderr, "Failed to acquire name: %s\n", strerror(-r));
+
+    // Register the object path and interfaces
+    r = sd_bus_add_object_vtable(bus, NULL,
+                                 "/org/mpris/MediaPlayer2",
+                                 "org.mpris.MediaPlayer2.Player",
+                                 mpris_vtable,
+                                 NULL);
+
+    printf("Player active. Press Ctrl+C to stop.\n");
+
+    for (;;) {
+        r = sd_bus_process(bus, NULL);
+        if (r < 0) break;
+        if (r > 0) continue;
+        r = sd_bus_wait(bus, (uint64_t) -1);
+        if (r < 0) break;
+    }
+
+    sd_bus_unref(bus);
+    return 0;
+}
diff --git a/modules/jai-mpris/module.jai b/modules/jai-mpris/module.jai
new file mode 100644
index 0000000..f6ea5aa
--- /dev/null
+++ b/modules/jai-mpris/module.jai
@@ -0,0 +1,110 @@
+#import "Basic";
+
+// Build the C library first (only .so — Jai links dynamically, which pulls in libsystemd automatically):
+//   cc -fPIC -shared -o linux/libmpris.so mpris.c -lsystemd
+libmpris :: #library "linux/libmpris";
+
+/* ── Types ──────────────────────────────────────────────────────────────── */
+
+Mpris_Player :: *void;
+
+// Callbacks must be #c_call. Pass your context pointer as userdata.
+Mpris_Callback             :: #type (userdata: *void)                                         -> void #c_call;
+Mpris_Seek_Callback        :: #type (offset_us: s64,  userdata: *void)                       -> void #c_call;
+Mpris_Set_Position_Callback :: #type (track_id: *u8, position_us: s64, userdata: *void)      -> void #c_call;
+Mpris_Volume_Callback      :: #type (volume: float64, userdata: *void)                       -> void #c_call;
+
+/* ── Raw C bindings ─────────────────────────────────────────────────────── */
+
+_mpris_player_create       :: (player_name: *u8, identity: *u8) -> Mpris_Player                                             #foreign libmpris "mpris_player_create";
+_mpris_player_destroy      :: (p: Mpris_Player)                                                                              #foreign libmpris "mpris_player_destroy";
+_mpris_set_playback_status :: (p: Mpris_Player, status: *u8)                                                                 #foreign libmpris "mpris_set_playback_status";
+_mpris_set_metadata        :: (p: Mpris_Player, track_id: *u8, title: *u8, artist: *u8, album: *u8, length_us: s64)          #foreign libmpris "mpris_set_metadata";
+_mpris_set_position        :: (p: Mpris_Player, position_us: s64)                                                            #foreign libmpris "mpris_set_position";
+_mpris_set_volume          :: (p: Mpris_Player, volume: float64)                                                             #foreign libmpris "mpris_set_volume";
+_mpris_set_can_go_next     :: (p: Mpris_Player, value: s32)                                                                  #foreign libmpris "mpris_set_can_go_next";
+_mpris_set_can_go_previous :: (p: Mpris_Player, value: s32)                                                                  #foreign libmpris "mpris_set_can_go_previous";
+_mpris_set_can_play        :: (p: Mpris_Player, value: s32)                                                                  #foreign libmpris "mpris_set_can_play";
+_mpris_set_can_pause       :: (p: Mpris_Player, value: s32)                                                                  #foreign libmpris "mpris_set_can_pause";
+_mpris_set_can_seek        :: (p: Mpris_Player, value: s32)                                                                  #foreign libmpris "mpris_set_can_seek";
+_mpris_emit_seeked         :: (p: Mpris_Player, position_us: s64)                                                            #foreign libmpris "mpris_emit_seeked";
+_mpris_on_play             :: (p: Mpris_Player, cb: Mpris_Callback,              userdata: *void)                            #foreign libmpris "mpris_on_play";
+_mpris_on_pause            :: (p: Mpris_Player, cb: Mpris_Callback,              userdata: *void)                            #foreign libmpris "mpris_on_pause";
+_mpris_on_play_pause       :: (p: Mpris_Player, cb: Mpris_Callback,              userdata: *void)                            #foreign libmpris "mpris_on_play_pause";
+_mpris_on_stop             :: (p: Mpris_Player, cb: Mpris_Callback,              userdata: *void)                            #foreign libmpris "mpris_on_stop";
+_mpris_on_next             :: (p: Mpris_Player, cb: Mpris_Callback,              userdata: *void)                            #foreign libmpris "mpris_on_next";
+_mpris_on_previous         :: (p: Mpris_Player, cb: Mpris_Callback,              userdata: *void)                            #foreign libmpris "mpris_on_previous";
+_mpris_on_seek             :: (p: Mpris_Player, cb: Mpris_Seek_Callback,         userdata: *void)                            #foreign libmpris "mpris_on_seek";
+_mpris_on_set_position     :: (p: Mpris_Player, cb: Mpris_Set_Position_Callback, userdata: *void)                            #foreign libmpris "mpris_on_set_position";
+_mpris_on_volume           :: (p: Mpris_Player, cb: Mpris_Volume_Callback,       userdata: *void)                            #foreign libmpris "mpris_on_volume";
+_mpris_process             :: (p: Mpris_Player) -> s32                                                                       #foreign libmpris "mpris_process";
+
+/* ── Jai-friendly wrappers (handle string conversion) ───────────────────── */
+
+// player_name: D-Bus name component, no spaces (e.g. "MyPlayer")
+// identity:    Human-readable name shown in media menus (e.g. "My Music Player")
+mpris_player_create :: (player_name: string, identity: string) -> Mpris_Player {
+    n := temp_c_string(player_name);
+    i := temp_c_string(identity);
+    return _mpris_player_create(n, i);
+}
+
+mpris_player_destroy :: (p: Mpris_Player) {
+    _mpris_player_destroy(p);
+}
+
+// status: "Playing" | "Paused" | "Stopped"
+mpris_set_playback_status :: (p: Mpris_Player, status: string) {
+    _mpris_set_playback_status(p, temp_c_string(status));
+}
+
+Mpris_Metadata :: struct {
+    track_id  : string;  // D-Bus object path, e.g. "/myapp/track/42". Leave empty for auto.
+    title     : string;
+    artist    : string;
+    album     : string;
+    length_us : s64;     // Duration in microseconds. 0 = unknown.
+}
+
+mpris_set_metadata :: (p: Mpris_Player, meta: Mpris_Metadata) {
+    tid := ifx meta.track_id then temp_c_string(meta.track_id) else null;
+    t   := ifx meta.title    then temp_c_string(meta.title)    else null;
+    a   := ifx meta.artist   then temp_c_string(meta.artist)   else null;
+    al  := ifx meta.album    then temp_c_string(meta.album)    else null;
+    _mpris_set_metadata(p, tid, t, a, al, meta.length_us);
+}
+
+mpris_set_position :: (p: Mpris_Player, position_us: s64) {
+    _mpris_set_position(p, position_us);
+}
+
+mpris_set_volume :: (p: Mpris_Player, volume: float64) {
+    _mpris_set_volume(p, volume);
+}
+
+mpris_set_can_go_next     :: (p: Mpris_Player, v: bool) { _mpris_set_can_go_next(p,     cast(s32) ifx v then 1 else 0); }
+mpris_set_can_go_previous :: (p: Mpris_Player, v: bool) { _mpris_set_can_go_previous(p, cast(s32) ifx v then 1 else 0); }
+mpris_set_can_play        :: (p: Mpris_Player, v: bool) { _mpris_set_can_play(p,        cast(s32) ifx v then 1 else 0); }
+mpris_set_can_pause       :: (p: Mpris_Player, v: bool) { _mpris_set_can_pause(p,       cast(s32) ifx v then 1 else 0); }
+mpris_set_can_seek        :: (p: Mpris_Player, v: bool) { _mpris_set_can_seek(p,        cast(s32) ifx v then 1 else 0); }
+
+mpris_emit_seeked :: (p: Mpris_Player, position_us: s64) {
+    _mpris_emit_seeked(p, position_us);
+}
+
+mpris_on_play         :: (p: Mpris_Player, cb: Mpris_Callback,              ud: *void) { _mpris_on_play(p, cb, ud); }
+mpris_on_pause        :: (p: Mpris_Player, cb: Mpris_Callback,              ud: *void) { _mpris_on_pause(p, cb, ud); }
+mpris_on_play_pause   :: (p: Mpris_Player, cb: Mpris_Callback,              ud: *void) { _mpris_on_play_pause(p, cb, ud); }
+mpris_on_stop         :: (p: Mpris_Player, cb: Mpris_Callback,              ud: *void) { _mpris_on_stop(p, cb, ud); }
+mpris_on_next         :: (p: Mpris_Player, cb: Mpris_Callback,              ud: *void) { _mpris_on_next(p, cb, ud); }
+mpris_on_previous     :: (p: Mpris_Player, cb: Mpris_Callback,              ud: *void) { _mpris_on_previous(p, cb, ud); }
+mpris_on_seek         :: (p: Mpris_Player, cb: Mpris_Seek_Callback,         ud: *void) { _mpris_on_seek(p, cb, ud); }
+mpris_on_set_position :: (p: Mpris_Player, cb: Mpris_Set_Position_Callback, ud: *void) { _mpris_on_set_position(p, cb, ud); }
+mpris_on_volume       :: (p: Mpris_Player, cb: Mpris_Volume_Callback,       ud: *void) { _mpris_on_volume(p, cb, ud); }
+
+// Returns true if messages were processed, false if idle, exits on error.
+mpris_process :: (p: Mpris_Player) -> bool {
+    r := _mpris_process(p);
+    if r < 0 { log_error("mpris_process failed: %", r); return false; }
+    return r > 0;
+}
diff --git a/modules/jai-mpris/mpris.c b/modules/jai-mpris/mpris.c
new file mode 100644
index 0000000..5fd0776
--- /dev/null
+++ b/modules/jai-mpris/mpris.c
@@ -0,0 +1,409 @@
+#include "mpris.h"
+#include <systemd/sd-bus.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#define MPRIS_PATH  "/org/mpris/MediaPlayer2"
+#define IFACE_ROOT  "org.mpris.MediaPlayer2"
+#define IFACE_PLAYER "org.mpris.MediaPlayer2.Player"
+
+struct MprisPlayer {
+    sd_bus      *bus;
+    sd_bus_slot *slot_root;
+    sd_bus_slot *slot_player;
+
+    char    *identity;
+    char    *playback_status;
+    double   volume;
+    int64_t  position_us;
+    int      can_go_next;
+    int      can_go_previous;
+    int      can_play;
+    int      can_pause;
+    int      can_seek;
+
+    char    *track_id;
+    char    *title;
+    char    *artist;
+    char    *album;
+    int64_t  length_us;
+
+    MprisCallback            on_play;         void *ud_play;
+    MprisCallback            on_pause;        void *ud_pause;
+    MprisCallback            on_play_pause;   void *ud_play_pause;
+    MprisCallback            on_stop;         void *ud_stop;
+    MprisCallback            on_next;         void *ud_next;
+    MprisCallback            on_previous;     void *ud_previous;
+    MprisSeekCallback        on_seek;         void *ud_seek;
+    MprisSetPositionCallback on_set_position; void *ud_set_position;
+    MprisVolumeCallback      on_volume;       void *ud_volume;
+};
+
+/* ── Root interface ─────────────────────────────────────────────────────── */
+
+static int handle_raise(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    return sd_bus_reply_method_return(m, "");
+}
+static int handle_quit(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    return sd_bus_reply_method_return(m, "");
+}
+
+static int get_identity(sd_bus *bus, const char *path, const char *iface,
+                        const char *prop, sd_bus_message *reply,
+                        void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "s", ((MprisPlayer *)ud)->identity);
+}
+static int get_false(sd_bus *bus, const char *path, const char *iface,
+                     const char *prop, sd_bus_message *reply,
+                     void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "b", 0);
+}
+static int get_empty_strv(sd_bus *bus, const char *path, const char *iface,
+                          const char *prop, sd_bus_message *reply,
+                          void *ud, sd_bus_error *e) {
+    int r = sd_bus_message_open_container(reply, 'a', "s");
+    if (r < 0) return r;
+    return sd_bus_message_close_container(reply);
+}
+
+static const sd_bus_vtable root_vtable[] = {
+    SD_BUS_VTABLE_START(0),
+    SD_BUS_METHOD("Raise", "", "", handle_raise, SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_METHOD("Quit",  "", "", handle_quit,  SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_PROPERTY("CanQuit",             "b",  get_false,      0, 0),
+    SD_BUS_PROPERTY("CanRaise",            "b",  get_false,      0, 0),
+    SD_BUS_PROPERTY("HasTrackList",        "b",  get_false,      0, 0),
+    SD_BUS_PROPERTY("Identity",            "s",  get_identity,   0, 0),
+    SD_BUS_PROPERTY("SupportedUriSchemes", "as", get_empty_strv, 0, 0),
+    SD_BUS_PROPERTY("SupportedMimeTypes",  "as", get_empty_strv, 0, 0),
+    SD_BUS_VTABLE_END
+};
+
+/* ── Player interface ───────────────────────────────────────────────────── */
+
+static int handle_play(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    MprisPlayer *p = ud;
+    if (p->on_play) p->on_play(p->ud_play);
+    return sd_bus_reply_method_return(m, "");
+}
+static int handle_pause(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    MprisPlayer *p = ud;
+    if (p->on_pause) p->on_pause(p->ud_pause);
+    return sd_bus_reply_method_return(m, "");
+}
+static int handle_play_pause(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    MprisPlayer *p = ud;
+    if (p->on_play_pause) p->on_play_pause(p->ud_play_pause);
+    return sd_bus_reply_method_return(m, "");
+}
+static int handle_stop(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    MprisPlayer *p = ud;
+    if (p->on_stop) p->on_stop(p->ud_stop);
+    return sd_bus_reply_method_return(m, "");
+}
+static int handle_next(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    MprisPlayer *p = ud;
+    if (p->on_next) p->on_next(p->ud_next);
+    return sd_bus_reply_method_return(m, "");
+}
+static int handle_previous(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    MprisPlayer *p = ud;
+    if (p->on_previous) p->on_previous(p->ud_previous);
+    return sd_bus_reply_method_return(m, "");
+}
+static int handle_seek(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    MprisPlayer *p = ud;
+    int64_t offset;
+    int r = sd_bus_message_read(m, "x", &offset);
+    if (r < 0) return r;
+    if (p->on_seek) p->on_seek(offset, p->ud_seek);
+    return sd_bus_reply_method_return(m, "");
+}
+static int handle_set_position(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    MprisPlayer *p = ud;
+    const char *track_id;
+    int64_t position;
+    int r = sd_bus_message_read(m, "ox", &track_id, &position);
+    if (r < 0) return r;
+    if (p->on_set_position) p->on_set_position(track_id, position, p->ud_set_position);
+    return sd_bus_reply_method_return(m, "");
+}
+static int handle_open_uri(sd_bus_message *m, void *ud, sd_bus_error *e) {
+    return sd_bus_reply_method_return(m, "");
+}
+
+static int get_playback_status(sd_bus *bus, const char *path, const char *iface,
+                                const char *prop, sd_bus_message *reply,
+                                void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "s", ((MprisPlayer *)ud)->playback_status);
+}
+
+static int get_loop_status(sd_bus *bus, const char *path, const char *iface,
+                            const char *prop, sd_bus_message *reply,
+                            void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "s", "None");
+}
+
+static int get_shuffle(sd_bus *bus, const char *path, const char *iface,
+                        const char *prop, sd_bus_message *reply,
+                        void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "b", 0);
+}
+
+static int get_metadata(sd_bus *bus, const char *path, const char *iface,
+                         const char *prop, sd_bus_message *reply,
+                         void *ud, sd_bus_error *e) {
+    MprisPlayer *p = ud;
+    int r;
+
+    r = sd_bus_message_open_container(reply, 'a', "{sv}");
+    if (r < 0) return r;
+
+    /* mpris:trackid is mandatory */
+    const char *trackid = p->track_id ? p->track_id : "/org/mpris/MediaPlayer2/TrackList/NoTrack";
+    r = sd_bus_message_open_container(reply, 'e', "sv"); if (r < 0) return r;
+    r = sd_bus_message_append(reply, "s", "mpris:trackid"); if (r < 0) return r;
+    r = sd_bus_message_open_container(reply, 'v', "o"); if (r < 0) return r;
+    r = sd_bus_message_append(reply, "o", trackid); if (r < 0) return r;
+    r = sd_bus_message_close_container(reply); if (r < 0) return r;
+    r = sd_bus_message_close_container(reply); if (r < 0) return r;
+
+    if (p->title) {
+        r = sd_bus_message_open_container(reply, 'e', "sv"); if (r < 0) return r;
+        r = sd_bus_message_append(reply, "s", "xesam:title"); if (r < 0) return r;
+        r = sd_bus_message_open_container(reply, 'v', "s"); if (r < 0) return r;
+        r = sd_bus_message_append(reply, "s", p->title); if (r < 0) return r;
+        r = sd_bus_message_close_container(reply); if (r < 0) return r;
+        r = sd_bus_message_close_container(reply); if (r < 0) return r;
+    }
+
+    /* xesam:artist is an array of strings */
+    if (p->artist) {
+        r = sd_bus_message_open_container(reply, 'e', "sv"); if (r < 0) return r;
+        r = sd_bus_message_append(reply, "s", "xesam:artist"); if (r < 0) return r;
+        r = sd_bus_message_open_container(reply, 'v', "as"); if (r < 0) return r;
+        r = sd_bus_message_open_container(reply, 'a', "s"); if (r < 0) return r;
+        r = sd_bus_message_append(reply, "s", p->artist); if (r < 0) return r;
+        r = sd_bus_message_close_container(reply); if (r < 0) return r;
+        r = sd_bus_message_close_container(reply); if (r < 0) return r;
+        r = sd_bus_message_close_container(reply); if (r < 0) return r;
+    }
+
+    if (p->album) {
+        r = sd_bus_message_open_container(reply, 'e', "sv"); if (r < 0) return r;
+        r = sd_bus_message_append(reply, "s", "xesam:album"); if (r < 0) return r;
+        r = sd_bus_message_open_container(reply, 'v', "s"); if (r < 0) return r;
+        r = sd_bus_message_append(reply, "s", p->album); if (r < 0) return r;
+        r = sd_bus_message_close_container(reply); if (r < 0) return r;
+        r = sd_bus_message_close_container(reply); if (r < 0) return r;
+    }
+
+    if (p->length_us > 0) {
+        r = sd_bus_message_open_container(reply, 'e', "sv"); if (r < 0) return r;
+        r = sd_bus_message_append(reply, "s", "mpris:length"); if (r < 0) return r;
+        r = sd_bus_message_open_container(reply, 'v', "x"); if (r < 0) return r;
+        r = sd_bus_message_append(reply, "x", p->length_us); if (r < 0) return r;
+        r = sd_bus_message_close_container(reply); if (r < 0) return r;
+        r = sd_bus_message_close_container(reply); if (r < 0) return r;
+    }
+
+    return sd_bus_message_close_container(reply);
+}
+
+static int get_volume(sd_bus *bus, const char *path, const char *iface,
+                       const char *prop, sd_bus_message *reply,
+                       void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "d", ((MprisPlayer *)ud)->volume);
+}
+
+static int set_volume(sd_bus *bus, const char *path, const char *iface,
+                       const char *prop, sd_bus_message *value,
+                       void *ud, sd_bus_error *e) {
+    MprisPlayer *p = ud;
+    double v;
+    int r = sd_bus_message_read(value, "d", &v);
+    if (r < 0) return r;
+    p->volume = v;
+    if (p->on_volume) p->on_volume(v, p->ud_volume);
+    return 1; /* signal that the value changed so PropertiesChanged is emitted */
+}
+
+static int get_position(sd_bus *bus, const char *path, const char *iface,
+                         const char *prop, sd_bus_message *reply,
+                         void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "x", ((MprisPlayer *)ud)->position_us);
+}
+
+static int get_rate(sd_bus *bus, const char *path, const char *iface,
+                     const char *prop, sd_bus_message *reply,
+                     void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "d", 1.0);
+}
+
+static int get_can_go_next(sd_bus *bus, const char *path, const char *iface,
+                            const char *prop, sd_bus_message *reply,
+                            void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "b", ((MprisPlayer *)ud)->can_go_next);
+}
+static int get_can_go_previous(sd_bus *bus, const char *path, const char *iface,
+                                const char *prop, sd_bus_message *reply,
+                                void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "b", ((MprisPlayer *)ud)->can_go_previous);
+}
+static int get_can_play(sd_bus *bus, const char *path, const char *iface,
+                         const char *prop, sd_bus_message *reply,
+                         void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "b", ((MprisPlayer *)ud)->can_play);
+}
+static int get_can_pause(sd_bus *bus, const char *path, const char *iface,
+                          const char *prop, sd_bus_message *reply,
+                          void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "b", ((MprisPlayer *)ud)->can_pause);
+}
+static int get_can_seek(sd_bus *bus, const char *path, const char *iface,
+                         const char *prop, sd_bus_message *reply,
+                         void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "b", ((MprisPlayer *)ud)->can_seek);
+}
+static int get_can_control(sd_bus *bus, const char *path, const char *iface,
+                            const char *prop, sd_bus_message *reply,
+                            void *ud, sd_bus_error *e) {
+    return sd_bus_message_append(reply, "b", 1);
+}
+
+static const sd_bus_vtable player_vtable[] = {
+    SD_BUS_VTABLE_START(0),
+    SD_BUS_METHOD("Play",        "",   "", handle_play,         SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_METHOD("Pause",       "",   "", handle_pause,        SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_METHOD("PlayPause",   "",   "", handle_play_pause,   SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_METHOD("Stop",        "",   "", handle_stop,         SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_METHOD("Next",        "",   "", handle_next,         SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_METHOD("Previous",    "",   "", handle_previous,     SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_METHOD("Seek",        "x",  "", handle_seek,         SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_METHOD("SetPosition", "ox", "", handle_set_position, SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_METHOD("OpenUri",     "s",  "", handle_open_uri,     SD_BUS_VTABLE_UNPRIVILEGED),
+    SD_BUS_SIGNAL("Seeked", "x", 0),
+    SD_BUS_PROPERTY        ("PlaybackStatus", "s",     get_playback_status, 0, SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
+    SD_BUS_PROPERTY        ("LoopStatus",     "s",     get_loop_status,     0, 0),
+    SD_BUS_PROPERTY        ("Shuffle",        "b",     get_shuffle,         0, 0),
+    SD_BUS_PROPERTY        ("Metadata",       "a{sv}", get_metadata,        0, SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
+    SD_BUS_WRITABLE_PROPERTY("Volume",        "d",     get_volume, set_volume, 0, SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
+    SD_BUS_PROPERTY        ("Position",       "x",     get_position,        0, SD_BUS_VTABLE_PROPERTY_EMITS_INVALIDATION),
+    SD_BUS_PROPERTY        ("Rate",           "d",     get_rate,            0, SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
+    SD_BUS_PROPERTY        ("MinimumRate",    "d",     get_rate,            0, 0),
+    SD_BUS_PROPERTY        ("MaximumRate",    "d",     get_rate,            0, 0),
+    SD_BUS_PROPERTY        ("CanGoNext",      "b",     get_can_go_next,     0, SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
+    SD_BUS_PROPERTY        ("CanGoPrevious",  "b",     get_can_go_previous, 0, SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
+    SD_BUS_PROPERTY        ("CanPlay",        "b",     get_can_play,        0, SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
+    SD_BUS_PROPERTY        ("CanPause",       "b",     get_can_pause,       0, SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
+    SD_BUS_PROPERTY        ("CanSeek",        "b",     get_can_seek,        0, SD_BUS_VTABLE_PROPERTY_EMITS_CHANGE),
+    SD_BUS_PROPERTY        ("CanControl",     "b",     get_can_control,     0, 0),
+    SD_BUS_VTABLE_END
+};
+
+/* ── Public API ─────────────────────────────────────────────────────────── */
+
+MprisPlayer *mpris_player_create(const char *player_name, const char *identity) {
+    MprisPlayer *p = calloc(1, sizeof(MprisPlayer));
+    if (!p) return NULL;
+
+    p->identity        = strdup(identity);
+    p->playback_status = strdup("Stopped");
+    p->volume          = 1.0;
+    p->can_play        = 1;
+    p->can_pause       = 1;
+
+    int r;
+
+    r = sd_bus_open_user(&p->bus);
+    if (r < 0) { fprintf(stderr, "mpris: D-Bus connect failed: %s\n", strerror(-r)); goto fail; }
+
+    char bus_name[256];
+    snprintf(bus_name, sizeof(bus_name), "org.mpris.MediaPlayer2.%s", player_name);
+
+    r = sd_bus_request_name(p->bus, bus_name, 0);
+    if (r < 0) { fprintf(stderr, "mpris: could not acquire %s: %s\n", bus_name, strerror(-r)); goto fail; }
+
+    r = sd_bus_add_object_vtable(p->bus, &p->slot_root,
+                                  MPRIS_PATH, IFACE_ROOT, root_vtable, p);
+    if (r < 0) { fprintf(stderr, "mpris: register root interface failed: %s\n", strerror(-r)); goto fail; }
+
+    r = sd_bus_add_object_vtable(p->bus, &p->slot_player,
+                                  MPRIS_PATH, IFACE_PLAYER, player_vtable, p);
+    if (r < 0) { fprintf(stderr, "mpris: register player interface failed: %s\n", strerror(-r)); goto fail; }
+
+    return p;
+
+fail:
+    mpris_player_destroy(p);
+    return NULL;
+}
+
+void mpris_player_destroy(MprisPlayer *p) {
+    if (!p) return;
+    sd_bus_slot_unref(p->slot_player);
+    sd_bus_slot_unref(p->slot_root);
+    sd_bus_unref(p->bus);
+    free(p->identity);
+    free(p->playback_status);
+    free(p->track_id);
+    free(p->title);
+    free(p->artist);
+    free(p->album);
+    free(p);
+}
+
+#define EMIT_PLAYER(p, ...) \
+    sd_bus_emit_properties_changed((p)->bus, MPRIS_PATH, IFACE_PLAYER, __VA_ARGS__, NULL)
+
+void mpris_set_playback_status(MprisPlayer *p, const char *status) {
+    free(p->playback_status);
+    p->playback_status = strdup(status);
+    EMIT_PLAYER(p, "PlaybackStatus");
+}
+
+void mpris_set_metadata(MprisPlayer *p, const char *track_id, const char *title,
+                         const char *artist, const char *album, int64_t length_us) {
+    free(p->track_id); p->track_id = track_id ? strdup(track_id) : NULL;
+    free(p->title);    p->title    = title    ? strdup(title)    : NULL;
+    free(p->artist);   p->artist   = artist   ? strdup(artist)   : NULL;
+    free(p->album);    p->album    = album     ? strdup(album)    : NULL;
+    p->length_us = length_us;
+    EMIT_PLAYER(p, "Metadata");
+}
+
+void mpris_set_volume(MprisPlayer *p, double volume) {
+    p->volume = volume;
+    EMIT_PLAYER(p, "Volume");
+}
+
+void mpris_set_position(MprisPlayer *p, int64_t position_us) {
+    p->position_us = position_us;
+    /* Position uses EMITS_INVALIDATION — clients poll it or watch the Seeked signal */
+}
+
+void mpris_set_can_go_next(MprisPlayer *p, int v)     { p->can_go_next = v;     EMIT_PLAYER(p, "CanGoNext"); }
+void mpris_set_can_go_previous(MprisPlayer *p, int v) { p->can_go_previous = v; EMIT_PLAYER(p, "CanGoPrevious"); }
+void mpris_set_can_play(MprisPlayer *p, int v)        { p->can_play = v;        EMIT_PLAYER(p, "CanPlay"); }
+void mpris_set_can_pause(MprisPlayer *p, int v)       { p->can_pause = v;       EMIT_PLAYER(p, "CanPause"); }
+void mpris_set_can_seek(MprisPlayer *p, int v)        { p->can_seek = v;        EMIT_PLAYER(p, "CanSeek"); }
+
+void mpris_emit_seeked(MprisPlayer *p, int64_t position_us) {
+    p->position_us = position_us;
+    sd_bus_emit_signal(p->bus, MPRIS_PATH, IFACE_PLAYER, "Seeked", "x", position_us);
+}
+
+void mpris_on_play        (MprisPlayer *p, MprisCallback cb,            void *ud) { p->on_play = cb;         p->ud_play = ud; }
+void mpris_on_pause       (MprisPlayer *p, MprisCallback cb,            void *ud) { p->on_pause = cb;        p->ud_pause = ud; }
+void mpris_on_play_pause  (MprisPlayer *p, MprisCallback cb,            void *ud) { p->on_play_pause = cb;   p->ud_play_pause = ud; }
+void mpris_on_stop        (MprisPlayer *p, MprisCallback cb,            void *ud) { p->on_stop = cb;         p->ud_stop = ud; }
+void mpris_on_next        (MprisPlayer *p, MprisCallback cb,            void *ud) { p->on_next = cb;         p->ud_next = ud; }
+void mpris_on_previous    (MprisPlayer *p, MprisCallback cb,            void *ud) { p->on_previous = cb;     p->ud_previous = ud; }
+void mpris_on_seek        (MprisPlayer *p, MprisSeekCallback cb,        void *ud) { p->on_seek = cb;         p->ud_seek = ud; }
+void mpris_on_set_position(MprisPlayer *p, MprisSetPositionCallback cb, void *ud) { p->on_set_position = cb; p->ud_set_position = ud; }
+void mpris_on_volume      (MprisPlayer *p, MprisVolumeCallback cb,      void *ud) { p->on_volume = cb;       p->ud_volume = ud; }
+
+int mpris_process(MprisPlayer *p) {
+    return sd_bus_process(p->bus, NULL);
+}
diff --git a/modules/jai-mpris/mpris.h b/modules/jai-mpris/mpris.h
new file mode 100644
index 0000000..8b8260a
--- /dev/null
+++ b/modules/jai-mpris/mpris.h
@@ -0,0 +1,54 @@
+#pragma once
+#include <stdint.h>
+
+typedef struct MprisPlayer MprisPlayer;
+
+typedef void     (*MprisCallback)            (void *userdata);
+typedef void     (*MprisSeekCallback)        (int64_t offset_us, void *userdata);
+typedef void     (*MprisSetPositionCallback) (const char *track_id, int64_t position_us, void *userdata);
+typedef void     (*MprisVolumeCallback)      (double volume, void *userdata);
+
+// Creates the player and acquires "org.mpris.MediaPlayer2.<player_name>" on the session bus.
+// identity is the human-readable name shown in media menus.
+// Returns NULL on failure.
+MprisPlayer *mpris_player_create(const char *player_name, const char *identity);
+void         mpris_player_destroy(MprisPlayer *p);
+
+// Update state (each setter emits PropertiesChanged so media menus update immediately).
+// status must be one of: "Playing", "Paused", "Stopped"
+void mpris_set_playback_status(MprisPlayer *p, const char *status);
+
+// Pass NULL for any field you don't have. length_us is track duration in microseconds.
+void mpris_set_metadata(MprisPlayer *p, const char *track_id, const char *title,
+                        const char *artist, const char *album, int64_t length_us);
+
+// position_us is the current playback position; not broadcast automatically,
+// only reported when polled. Call mpris_emit_seeked() after an actual seek.
+void mpris_set_position(MprisPlayer *p, int64_t position_us);
+
+void mpris_set_volume      (MprisPlayer *p, double volume);
+void mpris_set_can_go_next    (MprisPlayer *p, int value);
+void mpris_set_can_go_previous(MprisPlayer *p, int value);
+void mpris_set_can_play    (MprisPlayer *p, int value);
+void mpris_set_can_pause   (MprisPlayer *p, int value);
+void mpris_set_can_seek    (MprisPlayer *p, int value);
+
+// Emit the Seeked signal after a seek completes.
+void mpris_emit_seeked(MprisPlayer *p, int64_t position_us);
+
+// Register callbacks for incoming control commands.
+// userdata is passed through unchanged to your callback.
+void mpris_on_play        (MprisPlayer *p, MprisCallback cb,            void *userdata);
+void mpris_on_pause       (MprisPlayer *p, MprisCallback cb,            void *userdata);
+void mpris_on_play_pause  (MprisPlayer *p, MprisCallback cb,            void *userdata);
+void mpris_on_stop        (MprisPlayer *p, MprisCallback cb,            void *userdata);
+void mpris_on_next        (MprisPlayer *p, MprisCallback cb,            void *userdata);
+void mpris_on_previous    (MprisPlayer *p, MprisCallback cb,            void *userdata);
+void mpris_on_seek        (MprisPlayer *p, MprisSeekCallback cb,        void *userdata);
+void mpris_on_set_position(MprisPlayer *p, MprisSetPositionCallback cb, void *userdata);
+// Called when a remote client changes the Volume property.
+void mpris_on_volume      (MprisPlayer *p, MprisVolumeCallback cb,      void *userdata);
+
+// Drive the D-Bus event loop. Call this every frame / in your event loop.
+// Returns >0 if messages were processed, 0 if idle, <0 on error.
+int mpris_process(MprisPlayer *p);
diff --git a/modules/jai-mpris/mpris.o b/modules/jai-mpris/mpris.o
new file mode 100644
index 0000000..c6cf0b4
Binary files /dev/null and b/modules/jai-mpris/mpris.o differ
diff --git a/modules/stb_image/android/arm64/stb_image.a b/modules/stb_image/android/arm64/stb_image.a
deleted file mode 100644
index 45b38a5..0000000
Binary files a/modules/stb_image/android/arm64/stb_image.a and /dev/null differ
diff --git a/modules/stb_image/android/arm64/stb_image.so b/modules/stb_image/android/arm64/stb_image.so
deleted file mode 100644
index df77500..0000000
Binary files a/modules/stb_image/android/arm64/stb_image.so and /dev/null differ
diff --git a/modules/stb_image/android/x64/stb_image.a b/modules/stb_image/android/x64/stb_image.a
deleted file mode 100644
index 17c8b88..0000000
Binary files a/modules/stb_image/android/x64/stb_image.a and /dev/null differ
diff --git a/modules/stb_image/android/x64/stb_image.so b/modules/stb_image/android/x64/stb_image.so
deleted file mode 100644
index 8a94227..0000000
Binary files a/modules/stb_image/android/x64/stb_image.so and /dev/null differ
diff --git a/modules/stb_image/bindings.jai b/modules/stb_image/bindings.jai
deleted file mode 100644
index fb4fe14..0000000
--- a/modules/stb_image/bindings.jai
+++ /dev/null
@@ -1,145 +0,0 @@
-//
-// This file was auto-generated using the following command:
-//
-// jai generate.jai
-//
-
-
-
-STBI_VERSION :: 1;
-
-STBI :: enum u32 {
-    default    :: 0;
-
-    grey       :: 1;
-    grey_alpha :: 2;
-    rgb        :: 3;
-    rgb_alpha  :: 4;
-
-    STBI_default    :: default;
-
-    STBI_grey       :: grey;
-    STBI_grey_alpha :: grey_alpha;
-    STBI_rgb        :: rgb;
-    STBI_rgb_alpha  :: rgb_alpha;
-}
-
-//
-// load image by filename, open file, or memory buffer
-//
-stbi_io_callbacks :: struct {
-    read: #type (user: *void, data: *u8, size: s32) -> s32 #c_call; // fill 'data' with 'size' bytes.  return number of bytes actually read
-    skip: #type (user: *void, n: s32) -> void #c_call; // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
-    eof:  #type (user: *void) -> s32 #c_call; // returns nonzero if we are at end of file/data
-}
-
-////////////////////////////////////
-//
-// 8-bits-per-channel interface
-//
-stbi_load_from_memory :: (buffer: *u8, len: s32, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *u8 #foreign stb_image;
-stbi_load_from_callbacks :: (clbk: *stbi_io_callbacks, user: *void, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *u8 #foreign stb_image;
-
-stbi_load :: (filename: *u8, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *u8 #foreign stb_image;
-stbi_load_from_file :: (f: *FILE, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *u8 #foreign stb_image;
-
-stbi_load_gif_from_memory :: (buffer: *u8, len: s32, delays: **s32, x: *s32, y: *s32, z: *s32, comp: *s32, req_comp: s32) -> *u8 #foreign stb_image;
-
-////////////////////////////////////
-//
-// 16-bits-per-channel interface
-//
-stbi_load_16_from_memory :: (buffer: *u8, len: s32, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *u16 #foreign stb_image;
-stbi_load_16_from_callbacks :: (clbk: *stbi_io_callbacks, user: *void, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *u16 #foreign stb_image;
-
-stbi_load_16 :: (filename: *u8, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *u16 #foreign stb_image;
-stbi_load_from_file_16 :: (f: *FILE, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *u16 #foreign stb_image;
-
-stbi_loadf_from_memory :: (buffer: *u8, len: s32, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *float #foreign stb_image;
-stbi_loadf_from_callbacks :: (clbk: *stbi_io_callbacks, user: *void, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *float #foreign stb_image;
-
-stbi_loadf :: (filename: *u8, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *float #foreign stb_image;
-stbi_loadf_from_file :: (f: *FILE, x: *s32, y: *s32, channels_in_file: *s32, desired_channels: s32) -> *float #foreign stb_image;
-
-stbi_hdr_to_ldr_gamma :: (gamma: float) -> void #foreign stb_image;
-stbi_hdr_to_ldr_scale :: (scale: float) -> void #foreign stb_image;
-
-stbi_ldr_to_hdr_gamma :: (gamma: float) -> void #foreign stb_image;
-stbi_ldr_to_hdr_scale :: (scale: float) -> void #foreign stb_image;
-
-// stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
-stbi_is_hdr_from_callbacks :: (clbk: *stbi_io_callbacks, user: *void) -> s32 #foreign stb_image;
-stbi_is_hdr_from_memory :: (buffer: *u8, len: s32) -> s32 #foreign stb_image;
-
-stbi_is_hdr :: (filename: *u8) -> s32 #foreign stb_image;
-stbi_is_hdr_from_file :: (f: *FILE) -> s32 #foreign stb_image;
-
-// get a VERY brief reason for failure
-// on most compilers (and ALL modern mainstream compilers) this is threadsafe
-stbi_failure_reason :: () -> *u8 #foreign stb_image;
-
-// free the loaded image -- this is just free()
-stbi_image_free :: (retval_from_stbi_load: *void) -> void #foreign stb_image;
-
-// get image dimensions & components without fully decoding
-stbi_info_from_memory :: (buffer: *u8, len: s32, x: *s32, y: *s32, comp: *s32) -> s32 #foreign stb_image;
-stbi_info_from_callbacks :: (clbk: *stbi_io_callbacks, user: *void, x: *s32, y: *s32, comp: *s32) -> s32 #foreign stb_image;
-stbi_is_16_bit_from_memory :: (buffer: *u8, len: s32) -> s32 #foreign stb_image;
-stbi_is_16_bit_from_callbacks :: (clbk: *stbi_io_callbacks, user: *void) -> s32 #foreign stb_image;
-
-stbi_info :: (filename: *u8, x: *s32, y: *s32, comp: *s32) -> s32 #foreign stb_image;
-stbi_info_from_file :: (f: *FILE, x: *s32, y: *s32, comp: *s32) -> s32 #foreign stb_image;
-stbi_is_16_bit :: (filename: *u8) -> s32 #foreign stb_image;
-stbi_is_16_bit_from_file :: (f: *FILE) -> s32 #foreign stb_image;
-
-// for image formats that explicitly notate that they have premultiplied alpha,
-// we just return the colors as stored in the file. set this flag to force
-// unpremultiplication. results are undefined if the unpremultiply overflow.
-stbi_set_unpremultiply_on_load :: (flag_true_if_should_unpremultiply: s32) -> void #foreign stb_image;
-
-// indicate whether we should process iphone images back to canonical format,
-// or just pass them through "as-is"
-stbi_convert_iphone_png_to_rgb :: (flag_true_if_should_convert: s32) -> void #foreign stb_image;
-
-// flip the image vertically, so the first pixel in the output array is the bottom left
-stbi_set_flip_vertically_on_load :: (flag_true_if_should_flip: s32) -> void #foreign stb_image;
-
-// as above, but only applies to images loaded on the thread that calls the function
-// this function is only available if your compiler supports thread-local variables;
-// calling it will fail to link if your compiler doesn't
-stbi_set_unpremultiply_on_load_thread :: (flag_true_if_should_unpremultiply: s32) -> void #foreign stb_image;
-stbi_convert_iphone_png_to_rgb_thread :: (flag_true_if_should_convert: s32) -> void #foreign stb_image;
-stbi_set_flip_vertically_on_load_thread :: (flag_true_if_should_flip: s32) -> void #foreign stb_image;
-
-// ZLIB client - used by PNG, available for other purposes
-stbi_zlib_decode_malloc_guesssize :: (buffer: *u8, len: s32, initial_size: s32, outlen: *s32) -> *u8 #foreign stb_image;
-stbi_zlib_decode_malloc_guesssize_headerflag :: (buffer: *u8, len: s32, initial_size: s32, outlen: *s32, parse_header: s32) -> *u8 #foreign stb_image;
-stbi_zlib_decode_malloc :: (buffer: *u8, len: s32, outlen: *s32) -> *u8 #foreign stb_image;
-stbi_zlib_decode_buffer :: (obuffer: *u8, olen: s32, ibuffer: *u8, ilen: s32) -> s32 #foreign stb_image;
-
-stbi_zlib_decode_noheader_malloc :: (buffer: *u8, len: s32, outlen: *s32) -> *u8 #foreign stb_image;
-stbi_zlib_decode_noheader_buffer :: (obuffer: *u8, olen: s32, ibuffer: *u8, ilen: s32) -> s32 #foreign stb_image;
-
-#scope_file
-
-
-#if OS == .WINDOWS {
-    stb_image :: #library "windows/stb_image";
-} else #if OS == .LINUX {
-    stb_image :: #library "linux/stb_image";
-} else #if OS == .MACOS {
-    stb_image :: #library "macos/stb_image";
-} else #if OS == .ANDROID {
-    #if CPU == .X64 {
-        stb_image :: #library "android/x64/stb_image";
-    } else #if CPU == .ARM64 {
-        stb_image :: #library "android/arm64/stb_image";
-    }
-} else #if OS == .PS5 {
-    stb_image :: #library "ps5/stb_image";
-} else #if OS == .WASM {
-    stb_image :: #library "wasm/stb_image";
-} else {
-    #assert false;
-}
-
diff --git a/modules/stb_image/generate.jai b/modules/stb_image/generate.jai
deleted file mode 100644
index 8f9e3bf..0000000
--- a/modules/stb_image/generate.jai
+++ /dev/null
@@ -1,151 +0,0 @@
-AT_COMPILE_TIME :: true;
-
-SOURCE_PATH :: "source";
-LIB_BASE_NAME :: "stb_image";
-
-#if AT_COMPILE_TIME {
-    #run,stallable {
-        set_build_options_dc(.{do_output=false});
-        options := get_build_options();
-        args := options.compile_time_command_line;
-        if !generate_bindings(args, options.minimum_os_version) {
-            compiler_set_workspace_status(.FAILED);
-        }
-    }
-} else {
-    #import "System";
-
-    main :: () {
-        set_working_directory(path_strip_filename(get_path_of_running_executable()));
-        if !generate_bindings(get_command_line_arguments(), #run get_build_options().minimum_os_version) {
-            exit(1);
-        }
-    }
-}
-
-generate_bindings :: (args: [] string, minimum_os_version: type_of(Build_Options.minimum_os_version)) -> bool {
-    target_android := array_find(args, "-android");
-    target_x64     := array_find(args, "-x64");
-    target_arm     := array_find(args, "-arm64");
-    compile        := array_find(args, "-compile");
-    compile_debug  := array_find(args, "-debug");
-
-    os_target  := OS;
-    cpu_target := CPU;
-    if target_android os_target  = .ANDROID;
-    if target_x64     cpu_target = .X64;
-    if target_arm     cpu_target = .ARM64;
- 
-    lib_directory: string;
-    if os_target == {
-        case .WINDOWS;
-            lib_directory = "windows";
-        case .LINUX;
-            lib_directory = "linux";
-        case .MACOS;
-            lib_directory = "macos";
-        case .ANDROID;
-            lib_directory = ifx cpu_target == .X64 then "android/x64" else "android/arm64";
-        case .PS5;
-            lib_directory = "ps5";
-        case;
-            assert(false);
-    }
-
-    if compile {
-        source_file := tprint("%/stb_image.c", SOURCE_PATH);
-
-        make_directory_if_it_does_not_exist(lib_directory, recursive = true);
-        lib_path := tprint("%/%", lib_directory, LIB_BASE_NAME);
-        success := true;
-        if os_target == .MACOS {
-            lib_path_x64   := tprint("%_x64", lib_path);
-            lib_path_arm64 := tprint("%_arm64", lib_path);
-            macos_x64_version_arg   := "-mmacos-version-min=10.13"; // Our current x64 min version
-            macos_arm64_version_arg := "-mmacos-version-min=11.0";  // Earliest version that supports arm64
-            // x64 variant
-            success &&= build_cpp_dynamic_lib(lib_path_x64,   source_file, extra = .["-arch", "x86_64", macos_x64_version_arg],   debug=compile_debug);
-            success &&= build_cpp_static_lib( lib_path_x64,   source_file, extra = .["-arch", "x86_64", macos_x64_version_arg],   debug=compile_debug);
-            // arm64 variant
-            success &&= build_cpp_dynamic_lib(lib_path_arm64, source_file, extra = .["-arch", "arm64",  macos_arm64_version_arg], debug=compile_debug);
-            success &&= build_cpp_static_lib( lib_path_arm64, source_file, extra = .["-arch", "arm64",  macos_arm64_version_arg], debug=compile_debug);
-            // create universal binaries
-            run_result := run_command("lipo", "-create", tprint("%.dylib", lib_path_x64), tprint("%.dylib", lib_path_arm64), "-output", tprint("%.dylib", lib_path));
-            success &&= (run_result.exit_code == 0);
-            run_result  = run_command("lipo", "-create", tprint("%.a",     lib_path_x64), tprint("%.a",     lib_path_arm64), "-output", tprint("%.a",     lib_path));
-            success &&= (run_result.exit_code == 0);
-        } else {
-            extra: [..] string;
-            if os_target == .ANDROID {
-                _, target_triple_with_sdk := get_android_target_triple(cpu_target);
-                array_add(*extra, "-target", target_triple_with_sdk);
-            }
-            if os_target != .WINDOWS {
-                array_add(*extra, "-fPIC");
-            }
-
-            if os_target != .PS5 && os_target != .WASM {
-                success &&= build_cpp_dynamic_lib(lib_path, source_file, target = os_target, debug = compile_debug, extra = extra);
-            }
-            success &&= build_cpp_static_lib(lib_path, source_file, target = os_target, debug = compile_debug, extra = extra);
-        }
-
-        if !success     return false;
-    }
-
-    options: Generate_Bindings_Options;
-    options.os = os_target;
-    options.cpu = cpu_target;
-    {
-        using options;
-
-        array_add(*libpaths, lib_directory);
-        array_add(*libnames, LIB_BASE_NAME);
-        array_add(*source_files, tprint("%/stb_image.h", SOURCE_PATH));
-        array_add(*typedef_prefixes_to_unwrap, "stbi_");
-
-
-        generate_library_declarations = false;
-        footer = tprint(FOOTER_TEMPLATE, LIB_BASE_NAME);
-
-        auto_detect_enum_prefixes = true;
-        log_stripped_declarations = false;
-        generate_compile_time_struct_checks = false;
-    }
-
-    output_filename := "bindings.jai";
-    return generate_bindings(options, output_filename);
-}
-
-FOOTER_TEMPLATE :: #string END
-
-#if OS == .WINDOWS {
-    %1 :: #library "windows/%1";
-} else #if OS == .LINUX {
-    %1 :: #library "linux/%1";
-} else #if OS == .MACOS {
-    %1 :: #library "macos/%1";
-} else #if OS == .ANDROID {
-    #if CPU == .X64 {
-        %1 :: #library "android/x64/%1";
-    } else #if CPU == .ARM64 {
-        %1 :: #library "android/arm64/%1";
-    }
-} else #if OS == .PS5 {
-    %1 :: #library "ps5/%1";
-} else #if OS == .WASM {
-    // Wasm will be linked with emcc.
-} else {
-    #assert false;
-}
-
-END
-
-#import "Basic";
-#import "Bindings_Generator";
-#import "BuildCpp";
-#import "Compiler";
-#import "File";
-#import "Process";
-#import "Toolchains/Android";
-
diff --git a/modules/stb_image/linux/stb_image.a b/modules/stb_image/linux/stb_image.a
deleted file mode 100644
index fa8f925..0000000
Binary files a/modules/stb_image/linux/stb_image.a and /dev/null differ
diff --git a/modules/stb_image/linux/stb_image.so b/modules/stb_image/linux/stb_image.so
deleted file mode 100755
index e12cea9..0000000
Binary files a/modules/stb_image/linux/stb_image.so and /dev/null differ
diff --git a/modules/stb_image/macos/stb_image.a b/modules/stb_image/macos/stb_image.a
deleted file mode 100644
index 2003d88..0000000
Binary files a/modules/stb_image/macos/stb_image.a and /dev/null differ
diff --git a/modules/stb_image/macos/stb_image.dylib b/modules/stb_image/macos/stb_image.dylib
deleted file mode 100644
index 19972f7..0000000
Binary files a/modules/stb_image/macos/stb_image.dylib and /dev/null differ
diff --git a/modules/stb_image/module.jai b/modules/stb_image/module.jai
deleted file mode 100644
index a35c6bc..0000000
--- a/modules/stb_image/module.jai
+++ /dev/null
@@ -1,12 +0,0 @@
-#load "bindings.jai";
-
-#if OS == .WINDOWS ||  OS == .PS5 || OS == .WASM {
-    #scope_module
-    FILE :: void;
-} else #if OS_IS_UNIX {
-    #import "POSIX";
-    #library,system,link_always "libm";
-} else {
-    #assert false;
-}
-
diff --git a/modules/stb_image/source/stb_image.c b/modules/stb_image/source/stb_image.c
deleted file mode 100644
index da81036..0000000
--- a/modules/stb_image/source/stb_image.c
+++ /dev/null
@@ -1,11 +0,0 @@
-#ifdef WIN32
-#define __EXPORT __declspec(dllexport)
-#else
-#define __EXPORT
-#endif
-
-#define STBIDEF extern __EXPORT
-
-#define STBI_NO_STDIO
-#define STB_IMAGE_IMPLEMENTATION
-#include "stb_image.h"
diff --git a/modules/stb_image/source/stb_image.h b/modules/stb_image/source/stb_image.h
deleted file mode 100644
index a632d54..0000000
--- a/modules/stb_image/source/stb_image.h
+++ /dev/null
@@ -1,7985 +0,0 @@
-/* stb_image - v2.29 - public domain image loader - http://nothings.org/stb
-                                  no warranty implied; use at your own risk
-
-   Do this:
-      #define STB_IMAGE_IMPLEMENTATION
-   before you include this file in *one* C or C++ file to create the implementation.
-
-   // i.e. it should look like this:
-   #include ...
-   #include ...
-   #include ...
-   #define STB_IMAGE_IMPLEMENTATION
-   #include "stb_image.h"
-
-   You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
-   And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
-
-
-   QUICK NOTES:
-      Primarily of interest to game developers and other people who can
-          avoid problematic images and only need the trivial interface
-
-      JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
-      PNG 1/2/4/8/16-bit-per-channel
-
-      TGA (not sure what subset, if a subset)
-      BMP non-1bpp, non-RLE
-      PSD (composited view only, no extra channels, 8/16 bit-per-channel)
-
-      GIF (*comp always reports as 4-channel)
-      HDR (radiance rgbE format)
-      PIC (Softimage PIC)
-      PNM (PPM and PGM binary only)
-
-      Animated GIF still needs a proper API, but here's one way to do it:
-          http://gist.github.com/urraka/685d9a6340b26b830d49
-
-      - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
-      - decode from arbitrary I/O callbacks
-      - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
-
-   Full documentation under "DOCUMENTATION" below.
-
-
-LICENSE
-
-  See end of file for license information.
-
-RECENT REVISION HISTORY:
-
-      2.29  (2023-05-xx) optimizations
-      2.28  (2023-01-29) many error fixes, security errors, just tons of stuff
-      2.27  (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes
-      2.26  (2020-07-13) many minor fixes
-      2.25  (2020-02-02) fix warnings
-      2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
-      2.23  (2019-08-11) fix clang static analysis warning
-      2.22  (2019-03-04) gif fixes, fix warnings
-      2.21  (2019-02-25) fix typo in comment
-      2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
-      2.19  (2018-02-11) fix warning
-      2.18  (2018-01-30) fix warnings
-      2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
-      2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
-      2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
-      2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
-      2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
-      2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
-      2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
-                         RGB-format JPEG; remove white matting in PSD;
-                         allocate large structures on the stack;
-                         correct channel count for PNG & BMP
-      2.10  (2016-01-22) avoid warning introduced in 2.09
-      2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
-
-   See end of file for full revision history.
-
-
- ============================    Contributors    =========================
-
- Image formats                          Extensions, features
-    Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
-    Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
-    Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
-    Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
-    Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
-    Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
-    Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
-    github:urraka (animated gif)           Junggon Kim (PNM comments)
-    Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
-                                           socks-the-fox (16-bit PNG)
-                                           Jeremy Sawicki (handle all ImageNet JPGs)
- Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
-    Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
-    Arseny Kapoulkine                      Simon Breuss (16-bit PNM)
-    John-Mark Allen
-    Carmelo J Fdez-Aguera
-
- Bug & warning fixes
-    Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
-    Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
-    Phil Jordan                                Dave Moore           Roy Eltham
-    Hayaki Saito            Nathan Reed        Won Chun
-    Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
-    Thomas Ruf              Ronny Chevalier                         github:rlyeh
-    Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
-    Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
-    Eugene Golushkov        Laurent Gomila     Cort Stratton        github:snagar
-    Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
-    Cass Everitt            Ryamond Barbiero                        github:grim210
-    Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
-    Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
-    Josh Tobin              Neil Bickford      Matthew Gregan       github:poppolopoppo
-    Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
-    Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
-                            Brad Weinberger    Matvey Cherevko      github:mosra
-    Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
-    Ryan C. Gordon          [reserved]                              [reserved]
-                     DO NOT ADD YOUR NAME HERE
-
-                     Jacko Dirks
-
-  To add your name to the credits, pick a random blank space in the middle and fill it.
-  80% of merge conflicts on stb PRs are due to people adding their name at the end
-  of the credits.
-*/
-
-#ifndef STBI_INCLUDE_STB_IMAGE_H
-#define STBI_INCLUDE_STB_IMAGE_H
-
-// DOCUMENTATION
-//
-// Limitations:
-//    - no 12-bit-per-channel JPEG
-//    - no JPEGs with arithmetic coding
-//    - GIF always returns *comp=4
-//
-// Basic usage (see HDR discussion below for HDR usage):
-//    int x,y,n;
-//    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
-//    // ... process data if not NULL ...
-//    // ... x = width, y = height, n = # 8-bit components per pixel ...
-//    // ... replace '0' with '1'..'4' to force that many components per pixel
-//    // ... but 'n' will always be the number that it would have been if you said 0
-//    stbi_image_free(data);
-//
-// Standard parameters:
-//    int *x                 -- outputs image width in pixels
-//    int *y                 -- outputs image height in pixels
-//    int *channels_in_file  -- outputs # of image components in image file
-//    int desired_channels   -- if non-zero, # of image components requested in result
-//
-// The return value from an image loader is an 'unsigned char *' which points
-// to the pixel data, or NULL on an allocation failure or if the image is
-// corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
-// with each pixel consisting of N interleaved 8-bit components; the first
-// pixel pointed to is top-left-most in the image. There is no padding between
-// image scanlines or between pixels, regardless of format. The number of
-// components N is 'desired_channels' if desired_channels is non-zero, or
-// *channels_in_file otherwise. If desired_channels is non-zero,
-// *channels_in_file has the number of components that _would_ have been
-// output otherwise. E.g. if you set desired_channels to 4, you will always
-// get RGBA output, but you can check *channels_in_file to see if it's trivially
-// opaque because e.g. there were only 3 channels in the source image.
-//
-// An output image with N components has the following components interleaved
-// in this order in each pixel:
-//
-//     N=#comp     components
-//       1           grey
-//       2           grey, alpha
-//       3           red, green, blue
-//       4           red, green, blue, alpha
-//
-// If image loading fails for any reason, the return value will be NULL,
-// and *x, *y, *channels_in_file will be unchanged. The function
-// stbi_failure_reason() can be queried for an extremely brief, end-user
-// unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
-// to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
-// more user-friendly ones.
-//
-// Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
-//
-// To query the width, height and component count of an image without having to
-// decode the full file, you can use the stbi_info family of functions:
-//
-//   int x,y,n,ok;
-//   ok = stbi_info(filename, &x, &y, &n);
-//   // returns ok=1 and sets x, y, n if image is a supported format,
-//   // 0 otherwise.
-//
-// Note that stb_image pervasively uses ints in its public API for sizes,
-// including sizes of memory buffers. This is now part of the API and thus
-// hard to change without causing breakage. As a result, the various image
-// loaders all have certain limits on image size; these differ somewhat
-// by format but generally boil down to either just under 2GB or just under
-// 1GB. When the decoded image would be larger than this, stb_image decoding
-// will fail.
-//
-// Additionally, stb_image will reject image files that have any of their
-// dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
-// which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
-// the only way to have an image with such dimensions load correctly
-// is for it to have a rather extreme aspect ratio. Either way, the
-// assumption here is that such larger images are likely to be malformed
-// or malicious. If you do need to load an image with individual dimensions
-// larger than that, and it still fits in the overall size limit, you can
-// #define STBI_MAX_DIMENSIONS on your own to be something larger.
-//
-// ===========================================================================
-//
-// UNICODE:
-//
-//   If compiling for Windows and you wish to use Unicode filenames, compile
-//   with
-//       #define STBI_WINDOWS_UTF8
-//   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
-//   Windows wchar_t filenames to utf8.
-//
-// ===========================================================================
-//
-// Philosophy
-//
-// stb libraries are designed with the following priorities:
-//
-//    1. easy to use
-//    2. easy to maintain
-//    3. good performance
-//
-// Sometimes I let "good performance" creep up in priority over "easy to maintain",
-// and for best performance I may provide less-easy-to-use APIs that give higher
-// performance, in addition to the easy-to-use ones. Nevertheless, it's important
-// to keep in mind that from the standpoint of you, a client of this library,
-// all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
-//
-// Some secondary priorities arise directly from the first two, some of which
-// provide more explicit reasons why performance can't be emphasized.
-//
-//    - Portable ("ease of use")
-//    - Small source code footprint ("easy to maintain")
-//    - No dependencies ("ease of use")
-//
-// ===========================================================================
-//
-// I/O callbacks
-//
-// I/O callbacks allow you to read from arbitrary sources, like packaged
-// files or some other source. Data read from callbacks are processed
-// through a small internal buffer (currently 128 bytes) to try to reduce
-// overhead.
-//
-// The three functions you must define are "read" (reads some bytes of data),
-// "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
-//
-// ===========================================================================
-//
-// SIMD support
-//
-// The JPEG decoder will try to automatically use SIMD kernels on x86 when
-// supported by the compiler. For ARM Neon support, you must explicitly
-// request it.
-//
-// (The old do-it-yourself SIMD API is no longer supported in the current
-// code.)
-//
-// On x86, SSE2 will automatically be used when available based on a run-time
-// test; if not, the generic C versions are used as a fall-back. On ARM targets,
-// the typical path is to have separate builds for NEON and non-NEON devices
-// (at least this is true for iOS and Android). Therefore, the NEON support is
-// toggled by a build flag: define STBI_NEON to get NEON loops.
-//
-// If for some reason you do not want to use any of SIMD code, or if
-// you have issues compiling it, you can disable it entirely by
-// defining STBI_NO_SIMD.
-//
-// ===========================================================================
-//
-// HDR image support   (disable by defining STBI_NO_HDR)
-//
-// stb_image supports loading HDR images in general, and currently the Radiance
-// .HDR file format specifically. You can still load any file through the existing
-// interface; if you attempt to load an HDR file, it will be automatically remapped
-// to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
-// both of these constants can be reconfigured through this interface:
-//
-//     stbi_hdr_to_ldr_gamma(2.2f);
-//     stbi_hdr_to_ldr_scale(1.0f);
-//
-// (note, do not use _inverse_ constants; stbi_image will invert them
-// appropriately).
-//
-// Additionally, there is a new, parallel interface for loading files as
-// (linear) floats to preserve the full dynamic range:
-//
-//    float *data = stbi_loadf(filename, &x, &y, &n, 0);
-//
-// If you load LDR images through this interface, those images will
-// be promoted to floating point values, run through the inverse of
-// constants corresponding to the above:
-//
-//     stbi_ldr_to_hdr_scale(1.0f);
-//     stbi_ldr_to_hdr_gamma(2.2f);
-//
-// Finally, given a filename (or an open file or memory block--see header
-// file for details) containing image data, you can query for the "most
-// appropriate" interface to use (that is, whether the image is HDR or
-// not), using:
-//
-//     stbi_is_hdr(char *filename);
-//
-// ===========================================================================
-//
-// iPhone PNG support:
-//
-// We optionally support converting iPhone-formatted PNGs (which store
-// premultiplied BGRA) back to RGB, even though they're internally encoded
-// differently. To enable this conversion, call
-// stbi_convert_iphone_png_to_rgb(1).
-//
-// Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
-// pixel to remove any premultiplied alpha *only* if the image file explicitly
-// says there's premultiplied data (currently only happens in iPhone images,
-// and only if iPhone convert-to-rgb processing is on).
-//
-// ===========================================================================
-//
-// ADDITIONAL CONFIGURATION
-//
-//  - You can suppress implementation of any of the decoders to reduce
-//    your code footprint by #defining one or more of the following
-//    symbols before creating the implementation.
-//
-//        STBI_NO_JPEG
-//        STBI_NO_PNG
-//        STBI_NO_BMP
-//        STBI_NO_PSD
-//        STBI_NO_TGA
-//        STBI_NO_GIF
-//        STBI_NO_HDR
-//        STBI_NO_PIC
-//        STBI_NO_PNM   (.ppm and .pgm)
-//
-//  - You can request *only* certain decoders and suppress all other ones
-//    (this will be more forward-compatible, as addition of new decoders
-//    doesn't require you to disable them explicitly):
-//
-//        STBI_ONLY_JPEG
-//        STBI_ONLY_PNG
-//        STBI_ONLY_BMP
-//        STBI_ONLY_PSD
-//        STBI_ONLY_TGA
-//        STBI_ONLY_GIF
-//        STBI_ONLY_HDR
-//        STBI_ONLY_PIC
-//        STBI_ONLY_PNM   (.ppm and .pgm)
-//
-//   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
-//     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
-//
-//  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
-//    than that size (in either width or height) without further processing.
-//    This is to let programs in the wild set an upper bound to prevent
-//    denial-of-service attacks on untrusted data, as one could generate a
-//    valid image of gigantic dimensions and force stb_image to allocate a
-//    huge block of memory and spend disproportionate time decoding it. By
-//    default this is set to (1 << 24), which is 16777216, but that's still
-//    very big.
-
-#ifndef STBI_NO_STDIO
-#include <stdio.h>
-#endif // STBI_NO_STDIO
-
-#define STBI_VERSION 1
-
-enum
-{
-   STBI_default = 0, // only used for desired_channels
-
-   STBI_grey       = 1,
-   STBI_grey_alpha = 2,
-   STBI_rgb        = 3,
-   STBI_rgb_alpha  = 4
-};
-
-#include <stdlib.h>
-typedef unsigned char stbi_uc;
-typedef unsigned short stbi_us;
-
-#ifdef __cplusplus
-extern "C" {
-#endif
-
-#ifndef STBIDEF
-#ifdef STB_IMAGE_STATIC
-#define STBIDEF static
-#else
-#define STBIDEF extern
-#endif
-#endif
-
-//////////////////////////////////////////////////////////////////////////////
-//
-// PRIMARY API - works on images of any type
-//
-
-//
-// load image by filename, open file, or memory buffer
-//
-
-typedef struct
-{
-   int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
-   void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
-   int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
-} stbi_io_callbacks;
-
-////////////////////////////////////
-//
-// 8-bits-per-channel interface
-//
-
-STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
-STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
-
-#ifndef STBI_NO_STDIO
-STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
-STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
-// for stbi_load_from_file, file pointer is left pointing immediately after image
-#endif
-
-#ifndef STBI_NO_GIF
-STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
-#endif
-
-#ifdef STBI_WINDOWS_UTF8
-STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
-#endif
-
-////////////////////////////////////
-//
-// 16-bits-per-channel interface
-//
-
-STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
-STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
-
-#ifndef STBI_NO_STDIO
-STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
-STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
-#endif
-
-////////////////////////////////////
-//
-// float-per-channel interface
-//
-#ifndef STBI_NO_LINEAR
-   STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
-   STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
-
-   #ifndef STBI_NO_STDIO
-   STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
-   STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
-   #endif
-#endif
-
-#ifndef STBI_NO_HDR
-   STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
-   STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
-#endif // STBI_NO_HDR
-
-#ifndef STBI_NO_LINEAR
-   STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
-   STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
-#endif // STBI_NO_LINEAR
-
-// stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
-STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
-STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
-#ifndef STBI_NO_STDIO
-STBIDEF int      stbi_is_hdr          (char const *filename);
-STBIDEF int      stbi_is_hdr_from_file(FILE *f);
-#endif // STBI_NO_STDIO
-
-
-// get a VERY brief reason for failure
-// on most compilers (and ALL modern mainstream compilers) this is threadsafe
-STBIDEF const char *stbi_failure_reason  (void);
-
-// free the loaded image -- this is just free()
-STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
-
-// get image dimensions & components without fully decoding
-STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
-STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
-STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
-STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
-
-#ifndef STBI_NO_STDIO
-STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
-STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
-STBIDEF int      stbi_is_16_bit          (char const *filename);
-STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
-#endif
-
-
-
-// for image formats that explicitly notate that they have premultiplied alpha,
-// we just return the colors as stored in the file. set this flag to force
-// unpremultiplication. results are undefined if the unpremultiply overflow.
-STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
-
-// indicate whether we should process iphone images back to canonical format,
-// or just pass them through "as-is"
-STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
-
-// flip the image vertically, so the first pixel in the output array is the bottom left
-STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
-
-// as above, but only applies to images loaded on the thread that calls the function
-// this function is only available if your compiler supports thread-local variables;
-// calling it will fail to link if your compiler doesn't
-STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
-STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
-STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
-
-// ZLIB client - used by PNG, available for other purposes
-
-STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
-STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
-STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
-STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
-
-STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
-STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
-
-
-#ifdef __cplusplus
-}
-#endif
-
-//
-//
-////   end header file   /////////////////////////////////////////////////////
-#endif // STBI_INCLUDE_STB_IMAGE_H
-
-#ifdef STB_IMAGE_IMPLEMENTATION
-
-#if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
-  || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
-  || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
-  || defined(STBI_ONLY_ZLIB)
-   #ifndef STBI_ONLY_JPEG
-   #define STBI_NO_JPEG
-   #endif
-   #ifndef STBI_ONLY_PNG
-   #define STBI_NO_PNG
-   #endif
-   #ifndef STBI_ONLY_BMP
-   #define STBI_NO_BMP
-   #endif
-   #ifndef STBI_ONLY_PSD
-   #define STBI_NO_PSD
-   #endif
-   #ifndef STBI_ONLY_TGA
-   #define STBI_NO_TGA
-   #endif
-   #ifndef STBI_ONLY_GIF
-   #define STBI_NO_GIF
-   #endif
-   #ifndef STBI_ONLY_HDR
-   #define STBI_NO_HDR
-   #endif
-   #ifndef STBI_ONLY_PIC
-   #define STBI_NO_PIC
-   #endif
-   #ifndef STBI_ONLY_PNM
-   #define STBI_NO_PNM
-   #endif
-#endif
-
-#if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
-#define STBI_NO_ZLIB
-#endif
-
-
-#include <stdarg.h>
-#include <stddef.h> // ptrdiff_t on osx
-#include <stdlib.h>
-#include <string.h>
-#include <limits.h>
-
-#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
-#include <math.h>  // ldexp, pow
-#endif
-
-#ifndef STBI_NO_STDIO
-#include <stdio.h>
-#endif
-
-#ifndef STBI_ASSERT
-#include <assert.h>
-#define STBI_ASSERT(x) assert(x)
-#endif
-
-#ifdef __cplusplus
-#define STBI_EXTERN extern "C"
-#else
-#define STBI_EXTERN extern
-#endif
-
-
-#ifndef _MSC_VER
-   #ifdef __cplusplus
-   #define stbi_inline inline
-   #else
-   #define stbi_inline
-   #endif
-#else
-   #define stbi_inline __forceinline
-#endif
-
-#ifndef STBI_NO_THREAD_LOCALS
-   #if defined(__cplusplus) &&  __cplusplus >= 201103L
-      #define STBI_THREAD_LOCAL       thread_local
-   #elif defined(__GNUC__) && __GNUC__ < 5
-      #define STBI_THREAD_LOCAL       __thread
-   #elif defined(_MSC_VER)
-      #define STBI_THREAD_LOCAL       __declspec(thread)
-   #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
-      #define STBI_THREAD_LOCAL       _Thread_local
-   #endif
-
-   #ifndef STBI_THREAD_LOCAL
-      #if defined(__GNUC__)
-        #define STBI_THREAD_LOCAL       __thread
-      #endif
-   #endif
-#endif
-
-#if defined(_MSC_VER) || defined(__SYMBIAN32__)
-typedef unsigned short stbi__uint16;
-typedef   signed short stbi__int16;
-typedef unsigned int   stbi__uint32;
-typedef   signed int   stbi__int32;
-#else
-#include <stdint.h>
-typedef uint16_t stbi__uint16;
-typedef int16_t  stbi__int16;
-typedef uint32_t stbi__uint32;
-typedef int32_t  stbi__int32;
-#endif
-
-// should produce compiler error if size is wrong
-typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
-
-#ifdef _MSC_VER
-#define STBI_NOTUSED(v)  (void)(v)
-#else
-#define STBI_NOTUSED(v)  (void)sizeof(v)
-#endif
-
-#ifdef _MSC_VER
-#define STBI_HAS_LROTL
-#endif
-
-#ifdef STBI_HAS_LROTL
-   #define stbi_lrot(x,y)  _lrotl(x,y)
-#else
-   #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (-(y) & 31)))
-#endif
-
-#if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
-// ok
-#elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
-// ok
-#else
-#error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
-#endif
-
-#ifndef STBI_MALLOC
-#define STBI_MALLOC(sz)           malloc(sz)
-#define STBI_REALLOC(p,newsz)     realloc(p,newsz)
-#define STBI_FREE(p)              free(p)
-#endif
-
-#ifndef STBI_REALLOC_SIZED
-#define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
-#endif
-
-// x86/x64 detection
-#if defined(__x86_64__) || defined(_M_X64)
-#define STBI__X64_TARGET
-#elif defined(__i386) || defined(_M_IX86)
-#define STBI__X86_TARGET
-#endif
-
-#if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
-// gcc doesn't support sse2 intrinsics unless you compile with -msse2,
-// which in turn means it gets to use SSE2 everywhere. This is unfortunate,
-// but previous attempts to provide the SSE2 functions with runtime
-// detection caused numerous issues. The way architecture extensions are
-// exposed in GCC/Clang is, sadly, not really suited for one-file libs.
-// New behavior: if compiled with -msse2, we use SSE2 without any
-// detection; if not, we don't use it at all.
-#define STBI_NO_SIMD
-#endif
-
-#if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
-// Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
-//
-// 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
-// Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
-// As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
-// simultaneously enabling "-mstackrealign".
-//
-// See https://github.com/nothings/stb/issues/81 for more information.
-//
-// So default to no SSE2 on 32-bit MinGW. If you've read this far and added
-// -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
-#define STBI_NO_SIMD
-#endif
-
-#if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
-#define STBI_SSE2
-#include <emmintrin.h>
-
-#ifdef _MSC_VER
-
-#if _MSC_VER >= 1400  // not VC6
-#include <intrin.h> // __cpuid
-static int stbi__cpuid3(void)
-{
-   int info[4];
-   __cpuid(info,1);
-   return info[3];
-}
-#else
-static int stbi__cpuid3(void)
-{
-   int res;
-   __asm {
-      mov  eax,1
-      cpuid
-      mov  res,edx
-   }
-   return res;
-}
-#endif
-
-#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
-
-#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
-static int stbi__sse2_available(void)
-{
-   int info3 = stbi__cpuid3();
-   return ((info3 >> 26) & 1) != 0;
-}
-#endif
-
-#else // assume GCC-style if not VC++
-#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
-
-#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
-static int stbi__sse2_available(void)
-{
-   // If we're even attempting to compile this on GCC/Clang, that means
-   // -msse2 is on, which means the compiler is allowed to use SSE2
-   // instructions at will, and so are we.
-   return 1;
-}
-#endif
-
-#endif
-#endif
-
-// ARM NEON
-#if defined(STBI_NO_SIMD) && defined(STBI_NEON)
-#undef STBI_NEON
-#endif
-
-#ifdef STBI_NEON
-#include <arm_neon.h>
-#ifdef _MSC_VER
-#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
-#else
-#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
-#endif
-#endif
-
-#ifndef STBI_SIMD_ALIGN
-#define STBI_SIMD_ALIGN(type, name) type name
-#endif
-
-#ifndef STBI_MAX_DIMENSIONS
-#define STBI_MAX_DIMENSIONS (1 << 24)
-#endif
-
-///////////////////////////////////////////////
-//
-//  stbi__context struct and start_xxx functions
-
-// stbi__context structure is our basic context used by all images, so it
-// contains all the IO context, plus some basic image information
-typedef struct
-{
-   stbi__uint32 img_x, img_y;
-   int img_n, img_out_n;
-
-   stbi_io_callbacks io;
-   void *io_user_data;
-
-   int read_from_callbacks;
-   int buflen;
-   stbi_uc buffer_start[128];
-   int callback_already_read;
-
-   stbi_uc *img_buffer, *img_buffer_end;
-   stbi_uc *img_buffer_original, *img_buffer_original_end;
-} stbi__context;
-
-
-static void stbi__refill_buffer(stbi__context *s);
-
-// initialize a memory-decode context
-static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
-{
-   s->io.read = NULL;
-   s->read_from_callbacks = 0;
-   s->callback_already_read = 0;
-   s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
-   s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
-}
-
-// initialize a callback-based context
-static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
-{
-   s->io = *c;
-   s->io_user_data = user;
-   s->buflen = sizeof(s->buffer_start);
-   s->read_from_callbacks = 1;
-   s->callback_already_read = 0;
-   s->img_buffer = s->img_buffer_original = s->buffer_start;
-   stbi__refill_buffer(s);
-   s->img_buffer_original_end = s->img_buffer_end;
-}
-
-#ifndef STBI_NO_STDIO
-
-static int stbi__stdio_read(void *user, char *data, int size)
-{
-   return (int) fread(data,1,size,(FILE*) user);
-}
-
-static void stbi__stdio_skip(void *user, int n)
-{
-   int ch;
-   fseek((FILE*) user, n, SEEK_CUR);
-   ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
-   if (ch != EOF) {
-      ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
-   }
-}
-
-static int stbi__stdio_eof(void *user)
-{
-   return feof((FILE*) user) || ferror((FILE *) user);
-}
-
-static stbi_io_callbacks stbi__stdio_callbacks =
-{
-   stbi__stdio_read,
-   stbi__stdio_skip,
-   stbi__stdio_eof,
-};
-
-static void stbi__start_file(stbi__context *s, FILE *f)
-{
-   stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
-}
-
-//static void stop_file(stbi__context *s) { }
-
-#endif // !STBI_NO_STDIO
-
-static void stbi__rewind(stbi__context *s)
-{
-   // conceptually rewind SHOULD rewind to the beginning of the stream,
-   // but we just rewind to the beginning of the initial buffer, because
-   // we only use it after doing 'test', which only ever looks at at most 92 bytes
-   s->img_buffer = s->img_buffer_original;
-   s->img_buffer_end = s->img_buffer_original_end;
-}
-
-enum
-{
-   STBI_ORDER_RGB,
-   STBI_ORDER_BGR
-};
-
-typedef struct
-{
-   int bits_per_channel;
-   int num_channels;
-   int channel_order;
-} stbi__result_info;
-
-#ifndef STBI_NO_JPEG
-static int      stbi__jpeg_test(stbi__context *s);
-static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
-static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
-#endif
-
-#ifndef STBI_NO_PNG
-static int      stbi__png_test(stbi__context *s);
-static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
-static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
-static int      stbi__png_is16(stbi__context *s);
-#endif
-
-#ifndef STBI_NO_BMP
-static int      stbi__bmp_test(stbi__context *s);
-static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
-static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
-#endif
-
-#ifndef STBI_NO_TGA
-static int      stbi__tga_test(stbi__context *s);
-static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
-static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
-#endif
-
-#ifndef STBI_NO_PSD
-static int      stbi__psd_test(stbi__context *s);
-static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
-static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
-static int      stbi__psd_is16(stbi__context *s);
-#endif
-
-#ifndef STBI_NO_HDR
-static int      stbi__hdr_test(stbi__context *s);
-static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
-static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
-#endif
-
-#ifndef STBI_NO_PIC
-static int      stbi__pic_test(stbi__context *s);
-static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
-static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
-#endif
-
-#ifndef STBI_NO_GIF
-static int      stbi__gif_test(stbi__context *s);
-static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
-static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
-static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
-#endif
-
-#ifndef STBI_NO_PNM
-static int      stbi__pnm_test(stbi__context *s);
-static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
-static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
-static int      stbi__pnm_is16(stbi__context *s);
-#endif
-
-static
-#ifdef STBI_THREAD_LOCAL
-STBI_THREAD_LOCAL
-#endif
-const char *stbi__g_failure_reason;
-
-STBIDEF const char *stbi_failure_reason(void)
-{
-   return stbi__g_failure_reason;
-}
-
-#ifndef STBI_NO_FAILURE_STRINGS
-static int stbi__err(const char *str)
-{
-   stbi__g_failure_reason = str;
-   return 0;
-}
-#endif
-
-static void *stbi__malloc(size_t size)
-{
-    return STBI_MALLOC(size);
-}
-
-// stb_image uses ints pervasively, including for offset calculations.
-// therefore the largest decoded image size we can support with the
-// current code, even on 64-bit targets, is INT_MAX. this is not a
-// significant limitation for the intended use case.
-//
-// we do, however, need to make sure our size calculations don't
-// overflow. hence a few helper functions for size calculations that
-// multiply integers together, making sure that they're non-negative
-// and no overflow occurs.
-
-// return 1 if the sum is valid, 0 on overflow.
-// negative terms are considered invalid.
-static int stbi__addsizes_valid(int a, int b)
-{
-   if (b < 0) return 0;
-   // now 0 <= b <= INT_MAX, hence also
-   // 0 <= INT_MAX - b <= INTMAX.
-   // And "a + b <= INT_MAX" (which might overflow) is the
-   // same as a <= INT_MAX - b (no overflow)
-   return a <= INT_MAX - b;
-}
-
-// returns 1 if the product is valid, 0 on overflow.
-// negative factors are considered invalid.
-static int stbi__mul2sizes_valid(int a, int b)
-{
-   if (a < 0 || b < 0) return 0;
-   if (b == 0) return 1; // mul-by-0 is always safe
-   // portable way to check for no overflows in a*b
-   return a <= INT_MAX/b;
-}
-
-#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
-// returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
-static int stbi__mad2sizes_valid(int a, int b, int add)
-{
-   return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
-}
-#endif
-
-// returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
-static int stbi__mad3sizes_valid(int a, int b, int c, int add)
-{
-   return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
-      stbi__addsizes_valid(a*b*c, add);
-}
-
-// returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
-#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
-static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
-{
-   return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
-      stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
-}
-#endif
-
-#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
-// mallocs with size overflow checking
-static void *stbi__malloc_mad2(int a, int b, int add)
-{
-   if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
-   return stbi__malloc(a*b + add);
-}
-#endif
-
-static void *stbi__malloc_mad3(int a, int b, int c, int add)
-{
-   if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
-   return stbi__malloc(a*b*c + add);
-}
-
-#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
-static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
-{
-   if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
-   return stbi__malloc(a*b*c*d + add);
-}
-#endif
-
-// returns 1 if the sum of two signed ints is valid (between -2^31 and 2^31-1 inclusive), 0 on overflow.
-static int stbi__addints_valid(int a, int b)
-{
-   if ((a >= 0) != (b >= 0)) return 1; // a and b have different signs, so no overflow
-   if (a < 0 && b < 0) return a >= INT_MIN - b; // same as a + b >= INT_MIN; INT_MIN - b cannot overflow since b < 0.
-   return a <= INT_MAX - b;
-}
-
-// returns 1 if the product of two ints fits in a signed short, 0 on overflow.
-static int stbi__mul2shorts_valid(int a, int b)
-{
-   if (b == 0 || b == -1) return 1; // multiplication by 0 is always 0; check for -1 so SHRT_MIN/b doesn't overflow
-   if ((a >= 0) == (b >= 0)) return a <= SHRT_MAX/b; // product is positive, so similar to mul2sizes_valid
-   if (b < 0) return a <= SHRT_MIN / b; // same as a * b >= SHRT_MIN
-   return a >= SHRT_MIN / b;
-}
-
-// stbi__err - error
-// stbi__errpf - error returning pointer to float
-// stbi__errpuc - error returning pointer to unsigned char
-
-#ifdef STBI_NO_FAILURE_STRINGS
-   #define stbi__err(x,y)  0
-#elif defined(STBI_FAILURE_USERMSG)
-   #define stbi__err(x,y)  stbi__err(y)
-#else
-   #define stbi__err(x,y)  stbi__err(x)
-#endif
-
-#define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
-#define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
-
-STBIDEF void stbi_image_free(void *retval_from_stbi_load)
-{
-   STBI_FREE(retval_from_stbi_load);
-}
-
-#ifndef STBI_NO_LINEAR
-static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
-#endif
-
-#ifndef STBI_NO_HDR
-static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
-#endif
-
-static int stbi__vertically_flip_on_load_global = 0;
-
-STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
-{
-   stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
-}
-
-#ifndef STBI_THREAD_LOCAL
-#define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
-#else
-static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
-
-STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
-{
-   stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
-   stbi__vertically_flip_on_load_set = 1;
-}
-
-#define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
-                                         ? stbi__vertically_flip_on_load_local  \
-                                         : stbi__vertically_flip_on_load_global)
-#endif // STBI_THREAD_LOCAL
-
-static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
-{
-   memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
-   ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
-   ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
-   ri->num_channels = 0;
-
-   // test the formats with a very explicit header first (at least a FOURCC
-   // or distinctive magic number first)
-   #ifndef STBI_NO_PNG
-   if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
-   #endif
-   #ifndef STBI_NO_BMP
-   if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
-   #endif
-   #ifndef STBI_NO_GIF
-   if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
-   #endif
-   #ifndef STBI_NO_PSD
-   if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
-   #else
-   STBI_NOTUSED(bpc);
-   #endif
-   #ifndef STBI_NO_PIC
-   if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
-   #endif
-
-   // then the formats that can end up attempting to load with just 1 or 2
-   // bytes matching expectations; these are prone to false positives, so
-   // try them later
-   #ifndef STBI_NO_JPEG
-   if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
-   #endif
-   #ifndef STBI_NO_PNM
-   if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
-   #endif
-
-   #ifndef STBI_NO_HDR
-   if (stbi__hdr_test(s)) {
-      float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
-      return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
-   }
-   #endif
-
-   #ifndef STBI_NO_TGA
-   // test tga last because it's a crappy test!
-   if (stbi__tga_test(s))
-      return stbi__tga_load(s,x,y,comp,req_comp, ri);
-   #endif
-
-   return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
-}
-
-static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
-{
-   int i;
-   int img_len = w * h * channels;
-   stbi_uc *reduced;
-
-   reduced = (stbi_uc *) stbi__malloc(img_len);
-   if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
-
-   for (i = 0; i < img_len; ++i)
-      reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
-
-   STBI_FREE(orig);
-   return reduced;
-}
-
-static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
-{
-   int i;
-   int img_len = w * h * channels;
-   stbi__uint16 *enlarged;
-
-   enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
-   if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
-
-   for (i = 0; i < img_len; ++i)
-      enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
-
-   STBI_FREE(orig);
-   return enlarged;
-}
-
-static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
-{
-   int row;
-   size_t bytes_per_row = (size_t)w * bytes_per_pixel;
-   stbi_uc temp[2048];
-   stbi_uc *bytes = (stbi_uc *)image;
-
-   for (row = 0; row < (h>>1); row++) {
-      stbi_uc *row0 = bytes + row*bytes_per_row;
-      stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
-      // swap row0 with row1
-      size_t bytes_left = bytes_per_row;
-      while (bytes_left) {
-         size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
-         memcpy(temp, row0, bytes_copy);
-         memcpy(row0, row1, bytes_copy);
-         memcpy(row1, temp, bytes_copy);
-         row0 += bytes_copy;
-         row1 += bytes_copy;
-         bytes_left -= bytes_copy;
-      }
-   }
-}
-
-#ifndef STBI_NO_GIF
-static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
-{
-   int slice;
-   int slice_size = w * h * bytes_per_pixel;
-
-   stbi_uc *bytes = (stbi_uc *)image;
-   for (slice = 0; slice < z; ++slice) {
-      stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
-      bytes += slice_size;
-   }
-}
-#endif
-
-static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
-{
-   stbi__result_info ri;
-   void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
-
-   if (result == NULL)
-      return NULL;
-
-   // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
-   STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
-
-   if (ri.bits_per_channel != 8) {
-      result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
-      ri.bits_per_channel = 8;
-   }
-
-   // @TODO: move stbi__convert_format to here
-
-   if (stbi__vertically_flip_on_load) {
-      int channels = req_comp ? req_comp : *comp;
-      stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
-   }
-
-   return (unsigned char *) result;
-}
-
-static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
-{
-   stbi__result_info ri;
-   void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
-
-   if (result == NULL)
-      return NULL;
-
-   // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
-   STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
-
-   if (ri.bits_per_channel != 16) {
-      result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
-      ri.bits_per_channel = 16;
-   }
-
-   // @TODO: move stbi__convert_format16 to here
-   // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
-
-   if (stbi__vertically_flip_on_load) {
-      int channels = req_comp ? req_comp : *comp;
-      stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
-   }
-
-   return (stbi__uint16 *) result;
-}
-
-#if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
-static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
-{
-   if (stbi__vertically_flip_on_load && result != NULL) {
-      int channels = req_comp ? req_comp : *comp;
-      stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
-   }
-}
-#endif
-
-#ifndef STBI_NO_STDIO
-
-#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
-STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
-STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
-#endif
-
-#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
-STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
-{
-	return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
-}
-#endif
-
-static FILE *stbi__fopen(char const *filename, char const *mode)
-{
-   FILE *f;
-#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
-   wchar_t wMode[64];
-   wchar_t wFilename[1024];
-	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
-      return 0;
-
-	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
-      return 0;
-
-#if defined(_MSC_VER) && _MSC_VER >= 1400
-	if (0 != _wfopen_s(&f, wFilename, wMode))
-		f = 0;
-#else
-   f = _wfopen(wFilename, wMode);
-#endif
-
-#elif defined(_MSC_VER) && _MSC_VER >= 1400
-   if (0 != fopen_s(&f, filename, mode))
-      f=0;
-#else
-   f = fopen(filename, mode);
-#endif
-   return f;
-}
-
-
-STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
-{
-   FILE *f = stbi__fopen(filename, "rb");
-   unsigned char *result;
-   if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
-   result = stbi_load_from_file(f,x,y,comp,req_comp);
-   fclose(f);
-   return result;
-}
-
-STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
-{
-   unsigned char *result;
-   stbi__context s;
-   stbi__start_file(&s,f);
-   result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
-   if (result) {
-      // need to 'unget' all the characters in the IO buffer
-      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
-   }
-   return result;
-}
-
-STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
-{
-   stbi__uint16 *result;
-   stbi__context s;
-   stbi__start_file(&s,f);
-   result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
-   if (result) {
-      // need to 'unget' all the characters in the IO buffer
-      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
-   }
-   return result;
-}
-
-STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
-{
-   FILE *f = stbi__fopen(filename, "rb");
-   stbi__uint16 *result;
-   if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
-   result = stbi_load_from_file_16(f,x,y,comp,req_comp);
-   fclose(f);
-   return result;
-}
-
-
-#endif //!STBI_NO_STDIO
-
-STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
-{
-   stbi__context s;
-   stbi__start_mem(&s,buffer,len);
-   return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
-}
-
-STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
-{
-   stbi__context s;
-   stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
-   return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
-}
-
-STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
-{
-   stbi__context s;
-   stbi__start_mem(&s,buffer,len);
-   return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
-}
-
-STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
-{
-   stbi__context s;
-   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
-   return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
-}
-
-#ifndef STBI_NO_GIF
-STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
-{
-   unsigned char *result;
-   stbi__context s;
-   stbi__start_mem(&s,buffer,len);
-
-   result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
-   if (stbi__vertically_flip_on_load) {
-      stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
-   }
-
-   return result;
-}
-#endif
-
-#ifndef STBI_NO_LINEAR
-static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
-{
-   unsigned char *data;
-   #ifndef STBI_NO_HDR
-   if (stbi__hdr_test(s)) {
-      stbi__result_info ri;
-      float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
-      if (hdr_data)
-         stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
-      return hdr_data;
-   }
-   #endif
-   data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
-   if (data)
-      return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
-   return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
-}
-
-STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
-{
-   stbi__context s;
-   stbi__start_mem(&s,buffer,len);
-   return stbi__loadf_main(&s,x,y,comp,req_comp);
-}
-
-STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
-{
-   stbi__context s;
-   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
-   return stbi__loadf_main(&s,x,y,comp,req_comp);
-}
-
-#ifndef STBI_NO_STDIO
-STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
-{
-   float *result;
-   FILE *f = stbi__fopen(filename, "rb");
-   if (!f) return stbi__errpf("can't fopen", "Unable to open file");
-   result = stbi_loadf_from_file(f,x,y,comp,req_comp);
-   fclose(f);
-   return result;
-}
-
-STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
-{
-   stbi__context s;
-   stbi__start_file(&s,f);
-   return stbi__loadf_main(&s,x,y,comp,req_comp);
-}
-#endif // !STBI_NO_STDIO
-
-#endif // !STBI_NO_LINEAR
-
-// these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
-// defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
-// reports false!
-
-STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
-{
-   #ifndef STBI_NO_HDR
-   stbi__context s;
-   stbi__start_mem(&s,buffer,len);
-   return stbi__hdr_test(&s);
-   #else
-   STBI_NOTUSED(buffer);
-   STBI_NOTUSED(len);
-   return 0;
-   #endif
-}
-
-#ifndef STBI_NO_STDIO
-STBIDEF int      stbi_is_hdr          (char const *filename)
-{
-   FILE *f = stbi__fopen(filename, "rb");
-   int result=0;
-   if (f) {
-      result = stbi_is_hdr_from_file(f);
-      fclose(f);
-   }
-   return result;
-}
-
-STBIDEF int stbi_is_hdr_from_file(FILE *f)
-{
-   #ifndef STBI_NO_HDR
-   long pos = ftell(f);
-   int res;
-   stbi__context s;
-   stbi__start_file(&s,f);
-   res = stbi__hdr_test(&s);
-   fseek(f, pos, SEEK_SET);
-   return res;
-   #else
-   STBI_NOTUSED(f);
-   return 0;
-   #endif
-}
-#endif // !STBI_NO_STDIO
-
-STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
-{
-   #ifndef STBI_NO_HDR
-   stbi__context s;
-   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
-   return stbi__hdr_test(&s);
-   #else
-   STBI_NOTUSED(clbk);
-   STBI_NOTUSED(user);
-   return 0;
-   #endif
-}
-
-#ifndef STBI_NO_LINEAR
-static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
-
-STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
-STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
-#endif
-
-static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
-
-STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
-STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
-
-
-//////////////////////////////////////////////////////////////////////////////
-//
-// Common code used by all image loaders
-//
-
-enum
-{
-   STBI__SCAN_load=0,
-   STBI__SCAN_type,
-   STBI__SCAN_header
-};
-
-static void stbi__refill_buffer(stbi__context *s)
-{
-   int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
-   s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
-   if (n == 0) {
-      // at end of file, treat same as if from memory, but need to handle case
-      // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
-      s->read_from_callbacks = 0;
-      s->img_buffer = s->buffer_start;
-      s->img_buffer_end = s->buffer_start+1;
-      *s->img_buffer = 0;
-   } else {
-      s->img_buffer = s->buffer_start;
-      s->img_buffer_end = s->buffer_start + n;
-   }
-}
-
-stbi_inline static stbi_uc stbi__get8(stbi__context *s)
-{
-   if (s->img_buffer < s->img_buffer_end)
-      return *s->img_buffer++;
-   if (s->read_from_callbacks) {
-      stbi__refill_buffer(s);
-      return *s->img_buffer++;
-   }
-   return 0;
-}
-
-#if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
-// nothing
-#else
-stbi_inline static int stbi__at_eof(stbi__context *s)
-{
-   if (s->io.read) {
-      if (!(s->io.eof)(s->io_user_data)) return 0;
-      // if feof() is true, check if buffer = end
-      // special case: we've only got the special 0 character at the end
-      if (s->read_from_callbacks == 0) return 1;
-   }
-
-   return s->img_buffer >= s->img_buffer_end;
-}
-#endif
-
-#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
-// nothing
-#else
-static void stbi__skip(stbi__context *s, int n)
-{
-   if (n == 0) return;  // already there!
-   if (n < 0) {
-      s->img_buffer = s->img_buffer_end;
-      return;
-   }
-   if (s->io.read) {
-      int blen = (int) (s->img_buffer_end - s->img_buffer);
-      if (blen < n) {
-         s->img_buffer = s->img_buffer_end;
-         (s->io.skip)(s->io_user_data, n - blen);
-         return;
-      }
-   }
-   s->img_buffer += n;
-}
-#endif
-
-#if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
-// nothing
-#else
-static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
-{
-   if (s->io.read) {
-      int blen = (int) (s->img_buffer_end - s->img_buffer);
-      if (blen < n) {
-         int res, count;
-
-         memcpy(buffer, s->img_buffer, blen);
-
-         count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
-         res = (count == (n-blen));
-         s->img_buffer = s->img_buffer_end;
-         return res;
-      }
-   }
-
-   if (s->img_buffer+n <= s->img_buffer_end) {
-      memcpy(buffer, s->img_buffer, n);
-      s->img_buffer += n;
-      return 1;
-   } else
-      return 0;
-}
-#endif
-
-#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
-// nothing
-#else
-static int stbi__get16be(stbi__context *s)
-{
-   int z = stbi__get8(s);
-   return (z << 8) + stbi__get8(s);
-}
-#endif
-
-#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
-// nothing
-#else
-static stbi__uint32 stbi__get32be(stbi__context *s)
-{
-   stbi__uint32 z = stbi__get16be(s);
-   return (z << 16) + stbi__get16be(s);
-}
-#endif
-
-#if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
-// nothing
-#else
-static int stbi__get16le(stbi__context *s)
-{
-   int z = stbi__get8(s);
-   return z + (stbi__get8(s) << 8);
-}
-#endif
-
-#ifndef STBI_NO_BMP
-static stbi__uint32 stbi__get32le(stbi__context *s)
-{
-   stbi__uint32 z = stbi__get16le(s);
-   z += (stbi__uint32)stbi__get16le(s) << 16;
-   return z;
-}
-#endif
-
-#define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
-
-#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
-// nothing
-#else
-//////////////////////////////////////////////////////////////////////////////
-//
-//  generic converter from built-in img_n to req_comp
-//    individual types do this automatically as much as possible (e.g. jpeg
-//    does all cases internally since it needs to colorspace convert anyway,
-//    and it never has alpha, so very few cases ). png can automatically
-//    interleave an alpha=255 channel, but falls back to this for other cases
-//
-//  assume data buffer is malloced, so malloc a new one and free that one
-//  only failure mode is malloc failing
-
-static stbi_uc stbi__compute_y(int r, int g, int b)
-{
-   return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
-}
-#endif
-
-#if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
-// nothing
-#else
-static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
-{
-   int i,j;
-   unsigned char *good;
-
-   if (req_comp == img_n) return data;
-   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
-
-   good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
-   if (good == NULL) {
-      STBI_FREE(data);
-      return stbi__errpuc("outofmem", "Out of memory");
-   }
-
-   for (j=0; j < (int) y; ++j) {
-      unsigned char *src  = data + j * x * img_n   ;
-      unsigned char *dest = good + j * x * req_comp;
-
-      #define STBI__COMBO(a,b)  ((a)*8+(b))
-      #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
-      // convert source image with img_n components to one with req_comp components;
-      // avoid switch per pixel, so use switch per scanline and massive macros
-      switch (STBI__COMBO(img_n, req_comp)) {
-         STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
-         STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
-         STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
-         STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
-         STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
-         STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
-         STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
-         STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
-         STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
-         STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
-         STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
-         STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
-         default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
-      }
-      #undef STBI__CASE
-   }
-
-   STBI_FREE(data);
-   return good;
-}
-#endif
-
-#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
-// nothing
-#else
-static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
-{
-   return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
-}
-#endif
-
-#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
-// nothing
-#else
-static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
-{
-   int i,j;
-   stbi__uint16 *good;
-
-   if (req_comp == img_n) return data;
-   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
-
-   good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
-   if (good == NULL) {
-      STBI_FREE(data);
-      return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
-   }
-
-   for (j=0; j < (int) y; ++j) {
-      stbi__uint16 *src  = data + j * x * img_n   ;
-      stbi__uint16 *dest = good + j * x * req_comp;
-
-      #define STBI__COMBO(a,b)  ((a)*8+(b))
-      #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
-      // convert source image with img_n components to one with req_comp components;
-      // avoid switch per pixel, so use switch per scanline and massive macros
-      switch (STBI__COMBO(img_n, req_comp)) {
-         STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
-         STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
-         STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
-         STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
-         STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
-         STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
-         STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
-         STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
-         STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
-         STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
-         STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
-         STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
-         default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
-      }
-      #undef STBI__CASE
-   }
-
-   STBI_FREE(data);
-   return good;
-}
-#endif
-
-#ifndef STBI_NO_LINEAR
-static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
-{
-   int i,k,n;
-   float *output;
-   if (!data) return NULL;
-   output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
-   if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
-   // compute number of non-alpha components
-   if (comp & 1) n = comp; else n = comp-1;
-   for (i=0; i < x*y; ++i) {
-      for (k=0; k < n; ++k) {
-         output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
-      }
-   }
-   if (n < comp) {
-      for (i=0; i < x*y; ++i) {
-         output[i*comp + n] = data[i*comp + n]/255.0f;
-      }
-   }
-   STBI_FREE(data);
-   return output;
-}
-#endif
-
-#ifndef STBI_NO_HDR
-#define stbi__float2int(x)   ((int) (x))
-static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
-{
-   int i,k,n;
-   stbi_uc *output;
-   if (!data) return NULL;
-   output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
-   if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
-   // compute number of non-alpha components
-   if (comp & 1) n = comp; else n = comp-1;
-   for (i=0; i < x*y; ++i) {
-      for (k=0; k < n; ++k) {
-         float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
-         if (z < 0) z = 0;
-         if (z > 255) z = 255;
-         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
-      }
-      if (k < comp) {
-         float z = data[i*comp+k] * 255 + 0.5f;
-         if (z < 0) z = 0;
-         if (z > 255) z = 255;
-         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
-      }
-   }
-   STBI_FREE(data);
-   return output;
-}
-#endif
-
-//////////////////////////////////////////////////////////////////////////////
-//
-//  "baseline" JPEG/JFIF decoder
-//
-//    simple implementation
-//      - doesn't support delayed output of y-dimension
-//      - simple interface (only one output format: 8-bit interleaved RGB)
-//      - doesn't try to recover corrupt jpegs
-//      - doesn't allow partial loading, loading multiple at once
-//      - still fast on x86 (copying globals into locals doesn't help x86)
-//      - allocates lots of intermediate memory (full size of all components)
-//        - non-interleaved case requires this anyway
-//        - allows good upsampling (see next)
-//    high-quality
-//      - upsampled channels are bilinearly interpolated, even across blocks
-//      - quality integer IDCT derived from IJG's 'slow'
-//    performance
-//      - fast huffman; reasonable integer IDCT
-//      - some SIMD kernels for common paths on targets with SSE2/NEON
-//      - uses a lot of intermediate memory, could cache poorly
-
-#ifndef STBI_NO_JPEG
-
-// huffman decoding acceleration
-#define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
-
-typedef struct
-{
-   stbi_uc  fast[1 << FAST_BITS];
-   // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
-   stbi__uint16 code[256];
-   stbi_uc  values[256];
-   stbi_uc  size[257];
-   unsigned int maxcode[18];
-   int    delta[17];   // old 'firstsymbol' - old 'firstcode'
-} stbi__huffman;
-
-typedef struct
-{
-   stbi__context *s;
-   stbi__huffman huff_dc[4];
-   stbi__huffman huff_ac[4];
-   stbi__uint16 dequant[4][64];
-   stbi__int16 fast_ac[4][1 << FAST_BITS];
-
-// sizes for components, interleaved MCUs
-   int img_h_max, img_v_max;
-   int img_mcu_x, img_mcu_y;
-   int img_mcu_w, img_mcu_h;
-
-// definition of jpeg image component
-   struct
-   {
-      int id;
-      int h,v;
-      int tq;
-      int hd,ha;
-      int dc_pred;
-
-      int x,y,w2,h2;
-      stbi_uc *data;
-      void *raw_data, *raw_coeff;
-      stbi_uc *linebuf;
-      short   *coeff;   // progressive only
-      int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
-   } img_comp[4];
-
-   stbi__uint32   code_buffer; // jpeg entropy-coded buffer
-   int            code_bits;   // number of valid bits
-   unsigned char  marker;      // marker seen while filling entropy buffer
-   int            nomore;      // flag if we saw a marker so must stop
-
-   int            progressive;
-   int            spec_start;
-   int            spec_end;
-   int            succ_high;
-   int            succ_low;
-   int            eob_run;
-   int            jfif;
-   int            app14_color_transform; // Adobe APP14 tag
-   int            rgb;
-
-   int scan_n, order[4];
-   int restart_interval, todo;
-
-// kernels
-   void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
-   void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
-   stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
-} stbi__jpeg;
-
-static int stbi__build_huffman(stbi__huffman *h, int *count)
-{
-   int i,j,k=0;
-   unsigned int code;
-   // build size list for each symbol (from JPEG spec)
-   for (i=0; i < 16; ++i) {
-      for (j=0; j < count[i]; ++j) {
-         h->size[k++] = (stbi_uc) (i+1);
-         if(k >= 257) return stbi__err("bad size list","Corrupt JPEG");
-      }
-   }
-   h->size[k] = 0;
-
-   // compute actual symbols (from jpeg spec)
-   code = 0;
-   k = 0;
-   for(j=1; j <= 16; ++j) {
-      // compute delta to add to code to compute symbol id
-      h->delta[j] = k - code;
-      if (h->size[k] == j) {
-         while (h->size[k] == j)
-            h->code[k++] = (stbi__uint16) (code++);
-         if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
-      }
-      // compute largest code + 1 for this size, preshifted as needed later
-      h->maxcode[j] = code << (16-j);
-      code <<= 1;
-   }
-   h->maxcode[j] = 0xffffffff;
-
-   // build non-spec acceleration table; 255 is flag for not-accelerated
-   memset(h->fast, 255, 1 << FAST_BITS);
-   for (i=0; i < k; ++i) {
-      int s = h->size[i];
-      if (s <= FAST_BITS) {
-         int c = h->code[i] << (FAST_BITS-s);
-         int m = 1 << (FAST_BITS-s);
-         for (j=0; j < m; ++j) {
-            h->fast[c+j] = (stbi_uc) i;
-         }
-      }
-   }
-   return 1;
-}
-
-// build a table that decodes both magnitude and value of small ACs in
-// one go.
-static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
-{
-   int i;
-   for (i=0; i < (1 << FAST_BITS); ++i) {
-      stbi_uc fast = h->fast[i];
-      fast_ac[i] = 0;
-      if (fast < 255) {
-         int rs = h->values[fast];
-         int run = (rs >> 4) & 15;
-         int magbits = rs & 15;
-         int len = h->size[fast];
-
-         if (magbits && len + magbits <= FAST_BITS) {
-            // magnitude code followed by receive_extend code
-            int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
-            int m = 1 << (magbits - 1);
-            if (k < m) k += (~0U << magbits) + 1;
-            // if the result is small enough, we can fit it in fast_ac table
-            if (k >= -128 && k <= 127)
-               fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
-         }
-      }
-   }
-}
-
-static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
-{
-   do {
-      unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
-      if (b == 0xff) {
-         int c = stbi__get8(j->s);
-         while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
-         if (c != 0) {
-            j->marker = (unsigned char) c;
-            j->nomore = 1;
-            return;
-         }
-      }
-      j->code_buffer |= b << (24 - j->code_bits);
-      j->code_bits += 8;
-   } while (j->code_bits <= 24);
-}
-
-// (1 << n) - 1
-static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
-
-// decode a jpeg huffman value from the bitstream
-stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
-{
-   unsigned int temp;
-   int c,k;
-
-   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
-
-   // look at the top FAST_BITS and determine what symbol ID it is,
-   // if the code is <= FAST_BITS
-   c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
-   k = h->fast[c];
-   if (k < 255) {
-      int s = h->size[k];
-      if (s > j->code_bits)
-         return -1;
-      j->code_buffer <<= s;
-      j->code_bits -= s;
-      return h->values[k];
-   }
-
-   // naive test is to shift the code_buffer down so k bits are
-   // valid, then test against maxcode. To speed this up, we've
-   // preshifted maxcode left so that it has (16-k) 0s at the
-   // end; in other words, regardless of the number of bits, it
-   // wants to be compared against something shifted to have 16;
-   // that way we don't need to shift inside the loop.
-   temp = j->code_buffer >> 16;
-   for (k=FAST_BITS+1 ; ; ++k)
-      if (temp < h->maxcode[k])
-         break;
-   if (k == 17) {
-      // error! code not found
-      j->code_bits -= 16;
-      return -1;
-   }
-
-   if (k > j->code_bits)
-      return -1;
-
-   // convert the huffman code to the symbol id
-   c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
-   if(c < 0 || c >= 256) // symbol id out of bounds!
-       return -1;
-   STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
-
-   // convert the id to a symbol
-   j->code_bits -= k;
-   j->code_buffer <<= k;
-   return h->values[c];
-}
-
-// bias[n] = (-1<<n) + 1
-static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
-
-// combined JPEG 'receive' and JPEG 'extend', since baseline
-// always extends everything it receives.
-stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
-{
-   unsigned int k;
-   int sgn;
-   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
-   if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
-
-   sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative)
-   k = stbi_lrot(j->code_buffer, n);
-   j->code_buffer = k & ~stbi__bmask[n];
-   k &= stbi__bmask[n];
-   j->code_bits -= n;
-   return k + (stbi__jbias[n] & (sgn - 1));
-}
-
-// get some unsigned bits
-stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
-{
-   unsigned int k;
-   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
-   if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing
-   k = stbi_lrot(j->code_buffer, n);
-   j->code_buffer = k & ~stbi__bmask[n];
-   k &= stbi__bmask[n];
-   j->code_bits -= n;
-   return k;
-}
-
-stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
-{
-   unsigned int k;
-   if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
-   if (j->code_bits < 1) return 0; // ran out of bits from stream, return 0s intead of continuing
-   k = j->code_buffer;
-   j->code_buffer <<= 1;
-   --j->code_bits;
-   return k & 0x80000000;
-}
-
-// given a value that's at position X in the zigzag stream,
-// where does it appear in the 8x8 matrix coded as row-major?
-static const stbi_uc stbi__jpeg_dezigzag[64+15] =
-{
-    0,  1,  8, 16,  9,  2,  3, 10,
-   17, 24, 32, 25, 18, 11,  4,  5,
-   12, 19, 26, 33, 40, 48, 41, 34,
-   27, 20, 13,  6,  7, 14, 21, 28,
-   35, 42, 49, 56, 57, 50, 43, 36,
-   29, 22, 15, 23, 30, 37, 44, 51,
-   58, 59, 52, 45, 38, 31, 39, 46,
-   53, 60, 61, 54, 47, 55, 62, 63,
-   // let corrupt input sample past end
-   63, 63, 63, 63, 63, 63, 63, 63,
-   63, 63, 63, 63, 63, 63, 63
-};
-
-// decode one 64-entry block--
-static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
-{
-   int diff,dc,k;
-   int t;
-
-   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
-   t = stbi__jpeg_huff_decode(j, hdc);
-   if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG");
-
-   // 0 all the ac values now so we can do it 32-bits at a time
-   memset(data,0,64*sizeof(data[0]));
-
-   diff = t ? stbi__extend_receive(j, t) : 0;
-   if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta","Corrupt JPEG");
-   dc = j->img_comp[b].dc_pred + diff;
-   j->img_comp[b].dc_pred = dc;
-   if (!stbi__mul2shorts_valid(dc, dequant[0])) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
-   data[0] = (short) (dc * dequant[0]);
-
-   // decode AC components, see JPEG spec
-   k = 1;
-   do {
-      unsigned int zig;
-      int c,r,s;
-      if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
-      c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
-      r = fac[c];
-      if (r) { // fast-AC path
-         k += (r >> 4) & 15; // run
-         s = r & 15; // combined length
-         if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
-         j->code_buffer <<= s;
-         j->code_bits -= s;
-         // decode into unzigzag'd location
-         zig = stbi__jpeg_dezigzag[k++];
-         data[zig] = (short) ((r >> 8) * dequant[zig]);
-      } else {
-         int rs = stbi__jpeg_huff_decode(j, hac);
-         if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
-         s = rs & 15;
-         r = rs >> 4;
-         if (s == 0) {
-            if (rs != 0xf0) break; // end block
-            k += 16;
-         } else {
-            k += r;
-            // decode into unzigzag'd location
-            zig = stbi__jpeg_dezigzag[k++];
-            data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
-         }
-      }
-   } while (k < 64);
-   return 1;
-}
-
-static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
-{
-   int diff,dc;
-   int t;
-   if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
-
-   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
-
-   if (j->succ_high == 0) {
-      // first scan for DC coefficient, must be first
-      memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
-      t = stbi__jpeg_huff_decode(j, hdc);
-      if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
-      diff = t ? stbi__extend_receive(j, t) : 0;
-
-      if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta", "Corrupt JPEG");
-      dc = j->img_comp[b].dc_pred + diff;
-      j->img_comp[b].dc_pred = dc;
-      if (!stbi__mul2shorts_valid(dc, 1 << j->succ_low)) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
-      data[0] = (short) (dc * (1 << j->succ_low));
-   } else {
-      // refinement scan for DC coefficient
-      if (stbi__jpeg_get_bit(j))
-         data[0] += (short) (1 << j->succ_low);
-   }
-   return 1;
-}
-
-// @OPTIMIZE: store non-zigzagged during the decode passes,
-// and only de-zigzag when dequantizing
-static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
-{
-   int k;
-   if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
-
-   if (j->succ_high == 0) {
-      int shift = j->succ_low;
-
-      if (j->eob_run) {
-         --j->eob_run;
-         return 1;
-      }
-
-      k = j->spec_start;
-      do {
-         unsigned int zig;
-         int c,r,s;
-         if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
-         c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
-         r = fac[c];
-         if (r) { // fast-AC path
-            k += (r >> 4) & 15; // run
-            s = r & 15; // combined length
-            if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available");
-            j->code_buffer <<= s;
-            j->code_bits -= s;
-            zig = stbi__jpeg_dezigzag[k++];
-            data[zig] = (short) ((r >> 8) * (1 << shift));
-         } else {
-            int rs = stbi__jpeg_huff_decode(j, hac);
-            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
-            s = rs & 15;
-            r = rs >> 4;
-            if (s == 0) {
-               if (r < 15) {
-                  j->eob_run = (1 << r);
-                  if (r)
-                     j->eob_run += stbi__jpeg_get_bits(j, r);
-                  --j->eob_run;
-                  break;
-               }
-               k += 16;
-            } else {
-               k += r;
-               zig = stbi__jpeg_dezigzag[k++];
-               data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift));
-            }
-         }
-      } while (k <= j->spec_end);
-   } else {
-      // refinement scan for these AC coefficients
-
-      short bit = (short) (1 << j->succ_low);
-
-      if (j->eob_run) {
-         --j->eob_run;
-         for (k = j->spec_start; k <= j->spec_end; ++k) {
-            short *p = &data[stbi__jpeg_dezigzag[k]];
-            if (*p != 0)
-               if (stbi__jpeg_get_bit(j))
-                  if ((*p & bit)==0) {
-                     if (*p > 0)
-                        *p += bit;
-                     else
-                        *p -= bit;
-                  }
-         }
-      } else {
-         k = j->spec_start;
-         do {
-            int r,s;
-            int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
-            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
-            s = rs & 15;
-            r = rs >> 4;
-            if (s == 0) {
-               if (r < 15) {
-                  j->eob_run = (1 << r) - 1;
-                  if (r)
-                     j->eob_run += stbi__jpeg_get_bits(j, r);
-                  r = 64; // force end of block
-               } else {
-                  // r=15 s=0 should write 16 0s, so we just do
-                  // a run of 15 0s and then write s (which is 0),
-                  // so we don't have to do anything special here
-               }
-            } else {
-               if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
-               // sign bit
-               if (stbi__jpeg_get_bit(j))
-                  s = bit;
-               else
-                  s = -bit;
-            }
-
-            // advance by r
-            while (k <= j->spec_end) {
-               short *p = &data[stbi__jpeg_dezigzag[k++]];
-               if (*p != 0) {
-                  if (stbi__jpeg_get_bit(j))
-                     if ((*p & bit)==0) {
-                        if (*p > 0)
-                           *p += bit;
-                        else
-                           *p -= bit;
-                     }
-               } else {
-                  if (r == 0) {
-                     *p = (short) s;
-                     break;
-                  }
-                  --r;
-               }
-            }
-         } while (k <= j->spec_end);
-      }
-   }
-   return 1;
-}
-
-// take a -128..127 value and stbi__clamp it and convert to 0..255
-stbi_inline static stbi_uc stbi__clamp(int x)
-{
-   // trick to use a single test to catch both cases
-   if ((unsigned int) x > 255) {
-      if (x < 0) return 0;
-      if (x > 255) return 255;
-   }
-   return (stbi_uc) x;
-}
-
-#define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
-#define stbi__fsh(x)  ((x) * 4096)
-
-// derived from jidctint -- DCT_ISLOW
-#define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
-   int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
-   p2 = s2;                                    \
-   p3 = s6;                                    \
-   p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
-   t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
-   t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
-   p2 = s0;                                    \
-   p3 = s4;                                    \
-   t0 = stbi__fsh(p2+p3);                      \
-   t1 = stbi__fsh(p2-p3);                      \
-   x0 = t0+t3;                                 \
-   x3 = t0-t3;                                 \
-   x1 = t1+t2;                                 \
-   x2 = t1-t2;                                 \
-   t0 = s7;                                    \
-   t1 = s5;                                    \
-   t2 = s3;                                    \
-   t3 = s1;                                    \
-   p3 = t0+t2;                                 \
-   p4 = t1+t3;                                 \
-   p1 = t0+t3;                                 \
-   p2 = t1+t2;                                 \
-   p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
-   t0 = t0*stbi__f2f( 0.298631336f);           \
-   t1 = t1*stbi__f2f( 2.053119869f);           \
-   t2 = t2*stbi__f2f( 3.072711026f);           \
-   t3 = t3*stbi__f2f( 1.501321110f);           \
-   p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
-   p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
-   p3 = p3*stbi__f2f(-1.961570560f);           \
-   p4 = p4*stbi__f2f(-0.390180644f);           \
-   t3 += p1+p4;                                \
-   t2 += p2+p3;                                \
-   t1 += p2+p4;                                \
-   t0 += p1+p3;
-
-static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
-{
-   int i,val[64],*v=val;
-   stbi_uc *o;
-   short *d = data;
-
-   // columns
-   for (i=0; i < 8; ++i,++d, ++v) {
-      // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
-      if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
-           && d[40]==0 && d[48]==0 && d[56]==0) {
-         //    no shortcut                 0     seconds
-         //    (1|2|3|4|5|6|7)==0          0     seconds
-         //    all separate               -0.047 seconds
-         //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
-         int dcterm = d[0]*4;
-         v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
-      } else {
-         STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
-         // constants scaled things up by 1<<12; let's bring them back
-         // down, but keep 2 extra bits of precision
-         x0 += 512; x1 += 512; x2 += 512; x3 += 512;
-         v[ 0] = (x0+t3) >> 10;
-         v[56] = (x0-t3) >> 10;
-         v[ 8] = (x1+t2) >> 10;
-         v[48] = (x1-t2) >> 10;
-         v[16] = (x2+t1) >> 10;
-         v[40] = (x2-t1) >> 10;
-         v[24] = (x3+t0) >> 10;
-         v[32] = (x3-t0) >> 10;
-      }
-   }
-
-   for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
-      // no fast case since the first 1D IDCT spread components out
-      STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
-      // constants scaled things up by 1<<12, plus we had 1<<2 from first
-      // loop, plus horizontal and vertical each scale by sqrt(8) so together
-      // we've got an extra 1<<3, so 1<<17 total we need to remove.
-      // so we want to round that, which means adding 0.5 * 1<<17,
-      // aka 65536. Also, we'll end up with -128 to 127 that we want
-      // to encode as 0..255 by adding 128, so we'll add that before the shift
-      x0 += 65536 + (128<<17);
-      x1 += 65536 + (128<<17);
-      x2 += 65536 + (128<<17);
-      x3 += 65536 + (128<<17);
-      // tried computing the shifts into temps, or'ing the temps to see
-      // if any were out of range, but that was slower
-      o[0] = stbi__clamp((x0+t3) >> 17);
-      o[7] = stbi__clamp((x0-t3) >> 17);
-      o[1] = stbi__clamp((x1+t2) >> 17);
-      o[6] = stbi__clamp((x1-t2) >> 17);
-      o[2] = stbi__clamp((x2+t1) >> 17);
-      o[5] = stbi__clamp((x2-t1) >> 17);
-      o[3] = stbi__clamp((x3+t0) >> 17);
-      o[4] = stbi__clamp((x3-t0) >> 17);
-   }
-}
-
-#ifdef STBI_SSE2
-// sse2 integer IDCT. not the fastest possible implementation but it
-// produces bit-identical results to the generic C version so it's
-// fully "transparent".
-static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
-{
-   // This is constructed to match our regular (generic) integer IDCT exactly.
-   __m128i row0, row1, row2, row3, row4, row5, row6, row7;
-   __m128i tmp;
-
-   // dot product constant: even elems=x, odd elems=y
-   #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
-
-   // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
-   // out(1) = c1[even]*x + c1[odd]*y
-   #define dct_rot(out0,out1, x,y,c0,c1) \
-      __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
-      __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
-      __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
-      __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
-      __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
-      __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
-
-   // out = in << 12  (in 16-bit, out 32-bit)
-   #define dct_widen(out, in) \
-      __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
-      __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
-
-   // wide add
-   #define dct_wadd(out, a, b) \
-      __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
-      __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
-
-   // wide sub
-   #define dct_wsub(out, a, b) \
-      __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
-      __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
-
-   // butterfly a/b, add bias, then shift by "s" and pack
-   #define dct_bfly32o(out0, out1, a,b,bias,s) \
-      { \
-         __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
-         __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
-         dct_wadd(sum, abiased, b); \
-         dct_wsub(dif, abiased, b); \
-         out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
-         out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
-      }
-
-   // 8-bit interleave step (for transposes)
-   #define dct_interleave8(a, b) \
-      tmp = a; \
-      a = _mm_unpacklo_epi8(a, b); \
-      b = _mm_unpackhi_epi8(tmp, b)
-
-   // 16-bit interleave step (for transposes)
-   #define dct_interleave16(a, b) \
-      tmp = a; \
-      a = _mm_unpacklo_epi16(a, b); \
-      b = _mm_unpackhi_epi16(tmp, b)
-
-   #define dct_pass(bias,shift) \
-      { \
-         /* even part */ \
-         dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
-         __m128i sum04 = _mm_add_epi16(row0, row4); \
-         __m128i dif04 = _mm_sub_epi16(row0, row4); \
-         dct_widen(t0e, sum04); \
-         dct_widen(t1e, dif04); \
-         dct_wadd(x0, t0e, t3e); \
-         dct_wsub(x3, t0e, t3e); \
-         dct_wadd(x1, t1e, t2e); \
-         dct_wsub(x2, t1e, t2e); \
-         /* odd part */ \
-         dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
-         dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
-         __m128i sum17 = _mm_add_epi16(row1, row7); \
-         __m128i sum35 = _mm_add_epi16(row3, row5); \
-         dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
-         dct_wadd(x4, y0o, y4o); \
-         dct_wadd(x5, y1o, y5o); \
-         dct_wadd(x6, y2o, y5o); \
-         dct_wadd(x7, y3o, y4o); \
-         dct_bfly32o(row0,row7, x0,x7,bias,shift); \
-         dct_bfly32o(row1,row6, x1,x6,bias,shift); \
-         dct_bfly32o(row2,row5, x2,x5,bias,shift); \
-         dct_bfly32o(row3,row4, x3,x4,bias,shift); \
-      }
-
-   __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
-   __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
-   __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
-   __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
-   __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
-   __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
-   __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
-   __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
-
-   // rounding biases in column/row passes, see stbi__idct_block for explanation.
-   __m128i bias_0 = _mm_set1_epi32(512);
-   __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
-
-   // load
-   row0 = _mm_load_si128((const __m128i *) (data + 0*8));
-   row1 = _mm_load_si128((const __m128i *) (data + 1*8));
-   row2 = _mm_load_si128((const __m128i *) (data + 2*8));
-   row3 = _mm_load_si128((const __m128i *) (data + 3*8));
-   row4 = _mm_load_si128((const __m128i *) (data + 4*8));
-   row5 = _mm_load_si128((const __m128i *) (data + 5*8));
-   row6 = _mm_load_si128((const __m128i *) (data + 6*8));
-   row7 = _mm_load_si128((const __m128i *) (data + 7*8));
-
-   // column pass
-   dct_pass(bias_0, 10);
-
-   {
-      // 16bit 8x8 transpose pass 1
-      dct_interleave16(row0, row4);
-      dct_interleave16(row1, row5);
-      dct_interleave16(row2, row6);
-      dct_interleave16(row3, row7);
-
-      // transpose pass 2
-      dct_interleave16(row0, row2);
-      dct_interleave16(row1, row3);
-      dct_interleave16(row4, row6);
-      dct_interleave16(row5, row7);
-
-      // transpose pass 3
-      dct_interleave16(row0, row1);
-      dct_interleave16(row2, row3);
-      dct_interleave16(row4, row5);
-      dct_interleave16(row6, row7);
-   }
-
-   // row pass
-   dct_pass(bias_1, 17);
-
-   {
-      // pack
-      __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
-      __m128i p1 = _mm_packus_epi16(row2, row3);
-      __m128i p2 = _mm_packus_epi16(row4, row5);
-      __m128i p3 = _mm_packus_epi16(row6, row7);
-
-      // 8bit 8x8 transpose pass 1
-      dct_interleave8(p0, p2); // a0e0a1e1...
-      dct_interleave8(p1, p3); // c0g0c1g1...
-
-      // transpose pass 2
-      dct_interleave8(p0, p1); // a0c0e0g0...
-      dct_interleave8(p2, p3); // b0d0f0h0...
-
-      // transpose pass 3
-      dct_interleave8(p0, p2); // a0b0c0d0...
-      dct_interleave8(p1, p3); // a4b4c4d4...
-
-      // store
-      _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
-      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
-      _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
-      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
-      _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
-      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
-      _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
-      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
-   }
-
-#undef dct_const
-#undef dct_rot
-#undef dct_widen
-#undef dct_wadd
-#undef dct_wsub
-#undef dct_bfly32o
-#undef dct_interleave8
-#undef dct_interleave16
-#undef dct_pass
-}
-
-#endif // STBI_SSE2
-
-#ifdef STBI_NEON
-
-// NEON integer IDCT. should produce bit-identical
-// results to the generic C version.
-static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
-{
-   int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
-
-   int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
-   int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
-   int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
-   int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
-   int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
-   int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
-   int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
-   int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
-   int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
-   int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
-   int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
-   int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
-
-#define dct_long_mul(out, inq, coeff) \
-   int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
-   int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
-
-#define dct_long_mac(out, acc, inq, coeff) \
-   int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
-   int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
-
-#define dct_widen(out, inq) \
-   int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
-   int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
-
-// wide add
-#define dct_wadd(out, a, b) \
-   int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
-   int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
-
-// wide sub
-#define dct_wsub(out, a, b) \
-   int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
-   int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
-
-// butterfly a/b, then shift using "shiftop" by "s" and pack
-#define dct_bfly32o(out0,out1, a,b,shiftop,s) \
-   { \
-      dct_wadd(sum, a, b); \
-      dct_wsub(dif, a, b); \
-      out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
-      out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
-   }
-
-#define dct_pass(shiftop, shift) \
-   { \
-      /* even part */ \
-      int16x8_t sum26 = vaddq_s16(row2, row6); \
-      dct_long_mul(p1e, sum26, rot0_0); \
-      dct_long_mac(t2e, p1e, row6, rot0_1); \
-      dct_long_mac(t3e, p1e, row2, rot0_2); \
-      int16x8_t sum04 = vaddq_s16(row0, row4); \
-      int16x8_t dif04 = vsubq_s16(row0, row4); \
-      dct_widen(t0e, sum04); \
-      dct_widen(t1e, dif04); \
-      dct_wadd(x0, t0e, t3e); \
-      dct_wsub(x3, t0e, t3e); \
-      dct_wadd(x1, t1e, t2e); \
-      dct_wsub(x2, t1e, t2e); \
-      /* odd part */ \
-      int16x8_t sum15 = vaddq_s16(row1, row5); \
-      int16x8_t sum17 = vaddq_s16(row1, row7); \
-      int16x8_t sum35 = vaddq_s16(row3, row5); \
-      int16x8_t sum37 = vaddq_s16(row3, row7); \
-      int16x8_t sumodd = vaddq_s16(sum17, sum35); \
-      dct_long_mul(p5o, sumodd, rot1_0); \
-      dct_long_mac(p1o, p5o, sum17, rot1_1); \
-      dct_long_mac(p2o, p5o, sum35, rot1_2); \
-      dct_long_mul(p3o, sum37, rot2_0); \
-      dct_long_mul(p4o, sum15, rot2_1); \
-      dct_wadd(sump13o, p1o, p3o); \
-      dct_wadd(sump24o, p2o, p4o); \
-      dct_wadd(sump23o, p2o, p3o); \
-      dct_wadd(sump14o, p1o, p4o); \
-      dct_long_mac(x4, sump13o, row7, rot3_0); \
-      dct_long_mac(x5, sump24o, row5, rot3_1); \
-      dct_long_mac(x6, sump23o, row3, rot3_2); \
-      dct_long_mac(x7, sump14o, row1, rot3_3); \
-      dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
-      dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
-      dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
-      dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
-   }
-
-   // load
-   row0 = vld1q_s16(data + 0*8);
-   row1 = vld1q_s16(data + 1*8);
-   row2 = vld1q_s16(data + 2*8);
-   row3 = vld1q_s16(data + 3*8);
-   row4 = vld1q_s16(data + 4*8);
-   row5 = vld1q_s16(data + 5*8);
-   row6 = vld1q_s16(data + 6*8);
-   row7 = vld1q_s16(data + 7*8);
-
-   // add DC bias
-   row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
-
-   // column pass
-   dct_pass(vrshrn_n_s32, 10);
-
-   // 16bit 8x8 transpose
-   {
-// these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
-// whether compilers actually get this is another story, sadly.
-#define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
-#define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
-#define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
-
-      // pass 1
-      dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
-      dct_trn16(row2, row3);
-      dct_trn16(row4, row5);
-      dct_trn16(row6, row7);
-
-      // pass 2
-      dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
-      dct_trn32(row1, row3);
-      dct_trn32(row4, row6);
-      dct_trn32(row5, row7);
-
-      // pass 3
-      dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
-      dct_trn64(row1, row5);
-      dct_trn64(row2, row6);
-      dct_trn64(row3, row7);
-
-#undef dct_trn16
-#undef dct_trn32
-#undef dct_trn64
-   }
-
-   // row pass
-   // vrshrn_n_s32 only supports shifts up to 16, we need
-   // 17. so do a non-rounding shift of 16 first then follow
-   // up with a rounding shift by 1.
-   dct_pass(vshrn_n_s32, 16);
-
-   {
-      // pack and round
-      uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
-      uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
-      uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
-      uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
-      uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
-      uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
-      uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
-      uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
-
-      // again, these can translate into one instruction, but often don't.
-#define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
-#define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
-#define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
-
-      // sadly can't use interleaved stores here since we only write
-      // 8 bytes to each scan line!
-
-      // 8x8 8-bit transpose pass 1
-      dct_trn8_8(p0, p1);
-      dct_trn8_8(p2, p3);
-      dct_trn8_8(p4, p5);
-      dct_trn8_8(p6, p7);
-
-      // pass 2
-      dct_trn8_16(p0, p2);
-      dct_trn8_16(p1, p3);
-      dct_trn8_16(p4, p6);
-      dct_trn8_16(p5, p7);
-
-      // pass 3
-      dct_trn8_32(p0, p4);
-      dct_trn8_32(p1, p5);
-      dct_trn8_32(p2, p6);
-      dct_trn8_32(p3, p7);
-
-      // store
-      vst1_u8(out, p0); out += out_stride;
-      vst1_u8(out, p1); out += out_stride;
-      vst1_u8(out, p2); out += out_stride;
-      vst1_u8(out, p3); out += out_stride;
-      vst1_u8(out, p4); out += out_stride;
-      vst1_u8(out, p5); out += out_stride;
-      vst1_u8(out, p6); out += out_stride;
-      vst1_u8(out, p7);
-
-#undef dct_trn8_8
-#undef dct_trn8_16
-#undef dct_trn8_32
-   }
-
-#undef dct_long_mul
-#undef dct_long_mac
-#undef dct_widen
-#undef dct_wadd
-#undef dct_wsub
-#undef dct_bfly32o
-#undef dct_pass
-}
-
-#endif // STBI_NEON
-
-#define STBI__MARKER_none  0xff
-// if there's a pending marker from the entropy stream, return that
-// otherwise, fetch from the stream and get a marker. if there's no
-// marker, return 0xff, which is never a valid marker value
-static stbi_uc stbi__get_marker(stbi__jpeg *j)
-{
-   stbi_uc x;
-   if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
-   x = stbi__get8(j->s);
-   if (x != 0xff) return STBI__MARKER_none;
-   while (x == 0xff)
-      x = stbi__get8(j->s); // consume repeated 0xff fill bytes
-   return x;
-}
-
-// in each scan, we'll have scan_n components, and the order
-// of the components is specified by order[]
-#define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
-
-// after a restart interval, stbi__jpeg_reset the entropy decoder and
-// the dc prediction
-static void stbi__jpeg_reset(stbi__jpeg *j)
-{
-   j->code_bits = 0;
-   j->code_buffer = 0;
-   j->nomore = 0;
-   j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
-   j->marker = STBI__MARKER_none;
-   j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
-   j->eob_run = 0;
-   // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
-   // since we don't even allow 1<<30 pixels
-}
-
-static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
-{
-   stbi__jpeg_reset(z);
-   if (!z->progressive) {
-      if (z->scan_n == 1) {
-         int i,j;
-         STBI_SIMD_ALIGN(short, data[64]);
-         int n = z->order[0];
-         // non-interleaved data, we just need to process one block at a time,
-         // in trivial scanline order
-         // number of blocks to do just depends on how many actual "pixels" this
-         // component has, independent of interleaved MCU blocking and such
-         int w = (z->img_comp[n].x+7) >> 3;
-         int h = (z->img_comp[n].y+7) >> 3;
-         for (j=0; j < h; ++j) {
-            for (i=0; i < w; ++i) {
-               int ha = z->img_comp[n].ha;
-               if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
-               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
-               // every data block is an MCU, so countdown the restart interval
-               if (--z->todo <= 0) {
-                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
-                  // if it's NOT a restart, then just bail, so we get corrupt data
-                  // rather than no data
-                  if (!STBI__RESTART(z->marker)) return 1;
-                  stbi__jpeg_reset(z);
-               }
-            }
-         }
-         return 1;
-      } else { // interleaved
-         int i,j,k,x,y;
-         STBI_SIMD_ALIGN(short, data[64]);
-         for (j=0; j < z->img_mcu_y; ++j) {
-            for (i=0; i < z->img_mcu_x; ++i) {
-               // scan an interleaved mcu... process scan_n components in order
-               for (k=0; k < z->scan_n; ++k) {
-                  int n = z->order[k];
-                  // scan out an mcu's worth of this component; that's just determined
-                  // by the basic H and V specified for the component
-                  for (y=0; y < z->img_comp[n].v; ++y) {
-                     for (x=0; x < z->img_comp[n].h; ++x) {
-                        int x2 = (i*z->img_comp[n].h + x)*8;
-                        int y2 = (j*z->img_comp[n].v + y)*8;
-                        int ha = z->img_comp[n].ha;
-                        if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
-                        z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
-                     }
-                  }
-               }
-               // after all interleaved components, that's an interleaved MCU,
-               // so now count down the restart interval
-               if (--z->todo <= 0) {
-                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
-                  if (!STBI__RESTART(z->marker)) return 1;
-                  stbi__jpeg_reset(z);
-               }
-            }
-         }
-         return 1;
-      }
-   } else {
-      if (z->scan_n == 1) {
-         int i,j;
-         int n = z->order[0];
-         // non-interleaved data, we just need to process one block at a time,
-         // in trivial scanline order
-         // number of blocks to do just depends on how many actual "pixels" this
-         // component has, independent of interleaved MCU blocking and such
-         int w = (z->img_comp[n].x+7) >> 3;
-         int h = (z->img_comp[n].y+7) >> 3;
-         for (j=0; j < h; ++j) {
-            for (i=0; i < w; ++i) {
-               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
-               if (z->spec_start == 0) {
-                  if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
-                     return 0;
-               } else {
-                  int ha = z->img_comp[n].ha;
-                  if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
-                     return 0;
-               }
-               // every data block is an MCU, so countdown the restart interval
-               if (--z->todo <= 0) {
-                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
-                  if (!STBI__RESTART(z->marker)) return 1;
-                  stbi__jpeg_reset(z);
-               }
-            }
-         }
-         return 1;
-      } else { // interleaved
-         int i,j,k,x,y;
-         for (j=0; j < z->img_mcu_y; ++j) {
-            for (i=0; i < z->img_mcu_x; ++i) {
-               // scan an interleaved mcu... process scan_n components in order
-               for (k=0; k < z->scan_n; ++k) {
-                  int n = z->order[k];
-                  // scan out an mcu's worth of this component; that's just determined
-                  // by the basic H and V specified for the component
-                  for (y=0; y < z->img_comp[n].v; ++y) {
-                     for (x=0; x < z->img_comp[n].h; ++x) {
-                        int x2 = (i*z->img_comp[n].h + x);
-                        int y2 = (j*z->img_comp[n].v + y);
-                        short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
-                        if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
-                           return 0;
-                     }
-                  }
-               }
-               // after all interleaved components, that's an interleaved MCU,
-               // so now count down the restart interval
-               if (--z->todo <= 0) {
-                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
-                  if (!STBI__RESTART(z->marker)) return 1;
-                  stbi__jpeg_reset(z);
-               }
-            }
-         }
-         return 1;
-      }
-   }
-}
-
-static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
-{
-   int i;
-   for (i=0; i < 64; ++i)
-      data[i] *= dequant[i];
-}
-
-static void stbi__jpeg_finish(stbi__jpeg *z)
-{
-   if (z->progressive) {
-      // dequantize and idct the data
-      int i,j,n;
-      for (n=0; n < z->s->img_n; ++n) {
-         int w = (z->img_comp[n].x+7) >> 3;
-         int h = (z->img_comp[n].y+7) >> 3;
-         for (j=0; j < h; ++j) {
-            for (i=0; i < w; ++i) {
-               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
-               stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
-               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
-            }
-         }
-      }
-   }
-}
-
-static int stbi__process_marker(stbi__jpeg *z, int m)
-{
-   int L;
-   switch (m) {
-      case STBI__MARKER_none: // no marker found
-         return stbi__err("expected marker","Corrupt JPEG");
-
-      case 0xDD: // DRI - specify restart interval
-         if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
-         z->restart_interval = stbi__get16be(z->s);
-         return 1;
-
-      case 0xDB: // DQT - define quantization table
-         L = stbi__get16be(z->s)-2;
-         while (L > 0) {
-            int q = stbi__get8(z->s);
-            int p = q >> 4, sixteen = (p != 0);
-            int t = q & 15,i;
-            if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
-            if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
-
-            for (i=0; i < 64; ++i)
-               z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
-            L -= (sixteen ? 129 : 65);
-         }
-         return L==0;
-
-      case 0xC4: // DHT - define huffman table
-         L = stbi__get16be(z->s)-2;
-         while (L > 0) {
-            stbi_uc *v;
-            int sizes[16],i,n=0;
-            int q = stbi__get8(z->s);
-            int tc = q >> 4;
-            int th = q & 15;
-            if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
-            for (i=0; i < 16; ++i) {
-               sizes[i] = stbi__get8(z->s);
-               n += sizes[i];
-            }
-            if(n > 256) return stbi__err("bad DHT header","Corrupt JPEG"); // Loop over i < n would write past end of values!
-            L -= 17;
-            if (tc == 0) {
-               if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
-               v = z->huff_dc[th].values;
-            } else {
-               if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
-               v = z->huff_ac[th].values;
-            }
-            for (i=0; i < n; ++i)
-               v[i] = stbi__get8(z->s);
-            if (tc != 0)
-               stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
-            L -= n;
-         }
-         return L==0;
-   }
-
-   // check for comment block or APP blocks
-   if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
-      L = stbi__get16be(z->s);
-      if (L < 2) {
-         if (m == 0xFE)
-            return stbi__err("bad COM len","Corrupt JPEG");
-         else
-            return stbi__err("bad APP len","Corrupt JPEG");
-      }
-      L -= 2;
-
-      if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
-         static const unsigned char tag[5] = {'J','F','I','F','\0'};
-         int ok = 1;
-         int i;
-         for (i=0; i < 5; ++i)
-            if (stbi__get8(z->s) != tag[i])
-               ok = 0;
-         L -= 5;
-         if (ok)
-            z->jfif = 1;
-      } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
-         static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
-         int ok = 1;
-         int i;
-         for (i=0; i < 6; ++i)
-            if (stbi__get8(z->s) != tag[i])
-               ok = 0;
-         L -= 6;
-         if (ok) {
-            stbi__get8(z->s); // version
-            stbi__get16be(z->s); // flags0
-            stbi__get16be(z->s); // flags1
-            z->app14_color_transform = stbi__get8(z->s); // color transform
-            L -= 6;
-         }
-      }
-
-      stbi__skip(z->s, L);
-      return 1;
-   }
-
-   return stbi__err("unknown marker","Corrupt JPEG");
-}
-
-// after we see SOS
-static int stbi__process_scan_header(stbi__jpeg *z)
-{
-   int i;
-   int Ls = stbi__get16be(z->s);
-   z->scan_n = stbi__get8(z->s);
-   if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
-   if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
-   for (i=0; i < z->scan_n; ++i) {
-      int id = stbi__get8(z->s), which;
-      int q = stbi__get8(z->s);
-      for (which = 0; which < z->s->img_n; ++which)
-         if (z->img_comp[which].id == id)
-            break;
-      if (which == z->s->img_n) return 0; // no match
-      z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
-      z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
-      z->order[i] = which;
-   }
-
-   {
-      int aa;
-      z->spec_start = stbi__get8(z->s);
-      z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
-      aa = stbi__get8(z->s);
-      z->succ_high = (aa >> 4);
-      z->succ_low  = (aa & 15);
-      if (z->progressive) {
-         if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
-            return stbi__err("bad SOS", "Corrupt JPEG");
-      } else {
-         if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
-         if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
-         z->spec_end = 63;
-      }
-   }
-
-   return 1;
-}
-
-static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
-{
-   int i;
-   for (i=0; i < ncomp; ++i) {
-      if (z->img_comp[i].raw_data) {
-         STBI_FREE(z->img_comp[i].raw_data);
-         z->img_comp[i].raw_data = NULL;
-         z->img_comp[i].data = NULL;
-      }
-      if (z->img_comp[i].raw_coeff) {
-         STBI_FREE(z->img_comp[i].raw_coeff);
-         z->img_comp[i].raw_coeff = 0;
-         z->img_comp[i].coeff = 0;
-      }
-      if (z->img_comp[i].linebuf) {
-         STBI_FREE(z->img_comp[i].linebuf);
-         z->img_comp[i].linebuf = NULL;
-      }
-   }
-   return why;
-}
-
-static int stbi__process_frame_header(stbi__jpeg *z, int scan)
-{
-   stbi__context *s = z->s;
-   int Lf,p,i,q, h_max=1,v_max=1,c;
-   Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
-   p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
-   s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
-   s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
-   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
-   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
-   c = stbi__get8(s);
-   if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
-   s->img_n = c;
-   for (i=0; i < c; ++i) {
-      z->img_comp[i].data = NULL;
-      z->img_comp[i].linebuf = NULL;
-   }
-
-   if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
-
-   z->rgb = 0;
-   for (i=0; i < s->img_n; ++i) {
-      static const unsigned char rgb[3] = { 'R', 'G', 'B' };
-      z->img_comp[i].id = stbi__get8(s);
-      if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
-         ++z->rgb;
-      q = stbi__get8(s);
-      z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
-      z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
-      z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
-   }
-
-   if (scan != STBI__SCAN_load) return 1;
-
-   if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
-
-   for (i=0; i < s->img_n; ++i) {
-      if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
-      if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
-   }
-
-   // check that plane subsampling factors are integer ratios; our resamplers can't deal with fractional ratios
-   // and I've never seen a non-corrupted JPEG file actually use them
-   for (i=0; i < s->img_n; ++i) {
-      if (h_max % z->img_comp[i].h != 0) return stbi__err("bad H","Corrupt JPEG");
-      if (v_max % z->img_comp[i].v != 0) return stbi__err("bad V","Corrupt JPEG");
-   }
-
-   // compute interleaved mcu info
-   z->img_h_max = h_max;
-   z->img_v_max = v_max;
-   z->img_mcu_w = h_max * 8;
-   z->img_mcu_h = v_max * 8;
-   // these sizes can't be more than 17 bits
-   z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
-   z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
-
-   for (i=0; i < s->img_n; ++i) {
-      // number of effective pixels (e.g. for non-interleaved MCU)
-      z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
-      z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
-      // to simplify generation, we'll allocate enough memory to decode
-      // the bogus oversized data from using interleaved MCUs and their
-      // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
-      // discard the extra data until colorspace conversion
-      //
-      // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
-      // so these muls can't overflow with 32-bit ints (which we require)
-      z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
-      z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
-      z->img_comp[i].coeff = 0;
-      z->img_comp[i].raw_coeff = 0;
-      z->img_comp[i].linebuf = NULL;
-      z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
-      if (z->img_comp[i].raw_data == NULL)
-         return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
-      // align blocks for idct using mmx/sse
-      z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
-      if (z->progressive) {
-         // w2, h2 are multiples of 8 (see above)
-         z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
-         z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
-         z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
-         if (z->img_comp[i].raw_coeff == NULL)
-            return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
-         z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
-      }
-   }
-
-   return 1;
-}
-
-// use comparisons since in some cases we handle more than one case (e.g. SOF)
-#define stbi__DNL(x)         ((x) == 0xdc)
-#define stbi__SOI(x)         ((x) == 0xd8)
-#define stbi__EOI(x)         ((x) == 0xd9)
-#define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
-#define stbi__SOS(x)         ((x) == 0xda)
-
-#define stbi__SOF_progressive(x)   ((x) == 0xc2)
-
-static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
-{
-   int m;
-   z->jfif = 0;
-   z->app14_color_transform = -1; // valid values are 0,1,2
-   z->marker = STBI__MARKER_none; // initialize cached marker to empty
-   m = stbi__get_marker(z);
-   if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
-   if (scan == STBI__SCAN_type) return 1;
-   m = stbi__get_marker(z);
-   while (!stbi__SOF(m)) {
-      if (!stbi__process_marker(z,m)) return 0;
-      m = stbi__get_marker(z);
-      while (m == STBI__MARKER_none) {
-         // some files have extra padding after their blocks, so ok, we'll scan
-         if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
-         m = stbi__get_marker(z);
-      }
-   }
-   z->progressive = stbi__SOF_progressive(m);
-   if (!stbi__process_frame_header(z, scan)) return 0;
-   return 1;
-}
-
-static stbi_uc stbi__skip_jpeg_junk_at_end(stbi__jpeg *j)
-{
-   // some JPEGs have junk at end, skip over it but if we find what looks
-   // like a valid marker, resume there
-   while (!stbi__at_eof(j->s)) {
-      stbi_uc x = stbi__get8(j->s);
-      while (x == 0xff) { // might be a marker
-         if (stbi__at_eof(j->s)) return STBI__MARKER_none;
-         x = stbi__get8(j->s);
-         if (x != 0x00 && x != 0xff) {
-            // not a stuffed zero or lead-in to another marker, looks
-            // like an actual marker, return it
-            return x;
-         }
-         // stuffed zero has x=0 now which ends the loop, meaning we go
-         // back to regular scan loop.
-         // repeated 0xff keeps trying to read the next byte of the marker.
-      }
-   }
-   return STBI__MARKER_none;
-}
-
-// decode image to YCbCr format
-static int stbi__decode_jpeg_image(stbi__jpeg *j)
-{
-   int m;
-   for (m = 0; m < 4; m++) {
-      j->img_comp[m].raw_data = NULL;
-      j->img_comp[m].raw_coeff = NULL;
-   }
-   j->restart_interval = 0;
-   if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
-   m = stbi__get_marker(j);
-   while (!stbi__EOI(m)) {
-      if (stbi__SOS(m)) {
-         if (!stbi__process_scan_header(j)) return 0;
-         if (!stbi__parse_entropy_coded_data(j)) return 0;
-         if (j->marker == STBI__MARKER_none ) {
-         j->marker = stbi__skip_jpeg_junk_at_end(j);
-            // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
-         }
-         m = stbi__get_marker(j);
-         if (STBI__RESTART(m))
-            m = stbi__get_marker(j);
-      } else if (stbi__DNL(m)) {
-         int Ld = stbi__get16be(j->s);
-         stbi__uint32 NL = stbi__get16be(j->s);
-         if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
-         if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
-         m = stbi__get_marker(j);
-      } else {
-         if (!stbi__process_marker(j, m)) return 1;
-         m = stbi__get_marker(j);
-      }
-   }
-   if (j->progressive)
-      stbi__jpeg_finish(j);
-   return 1;
-}
-
-// static jfif-centered resampling (across block boundaries)
-
-typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
-                                    int w, int hs);
-
-#define stbi__div4(x) ((stbi_uc) ((x) >> 2))
-
-static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
-{
-   STBI_NOTUSED(out);
-   STBI_NOTUSED(in_far);
-   STBI_NOTUSED(w);
-   STBI_NOTUSED(hs);
-   return in_near;
-}
-
-static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
-{
-   // need to generate two samples vertically for every one in input
-   int i;
-   STBI_NOTUSED(hs);
-   for (i=0; i < w; ++i)
-      out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
-   return out;
-}
-
-static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
-{
-   // need to generate two samples horizontally for every one in input
-   int i;
-   stbi_uc *input = in_near;
-
-   if (w == 1) {
-      // if only one sample, can't do any interpolation
-      out[0] = out[1] = input[0];
-      return out;
-   }
-
-   out[0] = input[0];
-   out[1] = stbi__div4(input[0]*3 + input[1] + 2);
-   for (i=1; i < w-1; ++i) {
-      int n = 3*input[i]+2;
-      out[i*2+0] = stbi__div4(n+input[i-1]);
-      out[i*2+1] = stbi__div4(n+input[i+1]);
-   }
-   out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
-   out[i*2+1] = input[w-1];
-
-   STBI_NOTUSED(in_far);
-   STBI_NOTUSED(hs);
-
-   return out;
-}
-
-#define stbi__div16(x) ((stbi_uc) ((x) >> 4))
-
-static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
-{
-   // need to generate 2x2 samples for every one in input
-   int i,t0,t1;
-   if (w == 1) {
-      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
-      return out;
-   }
-
-   t1 = 3*in_near[0] + in_far[0];
-   out[0] = stbi__div4(t1+2);
-   for (i=1; i < w; ++i) {
-      t0 = t1;
-      t1 = 3*in_near[i]+in_far[i];
-      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
-      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
-   }
-   out[w*2-1] = stbi__div4(t1+2);
-
-   STBI_NOTUSED(hs);
-
-   return out;
-}
-
-#if defined(STBI_SSE2) || defined(STBI_NEON)
-static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
-{
-   // need to generate 2x2 samples for every one in input
-   int i=0,t0,t1;
-
-   if (w == 1) {
-      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
-      return out;
-   }
-
-   t1 = 3*in_near[0] + in_far[0];
-   // process groups of 8 pixels for as long as we can.
-   // note we can't handle the last pixel in a row in this loop
-   // because we need to handle the filter boundary conditions.
-   for (; i < ((w-1) & ~7); i += 8) {
-#if defined(STBI_SSE2)
-      // load and perform the vertical filtering pass
-      // this uses 3*x + y = 4*x + (y - x)
-      __m128i zero  = _mm_setzero_si128();
-      __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
-      __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
-      __m128i farw  = _mm_unpacklo_epi8(farb, zero);
-      __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
-      __m128i diff  = _mm_sub_epi16(farw, nearw);
-      __m128i nears = _mm_slli_epi16(nearw, 2);
-      __m128i curr  = _mm_add_epi16(nears, diff); // current row
-
-      // horizontal filter works the same based on shifted vers of current
-      // row. "prev" is current row shifted right by 1 pixel; we need to
-      // insert the previous pixel value (from t1).
-      // "next" is current row shifted left by 1 pixel, with first pixel
-      // of next block of 8 pixels added in.
-      __m128i prv0 = _mm_slli_si128(curr, 2);
-      __m128i nxt0 = _mm_srli_si128(curr, 2);
-      __m128i prev = _mm_insert_epi16(prv0, t1, 0);
-      __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
-
-      // horizontal filter, polyphase implementation since it's convenient:
-      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
-      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
-      // note the shared term.
-      __m128i bias  = _mm_set1_epi16(8);
-      __m128i curs = _mm_slli_epi16(curr, 2);
-      __m128i prvd = _mm_sub_epi16(prev, curr);
-      __m128i nxtd = _mm_sub_epi16(next, curr);
-      __m128i curb = _mm_add_epi16(curs, bias);
-      __m128i even = _mm_add_epi16(prvd, curb);
-      __m128i odd  = _mm_add_epi16(nxtd, curb);
-
-      // interleave even and odd pixels, then undo scaling.
-      __m128i int0 = _mm_unpacklo_epi16(even, odd);
-      __m128i int1 = _mm_unpackhi_epi16(even, odd);
-      __m128i de0  = _mm_srli_epi16(int0, 4);
-      __m128i de1  = _mm_srli_epi16(int1, 4);
-
-      // pack and write output
-      __m128i outv = _mm_packus_epi16(de0, de1);
-      _mm_storeu_si128((__m128i *) (out + i*2), outv);
-#elif defined(STBI_NEON)
-      // load and perform the vertical filtering pass
-      // this uses 3*x + y = 4*x + (y - x)
-      uint8x8_t farb  = vld1_u8(in_far + i);
-      uint8x8_t nearb = vld1_u8(in_near + i);
-      int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
-      int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
-      int16x8_t curr  = vaddq_s16(nears, diff); // current row
-
-      // horizontal filter works the same based on shifted vers of current
-      // row. "prev" is current row shifted right by 1 pixel; we need to
-      // insert the previous pixel value (from t1).
-      // "next" is current row shifted left by 1 pixel, with first pixel
-      // of next block of 8 pixels added in.
-      int16x8_t prv0 = vextq_s16(curr, curr, 7);
-      int16x8_t nxt0 = vextq_s16(curr, curr, 1);
-      int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
-      int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
-
-      // horizontal filter, polyphase implementation since it's convenient:
-      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
-      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
-      // note the shared term.
-      int16x8_t curs = vshlq_n_s16(curr, 2);
-      int16x8_t prvd = vsubq_s16(prev, curr);
-      int16x8_t nxtd = vsubq_s16(next, curr);
-      int16x8_t even = vaddq_s16(curs, prvd);
-      int16x8_t odd  = vaddq_s16(curs, nxtd);
-
-      // undo scaling and round, then store with even/odd phases interleaved
-      uint8x8x2_t o;
-      o.val[0] = vqrshrun_n_s16(even, 4);
-      o.val[1] = vqrshrun_n_s16(odd,  4);
-      vst2_u8(out + i*2, o);
-#endif
-
-      // "previous" value for next iter
-      t1 = 3*in_near[i+7] + in_far[i+7];
-   }
-
-   t0 = t1;
-   t1 = 3*in_near[i] + in_far[i];
-   out[i*2] = stbi__div16(3*t1 + t0 + 8);
-
-   for (++i; i < w; ++i) {
-      t0 = t1;
-      t1 = 3*in_near[i]+in_far[i];
-      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
-      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
-   }
-   out[w*2-1] = stbi__div4(t1+2);
-
-   STBI_NOTUSED(hs);
-
-   return out;
-}
-#endif
-
-static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
-{
-   // resample with nearest-neighbor
-   int i,j;
-   STBI_NOTUSED(in_far);
-   for (i=0; i < w; ++i)
-      for (j=0; j < hs; ++j)
-         out[i*hs+j] = in_near[i];
-   return out;
-}
-
-// this is a reduced-precision calculation of YCbCr-to-RGB introduced
-// to make sure the code produces the same results in both SIMD and scalar
-#define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
-static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
-{
-   int i;
-   for (i=0; i < count; ++i) {
-      int y_fixed = (y[i] << 20) + (1<<19); // rounding
-      int r,g,b;
-      int cr = pcr[i] - 128;
-      int cb = pcb[i] - 128;
-      r = y_fixed +  cr* stbi__float2fixed(1.40200f);
-      g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
-      b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
-      r >>= 20;
-      g >>= 20;
-      b >>= 20;
-      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
-      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
-      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
-      out[0] = (stbi_uc)r;
-      out[1] = (stbi_uc)g;
-      out[2] = (stbi_uc)b;
-      out[3] = 255;
-      out += step;
-   }
-}
-
-#if defined(STBI_SSE2) || defined(STBI_NEON)
-static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
-{
-   int i = 0;
-
-#ifdef STBI_SSE2
-   // step == 3 is pretty ugly on the final interleave, and i'm not convinced
-   // it's useful in practice (you wouldn't use it for textures, for example).
-   // so just accelerate step == 4 case.
-   if (step == 4) {
-      // this is a fairly straightforward implementation and not super-optimized.
-      __m128i signflip  = _mm_set1_epi8(-0x80);
-      __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
-      __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
-      __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
-      __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
-      __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
-      __m128i xw = _mm_set1_epi16(255); // alpha channel
-
-      for (; i+7 < count; i += 8) {
-         // load
-         __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
-         __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
-         __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
-         __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
-         __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
-
-         // unpack to short (and left-shift cr, cb by 8)
-         __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
-         __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
-         __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
-
-         // color transform
-         __m128i yws = _mm_srli_epi16(yw, 4);
-         __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
-         __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
-         __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
-         __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
-         __m128i rws = _mm_add_epi16(cr0, yws);
-         __m128i gwt = _mm_add_epi16(cb0, yws);
-         __m128i bws = _mm_add_epi16(yws, cb1);
-         __m128i gws = _mm_add_epi16(gwt, cr1);
-
-         // descale
-         __m128i rw = _mm_srai_epi16(rws, 4);
-         __m128i bw = _mm_srai_epi16(bws, 4);
-         __m128i gw = _mm_srai_epi16(gws, 4);
-
-         // back to byte, set up for transpose
-         __m128i brb = _mm_packus_epi16(rw, bw);
-         __m128i gxb = _mm_packus_epi16(gw, xw);
-
-         // transpose to interleave channels
-         __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
-         __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
-         __m128i o0 = _mm_unpacklo_epi16(t0, t1);
-         __m128i o1 = _mm_unpackhi_epi16(t0, t1);
-
-         // store
-         _mm_storeu_si128((__m128i *) (out + 0), o0);
-         _mm_storeu_si128((__m128i *) (out + 16), o1);
-         out += 32;
-      }
-   }
-#endif
-
-#ifdef STBI_NEON
-   // in this version, step=3 support would be easy to add. but is there demand?
-   if (step == 4) {
-      // this is a fairly straightforward implementation and not super-optimized.
-      uint8x8_t signflip = vdup_n_u8(0x80);
-      int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
-      int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
-      int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
-      int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
-
-      for (; i+7 < count; i += 8) {
-         // load
-         uint8x8_t y_bytes  = vld1_u8(y + i);
-         uint8x8_t cr_bytes = vld1_u8(pcr + i);
-         uint8x8_t cb_bytes = vld1_u8(pcb + i);
-         int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
-         int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
-
-         // expand to s16
-         int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
-         int16x8_t crw = vshll_n_s8(cr_biased, 7);
-         int16x8_t cbw = vshll_n_s8(cb_biased, 7);
-
-         // color transform
-         int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
-         int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
-         int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
-         int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
-         int16x8_t rws = vaddq_s16(yws, cr0);
-         int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
-         int16x8_t bws = vaddq_s16(yws, cb1);
-
-         // undo scaling, round, convert to byte
-         uint8x8x4_t o;
-         o.val[0] = vqrshrun_n_s16(rws, 4);
-         o.val[1] = vqrshrun_n_s16(gws, 4);
-         o.val[2] = vqrshrun_n_s16(bws, 4);
-         o.val[3] = vdup_n_u8(255);
-
-         // store, interleaving r/g/b/a
-         vst4_u8(out, o);
-         out += 8*4;
-      }
-   }
-#endif
-
-   for (; i < count; ++i) {
-      int y_fixed = (y[i] << 20) + (1<<19); // rounding
-      int r,g,b;
-      int cr = pcr[i] - 128;
-      int cb = pcb[i] - 128;
-      r = y_fixed + cr* stbi__float2fixed(1.40200f);
-      g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
-      b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
-      r >>= 20;
-      g >>= 20;
-      b >>= 20;
-      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
-      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
-      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
-      out[0] = (stbi_uc)r;
-      out[1] = (stbi_uc)g;
-      out[2] = (stbi_uc)b;
-      out[3] = 255;
-      out += step;
-   }
-}
-#endif
-
-// set up the kernels
-static void stbi__setup_jpeg(stbi__jpeg *j)
-{
-   j->idct_block_kernel = stbi__idct_block;
-   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
-   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
-
-#ifdef STBI_SSE2
-   if (stbi__sse2_available()) {
-      j->idct_block_kernel = stbi__idct_simd;
-      j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
-      j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
-   }
-#endif
-
-#ifdef STBI_NEON
-   j->idct_block_kernel = stbi__idct_simd;
-   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
-   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
-#endif
-}
-
-// clean up the temporary component buffers
-static void stbi__cleanup_jpeg(stbi__jpeg *j)
-{
-   stbi__free_jpeg_components(j, j->s->img_n, 0);
-}
-
-typedef struct
-{
-   resample_row_func resample;
-   stbi_uc *line0,*line1;
-   int hs,vs;   // expansion factor in each axis
-   int w_lores; // horizontal pixels pre-expansion
-   int ystep;   // how far through vertical expansion we are
-   int ypos;    // which pre-expansion row we're on
-} stbi__resample;
-
-// fast 0..255 * 0..255 => 0..255 rounded multiplication
-static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
-{
-   unsigned int t = x*y + 128;
-   return (stbi_uc) ((t + (t >>8)) >> 8);
-}
-
-static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
-{
-   int n, decode_n, is_rgb;
-   z->s->img_n = 0; // make stbi__cleanup_jpeg safe
-
-   // validate req_comp
-   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
-
-   // load a jpeg image from whichever source, but leave in YCbCr format
-   if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
-
-   // determine actual number of components to generate
-   n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
-
-   is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
-
-   if (z->s->img_n == 3 && n < 3 && !is_rgb)
-      decode_n = 1;
-   else
-      decode_n = z->s->img_n;
-
-   // nothing to do if no components requested; check this now to avoid
-   // accessing uninitialized coutput[0] later
-   if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; }
-
-   // resample and color-convert
-   {
-      int k;
-      unsigned int i,j;
-      stbi_uc *output;
-      stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
-
-      stbi__resample res_comp[4];
-
-      for (k=0; k < decode_n; ++k) {
-         stbi__resample *r = &res_comp[k];
-
-         // allocate line buffer big enough for upsampling off the edges
-         // with upsample factor of 4
-         z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
-         if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
-
-         r->hs      = z->img_h_max / z->img_comp[k].h;
-         r->vs      = z->img_v_max / z->img_comp[k].v;
-         r->ystep   = r->vs >> 1;
-         r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
-         r->ypos    = 0;
-         r->line0   = r->line1 = z->img_comp[k].data;
-
-         if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
-         else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
-         else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
-         else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
-         else                               r->resample = stbi__resample_row_generic;
-      }
-
-      // can't error after this so, this is safe
-      output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
-      if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
-
-      // now go ahead and resample
-      for (j=0; j < z->s->img_y; ++j) {
-         stbi_uc *out = output + n * z->s->img_x * j;
-         for (k=0; k < decode_n; ++k) {
-            stbi__resample *r = &res_comp[k];
-            int y_bot = r->ystep >= (r->vs >> 1);
-            coutput[k] = r->resample(z->img_comp[k].linebuf,
-                                     y_bot ? r->line1 : r->line0,
-                                     y_bot ? r->line0 : r->line1,
-                                     r->w_lores, r->hs);
-            if (++r->ystep >= r->vs) {
-               r->ystep = 0;
-               r->line0 = r->line1;
-               if (++r->ypos < z->img_comp[k].y)
-                  r->line1 += z->img_comp[k].w2;
-            }
-         }
-         if (n >= 3) {
-            stbi_uc *y = coutput[0];
-            if (z->s->img_n == 3) {
-               if (is_rgb) {
-                  for (i=0; i < z->s->img_x; ++i) {
-                     out[0] = y[i];
-                     out[1] = coutput[1][i];
-                     out[2] = coutput[2][i];
-                     out[3] = 255;
-                     out += n;
-                  }
-               } else {
-                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
-               }
-            } else if (z->s->img_n == 4) {
-               if (z->app14_color_transform == 0) { // CMYK
-                  for (i=0; i < z->s->img_x; ++i) {
-                     stbi_uc m = coutput[3][i];
-                     out[0] = stbi__blinn_8x8(coutput[0][i], m);
-                     out[1] = stbi__blinn_8x8(coutput[1][i], m);
-                     out[2] = stbi__blinn_8x8(coutput[2][i], m);
-                     out[3] = 255;
-                     out += n;
-                  }
-               } else if (z->app14_color_transform == 2) { // YCCK
-                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
-                  for (i=0; i < z->s->img_x; ++i) {
-                     stbi_uc m = coutput[3][i];
-                     out[0] = stbi__blinn_8x8(255 - out[0], m);
-                     out[1] = stbi__blinn_8x8(255 - out[1], m);
-                     out[2] = stbi__blinn_8x8(255 - out[2], m);
-                     out += n;
-                  }
-               } else { // YCbCr + alpha?  Ignore the fourth channel for now
-                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
-               }
-            } else
-               for (i=0; i < z->s->img_x; ++i) {
-                  out[0] = out[1] = out[2] = y[i];
-                  out[3] = 255; // not used if n==3
-                  out += n;
-               }
-         } else {
-            if (is_rgb) {
-               if (n == 1)
-                  for (i=0; i < z->s->img_x; ++i)
-                     *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
-               else {
-                  for (i=0; i < z->s->img_x; ++i, out += 2) {
-                     out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
-                     out[1] = 255;
-                  }
-               }
-            } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
-               for (i=0; i < z->s->img_x; ++i) {
-                  stbi_uc m = coutput[3][i];
-                  stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
-                  stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
-                  stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
-                  out[0] = stbi__compute_y(r, g, b);
-                  out[1] = 255;
-                  out += n;
-               }
-            } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
-               for (i=0; i < z->s->img_x; ++i) {
-                  out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
-                  out[1] = 255;
-                  out += n;
-               }
-            } else {
-               stbi_uc *y = coutput[0];
-               if (n == 1)
-                  for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
-               else
-                  for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
-            }
-         }
-      }
-      stbi__cleanup_jpeg(z);
-      *out_x = z->s->img_x;
-      *out_y = z->s->img_y;
-      if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
-      return output;
-   }
-}
-
-static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
-{
-   unsigned char* result;
-   stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
-   if (!j) return stbi__errpuc("outofmem", "Out of memory");
-   memset(j, 0, sizeof(stbi__jpeg));
-   STBI_NOTUSED(ri);
-   j->s = s;
-   stbi__setup_jpeg(j);
-   result = load_jpeg_image(j, x,y,comp,req_comp);
-   STBI_FREE(j);
-   return result;
-}
-
-static int stbi__jpeg_test(stbi__context *s)
-{
-   int r;
-   stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
-   if (!j) return stbi__err("outofmem", "Out of memory");
-   memset(j, 0, sizeof(stbi__jpeg));
-   j->s = s;
-   stbi__setup_jpeg(j);
-   r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
-   stbi__rewind(s);
-   STBI_FREE(j);
-   return r;
-}
-
-static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
-{
-   if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
-      stbi__rewind( j->s );
-      return 0;
-   }
-   if (x) *x = j->s->img_x;
-   if (y) *y = j->s->img_y;
-   if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
-   return 1;
-}
-
-static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
-{
-   int result;
-   stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
-   if (!j) return stbi__err("outofmem", "Out of memory");
-   memset(j, 0, sizeof(stbi__jpeg));
-   j->s = s;
-   result = stbi__jpeg_info_raw(j, x, y, comp);
-   STBI_FREE(j);
-   return result;
-}
-#endif
-
-// public domain zlib decode    v0.2  Sean Barrett 2006-11-18
-//    simple implementation
-//      - all input must be provided in an upfront buffer
-//      - all output is written to a single output buffer (can malloc/realloc)
-//    performance
-//      - fast huffman
-
-#ifndef STBI_NO_ZLIB
-
-// fast-way is faster to check than jpeg huffman, but slow way is slower
-#define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
-#define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
-#define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
-
-// zlib-style huffman encoding
-// (jpegs packs from left, zlib from right, so can't share code)
-typedef struct
-{
-   stbi__uint16 fast[1 << STBI__ZFAST_BITS];
-   stbi__uint16 firstcode[16];
-   int maxcode[17];
-   stbi__uint16 firstsymbol[16];
-   stbi_uc  size[STBI__ZNSYMS];
-   stbi__uint16 value[STBI__ZNSYMS];
-} stbi__zhuffman;
-
-stbi_inline static int stbi__bitreverse16(int n)
-{
-  n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
-  n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
-  n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
-  n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
-  return n;
-}
-
-stbi_inline static int stbi__bit_reverse(int v, int bits)
-{
-   STBI_ASSERT(bits <= 16);
-   // to bit reverse n bits, reverse 16 and shift
-   // e.g. 11 bits, bit reverse and shift away 5
-   return stbi__bitreverse16(v) >> (16-bits);
-}
-
-static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
-{
-   int i,k=0;
-   int code, next_code[16], sizes[17];
-
-   // DEFLATE spec for generating codes
-   memset(sizes, 0, sizeof(sizes));
-   memset(z->fast, 0, sizeof(z->fast));
-   for (i=0; i < num; ++i)
-      ++sizes[sizelist[i]];
-   sizes[0] = 0;
-   for (i=1; i < 16; ++i)
-      if (sizes[i] > (1 << i))
-         return stbi__err("bad sizes", "Corrupt PNG");
-   code = 0;
-   for (i=1; i < 16; ++i) {
-      next_code[i] = code;
-      z->firstcode[i] = (stbi__uint16) code;
-      z->firstsymbol[i] = (stbi__uint16) k;
-      code = (code + sizes[i]);
-      if (sizes[i])
-         if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
-      z->maxcode[i] = code << (16-i); // preshift for inner loop
-      code <<= 1;
-      k += sizes[i];
-   }
-   z->maxcode[16] = 0x10000; // sentinel
-   for (i=0; i < num; ++i) {
-      int s = sizelist[i];
-      if (s) {
-         int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
-         stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
-         z->size [c] = (stbi_uc     ) s;
-         z->value[c] = (stbi__uint16) i;
-         if (s <= STBI__ZFAST_BITS) {
-            int j = stbi__bit_reverse(next_code[s],s);
-            while (j < (1 << STBI__ZFAST_BITS)) {
-               z->fast[j] = fastv;
-               j += (1 << s);
-            }
-         }
-         ++next_code[s];
-      }
-   }
-   return 1;
-}
-
-// zlib-from-memory implementation for PNG reading
-//    because PNG allows splitting the zlib stream arbitrarily,
-//    and it's annoying structurally to have PNG call ZLIB call PNG,
-//    we require PNG read all the IDATs and combine them into a single
-//    memory buffer
-
-typedef struct
-{
-   stbi_uc *zbuffer, *zbuffer_end;
-   int num_bits;
-   int hit_zeof_once;
-   stbi__uint32 code_buffer;
-
-   char *zout;
-   char *zout_start;
-   char *zout_end;
-   int   z_expandable;
-
-   stbi__zhuffman z_length, z_distance;
-} stbi__zbuf;
-
-stbi_inline static int stbi__zeof(stbi__zbuf *z)
-{
-   return (z->zbuffer >= z->zbuffer_end);
-}
-
-stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
-{
-   return stbi__zeof(z) ? 0 : *z->zbuffer++;
-}
-
-static void stbi__fill_bits(stbi__zbuf *z)
-{
-   do {
-      if (z->code_buffer >= (1U << z->num_bits)) {
-        z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
-        return;
-      }
-      z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
-      z->num_bits += 8;
-   } while (z->num_bits <= 24);
-}
-
-stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
-{
-   unsigned int k;
-   if (z->num_bits < n) stbi__fill_bits(z);
-   k = z->code_buffer & ((1 << n) - 1);
-   z->code_buffer >>= n;
-   z->num_bits -= n;
-   return k;
-}
-
-static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
-{
-   int b,s,k;
-   // not resolved by fast table, so compute it the slow way
-   // use jpeg approach, which requires MSbits at top
-   k = stbi__bit_reverse(a->code_buffer, 16);
-   for (s=STBI__ZFAST_BITS+1; ; ++s)
-      if (k < z->maxcode[s])
-         break;
-   if (s >= 16) return -1; // invalid code!
-   // code size is s, so:
-   b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
-   if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere!
-   if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
-   a->code_buffer >>= s;
-   a->num_bits -= s;
-   return z->value[b];
-}
-
-stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
-{
-   int b,s;
-   if (a->num_bits < 16) {
-      if (stbi__zeof(a)) {
-         if (!a->hit_zeof_once) {
-            // This is the first time we hit eof, insert 16 extra padding btis
-            // to allow us to keep going; if we actually consume any of them
-            // though, that is invalid data. This is caught later.
-            a->hit_zeof_once = 1;
-            a->num_bits += 16; // add 16 implicit zero bits
-         } else {
-            // We already inserted our extra 16 padding bits and are again
-            // out, this stream is actually prematurely terminated.
-            return -1;
-         }
-      } else {
-         stbi__fill_bits(a);
-      }
-   }
-   b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
-   if (b) {
-      s = b >> 9;
-      a->code_buffer >>= s;
-      a->num_bits -= s;
-      return b & 511;
-   }
-   return stbi__zhuffman_decode_slowpath(a, z);
-}
-
-static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
-{
-   char *q;
-   unsigned int cur, limit, old_limit;
-   z->zout = zout;
-   if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
-   cur   = (unsigned int) (z->zout - z->zout_start);
-   limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
-   if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
-   while (cur + n > limit) {
-      if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
-      limit *= 2;
-   }
-   q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
-   STBI_NOTUSED(old_limit);
-   if (q == NULL) return stbi__err("outofmem", "Out of memory");
-   z->zout_start = q;
-   z->zout       = q + cur;
-   z->zout_end   = q + limit;
-   return 1;
-}
-
-static const int stbi__zlength_base[31] = {
-   3,4,5,6,7,8,9,10,11,13,
-   15,17,19,23,27,31,35,43,51,59,
-   67,83,99,115,131,163,195,227,258,0,0 };
-
-static const int stbi__zlength_extra[31]=
-{ 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
-
-static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
-257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
-
-static const int stbi__zdist_extra[32] =
-{ 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
-
-static int stbi__parse_huffman_block(stbi__zbuf *a)
-{
-   char *zout = a->zout;
-   for(;;) {
-      int z = stbi__zhuffman_decode(a, &a->z_length);
-      if (z < 256) {
-         if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
-         if (zout >= a->zout_end) {
-            if (!stbi__zexpand(a, zout, 1)) return 0;
-            zout = a->zout;
-         }
-         *zout++ = (char) z;
-      } else {
-         stbi_uc *p;
-         int len,dist;
-         if (z == 256) {
-            a->zout = zout;
-            if (a->hit_zeof_once && a->num_bits < 16) {
-               // The first time we hit zeof, we inserted 16 extra zero bits into our bit
-               // buffer so the decoder can just do its speculative decoding. But if we
-               // actually consumed any of those bits (which is the case when num_bits < 16),
-               // the stream actually read past the end so it is malformed.
-               return stbi__err("unexpected end","Corrupt PNG");
-            }
-            return 1;
-         }
-         if (z >= 286) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, length codes 286 and 287 must not appear in compressed data
-         z -= 257;
-         len = stbi__zlength_base[z];
-         if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
-         z = stbi__zhuffman_decode(a, &a->z_distance);
-         if (z < 0 || z >= 30) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, distance codes 30 and 31 must not appear in compressed data
-         dist = stbi__zdist_base[z];
-         if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
-         if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
-         if (len > a->zout_end - zout) {
-            if (!stbi__zexpand(a, zout, len)) return 0;
-            zout = a->zout;
-         }
-         p = (stbi_uc *) (zout - dist);
-         if (dist == 1) { // run of one byte; common in images.
-            stbi_uc v = *p;
-            if (len) { do *zout++ = v; while (--len); }
-         } else {
-            if (len) { do *zout++ = *p++; while (--len); }
-         }
-      }
-   }
-}
-
-static int stbi__compute_huffman_codes(stbi__zbuf *a)
-{
-   static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
-   stbi__zhuffman z_codelength;
-   stbi_uc lencodes[286+32+137];//padding for maximum single op
-   stbi_uc codelength_sizes[19];
-   int i,n;
-
-   int hlit  = stbi__zreceive(a,5) + 257;
-   int hdist = stbi__zreceive(a,5) + 1;
-   int hclen = stbi__zreceive(a,4) + 4;
-   int ntot  = hlit + hdist;
-
-   memset(codelength_sizes, 0, sizeof(codelength_sizes));
-   for (i=0; i < hclen; ++i) {
-      int s = stbi__zreceive(a,3);
-      codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
-   }
-   if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
-
-   n = 0;
-   while (n < ntot) {
-      int c = stbi__zhuffman_decode(a, &z_codelength);
-      if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
-      if (c < 16)
-         lencodes[n++] = (stbi_uc) c;
-      else {
-         stbi_uc fill = 0;
-         if (c == 16) {
-            c = stbi__zreceive(a,2)+3;
-            if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
-            fill = lencodes[n-1];
-         } else if (c == 17) {
-            c = stbi__zreceive(a,3)+3;
-         } else if (c == 18) {
-            c = stbi__zreceive(a,7)+11;
-         } else {
-            return stbi__err("bad codelengths", "Corrupt PNG");
-         }
-         if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
-         memset(lencodes+n, fill, c);
-         n += c;
-      }
-   }
-   if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
-   if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
-   if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
-   return 1;
-}
-
-static int stbi__parse_uncompressed_block(stbi__zbuf *a)
-{
-   stbi_uc header[4];
-   int len,nlen,k;
-   if (a->num_bits & 7)
-      stbi__zreceive(a, a->num_bits & 7); // discard
-   // drain the bit-packed data into header
-   k = 0;
-   while (a->num_bits > 0) {
-      header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
-      a->code_buffer >>= 8;
-      a->num_bits -= 8;
-   }
-   if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
-   // now fill header the normal way
-   while (k < 4)
-      header[k++] = stbi__zget8(a);
-   len  = header[1] * 256 + header[0];
-   nlen = header[3] * 256 + header[2];
-   if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
-   if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
-   if (a->zout + len > a->zout_end)
-      if (!stbi__zexpand(a, a->zout, len)) return 0;
-   memcpy(a->zout, a->zbuffer, len);
-   a->zbuffer += len;
-   a->zout += len;
-   return 1;
-}
-
-static int stbi__parse_zlib_header(stbi__zbuf *a)
-{
-   int cmf   = stbi__zget8(a);
-   int cm    = cmf & 15;
-   /* int cinfo = cmf >> 4; */
-   int flg   = stbi__zget8(a);
-   if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
-   if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
-   if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
-   if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
-   // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
-   return 1;
-}
-
-static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] =
-{
-   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
-   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
-   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
-   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
-   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
-   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
-   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
-   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
-   7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
-};
-static const stbi_uc stbi__zdefault_distance[32] =
-{
-   5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
-};
-/*
-Init algorithm:
-{
-   int i;   // use <= to match clearly with spec
-   for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
-   for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
-   for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
-   for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
-
-   for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
-}
-*/
-
-static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
-{
-   int final, type;
-   if (parse_header)
-      if (!stbi__parse_zlib_header(a)) return 0;
-   a->num_bits = 0;
-   a->code_buffer = 0;
-   a->hit_zeof_once = 0;
-   do {
-      final = stbi__zreceive(a,1);
-      type = stbi__zreceive(a,2);
-      if (type == 0) {
-         if (!stbi__parse_uncompressed_block(a)) return 0;
-      } else if (type == 3) {
-         return 0;
-      } else {
-         if (type == 1) {
-            // use fixed code lengths
-            if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , STBI__ZNSYMS)) return 0;
-            if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
-         } else {
-            if (!stbi__compute_huffman_codes(a)) return 0;
-         }
-         if (!stbi__parse_huffman_block(a)) return 0;
-      }
-   } while (!final);
-   return 1;
-}
-
-static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
-{
-   a->zout_start = obuf;
-   a->zout       = obuf;
-   a->zout_end   = obuf + olen;
-   a->z_expandable = exp;
-
-   return stbi__parse_zlib(a, parse_header);
-}
-
-STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
-{
-   stbi__zbuf a;
-   char *p = (char *) stbi__malloc(initial_size);
-   if (p == NULL) return NULL;
-   a.zbuffer = (stbi_uc *) buffer;
-   a.zbuffer_end = (stbi_uc *) buffer + len;
-   if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
-      if (outlen) *outlen = (int) (a.zout - a.zout_start);
-      return a.zout_start;
-   } else {
-      STBI_FREE(a.zout_start);
-      return NULL;
-   }
-}
-
-STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
-{
-   return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
-}
-
-STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
-{
-   stbi__zbuf a;
-   char *p = (char *) stbi__malloc(initial_size);
-   if (p == NULL) return NULL;
-   a.zbuffer = (stbi_uc *) buffer;
-   a.zbuffer_end = (stbi_uc *) buffer + len;
-   if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
-      if (outlen) *outlen = (int) (a.zout - a.zout_start);
-      return a.zout_start;
-   } else {
-      STBI_FREE(a.zout_start);
-      return NULL;
-   }
-}
-
-STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
-{
-   stbi__zbuf a;
-   a.zbuffer = (stbi_uc *) ibuffer;
-   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
-   if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
-      return (int) (a.zout - a.zout_start);
-   else
-      return -1;
-}
-
-STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
-{
-   stbi__zbuf a;
-   char *p = (char *) stbi__malloc(16384);
-   if (p == NULL) return NULL;
-   a.zbuffer = (stbi_uc *) buffer;
-   a.zbuffer_end = (stbi_uc *) buffer+len;
-   if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
-      if (outlen) *outlen = (int) (a.zout - a.zout_start);
-      return a.zout_start;
-   } else {
-      STBI_FREE(a.zout_start);
-      return NULL;
-   }
-}
-
-STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
-{
-   stbi__zbuf a;
-   a.zbuffer = (stbi_uc *) ibuffer;
-   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
-   if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
-      return (int) (a.zout - a.zout_start);
-   else
-      return -1;
-}
-#endif
-
-// public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
-//    simple implementation
-//      - only 8-bit samples
-//      - no CRC checking
-//      - allocates lots of intermediate memory
-//        - avoids problem of streaming data between subsystems
-//        - avoids explicit window management
-//    performance
-//      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
-
-#ifndef STBI_NO_PNG
-typedef struct
-{
-   stbi__uint32 length;
-   stbi__uint32 type;
-} stbi__pngchunk;
-
-static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
-{
-   stbi__pngchunk c;
-   c.length = stbi__get32be(s);
-   c.type   = stbi__get32be(s);
-   return c;
-}
-
-static int stbi__check_png_header(stbi__context *s)
-{
-   static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
-   int i;
-   for (i=0; i < 8; ++i)
-      if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
-   return 1;
-}
-
-typedef struct
-{
-   stbi__context *s;
-   stbi_uc *idata, *expanded, *out;
-   int depth;
-} stbi__png;
-
-
-enum {
-   STBI__F_none=0,
-   STBI__F_sub=1,
-   STBI__F_up=2,
-   STBI__F_avg=3,
-   STBI__F_paeth=4,
-   // synthetic filter used for first scanline to avoid needing a dummy row of 0s
-   STBI__F_avg_first
-};
-
-static stbi_uc first_row_filter[5] =
-{
-   STBI__F_none,
-   STBI__F_sub,
-   STBI__F_none,
-   STBI__F_avg_first,
-   STBI__F_sub // Paeth with b=c=0 turns out to be equivalent to sub
-};
-
-static int stbi__paeth(int a, int b, int c)
-{
-   // This formulation looks very different from the reference in the PNG spec, but is
-   // actually equivalent and has favorable data dependencies and admits straightforward
-   // generation of branch-free code, which helps performance significantly.
-   int thresh = c*3 - (a + b);
-   int lo = a < b ? a : b;
-   int hi = a < b ? b : a;
-   int t0 = (hi <= thresh) ? lo : c;
-   int t1 = (thresh <= lo) ? hi : t0;
-   return t1;
-}
-
-static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
-
-// adds an extra all-255 alpha channel
-// dest == src is legal
-// img_n must be 1 or 3
-static void stbi__create_png_alpha_expand8(stbi_uc *dest, stbi_uc *src, stbi__uint32 x, int img_n)
-{
-   int i;
-   // must process data backwards since we allow dest==src
-   if (img_n == 1) {
-      for (i=x-1; i >= 0; --i) {
-         dest[i*2+1] = 255;
-         dest[i*2+0] = src[i];
-      }
-   } else {
-      STBI_ASSERT(img_n == 3);
-      for (i=x-1; i >= 0; --i) {
-         dest[i*4+3] = 255;
-         dest[i*4+2] = src[i*3+2];
-         dest[i*4+1] = src[i*3+1];
-         dest[i*4+0] = src[i*3+0];
-      }
-   }
-}
-
-// create the png data from post-deflated data
-static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
-{
-   int bytes = (depth == 16 ? 2 : 1);
-   stbi__context *s = a->s;
-   stbi__uint32 i,j,stride = x*out_n*bytes;
-   stbi__uint32 img_len, img_width_bytes;
-   stbi_uc *filter_buf;
-   int all_ok = 1;
-   int k;
-   int img_n = s->img_n; // copy it into a local for later
-
-   int output_bytes = out_n*bytes;
-   int filter_bytes = img_n*bytes;
-   int width = x;
-
-   STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
-   a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
-   if (!a->out) return stbi__err("outofmem", "Out of memory");
-
-   // note: error exits here don't need to clean up a->out individually,
-   // stbi__do_png always does on error.
-   if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
-   img_width_bytes = (((img_n * x * depth) + 7) >> 3);
-   if (!stbi__mad2sizes_valid(img_width_bytes, y, img_width_bytes)) return stbi__err("too large", "Corrupt PNG");
-   img_len = (img_width_bytes + 1) * y;
-
-   // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
-   // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
-   // so just check for raw_len < img_len always.
-   if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
-
-   // Allocate two scan lines worth of filter workspace buffer.
-   filter_buf = (stbi_uc *) stbi__malloc_mad2(img_width_bytes, 2, 0);
-   if (!filter_buf) return stbi__err("outofmem", "Out of memory");
-
-   // Filtering for low-bit-depth images
-   if (depth < 8) {
-      filter_bytes = 1;
-      width = img_width_bytes;
-   }
-
-   for (j=0; j < y; ++j) {
-      // cur/prior filter buffers alternate
-      stbi_uc *cur = filter_buf + (j & 1)*img_width_bytes;
-      stbi_uc *prior = filter_buf + (~j & 1)*img_width_bytes;
-      stbi_uc *dest = a->out + stride*j;
-      int nk = width * filter_bytes;
-      int filter = *raw++;
-
-      // check filter type
-      if (filter > 4) {
-         all_ok = stbi__err("invalid filter","Corrupt PNG");
-         break;
-      }
-
-      // if first row, use special filter that doesn't sample previous row
-      if (j == 0) filter = first_row_filter[filter];
-
-      // perform actual filtering
-      switch (filter) {
-      case STBI__F_none:
-         memcpy(cur, raw, nk);
-         break;
-      case STBI__F_sub:
-         memcpy(cur, raw, filter_bytes);
-         for (k = filter_bytes; k < nk; ++k)
-            cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]);
-         break;
-      case STBI__F_up:
-         for (k = 0; k < nk; ++k)
-            cur[k] = STBI__BYTECAST(raw[k] + prior[k]);
-         break;
-      case STBI__F_avg:
-         for (k = 0; k < filter_bytes; ++k)
-            cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1));
-         for (k = filter_bytes; k < nk; ++k)
-            cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1));
-         break;
-      case STBI__F_paeth:
-         for (k = 0; k < filter_bytes; ++k)
-            cur[k] = STBI__BYTECAST(raw[k] + prior[k]); // prior[k] == stbi__paeth(0,prior[k],0)
-         for (k = filter_bytes; k < nk; ++k)
-            cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes], prior[k], prior[k-filter_bytes]));
-         break;
-      case STBI__F_avg_first:
-         memcpy(cur, raw, filter_bytes);
-         for (k = filter_bytes; k < nk; ++k)
-            cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1));
-         break;
-      }
-
-      raw += nk;
-
-      // expand decoded bits in cur to dest, also adding an extra alpha channel if desired
-      if (depth < 8) {
-         stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
-         stbi_uc *in = cur;
-         stbi_uc *out = dest;
-         stbi_uc inb = 0;
-         stbi__uint32 nsmp = x*img_n;
-
-         // expand bits to bytes first
-         if (depth == 4) {
-            for (i=0; i < nsmp; ++i) {
-               if ((i & 1) == 0) inb = *in++;
-               *out++ = scale * (inb >> 4);
-               inb <<= 4;
-            }
-         } else if (depth == 2) {
-            for (i=0; i < nsmp; ++i) {
-               if ((i & 3) == 0) inb = *in++;
-               *out++ = scale * (inb >> 6);
-               inb <<= 2;
-            }
-         } else {
-            STBI_ASSERT(depth == 1);
-            for (i=0; i < nsmp; ++i) {
-               if ((i & 7) == 0) inb = *in++;
-               *out++ = scale * (inb >> 7);
-               inb <<= 1;
-            }
-         }
-
-         // insert alpha=255 values if desired
-         if (img_n != out_n)
-            stbi__create_png_alpha_expand8(dest, dest, x, img_n);
-      } else if (depth == 8) {
-         if (img_n == out_n)
-            memcpy(dest, cur, x*img_n);
-         else
-            stbi__create_png_alpha_expand8(dest, cur, x, img_n);
-      } else if (depth == 16) {
-         // convert the image data from big-endian to platform-native
-         stbi__uint16 *dest16 = (stbi__uint16*)dest;
-         stbi__uint32 nsmp = x*img_n;
-
-         if (img_n == out_n) {
-            for (i = 0; i < nsmp; ++i, ++dest16, cur += 2)
-               *dest16 = (cur[0] << 8) | cur[1];
-         } else {
-            STBI_ASSERT(img_n+1 == out_n);
-            if (img_n == 1) {
-               for (i = 0; i < x; ++i, dest16 += 2, cur += 2) {
-                  dest16[0] = (cur[0] << 8) | cur[1];
-                  dest16[1] = 0xffff;
-               }
-            } else {
-               STBI_ASSERT(img_n == 3);
-               for (i = 0; i < x; ++i, dest16 += 4, cur += 6) {
-                  dest16[0] = (cur[0] << 8) | cur[1];
-                  dest16[1] = (cur[2] << 8) | cur[3];
-                  dest16[2] = (cur[4] << 8) | cur[5];
-                  dest16[3] = 0xffff;
-               }
-            }
-         }
-      }
-   }
-
-   STBI_FREE(filter_buf);
-   if (!all_ok) return 0;
-
-   return 1;
-}
-
-static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
-{
-   int bytes = (depth == 16 ? 2 : 1);
-   int out_bytes = out_n * bytes;
-   stbi_uc *final;
-   int p;
-   if (!interlaced)
-      return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
-
-   // de-interlacing
-   final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
-   if (!final) return stbi__err("outofmem", "Out of memory");
-   for (p=0; p < 7; ++p) {
-      int xorig[] = { 0,4,0,2,0,1,0 };
-      int yorig[] = { 0,0,4,0,2,0,1 };
-      int xspc[]  = { 8,8,4,4,2,2,1 };
-      int yspc[]  = { 8,8,8,4,4,2,2 };
-      int i,j,x,y;
-      // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
-      x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
-      y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
-      if (x && y) {
-         stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
-         if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
-            STBI_FREE(final);
-            return 0;
-         }
-         for (j=0; j < y; ++j) {
-            for (i=0; i < x; ++i) {
-               int out_y = j*yspc[p]+yorig[p];
-               int out_x = i*xspc[p]+xorig[p];
-               memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
-                      a->out + (j*x+i)*out_bytes, out_bytes);
-            }
-         }
-         STBI_FREE(a->out);
-         image_data += img_len;
-         image_data_len -= img_len;
-      }
-   }
-   a->out = final;
-
-   return 1;
-}
-
-static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
-{
-   stbi__context *s = z->s;
-   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
-   stbi_uc *p = z->out;
-
-   // compute color-based transparency, assuming we've
-   // already got 255 as the alpha value in the output
-   STBI_ASSERT(out_n == 2 || out_n == 4);
-
-   if (out_n == 2) {
-      for (i=0; i < pixel_count; ++i) {
-         p[1] = (p[0] == tc[0] ? 0 : 255);
-         p += 2;
-      }
-   } else {
-      for (i=0; i < pixel_count; ++i) {
-         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
-            p[3] = 0;
-         p += 4;
-      }
-   }
-   return 1;
-}
-
-static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
-{
-   stbi__context *s = z->s;
-   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
-   stbi__uint16 *p = (stbi__uint16*) z->out;
-
-   // compute color-based transparency, assuming we've
-   // already got 65535 as the alpha value in the output
-   STBI_ASSERT(out_n == 2 || out_n == 4);
-
-   if (out_n == 2) {
-      for (i = 0; i < pixel_count; ++i) {
-         p[1] = (p[0] == tc[0] ? 0 : 65535);
-         p += 2;
-      }
-   } else {
-      for (i = 0; i < pixel_count; ++i) {
-         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
-            p[3] = 0;
-         p += 4;
-      }
-   }
-   return 1;
-}
-
-static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
-{
-   stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
-   stbi_uc *p, *temp_out, *orig = a->out;
-
-   p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
-   if (p == NULL) return stbi__err("outofmem", "Out of memory");
-
-   // between here and free(out) below, exitting would leak
-   temp_out = p;
-
-   if (pal_img_n == 3) {
-      for (i=0; i < pixel_count; ++i) {
-         int n = orig[i]*4;
-         p[0] = palette[n  ];
-         p[1] = palette[n+1];
-         p[2] = palette[n+2];
-         p += 3;
-      }
-   } else {
-      for (i=0; i < pixel_count; ++i) {
-         int n = orig[i]*4;
-         p[0] = palette[n  ];
-         p[1] = palette[n+1];
-         p[2] = palette[n+2];
-         p[3] = palette[n+3];
-         p += 4;
-      }
-   }
-   STBI_FREE(a->out);
-   a->out = temp_out;
-
-   STBI_NOTUSED(len);
-
-   return 1;
-}
-
-static int stbi__unpremultiply_on_load_global = 0;
-static int stbi__de_iphone_flag_global = 0;
-
-STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
-{
-   stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
-}
-
-STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
-{
-   stbi__de_iphone_flag_global = flag_true_if_should_convert;
-}
-
-#ifndef STBI_THREAD_LOCAL
-#define stbi__unpremultiply_on_load  stbi__unpremultiply_on_load_global
-#define stbi__de_iphone_flag  stbi__de_iphone_flag_global
-#else
-static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set;
-static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set;
-
-STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
-{
-   stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
-   stbi__unpremultiply_on_load_set = 1;
-}
-
-STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
-{
-   stbi__de_iphone_flag_local = flag_true_if_should_convert;
-   stbi__de_iphone_flag_set = 1;
-}
-
-#define stbi__unpremultiply_on_load  (stbi__unpremultiply_on_load_set           \
-                                       ? stbi__unpremultiply_on_load_local      \
-                                       : stbi__unpremultiply_on_load_global)
-#define stbi__de_iphone_flag  (stbi__de_iphone_flag_set                         \
-                                ? stbi__de_iphone_flag_local                    \
-                                : stbi__de_iphone_flag_global)
-#endif // STBI_THREAD_LOCAL
-
-static void stbi__de_iphone(stbi__png *z)
-{
-   stbi__context *s = z->s;
-   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
-   stbi_uc *p = z->out;
-
-   if (s->img_out_n == 3) {  // convert bgr to rgb
-      for (i=0; i < pixel_count; ++i) {
-         stbi_uc t = p[0];
-         p[0] = p[2];
-         p[2] = t;
-         p += 3;
-      }
-   } else {
-      STBI_ASSERT(s->img_out_n == 4);
-      if (stbi__unpremultiply_on_load) {
-         // convert bgr to rgb and unpremultiply
-         for (i=0; i < pixel_count; ++i) {
-            stbi_uc a = p[3];
-            stbi_uc t = p[0];
-            if (a) {
-               stbi_uc half = a / 2;
-               p[0] = (p[2] * 255 + half) / a;
-               p[1] = (p[1] * 255 + half) / a;
-               p[2] = ( t   * 255 + half) / a;
-            } else {
-               p[0] = p[2];
-               p[2] = t;
-            }
-            p += 4;
-         }
-      } else {
-         // convert bgr to rgb
-         for (i=0; i < pixel_count; ++i) {
-            stbi_uc t = p[0];
-            p[0] = p[2];
-            p[2] = t;
-            p += 4;
-         }
-      }
-   }
-}
-
-#define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
-
-static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
-{
-   stbi_uc palette[1024], pal_img_n=0;
-   stbi_uc has_trans=0, tc[3]={0};
-   stbi__uint16 tc16[3];
-   stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
-   int first=1,k,interlace=0, color=0, is_iphone=0;
-   stbi__context *s = z->s;
-
-   z->expanded = NULL;
-   z->idata = NULL;
-   z->out = NULL;
-
-   if (!stbi__check_png_header(s)) return 0;
-
-   if (scan == STBI__SCAN_type) return 1;
-
-   for (;;) {
-      stbi__pngchunk c = stbi__get_chunk_header(s);
-      switch (c.type) {
-         case STBI__PNG_TYPE('C','g','B','I'):
-            is_iphone = 1;
-            stbi__skip(s, c.length);
-            break;
-         case STBI__PNG_TYPE('I','H','D','R'): {
-            int comp,filter;
-            if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
-            first = 0;
-            if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
-            s->img_x = stbi__get32be(s);
-            s->img_y = stbi__get32be(s);
-            if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
-            if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
-            z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
-            color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
-            if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
-            if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
-            comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
-            filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
-            interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
-            if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
-            if (!pal_img_n) {
-               s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
-               if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
-            } else {
-               // if paletted, then pal_n is our final components, and
-               // img_n is # components to decompress/filter.
-               s->img_n = 1;
-               if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
-            }
-            // even with SCAN_header, have to scan to see if we have a tRNS
-            break;
-         }
-
-         case STBI__PNG_TYPE('P','L','T','E'):  {
-            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
-            if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
-            pal_len = c.length / 3;
-            if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
-            for (i=0; i < pal_len; ++i) {
-               palette[i*4+0] = stbi__get8(s);
-               palette[i*4+1] = stbi__get8(s);
-               palette[i*4+2] = stbi__get8(s);
-               palette[i*4+3] = 255;
-            }
-            break;
-         }
-
-         case STBI__PNG_TYPE('t','R','N','S'): {
-            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
-            if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
-            if (pal_img_n) {
-               if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
-               if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
-               if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
-               pal_img_n = 4;
-               for (i=0; i < c.length; ++i)
-                  palette[i*4+3] = stbi__get8(s);
-            } else {
-               if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
-               if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
-               has_trans = 1;
-               // non-paletted with tRNS = constant alpha. if header-scanning, we can stop now.
-               if (scan == STBI__SCAN_header) { ++s->img_n; return 1; }
-               if (z->depth == 16) {
-                  for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
-               } else {
-                  for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
-               }
-            }
-            break;
-         }
-
-         case STBI__PNG_TYPE('I','D','A','T'): {
-            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
-            if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
-            if (scan == STBI__SCAN_header) {
-               // header scan definitely stops at first IDAT
-               if (pal_img_n)
-                  s->img_n = pal_img_n;
-               return 1;
-            }
-            if (c.length > (1u << 30)) return stbi__err("IDAT size limit", "IDAT section larger than 2^30 bytes");
-            if ((int)(ioff + c.length) < (int)ioff) return 0;
-            if (ioff + c.length > idata_limit) {
-               stbi__uint32 idata_limit_old = idata_limit;
-               stbi_uc *p;
-               if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
-               while (ioff + c.length > idata_limit)
-                  idata_limit *= 2;
-               STBI_NOTUSED(idata_limit_old);
-               p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
-               z->idata = p;
-            }
-            if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
-            ioff += c.length;
-            break;
-         }
-
-         case STBI__PNG_TYPE('I','E','N','D'): {
-            stbi__uint32 raw_len, bpl;
-            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
-            if (scan != STBI__SCAN_load) return 1;
-            if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
-            // initial guess for decoded data size to avoid unnecessary reallocs
-            bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
-            raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
-            z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
-            if (z->expanded == NULL) return 0; // zlib should set error
-            STBI_FREE(z->idata); z->idata = NULL;
-            if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
-               s->img_out_n = s->img_n+1;
-            else
-               s->img_out_n = s->img_n;
-            if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
-            if (has_trans) {
-               if (z->depth == 16) {
-                  if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
-               } else {
-                  if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
-               }
-            }
-            if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
-               stbi__de_iphone(z);
-            if (pal_img_n) {
-               // pal_img_n == 3 or 4
-               s->img_n = pal_img_n; // record the actual colors we had
-               s->img_out_n = pal_img_n;
-               if (req_comp >= 3) s->img_out_n = req_comp;
-               if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
-                  return 0;
-            } else if (has_trans) {
-               // non-paletted image with tRNS -> source image has (constant) alpha
-               ++s->img_n;
-            }
-            STBI_FREE(z->expanded); z->expanded = NULL;
-            // end of PNG chunk, read and skip CRC
-            stbi__get32be(s);
-            return 1;
-         }
-
-         default:
-            // if critical, fail
-            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
-            if ((c.type & (1 << 29)) == 0) {
-               #ifndef STBI_NO_FAILURE_STRINGS
-               // not threadsafe
-               static char invalid_chunk[] = "XXXX PNG chunk not known";
-               invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
-               invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
-               invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
-               invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
-               #endif
-               return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
-            }
-            stbi__skip(s, c.length);
-            break;
-      }
-      // end of PNG chunk, read and skip CRC
-      stbi__get32be(s);
-   }
-}
-
-static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
-{
-   void *result=NULL;
-   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
-   if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
-      if (p->depth <= 8)
-         ri->bits_per_channel = 8;
-      else if (p->depth == 16)
-         ri->bits_per_channel = 16;
-      else
-         return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
-      result = p->out;
-      p->out = NULL;
-      if (req_comp && req_comp != p->s->img_out_n) {
-         if (ri->bits_per_channel == 8)
-            result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
-         else
-            result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
-         p->s->img_out_n = req_comp;
-         if (result == NULL) return result;
-      }
-      *x = p->s->img_x;
-      *y = p->s->img_y;
-      if (n) *n = p->s->img_n;
-   }
-   STBI_FREE(p->out);      p->out      = NULL;
-   STBI_FREE(p->expanded); p->expanded = NULL;
-   STBI_FREE(p->idata);    p->idata    = NULL;
-
-   return result;
-}
-
-static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
-{
-   stbi__png p;
-   p.s = s;
-   return stbi__do_png(&p, x,y,comp,req_comp, ri);
-}
-
-static int stbi__png_test(stbi__context *s)
-{
-   int r;
-   r = stbi__check_png_header(s);
-   stbi__rewind(s);
-   return r;
-}
-
-static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
-{
-   if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
-      stbi__rewind( p->s );
-      return 0;
-   }
-   if (x) *x = p->s->img_x;
-   if (y) *y = p->s->img_y;
-   if (comp) *comp = p->s->img_n;
-   return 1;
-}
-
-static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
-{
-   stbi__png p;
-   p.s = s;
-   return stbi__png_info_raw(&p, x, y, comp);
-}
-
-static int stbi__png_is16(stbi__context *s)
-{
-   stbi__png p;
-   p.s = s;
-   if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
-	   return 0;
-   if (p.depth != 16) {
-      stbi__rewind(p.s);
-      return 0;
-   }
-   return 1;
-}
-#endif
-
-// Microsoft/Windows BMP image
-
-#ifndef STBI_NO_BMP
-static int stbi__bmp_test_raw(stbi__context *s)
-{
-   int r;
-   int sz;
-   if (stbi__get8(s) != 'B') return 0;
-   if (stbi__get8(s) != 'M') return 0;
-   stbi__get32le(s); // discard filesize
-   stbi__get16le(s); // discard reserved
-   stbi__get16le(s); // discard reserved
-   stbi__get32le(s); // discard data offset
-   sz = stbi__get32le(s);
-   r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
-   return r;
-}
-
-static int stbi__bmp_test(stbi__context *s)
-{
-   int r = stbi__bmp_test_raw(s);
-   stbi__rewind(s);
-   return r;
-}
-
-
-// returns 0..31 for the highest set bit
-static int stbi__high_bit(unsigned int z)
-{
-   int n=0;
-   if (z == 0) return -1;
-   if (z >= 0x10000) { n += 16; z >>= 16; }
-   if (z >= 0x00100) { n +=  8; z >>=  8; }
-   if (z >= 0x00010) { n +=  4; z >>=  4; }
-   if (z >= 0x00004) { n +=  2; z >>=  2; }
-   if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
-   return n;
-}
-
-static int stbi__bitcount(unsigned int a)
-{
-   a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
-   a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
-   a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
-   a = (a + (a >> 8)); // max 16 per 8 bits
-   a = (a + (a >> 16)); // max 32 per 8 bits
-   return a & 0xff;
-}
-
-// extract an arbitrarily-aligned N-bit value (N=bits)
-// from v, and then make it 8-bits long and fractionally
-// extend it to full full range.
-static int stbi__shiftsigned(unsigned int v, int shift, int bits)
-{
-   static unsigned int mul_table[9] = {
-      0,
-      0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
-      0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
-   };
-   static unsigned int shift_table[9] = {
-      0, 0,0,1,0,2,4,6,0,
-   };
-   if (shift < 0)
-      v <<= -shift;
-   else
-      v >>= shift;
-   STBI_ASSERT(v < 256);
-   v >>= (8-bits);
-   STBI_ASSERT(bits >= 0 && bits <= 8);
-   return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
-}
-
-typedef struct
-{
-   int bpp, offset, hsz;
-   unsigned int mr,mg,mb,ma, all_a;
-   int extra_read;
-} stbi__bmp_data;
-
-static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
-{
-   // BI_BITFIELDS specifies masks explicitly, don't override
-   if (compress == 3)
-      return 1;
-
-   if (compress == 0) {
-      if (info->bpp == 16) {
-         info->mr = 31u << 10;
-         info->mg = 31u <<  5;
-         info->mb = 31u <<  0;
-      } else if (info->bpp == 32) {
-         info->mr = 0xffu << 16;
-         info->mg = 0xffu <<  8;
-         info->mb = 0xffu <<  0;
-         info->ma = 0xffu << 24;
-         info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
-      } else {
-         // otherwise, use defaults, which is all-0
-         info->mr = info->mg = info->mb = info->ma = 0;
-      }
-      return 1;
-   }
-   return 0; // error
-}
-
-static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
-{
-   int hsz;
-   if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
-   stbi__get32le(s); // discard filesize
-   stbi__get16le(s); // discard reserved
-   stbi__get16le(s); // discard reserved
-   info->offset = stbi__get32le(s);
-   info->hsz = hsz = stbi__get32le(s);
-   info->mr = info->mg = info->mb = info->ma = 0;
-   info->extra_read = 14;
-
-   if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
-
-   if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
-   if (hsz == 12) {
-      s->img_x = stbi__get16le(s);
-      s->img_y = stbi__get16le(s);
-   } else {
-      s->img_x = stbi__get32le(s);
-      s->img_y = stbi__get32le(s);
-   }
-   if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
-   info->bpp = stbi__get16le(s);
-   if (hsz != 12) {
-      int compress = stbi__get32le(s);
-      if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
-      if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes
-      if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
-      stbi__get32le(s); // discard sizeof
-      stbi__get32le(s); // discard hres
-      stbi__get32le(s); // discard vres
-      stbi__get32le(s); // discard colorsused
-      stbi__get32le(s); // discard max important
-      if (hsz == 40 || hsz == 56) {
-         if (hsz == 56) {
-            stbi__get32le(s);
-            stbi__get32le(s);
-            stbi__get32le(s);
-            stbi__get32le(s);
-         }
-         if (info->bpp == 16 || info->bpp == 32) {
-            if (compress == 0) {
-               stbi__bmp_set_mask_defaults(info, compress);
-            } else if (compress == 3) {
-               info->mr = stbi__get32le(s);
-               info->mg = stbi__get32le(s);
-               info->mb = stbi__get32le(s);
-               info->extra_read += 12;
-               // not documented, but generated by photoshop and handled by mspaint
-               if (info->mr == info->mg && info->mg == info->mb) {
-                  // ?!?!?
-                  return stbi__errpuc("bad BMP", "bad BMP");
-               }
-            } else
-               return stbi__errpuc("bad BMP", "bad BMP");
-         }
-      } else {
-         // V4/V5 header
-         int i;
-         if (hsz != 108 && hsz != 124)
-            return stbi__errpuc("bad BMP", "bad BMP");
-         info->mr = stbi__get32le(s);
-         info->mg = stbi__get32le(s);
-         info->mb = stbi__get32le(s);
-         info->ma = stbi__get32le(s);
-         if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs
-            stbi__bmp_set_mask_defaults(info, compress);
-         stbi__get32le(s); // discard color space
-         for (i=0; i < 12; ++i)
-            stbi__get32le(s); // discard color space parameters
-         if (hsz == 124) {
-            stbi__get32le(s); // discard rendering intent
-            stbi__get32le(s); // discard offset of profile data
-            stbi__get32le(s); // discard size of profile data
-            stbi__get32le(s); // discard reserved
-         }
-      }
-   }
-   return (void *) 1;
-}
-
-
-static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
-{
-   stbi_uc *out;
-   unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
-   stbi_uc pal[256][4];
-   int psize=0,i,j,width;
-   int flip_vertically, pad, target;
-   stbi__bmp_data info;
-   STBI_NOTUSED(ri);
-
-   info.all_a = 255;
-   if (stbi__bmp_parse_header(s, &info) == NULL)
-      return NULL; // error code already set
-
-   flip_vertically = ((int) s->img_y) > 0;
-   s->img_y = abs((int) s->img_y);
-
-   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
-   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
-
-   mr = info.mr;
-   mg = info.mg;
-   mb = info.mb;
-   ma = info.ma;
-   all_a = info.all_a;
-
-   if (info.hsz == 12) {
-      if (info.bpp < 24)
-         psize = (info.offset - info.extra_read - 24) / 3;
-   } else {
-      if (info.bpp < 16)
-         psize = (info.offset - info.extra_read - info.hsz) >> 2;
-   }
-   if (psize == 0) {
-      // accept some number of extra bytes after the header, but if the offset points either to before
-      // the header ends or implies a large amount of extra data, reject the file as malformed
-      int bytes_read_so_far = s->callback_already_read + (int)(s->img_buffer - s->img_buffer_original);
-      int header_limit = 1024; // max we actually read is below 256 bytes currently.
-      int extra_data_limit = 256*4; // what ordinarily goes here is a palette; 256 entries*4 bytes is its max size.
-      if (bytes_read_so_far <= 0 || bytes_read_so_far > header_limit) {
-         return stbi__errpuc("bad header", "Corrupt BMP");
-      }
-      // we established that bytes_read_so_far is positive and sensible.
-      // the first half of this test rejects offsets that are either too small positives, or
-      // negative, and guarantees that info.offset >= bytes_read_so_far > 0. this in turn
-      // ensures the number computed in the second half of the test can't overflow.
-      if (info.offset < bytes_read_so_far || info.offset - bytes_read_so_far > extra_data_limit) {
-         return stbi__errpuc("bad offset", "Corrupt BMP");
-      } else {
-         stbi__skip(s, info.offset - bytes_read_so_far);
-      }
-   }
-
-   if (info.bpp == 24 && ma == 0xff000000)
-      s->img_n = 3;
-   else
-      s->img_n = ma ? 4 : 3;
-   if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
-      target = req_comp;
-   else
-      target = s->img_n; // if they want monochrome, we'll post-convert
-
-   // sanity-check size
-   if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
-      return stbi__errpuc("too large", "Corrupt BMP");
-
-   out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
-   if (!out) return stbi__errpuc("outofmem", "Out of memory");
-   if (info.bpp < 16) {
-      int z=0;
-      if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
-      for (i=0; i < psize; ++i) {
-         pal[i][2] = stbi__get8(s);
-         pal[i][1] = stbi__get8(s);
-         pal[i][0] = stbi__get8(s);
-         if (info.hsz != 12) stbi__get8(s);
-         pal[i][3] = 255;
-      }
-      stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
-      if (info.bpp == 1) width = (s->img_x + 7) >> 3;
-      else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
-      else if (info.bpp == 8) width = s->img_x;
-      else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
-      pad = (-width)&3;
-      if (info.bpp == 1) {
-         for (j=0; j < (int) s->img_y; ++j) {
-            int bit_offset = 7, v = stbi__get8(s);
-            for (i=0; i < (int) s->img_x; ++i) {
-               int color = (v>>bit_offset)&0x1;
-               out[z++] = pal[color][0];
-               out[z++] = pal[color][1];
-               out[z++] = pal[color][2];
-               if (target == 4) out[z++] = 255;
-               if (i+1 == (int) s->img_x) break;
-               if((--bit_offset) < 0) {
-                  bit_offset = 7;
-                  v = stbi__get8(s);
-               }
-            }
-            stbi__skip(s, pad);
-         }
-      } else {
-         for (j=0; j < (int) s->img_y; ++j) {
-            for (i=0; i < (int) s->img_x; i += 2) {
-               int v=stbi__get8(s),v2=0;
-               if (info.bpp == 4) {
-                  v2 = v & 15;
-                  v >>= 4;
-               }
-               out[z++] = pal[v][0];
-               out[z++] = pal[v][1];
-               out[z++] = pal[v][2];
-               if (target == 4) out[z++] = 255;
-               if (i+1 == (int) s->img_x) break;
-               v = (info.bpp == 8) ? stbi__get8(s) : v2;
-               out[z++] = pal[v][0];
-               out[z++] = pal[v][1];
-               out[z++] = pal[v][2];
-               if (target == 4) out[z++] = 255;
-            }
-            stbi__skip(s, pad);
-         }
-      }
-   } else {
-      int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
-      int z = 0;
-      int easy=0;
-      stbi__skip(s, info.offset - info.extra_read - info.hsz);
-      if (info.bpp == 24) width = 3 * s->img_x;
-      else if (info.bpp == 16) width = 2*s->img_x;
-      else /* bpp = 32 and pad = 0 */ width=0;
-      pad = (-width) & 3;
-      if (info.bpp == 24) {
-         easy = 1;
-      } else if (info.bpp == 32) {
-         if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
-            easy = 2;
-      }
-      if (!easy) {
-         if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
-         // right shift amt to put high bit in position #7
-         rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
-         gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
-         bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
-         ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
-         if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
-      }
-      for (j=0; j < (int) s->img_y; ++j) {
-         if (easy) {
-            for (i=0; i < (int) s->img_x; ++i) {
-               unsigned char a;
-               out[z+2] = stbi__get8(s);
-               out[z+1] = stbi__get8(s);
-               out[z+0] = stbi__get8(s);
-               z += 3;
-               a = (easy == 2 ? stbi__get8(s) : 255);
-               all_a |= a;
-               if (target == 4) out[z++] = a;
-            }
-         } else {
-            int bpp = info.bpp;
-            for (i=0; i < (int) s->img_x; ++i) {
-               stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
-               unsigned int a;
-               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
-               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
-               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
-               a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
-               all_a |= a;
-               if (target == 4) out[z++] = STBI__BYTECAST(a);
-            }
-         }
-         stbi__skip(s, pad);
-      }
-   }
-
-   // if alpha channel is all 0s, replace with all 255s
-   if (target == 4 && all_a == 0)
-      for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
-         out[i] = 255;
-
-   if (flip_vertically) {
-      stbi_uc t;
-      for (j=0; j < (int) s->img_y>>1; ++j) {
-         stbi_uc *p1 = out +      j     *s->img_x*target;
-         stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
-         for (i=0; i < (int) s->img_x*target; ++i) {
-            t = p1[i]; p1[i] = p2[i]; p2[i] = t;
-         }
-      }
-   }
-
-   if (req_comp && req_comp != target) {
-      out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
-      if (out == NULL) return out; // stbi__convert_format frees input on failure
-   }
-
-   *x = s->img_x;
-   *y = s->img_y;
-   if (comp) *comp = s->img_n;
-   return out;
-}
-#endif
-
-// Targa Truevision - TGA
-// by Jonathan Dummer
-#ifndef STBI_NO_TGA
-// returns STBI_rgb or whatever, 0 on error
-static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
-{
-   // only RGB or RGBA (incl. 16bit) or grey allowed
-   if (is_rgb16) *is_rgb16 = 0;
-   switch(bits_per_pixel) {
-      case 8:  return STBI_grey;
-      case 16: if(is_grey) return STBI_grey_alpha;
-               // fallthrough
-      case 15: if(is_rgb16) *is_rgb16 = 1;
-               return STBI_rgb;
-      case 24: // fallthrough
-      case 32: return bits_per_pixel/8;
-      default: return 0;
-   }
-}
-
-static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
-{
-    int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
-    int sz, tga_colormap_type;
-    stbi__get8(s);                   // discard Offset
-    tga_colormap_type = stbi__get8(s); // colormap type
-    if( tga_colormap_type > 1 ) {
-        stbi__rewind(s);
-        return 0;      // only RGB or indexed allowed
-    }
-    tga_image_type = stbi__get8(s); // image type
-    if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
-        if (tga_image_type != 1 && tga_image_type != 9) {
-            stbi__rewind(s);
-            return 0;
-        }
-        stbi__skip(s,4);       // skip index of first colormap entry and number of entries
-        sz = stbi__get8(s);    //   check bits per palette color entry
-        if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
-            stbi__rewind(s);
-            return 0;
-        }
-        stbi__skip(s,4);       // skip image x and y origin
-        tga_colormap_bpp = sz;
-    } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
-        if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
-            stbi__rewind(s);
-            return 0; // only RGB or grey allowed, +/- RLE
-        }
-        stbi__skip(s,9); // skip colormap specification and image x/y origin
-        tga_colormap_bpp = 0;
-    }
-    tga_w = stbi__get16le(s);
-    if( tga_w < 1 ) {
-        stbi__rewind(s);
-        return 0;   // test width
-    }
-    tga_h = stbi__get16le(s);
-    if( tga_h < 1 ) {
-        stbi__rewind(s);
-        return 0;   // test height
-    }
-    tga_bits_per_pixel = stbi__get8(s); // bits per pixel
-    stbi__get8(s); // ignore alpha bits
-    if (tga_colormap_bpp != 0) {
-        if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
-            // when using a colormap, tga_bits_per_pixel is the size of the indexes
-            // I don't think anything but 8 or 16bit indexes makes sense
-            stbi__rewind(s);
-            return 0;
-        }
-        tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
-    } else {
-        tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
-    }
-    if(!tga_comp) {
-      stbi__rewind(s);
-      return 0;
-    }
-    if (x) *x = tga_w;
-    if (y) *y = tga_h;
-    if (comp) *comp = tga_comp;
-    return 1;                   // seems to have passed everything
-}
-
-static int stbi__tga_test(stbi__context *s)
-{
-   int res = 0;
-   int sz, tga_color_type;
-   stbi__get8(s);      //   discard Offset
-   tga_color_type = stbi__get8(s);   //   color type
-   if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
-   sz = stbi__get8(s);   //   image type
-   if ( tga_color_type == 1 ) { // colormapped (paletted) image
-      if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
-      stbi__skip(s,4);       // skip index of first colormap entry and number of entries
-      sz = stbi__get8(s);    //   check bits per palette color entry
-      if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
-      stbi__skip(s,4);       // skip image x and y origin
-   } else { // "normal" image w/o colormap
-      if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
-      stbi__skip(s,9); // skip colormap specification and image x/y origin
-   }
-   if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
-   if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
-   sz = stbi__get8(s);   //   bits per pixel
-   if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
-   if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
-
-   res = 1; // if we got this far, everything's good and we can return 1 instead of 0
-
-errorEnd:
-   stbi__rewind(s);
-   return res;
-}
-
-// read 16bit value and convert to 24bit RGB
-static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
-{
-   stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
-   stbi__uint16 fiveBitMask = 31;
-   // we have 3 channels with 5bits each
-   int r = (px >> 10) & fiveBitMask;
-   int g = (px >> 5) & fiveBitMask;
-   int b = px & fiveBitMask;
-   // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
-   out[0] = (stbi_uc)((r * 255)/31);
-   out[1] = (stbi_uc)((g * 255)/31);
-   out[2] = (stbi_uc)((b * 255)/31);
-
-   // some people claim that the most significant bit might be used for alpha
-   // (possibly if an alpha-bit is set in the "image descriptor byte")
-   // but that only made 16bit test images completely translucent..
-   // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
-}
-
-static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
-{
-   //   read in the TGA header stuff
-   int tga_offset = stbi__get8(s);
-   int tga_indexed = stbi__get8(s);
-   int tga_image_type = stbi__get8(s);
-   int tga_is_RLE = 0;
-   int tga_palette_start = stbi__get16le(s);
-   int tga_palette_len = stbi__get16le(s);
-   int tga_palette_bits = stbi__get8(s);
-   int tga_x_origin = stbi__get16le(s);
-   int tga_y_origin = stbi__get16le(s);
-   int tga_width = stbi__get16le(s);
-   int tga_height = stbi__get16le(s);
-   int tga_bits_per_pixel = stbi__get8(s);
-   int tga_comp, tga_rgb16=0;
-   int tga_inverted = stbi__get8(s);
-   // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
-   //   image data
-   unsigned char *tga_data;
-   unsigned char *tga_palette = NULL;
-   int i, j;
-   unsigned char raw_data[4] = {0};
-   int RLE_count = 0;
-   int RLE_repeating = 0;
-   int read_next_pixel = 1;
-   STBI_NOTUSED(ri);
-   STBI_NOTUSED(tga_x_origin); // @TODO
-   STBI_NOTUSED(tga_y_origin); // @TODO
-
-   if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
-   if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
-
-   //   do a tiny bit of precessing
-   if ( tga_image_type >= 8 )
-   {
-      tga_image_type -= 8;
-      tga_is_RLE = 1;
-   }
-   tga_inverted = 1 - ((tga_inverted >> 5) & 1);
-
-   //   If I'm paletted, then I'll use the number of bits from the palette
-   if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
-   else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
-
-   if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
-      return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
-
-   //   tga info
-   *x = tga_width;
-   *y = tga_height;
-   if (comp) *comp = tga_comp;
-
-   if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
-      return stbi__errpuc("too large", "Corrupt TGA");
-
-   tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
-   if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
-
-   // skip to the data's starting position (offset usually = 0)
-   stbi__skip(s, tga_offset );
-
-   if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
-      for (i=0; i < tga_height; ++i) {
-         int row = tga_inverted ? tga_height -i - 1 : i;
-         stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
-         stbi__getn(s, tga_row, tga_width * tga_comp);
-      }
-   } else  {
-      //   do I need to load a palette?
-      if ( tga_indexed)
-      {
-         if (tga_palette_len == 0) {  /* you have to have at least one entry! */
-            STBI_FREE(tga_data);
-            return stbi__errpuc("bad palette", "Corrupt TGA");
-         }
-
-         //   any data to skip? (offset usually = 0)
-         stbi__skip(s, tga_palette_start );
-         //   load the palette
-         tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
-         if (!tga_palette) {
-            STBI_FREE(tga_data);
-            return stbi__errpuc("outofmem", "Out of memory");
-         }
-         if (tga_rgb16) {
-            stbi_uc *pal_entry = tga_palette;
-            STBI_ASSERT(tga_comp == STBI_rgb);
-            for (i=0; i < tga_palette_len; ++i) {
-               stbi__tga_read_rgb16(s, pal_entry);
-               pal_entry += tga_comp;
-            }
-         } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
-               STBI_FREE(tga_data);
-               STBI_FREE(tga_palette);
-               return stbi__errpuc("bad palette", "Corrupt TGA");
-         }
-      }
-      //   load the data
-      for (i=0; i < tga_width * tga_height; ++i)
-      {
-         //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
-         if ( tga_is_RLE )
-         {
-            if ( RLE_count == 0 )
-            {
-               //   yep, get the next byte as a RLE command
-               int RLE_cmd = stbi__get8(s);
-               RLE_count = 1 + (RLE_cmd & 127);
-               RLE_repeating = RLE_cmd >> 7;
-               read_next_pixel = 1;
-            } else if ( !RLE_repeating )
-            {
-               read_next_pixel = 1;
-            }
-         } else
-         {
-            read_next_pixel = 1;
-         }
-         //   OK, if I need to read a pixel, do it now
-         if ( read_next_pixel )
-         {
-            //   load however much data we did have
-            if ( tga_indexed )
-            {
-               // read in index, then perform the lookup
-               int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
-               if ( pal_idx >= tga_palette_len ) {
-                  // invalid index
-                  pal_idx = 0;
-               }
-               pal_idx *= tga_comp;
-               for (j = 0; j < tga_comp; ++j) {
-                  raw_data[j] = tga_palette[pal_idx+j];
-               }
-            } else if(tga_rgb16) {
-               STBI_ASSERT(tga_comp == STBI_rgb);
-               stbi__tga_read_rgb16(s, raw_data);
-            } else {
-               //   read in the data raw
-               for (j = 0; j < tga_comp; ++j) {
-                  raw_data[j] = stbi__get8(s);
-               }
-            }
-            //   clear the reading flag for the next pixel
-            read_next_pixel = 0;
-         } // end of reading a pixel
-
-         // copy data
-         for (j = 0; j < tga_comp; ++j)
-           tga_data[i*tga_comp+j] = raw_data[j];
-
-         //   in case we're in RLE mode, keep counting down
-         --RLE_count;
-      }
-      //   do I need to invert the image?
-      if ( tga_inverted )
-      {
-         for (j = 0; j*2 < tga_height; ++j)
-         {
-            int index1 = j * tga_width * tga_comp;
-            int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
-            for (i = tga_width * tga_comp; i > 0; --i)
-            {
-               unsigned char temp = tga_data[index1];
-               tga_data[index1] = tga_data[index2];
-               tga_data[index2] = temp;
-               ++index1;
-               ++index2;
-            }
-         }
-      }
-      //   clear my palette, if I had one
-      if ( tga_palette != NULL )
-      {
-         STBI_FREE( tga_palette );
-      }
-   }
-
-   // swap RGB - if the source data was RGB16, it already is in the right order
-   if (tga_comp >= 3 && !tga_rgb16)
-   {
-      unsigned char* tga_pixel = tga_data;
-      for (i=0; i < tga_width * tga_height; ++i)
-      {
-         unsigned char temp = tga_pixel[0];
-         tga_pixel[0] = tga_pixel[2];
-         tga_pixel[2] = temp;
-         tga_pixel += tga_comp;
-      }
-   }
-
-   // convert to target component count
-   if (req_comp && req_comp != tga_comp)
-      tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
-
-   //   the things I do to get rid of an error message, and yet keep
-   //   Microsoft's C compilers happy... [8^(
-   tga_palette_start = tga_palette_len = tga_palette_bits =
-         tga_x_origin = tga_y_origin = 0;
-   STBI_NOTUSED(tga_palette_start);
-   //   OK, done
-   return tga_data;
-}
-#endif
-
-// *************************************************************************************************
-// Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
-
-#ifndef STBI_NO_PSD
-static int stbi__psd_test(stbi__context *s)
-{
-   int r = (stbi__get32be(s) == 0x38425053);
-   stbi__rewind(s);
-   return r;
-}
-
-static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
-{
-   int count, nleft, len;
-
-   count = 0;
-   while ((nleft = pixelCount - count) > 0) {
-      len = stbi__get8(s);
-      if (len == 128) {
-         // No-op.
-      } else if (len < 128) {
-         // Copy next len+1 bytes literally.
-         len++;
-         if (len > nleft) return 0; // corrupt data
-         count += len;
-         while (len) {
-            *p = stbi__get8(s);
-            p += 4;
-            len--;
-         }
-      } else if (len > 128) {
-         stbi_uc   val;
-         // Next -len+1 bytes in the dest are replicated from next source byte.
-         // (Interpret len as a negative 8-bit int.)
-         len = 257 - len;
-         if (len > nleft) return 0; // corrupt data
-         val = stbi__get8(s);
-         count += len;
-         while (len) {
-            *p = val;
-            p += 4;
-            len--;
-         }
-      }
-   }
-
-   return 1;
-}
-
-static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
-{
-   int pixelCount;
-   int channelCount, compression;
-   int channel, i;
-   int bitdepth;
-   int w,h;
-   stbi_uc *out;
-   STBI_NOTUSED(ri);
-
-   // Check identifier
-   if (stbi__get32be(s) != 0x38425053)   // "8BPS"
-      return stbi__errpuc("not PSD", "Corrupt PSD image");
-
-   // Check file type version.
-   if (stbi__get16be(s) != 1)
-      return stbi__errpuc("wrong version", "Unsupported version of PSD image");
-
-   // Skip 6 reserved bytes.
-   stbi__skip(s, 6 );
-
-   // Read the number of channels (R, G, B, A, etc).
-   channelCount = stbi__get16be(s);
-   if (channelCount < 0 || channelCount > 16)
-      return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
-
-   // Read the rows and columns of the image.
-   h = stbi__get32be(s);
-   w = stbi__get32be(s);
-
-   if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
-   if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
-
-   // Make sure the depth is 8 bits.
-   bitdepth = stbi__get16be(s);
-   if (bitdepth != 8 && bitdepth != 16)
-      return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
-
-   // Make sure the color mode is RGB.
-   // Valid options are:
-   //   0: Bitmap
-   //   1: Grayscale
-   //   2: Indexed color
-   //   3: RGB color
-   //   4: CMYK color
-   //   7: Multichannel
-   //   8: Duotone
-   //   9: Lab color
-   if (stbi__get16be(s) != 3)
-      return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
-
-   // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
-   stbi__skip(s,stbi__get32be(s) );
-
-   // Skip the image resources.  (resolution, pen tool paths, etc)
-   stbi__skip(s, stbi__get32be(s) );
-
-   // Skip the reserved data.
-   stbi__skip(s, stbi__get32be(s) );
-
-   // Find out if the data is compressed.
-   // Known values:
-   //   0: no compression
-   //   1: RLE compressed
-   compression = stbi__get16be(s);
-   if (compression > 1)
-      return stbi__errpuc("bad compression", "PSD has an unknown compression format");
-
-   // Check size
-   if (!stbi__mad3sizes_valid(4, w, h, 0))
-      return stbi__errpuc("too large", "Corrupt PSD");
-
-   // Create the destination image.
-
-   if (!compression && bitdepth == 16 && bpc == 16) {
-      out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
-      ri->bits_per_channel = 16;
-   } else
-      out = (stbi_uc *) stbi__malloc(4 * w*h);
-
-   if (!out) return stbi__errpuc("outofmem", "Out of memory");
-   pixelCount = w*h;
-
-   // Initialize the data to zero.
-   //memset( out, 0, pixelCount * 4 );
-
-   // Finally, the image data.
-   if (compression) {
-      // RLE as used by .PSD and .TIFF
-      // Loop until you get the number of unpacked bytes you are expecting:
-      //     Read the next source byte into n.
-      //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
-      //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
-      //     Else if n is 128, noop.
-      // Endloop
-
-      // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
-      // which we're going to just skip.
-      stbi__skip(s, h * channelCount * 2 );
-
-      // Read the RLE data by channel.
-      for (channel = 0; channel < 4; channel++) {
-         stbi_uc *p;
-
-         p = out+channel;
-         if (channel >= channelCount) {
-            // Fill this channel with default data.
-            for (i = 0; i < pixelCount; i++, p += 4)
-               *p = (channel == 3 ? 255 : 0);
-         } else {
-            // Read the RLE data.
-            if (!stbi__psd_decode_rle(s, p, pixelCount)) {
-               STBI_FREE(out);
-               return stbi__errpuc("corrupt", "bad RLE data");
-            }
-         }
-      }
-
-   } else {
-      // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
-      // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
-
-      // Read the data by channel.
-      for (channel = 0; channel < 4; channel++) {
-         if (channel >= channelCount) {
-            // Fill this channel with default data.
-            if (bitdepth == 16 && bpc == 16) {
-               stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
-               stbi__uint16 val = channel == 3 ? 65535 : 0;
-               for (i = 0; i < pixelCount; i++, q += 4)
-                  *q = val;
-            } else {
-               stbi_uc *p = out+channel;
-               stbi_uc val = channel == 3 ? 255 : 0;
-               for (i = 0; i < pixelCount; i++, p += 4)
-                  *p = val;
-            }
-         } else {
-            if (ri->bits_per_channel == 16) {    // output bpc
-               stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
-               for (i = 0; i < pixelCount; i++, q += 4)
-                  *q = (stbi__uint16) stbi__get16be(s);
-            } else {
-               stbi_uc *p = out+channel;
-               if (bitdepth == 16) {  // input bpc
-                  for (i = 0; i < pixelCount; i++, p += 4)
-                     *p = (stbi_uc) (stbi__get16be(s) >> 8);
-               } else {
-                  for (i = 0; i < pixelCount; i++, p += 4)
-                     *p = stbi__get8(s);
-               }
-            }
-         }
-      }
-   }
-
-   // remove weird white matte from PSD
-   if (channelCount >= 4) {
-      if (ri->bits_per_channel == 16) {
-         for (i=0; i < w*h; ++i) {
-            stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
-            if (pixel[3] != 0 && pixel[3] != 65535) {
-               float a = pixel[3] / 65535.0f;
-               float ra = 1.0f / a;
-               float inv_a = 65535.0f * (1 - ra);
-               pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
-               pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
-               pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
-            }
-         }
-      } else {
-         for (i=0; i < w*h; ++i) {
-            unsigned char *pixel = out + 4*i;
-            if (pixel[3] != 0 && pixel[3] != 255) {
-               float a = pixel[3] / 255.0f;
-               float ra = 1.0f / a;
-               float inv_a = 255.0f * (1 - ra);
-               pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
-               pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
-               pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
-            }
-         }
-      }
-   }
-
-   // convert to desired output format
-   if (req_comp && req_comp != 4) {
-      if (ri->bits_per_channel == 16)
-         out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
-      else
-         out = stbi__convert_format(out, 4, req_comp, w, h);
-      if (out == NULL) return out; // stbi__convert_format frees input on failure
-   }
-
-   if (comp) *comp = 4;
-   *y = h;
-   *x = w;
-
-   return out;
-}
-#endif
-
-// *************************************************************************************************
-// Softimage PIC loader
-// by Tom Seddon
-//
-// See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
-// See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
-
-#ifndef STBI_NO_PIC
-static int stbi__pic_is4(stbi__context *s,const char *str)
-{
-   int i;
-   for (i=0; i<4; ++i)
-      if (stbi__get8(s) != (stbi_uc)str[i])
-         return 0;
-
-   return 1;
-}
-
-static int stbi__pic_test_core(stbi__context *s)
-{
-   int i;
-
-   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
-      return 0;
-
-   for(i=0;i<84;++i)
-      stbi__get8(s);
-
-   if (!stbi__pic_is4(s,"PICT"))
-      return 0;
-
-   return 1;
-}
-
-typedef struct
-{
-   stbi_uc size,type,channel;
-} stbi__pic_packet;
-
-static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
-{
-   int mask=0x80, i;
-
-   for (i=0; i<4; ++i, mask>>=1) {
-      if (channel & mask) {
-         if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
-         dest[i]=stbi__get8(s);
-      }
-   }
-
-   return dest;
-}
-
-static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
-{
-   int mask=0x80,i;
-
-   for (i=0;i<4; ++i, mask>>=1)
-      if (channel&mask)
-         dest[i]=src[i];
-}
-
-static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
-{
-   int act_comp=0,num_packets=0,y,chained;
-   stbi__pic_packet packets[10];
-
-   // this will (should...) cater for even some bizarre stuff like having data
-    // for the same channel in multiple packets.
-   do {
-      stbi__pic_packet *packet;
-
-      if (num_packets==sizeof(packets)/sizeof(packets[0]))
-         return stbi__errpuc("bad format","too many packets");
-
-      packet = &packets[num_packets++];
-
-      chained = stbi__get8(s);
-      packet->size    = stbi__get8(s);
-      packet->type    = stbi__get8(s);
-      packet->channel = stbi__get8(s);
-
-      act_comp |= packet->channel;
-
-      if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
-      if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
-   } while (chained);
-
-   *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
-
-   for(y=0; y<height; ++y) {
-      int packet_idx;
-
-      for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
-         stbi__pic_packet *packet = &packets[packet_idx];
-         stbi_uc *dest = result+y*width*4;
-
-         switch (packet->type) {
-            default:
-               return stbi__errpuc("bad format","packet has bad compression type");
-
-            case 0: {//uncompressed
-               int x;
-
-               for(x=0;x<width;++x, dest+=4)
-                  if (!stbi__readval(s,packet->channel,dest))
-                     return 0;
-               break;
-            }
-
-            case 1://Pure RLE
-               {
-                  int left=width, i;
-
-                  while (left>0) {
-                     stbi_uc count,value[4];
-
-                     count=stbi__get8(s);
-                     if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
-
-                     if (count > left)
-                        count = (stbi_uc) left;
-
-                     if (!stbi__readval(s,packet->channel,value))  return 0;
-
-                     for(i=0; i<count; ++i,dest+=4)
-                        stbi__copyval(packet->channel,dest,value);
-                     left -= count;
-                  }
-               }
-               break;
-
-            case 2: {//Mixed RLE
-               int left=width;
-               while (left>0) {
-                  int count = stbi__get8(s), i;
-                  if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
-
-                  if (count >= 128) { // Repeated
-                     stbi_uc value[4];
-
-                     if (count==128)
-                        count = stbi__get16be(s);
-                     else
-                        count -= 127;
-                     if (count > left)
-                        return stbi__errpuc("bad file","scanline overrun");
-
-                     if (!stbi__readval(s,packet->channel,value))
-                        return 0;
-
-                     for(i=0;i<count;++i, dest += 4)
-                        stbi__copyval(packet->channel,dest,value);
-                  } else { // Raw
-                     ++count;
-                     if (count>left) return stbi__errpuc("bad file","scanline overrun");
-
-                     for(i=0;i<count;++i, dest+=4)
-                        if (!stbi__readval(s,packet->channel,dest))
-                           return 0;
-                  }
-                  left-=count;
-               }
-               break;
-            }
-         }
-      }
-   }
-
-   return result;
-}
-
-static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
-{
-   stbi_uc *result;
-   int i, x,y, internal_comp;
-   STBI_NOTUSED(ri);
-
-   if (!comp) comp = &internal_comp;
-
-   for (i=0; i<92; ++i)
-      stbi__get8(s);
-
-   x = stbi__get16be(s);
-   y = stbi__get16be(s);
-
-   if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
-   if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
-
-   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
-   if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
-
-   stbi__get32be(s); //skip `ratio'
-   stbi__get16be(s); //skip `fields'
-   stbi__get16be(s); //skip `pad'
-
-   // intermediate buffer is RGBA
-   result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
-   if (!result) return stbi__errpuc("outofmem", "Out of memory");
-   memset(result, 0xff, x*y*4);
-
-   if (!stbi__pic_load_core(s,x,y,comp, result)) {
-      STBI_FREE(result);
-      result=0;
-   }
-   *px = x;
-   *py = y;
-   if (req_comp == 0) req_comp = *comp;
-   result=stbi__convert_format(result,4,req_comp,x,y);
-
-   return result;
-}
-
-static int stbi__pic_test(stbi__context *s)
-{
-   int r = stbi__pic_test_core(s);
-   stbi__rewind(s);
-   return r;
-}
-#endif
-
-// *************************************************************************************************
-// GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
-
-#ifndef STBI_NO_GIF
-typedef struct
-{
-   stbi__int16 prefix;
-   stbi_uc first;
-   stbi_uc suffix;
-} stbi__gif_lzw;
-
-typedef struct
-{
-   int w,h;
-   stbi_uc *out;                 // output buffer (always 4 components)
-   stbi_uc *background;          // The current "background" as far as a gif is concerned
-   stbi_uc *history;
-   int flags, bgindex, ratio, transparent, eflags;
-   stbi_uc  pal[256][4];
-   stbi_uc lpal[256][4];
-   stbi__gif_lzw codes[8192];
-   stbi_uc *color_table;
-   int parse, step;
-   int lflags;
-   int start_x, start_y;
-   int max_x, max_y;
-   int cur_x, cur_y;
-   int line_size;
-   int delay;
-} stbi__gif;
-
-static int stbi__gif_test_raw(stbi__context *s)
-{
-   int sz;
-   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
-   sz = stbi__get8(s);
-   if (sz != '9' && sz != '7') return 0;
-   if (stbi__get8(s) != 'a') return 0;
-   return 1;
-}
-
-static int stbi__gif_test(stbi__context *s)
-{
-   int r = stbi__gif_test_raw(s);
-   stbi__rewind(s);
-   return r;
-}
-
-static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
-{
-   int i;
-   for (i=0; i < num_entries; ++i) {
-      pal[i][2] = stbi__get8(s);
-      pal[i][1] = stbi__get8(s);
-      pal[i][0] = stbi__get8(s);
-      pal[i][3] = transp == i ? 0 : 255;
-   }
-}
-
-static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
-{
-   stbi_uc version;
-   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
-      return stbi__err("not GIF", "Corrupt GIF");
-
-   version = stbi__get8(s);
-   if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
-   if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
-
-   stbi__g_failure_reason = "";
-   g->w = stbi__get16le(s);
-   g->h = stbi__get16le(s);
-   g->flags = stbi__get8(s);
-   g->bgindex = stbi__get8(s);
-   g->ratio = stbi__get8(s);
-   g->transparent = -1;
-
-   if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
-   if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
-
-   if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
-
-   if (is_info) return 1;
-
-   if (g->flags & 0x80)
-      stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
-
-   return 1;
-}
-
-static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
-{
-   stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
-   if (!g) return stbi__err("outofmem", "Out of memory");
-   if (!stbi__gif_header(s, g, comp, 1)) {
-      STBI_FREE(g);
-      stbi__rewind( s );
-      return 0;
-   }
-   if (x) *x = g->w;
-   if (y) *y = g->h;
-   STBI_FREE(g);
-   return 1;
-}
-
-static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
-{
-   stbi_uc *p, *c;
-   int idx;
-
-   // recurse to decode the prefixes, since the linked-list is backwards,
-   // and working backwards through an interleaved image would be nasty
-   if (g->codes[code].prefix >= 0)
-      stbi__out_gif_code(g, g->codes[code].prefix);
-
-   if (g->cur_y >= g->max_y) return;
-
-   idx = g->cur_x + g->cur_y;
-   p = &g->out[idx];
-   g->history[idx / 4] = 1;
-
-   c = &g->color_table[g->codes[code].suffix * 4];
-   if (c[3] > 128) { // don't render transparent pixels;
-      p[0] = c[2];
-      p[1] = c[1];
-      p[2] = c[0];
-      p[3] = c[3];
-   }
-   g->cur_x += 4;
-
-   if (g->cur_x >= g->max_x) {
-      g->cur_x = g->start_x;
-      g->cur_y += g->step;
-
-      while (g->cur_y >= g->max_y && g->parse > 0) {
-         g->step = (1 << g->parse) * g->line_size;
-         g->cur_y = g->start_y + (g->step >> 1);
-         --g->parse;
-      }
-   }
-}
-
-static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
-{
-   stbi_uc lzw_cs;
-   stbi__int32 len, init_code;
-   stbi__uint32 first;
-   stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
-   stbi__gif_lzw *p;
-
-   lzw_cs = stbi__get8(s);
-   if (lzw_cs > 12) return NULL;
-   clear = 1 << lzw_cs;
-   first = 1;
-   codesize = lzw_cs + 1;
-   codemask = (1 << codesize) - 1;
-   bits = 0;
-   valid_bits = 0;
-   for (init_code = 0; init_code < clear; init_code++) {
-      g->codes[init_code].prefix = -1;
-      g->codes[init_code].first = (stbi_uc) init_code;
-      g->codes[init_code].suffix = (stbi_uc) init_code;
-   }
-
-   // support no starting clear code
-   avail = clear+2;
-   oldcode = -1;
-
-   len = 0;
-   for(;;) {
-      if (valid_bits < codesize) {
-         if (len == 0) {
-            len = stbi__get8(s); // start new block
-            if (len == 0)
-               return g->out;
-         }
-         --len;
-         bits |= (stbi__int32) stbi__get8(s) << valid_bits;
-         valid_bits += 8;
-      } else {
-         stbi__int32 code = bits & codemask;
-         bits >>= codesize;
-         valid_bits -= codesize;
-         // @OPTIMIZE: is there some way we can accelerate the non-clear path?
-         if (code == clear) {  // clear code
-            codesize = lzw_cs + 1;
-            codemask = (1 << codesize) - 1;
-            avail = clear + 2;
-            oldcode = -1;
-            first = 0;
-         } else if (code == clear + 1) { // end of stream code
-            stbi__skip(s, len);
-            while ((len = stbi__get8(s)) > 0)
-               stbi__skip(s,len);
-            return g->out;
-         } else if (code <= avail) {
-            if (first) {
-               return stbi__errpuc("no clear code", "Corrupt GIF");
-            }
-
-            if (oldcode >= 0) {
-               p = &g->codes[avail++];
-               if (avail > 8192) {
-                  return stbi__errpuc("too many codes", "Corrupt GIF");
-               }
-
-               p->prefix = (stbi__int16) oldcode;
-               p->first = g->codes[oldcode].first;
-               p->suffix = (code == avail) ? p->first : g->codes[code].first;
-            } else if (code == avail)
-               return stbi__errpuc("illegal code in raster", "Corrupt GIF");
-
-            stbi__out_gif_code(g, (stbi__uint16) code);
-
-            if ((avail & codemask) == 0 && avail <= 0x0FFF) {
-               codesize++;
-               codemask = (1 << codesize) - 1;
-            }
-
-            oldcode = code;
-         } else {
-            return stbi__errpuc("illegal code in raster", "Corrupt GIF");
-         }
-      }
-   }
-}
-
-// this function is designed to support animated gifs, although stb_image doesn't support it
-// two back is the image from two frames ago, used for a very specific disposal format
-static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
-{
-   int dispose;
-   int first_frame;
-   int pi;
-   int pcount;
-   STBI_NOTUSED(req_comp);
-
-   // on first frame, any non-written pixels get the background colour (non-transparent)
-   first_frame = 0;
-   if (g->out == 0) {
-      if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
-      if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
-         return stbi__errpuc("too large", "GIF image is too large");
-      pcount = g->w * g->h;
-      g->out = (stbi_uc *) stbi__malloc(4 * pcount);
-      g->background = (stbi_uc *) stbi__malloc(4 * pcount);
-      g->history = (stbi_uc *) stbi__malloc(pcount);
-      if (!g->out || !g->background || !g->history)
-         return stbi__errpuc("outofmem", "Out of memory");
-
-      // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
-      // background colour is only used for pixels that are not rendered first frame, after that "background"
-      // color refers to the color that was there the previous frame.
-      memset(g->out, 0x00, 4 * pcount);
-      memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
-      memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
-      first_frame = 1;
-   } else {
-      // second frame - how do we dispose of the previous one?
-      dispose = (g->eflags & 0x1C) >> 2;
-      pcount = g->w * g->h;
-
-      if ((dispose == 3) && (two_back == 0)) {
-         dispose = 2; // if I don't have an image to revert back to, default to the old background
-      }
-
-      if (dispose == 3) { // use previous graphic
-         for (pi = 0; pi < pcount; ++pi) {
-            if (g->history[pi]) {
-               memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
-            }
-         }
-      } else if (dispose == 2) {
-         // restore what was changed last frame to background before that frame;
-         for (pi = 0; pi < pcount; ++pi) {
-            if (g->history[pi]) {
-               memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
-            }
-         }
-      } else {
-         // This is a non-disposal case eithe way, so just
-         // leave the pixels as is, and they will become the new background
-         // 1: do not dispose
-         // 0:  not specified.
-      }
-
-      // background is what out is after the undoing of the previou frame;
-      memcpy( g->background, g->out, 4 * g->w * g->h );
-   }
-
-   // clear my history;
-   memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
-
-   for (;;) {
-      int tag = stbi__get8(s);
-      switch (tag) {
-         case 0x2C: /* Image Descriptor */
-         {
-            stbi__int32 x, y, w, h;
-            stbi_uc *o;
-
-            x = stbi__get16le(s);
-            y = stbi__get16le(s);
-            w = stbi__get16le(s);
-            h = stbi__get16le(s);
-            if (((x + w) > (g->w)) || ((y + h) > (g->h)))
-               return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
-
-            g->line_size = g->w * 4;
-            g->start_x = x * 4;
-            g->start_y = y * g->line_size;
-            g->max_x   = g->start_x + w * 4;
-            g->max_y   = g->start_y + h * g->line_size;
-            g->cur_x   = g->start_x;
-            g->cur_y   = g->start_y;
-
-            // if the width of the specified rectangle is 0, that means
-            // we may not see *any* pixels or the image is malformed;
-            // to make sure this is caught, move the current y down to
-            // max_y (which is what out_gif_code checks).
-            if (w == 0)
-               g->cur_y = g->max_y;
-
-            g->lflags = stbi__get8(s);
-
-            if (g->lflags & 0x40) {
-               g->step = 8 * g->line_size; // first interlaced spacing
-               g->parse = 3;
-            } else {
-               g->step = g->line_size;
-               g->parse = 0;
-            }
-
-            if (g->lflags & 0x80) {
-               stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
-               g->color_table = (stbi_uc *) g->lpal;
-            } else if (g->flags & 0x80) {
-               g->color_table = (stbi_uc *) g->pal;
-            } else
-               return stbi__errpuc("missing color table", "Corrupt GIF");
-
-            o = stbi__process_gif_raster(s, g);
-            if (!o) return NULL;
-
-            // if this was the first frame,
-            pcount = g->w * g->h;
-            if (first_frame && (g->bgindex > 0)) {
-               // if first frame, any pixel not drawn to gets the background color
-               for (pi = 0; pi < pcount; ++pi) {
-                  if (g->history[pi] == 0) {
-                     g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
-                     memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
-                  }
-               }
-            }
-
-            return o;
-         }
-
-         case 0x21: // Comment Extension.
-         {
-            int len;
-            int ext = stbi__get8(s);
-            if (ext == 0xF9) { // Graphic Control Extension.
-               len = stbi__get8(s);
-               if (len == 4) {
-                  g->eflags = stbi__get8(s);
-                  g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
-
-                  // unset old transparent
-                  if (g->transparent >= 0) {
-                     g->pal[g->transparent][3] = 255;
-                  }
-                  if (g->eflags & 0x01) {
-                     g->transparent = stbi__get8(s);
-                     if (g->transparent >= 0) {
-                        g->pal[g->transparent][3] = 0;
-                     }
-                  } else {
-                     // don't need transparent
-                     stbi__skip(s, 1);
-                     g->transparent = -1;
-                  }
-               } else {
-                  stbi__skip(s, len);
-                  break;
-               }
-            }
-            while ((len = stbi__get8(s)) != 0) {
-               stbi__skip(s, len);
-            }
-            break;
-         }
-
-         case 0x3B: // gif stream termination code
-            return (stbi_uc *) s; // using '1' causes warning on some compilers
-
-         default:
-            return stbi__errpuc("unknown code", "Corrupt GIF");
-      }
-   }
-}
-
-static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
-{
-   STBI_FREE(g->out);
-   STBI_FREE(g->history);
-   STBI_FREE(g->background);
-
-   if (out) STBI_FREE(out);
-   if (delays && *delays) STBI_FREE(*delays);
-   return stbi__errpuc("outofmem", "Out of memory");
-}
-
-static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
-{
-   if (stbi__gif_test(s)) {
-      int layers = 0;
-      stbi_uc *u = 0;
-      stbi_uc *out = 0;
-      stbi_uc *two_back = 0;
-      stbi__gif g;
-      int stride;
-      int out_size = 0;
-      int delays_size = 0;
-
-      STBI_NOTUSED(out_size);
-      STBI_NOTUSED(delays_size);
-
-      memset(&g, 0, sizeof(g));
-      if (delays) {
-         *delays = 0;
-      }
-
-      do {
-         u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
-         if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
-
-         if (u) {
-            *x = g.w;
-            *y = g.h;
-            ++layers;
-            stride = g.w * g.h * 4;
-
-            if (out) {
-               void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
-               if (!tmp)
-                  return stbi__load_gif_main_outofmem(&g, out, delays);
-               else {
-                   out = (stbi_uc*) tmp;
-                   out_size = layers * stride;
-               }
-
-               if (delays) {
-                  int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
-                  if (!new_delays)
-                     return stbi__load_gif_main_outofmem(&g, out, delays);
-                  *delays = new_delays;
-                  delays_size = layers * sizeof(int);
-               }
-            } else {
-               out = (stbi_uc*)stbi__malloc( layers * stride );
-               if (!out)
-                  return stbi__load_gif_main_outofmem(&g, out, delays);
-               out_size = layers * stride;
-               if (delays) {
-                  *delays = (int*) stbi__malloc( layers * sizeof(int) );
-                  if (!*delays)
-                     return stbi__load_gif_main_outofmem(&g, out, delays);
-                  delays_size = layers * sizeof(int);
-               }
-            }
-            memcpy( out + ((layers - 1) * stride), u, stride );
-            if (layers >= 2) {
-               two_back = out - 2 * stride;
-            }
-
-            if (delays) {
-               (*delays)[layers - 1U] = g.delay;
-            }
-         }
-      } while (u != 0);
-
-      // free temp buffer;
-      STBI_FREE(g.out);
-      STBI_FREE(g.history);
-      STBI_FREE(g.background);
-
-      // do the final conversion after loading everything;
-      if (req_comp && req_comp != 4)
-         out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
-
-      *z = layers;
-      return out;
-   } else {
-      return stbi__errpuc("not GIF", "Image was not as a gif type.");
-   }
-}
-
-static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
-{
-   stbi_uc *u = 0;
-   stbi__gif g;
-   memset(&g, 0, sizeof(g));
-   STBI_NOTUSED(ri);
-
-   u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
-   if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
-   if (u) {
-      *x = g.w;
-      *y = g.h;
-
-      // moved conversion to after successful load so that the same
-      // can be done for multiple frames.
-      if (req_comp && req_comp != 4)
-         u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
-   } else if (g.out) {
-      // if there was an error and we allocated an image buffer, free it!
-      STBI_FREE(g.out);
-   }
-
-   // free buffers needed for multiple frame loading;
-   STBI_FREE(g.history);
-   STBI_FREE(g.background);
-
-   return u;
-}
-
-static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
-{
-   return stbi__gif_info_raw(s,x,y,comp);
-}
-#endif
-
-// *************************************************************************************************
-// Radiance RGBE HDR loader
-// originally by Nicolas Schulz
-#ifndef STBI_NO_HDR
-static int stbi__hdr_test_core(stbi__context *s, const char *signature)
-{
-   int i;
-   for (i=0; signature[i]; ++i)
-      if (stbi__get8(s) != signature[i])
-          return 0;
-   stbi__rewind(s);
-   return 1;
-}
-
-static int stbi__hdr_test(stbi__context* s)
-{
-   int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
-   stbi__rewind(s);
-   if(!r) {
-       r = stbi__hdr_test_core(s, "#?RGBE\n");
-       stbi__rewind(s);
-   }
-   return r;
-}
-
-#define STBI__HDR_BUFLEN  1024
-static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
-{
-   int len=0;
-   char c = '\0';
-
-   c = (char) stbi__get8(z);
-
-   while (!stbi__at_eof(z) && c != '\n') {
-      buffer[len++] = c;
-      if (len == STBI__HDR_BUFLEN-1) {
-         // flush to end of line
-         while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
-            ;
-         break;
-      }
-      c = (char) stbi__get8(z);
-   }
-
-   buffer[len] = 0;
-   return buffer;
-}
-
-static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
-{
-   if ( input[3] != 0 ) {
-      float f1;
-      // Exponent
-      f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
-      if (req_comp <= 2)
-         output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
-      else {
-         output[0] = input[0] * f1;
-         output[1] = input[1] * f1;
-         output[2] = input[2] * f1;
-      }
-      if (req_comp == 2) output[1] = 1;
-      if (req_comp == 4) output[3] = 1;
-   } else {
-      switch (req_comp) {
-         case 4: output[3] = 1; /* fallthrough */
-         case 3: output[0] = output[1] = output[2] = 0;
-                 break;
-         case 2: output[1] = 1; /* fallthrough */
-         case 1: output[0] = 0;
-                 break;
-      }
-   }
-}
-
-static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
-{
-   char buffer[STBI__HDR_BUFLEN];
-   char *token;
-   int valid = 0;
-   int width, height;
-   stbi_uc *scanline;
-   float *hdr_data;
-   int len;
-   unsigned char count, value;
-   int i, j, k, c1,c2, z;
-   const char *headerToken;
-   STBI_NOTUSED(ri);
-
-   // Check identifier
-   headerToken = stbi__hdr_gettoken(s,buffer);
-   if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
-      return stbi__errpf("not HDR", "Corrupt HDR image");
-
-   // Parse header
-   for(;;) {
-      token = stbi__hdr_gettoken(s,buffer);
-      if (token[0] == 0) break;
-      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
-   }
-
-   if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
-
-   // Parse width and height
-   // can't use sscanf() if we're not using stdio!
-   token = stbi__hdr_gettoken(s,buffer);
-   if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
-   token += 3;
-   height = (int) strtol(token, &token, 10);
-   while (*token == ' ') ++token;
-   if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
-   token += 3;
-   width = (int) strtol(token, NULL, 10);
-
-   if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
-   if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
-
-   *x = width;
-   *y = height;
-
-   if (comp) *comp = 3;
-   if (req_comp == 0) req_comp = 3;
-
-   if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
-      return stbi__errpf("too large", "HDR image is too large");
-
-   // Read data
-   hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
-   if (!hdr_data)
-      return stbi__errpf("outofmem", "Out of memory");
-
-   // Load image data
-   // image data is stored as some number of sca
-   if ( width < 8 || width >= 32768) {
-      // Read flat data
-      for (j=0; j < height; ++j) {
-         for (i=0; i < width; ++i) {
-            stbi_uc rgbe[4];
-           main_decode_loop:
-            stbi__getn(s, rgbe, 4);
-            stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
-         }
-      }
-   } else {
-      // Read RLE-encoded data
-      scanline = NULL;
-
-      for (j = 0; j < height; ++j) {
-         c1 = stbi__get8(s);
-         c2 = stbi__get8(s);
-         len = stbi__get8(s);
-         if (c1 != 2 || c2 != 2 || (len & 0x80)) {
-            // not run-length encoded, so we have to actually use THIS data as a decoded
-            // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
-            stbi_uc rgbe[4];
-            rgbe[0] = (stbi_uc) c1;
-            rgbe[1] = (stbi_uc) c2;
-            rgbe[2] = (stbi_uc) len;
-            rgbe[3] = (stbi_uc) stbi__get8(s);
-            stbi__hdr_convert(hdr_data, rgbe, req_comp);
-            i = 1;
-            j = 0;
-            STBI_FREE(scanline);
-            goto main_decode_loop; // yes, this makes no sense
-         }
-         len <<= 8;
-         len |= stbi__get8(s);
-         if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
-         if (scanline == NULL) {
-            scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
-            if (!scanline) {
-               STBI_FREE(hdr_data);
-               return stbi__errpf("outofmem", "Out of memory");
-            }
-         }
-
-         for (k = 0; k < 4; ++k) {
-            int nleft;
-            i = 0;
-            while ((nleft = width - i) > 0) {
-               count = stbi__get8(s);
-               if (count > 128) {
-                  // Run
-                  value = stbi__get8(s);
-                  count -= 128;
-                  if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
-                  for (z = 0; z < count; ++z)
-                     scanline[i++ * 4 + k] = value;
-               } else {
-                  // Dump
-                  if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
-                  for (z = 0; z < count; ++z)
-                     scanline[i++ * 4 + k] = stbi__get8(s);
-               }
-            }
-         }
-         for (i=0; i < width; ++i)
-            stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
-      }
-      if (scanline)
-         STBI_FREE(scanline);
-   }
-
-   return hdr_data;
-}
-
-static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
-{
-   char buffer[STBI__HDR_BUFLEN];
-   char *token;
-   int valid = 0;
-   int dummy;
-
-   if (!x) x = &dummy;
-   if (!y) y = &dummy;
-   if (!comp) comp = &dummy;
-
-   if (stbi__hdr_test(s) == 0) {
-       stbi__rewind( s );
-       return 0;
-   }
-
-   for(;;) {
-      token = stbi__hdr_gettoken(s,buffer);
-      if (token[0] == 0) break;
-      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
-   }
-
-   if (!valid) {
-       stbi__rewind( s );
-       return 0;
-   }
-   token = stbi__hdr_gettoken(s,buffer);
-   if (strncmp(token, "-Y ", 3)) {
-       stbi__rewind( s );
-       return 0;
-   }
-   token += 3;
-   *y = (int) strtol(token, &token, 10);
-   while (*token == ' ') ++token;
-   if (strncmp(token, "+X ", 3)) {
-       stbi__rewind( s );
-       return 0;
-   }
-   token += 3;
-   *x = (int) strtol(token, NULL, 10);
-   *comp = 3;
-   return 1;
-}
-#endif // STBI_NO_HDR
-
-#ifndef STBI_NO_BMP
-static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
-{
-   void *p;
-   stbi__bmp_data info;
-
-   info.all_a = 255;
-   p = stbi__bmp_parse_header(s, &info);
-   if (p == NULL) {
-      stbi__rewind( s );
-      return 0;
-   }
-   if (x) *x = s->img_x;
-   if (y) *y = s->img_y;
-   if (comp) {
-      if (info.bpp == 24 && info.ma == 0xff000000)
-         *comp = 3;
-      else
-         *comp = info.ma ? 4 : 3;
-   }
-   return 1;
-}
-#endif
-
-#ifndef STBI_NO_PSD
-static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
-{
-   int channelCount, dummy, depth;
-   if (!x) x = &dummy;
-   if (!y) y = &dummy;
-   if (!comp) comp = &dummy;
-   if (stbi__get32be(s) != 0x38425053) {
-       stbi__rewind( s );
-       return 0;
-   }
-   if (stbi__get16be(s) != 1) {
-       stbi__rewind( s );
-       return 0;
-   }
-   stbi__skip(s, 6);
-   channelCount = stbi__get16be(s);
-   if (channelCount < 0 || channelCount > 16) {
-       stbi__rewind( s );
-       return 0;
-   }
-   *y = stbi__get32be(s);
-   *x = stbi__get32be(s);
-   depth = stbi__get16be(s);
-   if (depth != 8 && depth != 16) {
-       stbi__rewind( s );
-       return 0;
-   }
-   if (stbi__get16be(s) != 3) {
-       stbi__rewind( s );
-       return 0;
-   }
-   *comp = 4;
-   return 1;
-}
-
-static int stbi__psd_is16(stbi__context *s)
-{
-   int channelCount, depth;
-   if (stbi__get32be(s) != 0x38425053) {
-       stbi__rewind( s );
-       return 0;
-   }
-   if (stbi__get16be(s) != 1) {
-       stbi__rewind( s );
-       return 0;
-   }
-   stbi__skip(s, 6);
-   channelCount = stbi__get16be(s);
-   if (channelCount < 0 || channelCount > 16) {
-       stbi__rewind( s );
-       return 0;
-   }
-   STBI_NOTUSED(stbi__get32be(s));
-   STBI_NOTUSED(stbi__get32be(s));
-   depth = stbi__get16be(s);
-   if (depth != 16) {
-       stbi__rewind( s );
-       return 0;
-   }
-   return 1;
-}
-#endif
-
-#ifndef STBI_NO_PIC
-static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
-{
-   int act_comp=0,num_packets=0,chained,dummy;
-   stbi__pic_packet packets[10];
-
-   if (!x) x = &dummy;
-   if (!y) y = &dummy;
-   if (!comp) comp = &dummy;
-
-   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
-      stbi__rewind(s);
-      return 0;
-   }
-
-   stbi__skip(s, 88);
-
-   *x = stbi__get16be(s);
-   *y = stbi__get16be(s);
-   if (stbi__at_eof(s)) {
-      stbi__rewind( s);
-      return 0;
-   }
-   if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
-      stbi__rewind( s );
-      return 0;
-   }
-
-   stbi__skip(s, 8);
-
-   do {
-      stbi__pic_packet *packet;
-
-      if (num_packets==sizeof(packets)/sizeof(packets[0]))
-         return 0;
-
-      packet = &packets[num_packets++];
-      chained = stbi__get8(s);
-      packet->size    = stbi__get8(s);
-      packet->type    = stbi__get8(s);
-      packet->channel = stbi__get8(s);
-      act_comp |= packet->channel;
-
-      if (stbi__at_eof(s)) {
-          stbi__rewind( s );
-          return 0;
-      }
-      if (packet->size != 8) {
-          stbi__rewind( s );
-          return 0;
-      }
-   } while (chained);
-
-   *comp = (act_comp & 0x10 ? 4 : 3);
-
-   return 1;
-}
-#endif
-
-// *************************************************************************************************
-// Portable Gray Map and Portable Pixel Map loader
-// by Ken Miller
-//
-// PGM: http://netpbm.sourceforge.net/doc/pgm.html
-// PPM: http://netpbm.sourceforge.net/doc/ppm.html
-//
-// Known limitations:
-//    Does not support comments in the header section
-//    Does not support ASCII image data (formats P2 and P3)
-
-#ifndef STBI_NO_PNM
-
-static int      stbi__pnm_test(stbi__context *s)
-{
-   char p, t;
-   p = (char) stbi__get8(s);
-   t = (char) stbi__get8(s);
-   if (p != 'P' || (t != '5' && t != '6')) {
-       stbi__rewind( s );
-       return 0;
-   }
-   return 1;
-}
-
-static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
-{
-   stbi_uc *out;
-   STBI_NOTUSED(ri);
-
-   ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
-   if (ri->bits_per_channel == 0)
-      return 0;
-
-   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
-   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
-
-   *x = s->img_x;
-   *y = s->img_y;
-   if (comp) *comp = s->img_n;
-
-   if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0))
-      return stbi__errpuc("too large", "PNM too large");
-
-   out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0);
-   if (!out) return stbi__errpuc("outofmem", "Out of memory");
-   if (!stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8))) {
-      STBI_FREE(out);
-      return stbi__errpuc("bad PNM", "PNM file truncated");
-   }
-
-   if (req_comp && req_comp != s->img_n) {
-      if (ri->bits_per_channel == 16) {
-         out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, s->img_n, req_comp, s->img_x, s->img_y);
-      } else {
-         out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
-      }
-      if (out == NULL) return out; // stbi__convert_format frees input on failure
-   }
-   return out;
-}
-
-static int      stbi__pnm_isspace(char c)
-{
-   return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
-}
-
-static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
-{
-   for (;;) {
-      while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
-         *c = (char) stbi__get8(s);
-
-      if (stbi__at_eof(s) || *c != '#')
-         break;
-
-      while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
-         *c = (char) stbi__get8(s);
-   }
-}
-
-static int      stbi__pnm_isdigit(char c)
-{
-   return c >= '0' && c <= '9';
-}
-
-static int      stbi__pnm_getinteger(stbi__context *s, char *c)
-{
-   int value = 0;
-
-   while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
-      value = value*10 + (*c - '0');
-      *c = (char) stbi__get8(s);
-      if((value > 214748364) || (value == 214748364 && *c > '7'))
-          return stbi__err("integer parse overflow", "Parsing an integer in the PPM header overflowed a 32-bit int");
-   }
-
-   return value;
-}
-
-static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
-{
-   int maxv, dummy;
-   char c, p, t;
-
-   if (!x) x = &dummy;
-   if (!y) y = &dummy;
-   if (!comp) comp = &dummy;
-
-   stbi__rewind(s);
-
-   // Get identifier
-   p = (char) stbi__get8(s);
-   t = (char) stbi__get8(s);
-   if (p != 'P' || (t != '5' && t != '6')) {
-       stbi__rewind(s);
-       return 0;
-   }
-
-   *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
-
-   c = (char) stbi__get8(s);
-   stbi__pnm_skip_whitespace(s, &c);
-
-   *x = stbi__pnm_getinteger(s, &c); // read width
-   if(*x == 0)
-       return stbi__err("invalid width", "PPM image header had zero or overflowing width");
-   stbi__pnm_skip_whitespace(s, &c);
-
-   *y = stbi__pnm_getinteger(s, &c); // read height
-   if (*y == 0)
-       return stbi__err("invalid width", "PPM image header had zero or overflowing width");
-   stbi__pnm_skip_whitespace(s, &c);
-
-   maxv = stbi__pnm_getinteger(s, &c);  // read max value
-   if (maxv > 65535)
-      return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images");
-   else if (maxv > 255)
-      return 16;
-   else
-      return 8;
-}
-
-static int stbi__pnm_is16(stbi__context *s)
-{
-   if (stbi__pnm_info(s, NULL, NULL, NULL) == 16)
-	   return 1;
-   return 0;
-}
-#endif
-
-static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
-{
-   #ifndef STBI_NO_JPEG
-   if (stbi__jpeg_info(s, x, y, comp)) return 1;
-   #endif
-
-   #ifndef STBI_NO_PNG
-   if (stbi__png_info(s, x, y, comp))  return 1;
-   #endif
-
-   #ifndef STBI_NO_GIF
-   if (stbi__gif_info(s, x, y, comp))  return 1;
-   #endif
-
-   #ifndef STBI_NO_BMP
-   if (stbi__bmp_info(s, x, y, comp))  return 1;
-   #endif
-
-   #ifndef STBI_NO_PSD
-   if (stbi__psd_info(s, x, y, comp))  return 1;
-   #endif
-
-   #ifndef STBI_NO_PIC
-   if (stbi__pic_info(s, x, y, comp))  return 1;
-   #endif
-
-   #ifndef STBI_NO_PNM
-   if (stbi__pnm_info(s, x, y, comp))  return 1;
-   #endif
-
-   #ifndef STBI_NO_HDR
-   if (stbi__hdr_info(s, x, y, comp))  return 1;
-   #endif
-
-   // test tga last because it's a crappy test!
-   #ifndef STBI_NO_TGA
-   if (stbi__tga_info(s, x, y, comp))
-       return 1;
-   #endif
-   return stbi__err("unknown image type", "Image not of any known type, or corrupt");
-}
-
-static int stbi__is_16_main(stbi__context *s)
-{
-   #ifndef STBI_NO_PNG
-   if (stbi__png_is16(s))  return 1;
-   #endif
-
-   #ifndef STBI_NO_PSD
-   if (stbi__psd_is16(s))  return 1;
-   #endif
-
-   #ifndef STBI_NO_PNM
-   if (stbi__pnm_is16(s))  return 1;
-   #endif
-   return 0;
-}
-
-#ifndef STBI_NO_STDIO
-STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
-{
-    FILE *f = stbi__fopen(filename, "rb");
-    int result;
-    if (!f) return stbi__err("can't fopen", "Unable to open file");
-    result = stbi_info_from_file(f, x, y, comp);
-    fclose(f);
-    return result;
-}
-
-STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
-{
-   int r;
-   stbi__context s;
-   long pos = ftell(f);
-   stbi__start_file(&s, f);
-   r = stbi__info_main(&s,x,y,comp);
-   fseek(f,pos,SEEK_SET);
-   return r;
-}
-
-STBIDEF int stbi_is_16_bit(char const *filename)
-{
-    FILE *f = stbi__fopen(filename, "rb");
-    int result;
-    if (!f) return stbi__err("can't fopen", "Unable to open file");
-    result = stbi_is_16_bit_from_file(f);
-    fclose(f);
-    return result;
-}
-
-STBIDEF int stbi_is_16_bit_from_file(FILE *f)
-{
-   int r;
-   stbi__context s;
-   long pos = ftell(f);
-   stbi__start_file(&s, f);
-   r = stbi__is_16_main(&s);
-   fseek(f,pos,SEEK_SET);
-   return r;
-}
-#endif // !STBI_NO_STDIO
-
-STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
-{
-   stbi__context s;
-   stbi__start_mem(&s,buffer,len);
-   return stbi__info_main(&s,x,y,comp);
-}
-
-STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
-{
-   stbi__context s;
-   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
-   return stbi__info_main(&s,x,y,comp);
-}
-
-STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
-{
-   stbi__context s;
-   stbi__start_mem(&s,buffer,len);
-   return stbi__is_16_main(&s);
-}
-
-STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
-{
-   stbi__context s;
-   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
-   return stbi__is_16_main(&s);
-}
-
-#endif // STB_IMAGE_IMPLEMENTATION
-
-/*
-   revision history:
-      2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
-      2.19  (2018-02-11) fix warning
-      2.18  (2018-01-30) fix warnings
-      2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
-                         1-bit BMP
-                         *_is_16_bit api
-                         avoid warnings
-      2.16  (2017-07-23) all functions have 16-bit variants;
-                         STBI_NO_STDIO works again;
-                         compilation fixes;
-                         fix rounding in unpremultiply;
-                         optimize vertical flip;
-                         disable raw_len validation;
-                         documentation fixes
-      2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
-                         warning fixes; disable run-time SSE detection on gcc;
-                         uniform handling of optional "return" values;
-                         thread-safe initialization of zlib tables
-      2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
-      2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
-      2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
-      2.11  (2016-04-02) allocate large structures on the stack
-                         remove white matting for transparent PSD
-                         fix reported channel count for PNG & BMP
-                         re-enable SSE2 in non-gcc 64-bit
-                         support RGB-formatted JPEG
-                         read 16-bit PNGs (only as 8-bit)
-      2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
-      2.09  (2016-01-16) allow comments in PNM files
-                         16-bit-per-pixel TGA (not bit-per-component)
-                         info() for TGA could break due to .hdr handling
-                         info() for BMP to shares code instead of sloppy parse
-                         can use STBI_REALLOC_SIZED if allocator doesn't support realloc
-                         code cleanup
-      2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
-      2.07  (2015-09-13) fix compiler warnings
-                         partial animated GIF support
-                         limited 16-bpc PSD support
-                         #ifdef unused functions
-                         bug with < 92 byte PIC,PNM,HDR,TGA
-      2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
-      2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
-      2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
-      2.03  (2015-04-12) extra corruption checking (mmozeiko)
-                         stbi_set_flip_vertically_on_load (nguillemot)
-                         fix NEON support; fix mingw support
-      2.02  (2015-01-19) fix incorrect assert, fix warning
-      2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
-      2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
-      2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
-                         progressive JPEG (stb)
-                         PGM/PPM support (Ken Miller)
-                         STBI_MALLOC,STBI_REALLOC,STBI_FREE
-                         GIF bugfix -- seemingly never worked
-                         STBI_NO_*, STBI_ONLY_*
-      1.48  (2014-12-14) fix incorrectly-named assert()
-      1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
-                         optimize PNG (ryg)
-                         fix bug in interlaced PNG with user-specified channel count (stb)
-      1.46  (2014-08-26)
-              fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
-      1.45  (2014-08-16)
-              fix MSVC-ARM internal compiler error by wrapping malloc
-      1.44  (2014-08-07)
-              various warning fixes from Ronny Chevalier
-      1.43  (2014-07-15)
-              fix MSVC-only compiler problem in code changed in 1.42
-      1.42  (2014-07-09)
-              don't define _CRT_SECURE_NO_WARNINGS (affects user code)
-              fixes to stbi__cleanup_jpeg path
-              added STBI_ASSERT to avoid requiring assert.h
-      1.41  (2014-06-25)
-              fix search&replace from 1.36 that messed up comments/error messages
-      1.40  (2014-06-22)
-              fix gcc struct-initialization warning
-      1.39  (2014-06-15)
-              fix to TGA optimization when req_comp != number of components in TGA;
-              fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
-              add support for BMP version 5 (more ignored fields)
-      1.38  (2014-06-06)
-              suppress MSVC warnings on integer casts truncating values
-              fix accidental rename of 'skip' field of I/O
-      1.37  (2014-06-04)
-              remove duplicate typedef
-      1.36  (2014-06-03)
-              convert to header file single-file library
-              if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
-      1.35  (2014-05-27)
-              various warnings
-              fix broken STBI_SIMD path
-              fix bug where stbi_load_from_file no longer left file pointer in correct place
-              fix broken non-easy path for 32-bit BMP (possibly never used)
-              TGA optimization by Arseny Kapoulkine
-      1.34  (unknown)
-              use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
-      1.33  (2011-07-14)
-              make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
-      1.32  (2011-07-13)
-              support for "info" function for all supported filetypes (SpartanJ)
-      1.31  (2011-06-20)
-              a few more leak fixes, bug in PNG handling (SpartanJ)
-      1.30  (2011-06-11)
-              added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
-              removed deprecated format-specific test/load functions
-              removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
-              error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
-              fix inefficiency in decoding 32-bit BMP (David Woo)
-      1.29  (2010-08-16)
-              various warning fixes from Aurelien Pocheville
-      1.28  (2010-08-01)
-              fix bug in GIF palette transparency (SpartanJ)
-      1.27  (2010-08-01)
-              cast-to-stbi_uc to fix warnings
-      1.26  (2010-07-24)
-              fix bug in file buffering for PNG reported by SpartanJ
-      1.25  (2010-07-17)
-              refix trans_data warning (Won Chun)
-      1.24  (2010-07-12)
-              perf improvements reading from files on platforms with lock-heavy fgetc()
-              minor perf improvements for jpeg
-              deprecated type-specific functions so we'll get feedback if they're needed
-              attempt to fix trans_data warning (Won Chun)
-      1.23    fixed bug in iPhone support
-      1.22  (2010-07-10)
-              removed image *writing* support
-              stbi_info support from Jetro Lauha
-              GIF support from Jean-Marc Lienher
-              iPhone PNG-extensions from James Brown
-              warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
-      1.21    fix use of 'stbi_uc' in header (reported by jon blow)
-      1.20    added support for Softimage PIC, by Tom Seddon
-      1.19    bug in interlaced PNG corruption check (found by ryg)
-      1.18  (2008-08-02)
-              fix a threading bug (local mutable static)
-      1.17    support interlaced PNG
-      1.16    major bugfix - stbi__convert_format converted one too many pixels
-      1.15    initialize some fields for thread safety
-      1.14    fix threadsafe conversion bug
-              header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
-      1.13    threadsafe
-      1.12    const qualifiers in the API
-      1.11    Support installable IDCT, colorspace conversion routines
-      1.10    Fixes for 64-bit (don't use "unsigned long")
-              optimized upsampling by Fabian "ryg" Giesen
-      1.09    Fix format-conversion for PSD code (bad global variables!)
-      1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
-      1.07    attempt to fix C++ warning/errors again
-      1.06    attempt to fix C++ warning/errors again
-      1.05    fix TGA loading to return correct *comp and use good luminance calc
-      1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
-      1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
-      1.02    support for (subset of) HDR files, float interface for preferred access to them
-      1.01    fix bug: possible bug in handling right-side up bmps... not sure
-              fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
-      1.00    interface to zlib that skips zlib header
-      0.99    correct handling of alpha in palette
-      0.98    TGA loader by lonesock; dynamically add loaders (untested)
-      0.97    jpeg errors on too large a file; also catch another malloc failure
-      0.96    fix detection of invalid v value - particleman@mollyrocket forum
-      0.95    during header scan, seek to markers in case of padding
-      0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
-      0.93    handle jpegtran output; verbose errors
-      0.92    read 4,8,16,24,32-bit BMP files of several formats
-      0.91    output 24-bit Windows 3.0 BMP files
-      0.90    fix a few more warnings; bump version number to approach 1.0
-      0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
-      0.60    fix compiling as c++
-      0.59    fix warnings: merge Dave Moore's -Wall fixes
-      0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
-      0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
-      0.56    fix bug: zlib uncompressed mode len vs. nlen
-      0.55    fix bug: restart_interval not initialized to 0
-      0.54    allow NULL for 'int *comp'
-      0.53    fix bug in png 3->4; speedup png decoding
-      0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
-      0.51    obey req_comp requests, 1-component jpegs return as 1-component,
-              on 'test' only check type, not whether we support this variant
-      0.50  (2006-11-19)
-              first released version
-*/
-
-
-/*
-------------------------------------------------------------------------------
-This software is available under 2 licenses -- choose whichever you prefer.
-------------------------------------------------------------------------------
-ALTERNATIVE A - MIT License
-Copyright (c) 2017 Sean Barrett
-Permission is hereby granted, free of charge, to any person obtaining a copy of
-this software and associated documentation files (the "Software"), to deal in
-the Software without restriction, including without limitation the rights to
-use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
-of the Software, and to permit persons to whom the Software is furnished to do
-so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
-------------------------------------------------------------------------------
-ALTERNATIVE B - Public Domain (www.unlicense.org)
-This is free and unencumbered software released into the public domain.
-Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
-software, either in source code form or as a compiled binary, for any purpose,
-commercial or non-commercial, and by any means.
-In jurisdictions that recognize copyright laws, the author or authors of this
-software dedicate any and all copyright interest in the software to the public
-domain. We make this dedication for the benefit of the public at large and to
-the detriment of our heirs and successors. We intend this dedication to be an
-overt act of relinquishment in perpetuity of all present and future rights to
-this software under copyright law.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
-ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
-WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
-------------------------------------------------------------------------------
-*/
diff --git a/modules/stb_image/windows/stb_image.dll b/modules/stb_image/windows/stb_image.dll
deleted file mode 100644
index f27c3c3..0000000
Binary files a/modules/stb_image/windows/stb_image.dll and /dev/null differ
diff --git a/modules/stb_image/windows/stb_image.lib b/modules/stb_image/windows/stb_image.lib
deleted file mode 100644
index aa89310..0000000
Binary files a/modules/stb_image/windows/stb_image.lib and /dev/null differ
diff --git a/src/audio/analysis.jai b/src/audio/analysis.jai
index 39cb7cb..8ff09e6 100644
--- a/src/audio/analysis.jai
+++ b/src/audio/analysis.jai
@@ -1,14 +1,14 @@
 //
 // Real spectrum analysis driving the visualizer.
 //
-// Since we now play OGG_COMPRESSED (Sound_Player has no public PCM output),
-// we maintain a separate stb_vorbis decoder in app.analysis_vorbis. Each
-// frame we seek it to the current play_cursor position and decode FFT_SIZE
-// samples for the FFT. Forward seeks (normal playback) are near-instant since
-// they just continue from the current bitstream position.
+// We read directly from the same s16 PCM buffer Sound_Player is mixing from
+// (sd.samples), centered on play_cursor. No second decoder, no seeking, no
+// drift between bars and audio.
 //
-// play_cursor is in "virtual 44100 Hz sample" units (Sound_Player default);
-// we convert to OGG sample position using the vorbis file's actual rate.
+// Sound_Player advances play_cursor at sd.sampling_rate * current_rate per
+// second, in *frame* units (not interleaved samples). To pull FFT_SIZE
+// frames from sd.samples (which is interleaved), index by
+// frame_index * nchannels.
 //
 
 MIN_VISUAL_FREQ :: 30.0;
@@ -21,49 +21,30 @@ update_audio_analysis :: () {
         return;
     }
 
-    if !app.analysis_vorbis {
+    sd := app.current_stream.sound_data;
+    if sd.type != .LINEAR_SAMPLE_ARRAY || !sd.samples || sd.sampling_rate == 0 || sd.nchannels == 0 {
         decay_spectrum(0.10);
         return;
     }
 
-    vorbis := cast(*Stb_Vorbis.stb_vorbis) app.analysis_vorbis;
-    info   := Stb_Vorbis.stb_vorbis_get_info(vorbis);
-    if info.sample_rate == 0 || info.channels == 0 {
-        decay_spectrum(0.10);
-        return;
-    }
+    nch          := cast(s64) sd.nchannels;
+    total_frames := sd.nsamples_times_nchannels / nch;
 
-    // Convert play_cursor to real elapsed seconds, then to OGG sample position.
-    // play_cursor advances at play_rate * current_rate per second, where
-    // current_rate = play_rate / ogg_rate (set by make_stream). So:
-    //   time_s  = play_cursor / (play_rate * current_rate)
-    //   ogg_pos = time_s * ogg_rate
-    sd           := app.current_stream.sound_data;
-    play_rate    := cast(float) sd.sampling_rate;     // 44100 default for OGG_COMPRESSED
-    ogg_rate     := cast(float) info.sample_rate;     // actual OGG sample rate
-    current_rate := app.current_stream.current_rate;
-    if current_rate <= 0  { decay_spectrum(0.10); return; }
-    time_s     := app.current_stream.play_cursor / (play_rate * current_rate);
-    ogg_frame  := cast(s64)(time_s * ogg_rate);
-    seek_frame := max(0, ogg_frame - FFT_SIZE / 2);
-
-    Stb_Vorbis.stb_vorbis_seek_frame(vorbis, cast(u32) seek_frame);
-
-    nch     := cast(s32) min(info.channels, 2);
-    decoded := Stb_Vorbis.stb_vorbis_get_samples_short_interleaved(
-        vorbis, nch, analysis_pcm_buf.data, FFT_SIZE * nch);
-
-    if decoded < FFT_SIZE / 4 {
+    cursor_frame := cast(s64) app.current_stream.play_cursor;
+    start_frame  := cursor_frame - FFT_SIZE / 2;
+    if start_frame < 0  start_frame = 0;
+    if start_frame + FFT_SIZE > total_frames  start_frame = total_frames - FFT_SIZE;
+    if start_frame < 0 {
         decay_spectrum(0.10);
         return;
     }
 
     // Mix to mono and apply Hann window.
     inv_chan := 1.0 / cast(float) nch;
+    base     := start_frame * nch;
     for k: 0..FFT_SIZE-1 {
-        if k >= decoded { fft_re[k] = 0; fft_im[k] = 0; continue; }
         sum: float = 0;
-        for ch: 0..nch-1  sum += cast(float) analysis_pcm_buf[k * nch + ch];
+        for ch: 0..nch-1  sum += cast(float) sd.samples[base + k * nch + ch];
         mono     := sum * inv_chan / 32768.0;
         fft_re[k] = mono * fft_window[k];
         fft_im[k] = 0;
@@ -71,7 +52,7 @@ update_audio_analysis :: () {
 
     fft();
 
-    rate     := ogg_rate;
+    rate     := cast(float) sd.sampling_rate;
     nyquist  := rate * 0.5;
     log_lo   := log(MIN_VISUAL_FREQ);
     log_hi   := log(nyquist);
@@ -105,8 +86,6 @@ update_audio_analysis :: () {
 
 #scope_file
 
-analysis_pcm_buf: [FFT_SIZE * 2] s16;
-
 decay_spectrum :: (rate: float) {
     for 0..SPECTRUM_BINS-1 {
         app.spectrum[it] = lerp(app.spectrum[it], 0, rate);
diff --git a/src/audio/decoders.jai b/src/audio/decoders.jai
index 8d25fc1..9edc8a6 100644
--- a/src/audio/decoders.jai
+++ b/src/audio/decoders.jai
@@ -1,110 +1,78 @@
 //
-// Format-aware decoder. Detects MP3 / FLAC / OGG / WAV from the leading bytes
-// and routes through dr_mp3/dr_flac (vendored, modules/audio_decoders) for the
-// formats Sound_Player doesn't natively handle.
+// OGG Vorbis decoder. Jellyfin always serves us /universal as OGG, so this
+// is the single decode path. Returns Sound_Data with type LINEAR_SAMPLE_ARRAY
+// — same buffer the visualizer reads via sound_data.samples (no parallel
+// decoder, no cursor desync).
 //
-// Native formats (OGG, WAV) go through Sound.load_audio_data unchanged.
-// MP3/FLAC are decoded to interleaved s16 PCM and wrapped in a Sound_Data
-// with type = .LINEAR_SAMPLE_ARRAY.
+// We do NOT use stb_vorbis_decode_memory: that malloc()s the output buffer
+// with libc's allocator, and Sound_Player's release_asset path frees it
+// with Jai's allocator → SEGV at song end. Instead we open the stream,
+// allocate the PCM buffer ourselves, and decode into it. Same allocator on
+// both ends.
 //
 
-Audio_Format :: enum {
-    UNKNOWN;
-    OGG;
-    WAV;
-    MP3;
-    FLAC;
-}
+decode_ogg :: (bytes: string, name: string) -> Sound.Sound_Data, bool {
+    err: s32;
+    v := Stb_Vorbis.stb_vorbis_open_memory(bytes.data, cast(s32) bytes.count, *err, null);
+    if !v {
+        log_error("decode_ogg: open failed for '%' (err=%)", name, err);
+        return .{}, false;
+    }
+    defer Stb_Vorbis.stb_vorbis_close(v);
 
-detect_audio_format :: (bytes: string) -> Audio_Format {
-    if bytes.count < 4  return .UNKNOWN;
-
-    b := bytes.data;
-    // MP3: ID3 tag
-    if b[0] == #char "I" && b[1] == #char "D" && b[2] == #char "3"   return .MP3;
-    // MP3: frame sync 0xFFEx (also 0xFFFx for MPEG-1, 0xFFEx for MPEG-2)
-    if b[0] == 0xFF && (b[1] & 0xE0) == 0xE0                          return .MP3;
-    // FLAC: "fLaC"
-    if b[0] == #char "f" && b[1] == #char "L" && b[2] == #char "a" && b[3] == #char "C"  return .FLAC;
-    // OGG: "OggS"
-    if b[0] == #char "O" && b[1] == #char "g" && b[2] == #char "g" && b[3] == #char "S"  return .OGG;
-    // WAV: "RIFF"
-    if b[0] == #char "R" && b[1] == #char "I" && b[2] == #char "F" && b[3] == #char "F"  return .WAV;
-
-    return .UNKNOWN;
-}
-
-//
-// Decode `bytes` into a Sound_Data ready to feed to Sound.make_stream().
-//
-// Ownership notes:
-// - On the OGG/WAV path, Sound_Player keeps `bytes` alive via Sound_Data.buffer.
-// - On the MP3/FLAC path, we call decoder_free on the original bytes immediately
-//   (we have decoded to a separate s16 PCM buffer); the Sound_Data references
-//   the dr_libs-malloc'd PCM buffer instead.
-//
-// TODO: free Sound_Data + its buffer in the release_asset path. Today we leak.
-//
-decode_audio :: (bytes: string, name: string) -> Sound.Sound_Data, Audio_Format, bool {
-    format := detect_audio_format(bytes);
-    log_info("decode_audio: '%' bytes=% format=%", name, bytes.count, format);
-
-    if format == .OGG || format == .WAV {
-        result := Sound.load_audio_data(name, bytes);
-        return result, format, result.loaded;
+    info := Stb_Vorbis.stb_vorbis_get_info(v);
+    nch  := cast(s64) info.channels;
+    if nch <= 0 || info.sample_rate == 0 {
+        log_error("decode_ogg: bad info for '%' (ch=%, rate=%)", name, info.channels, info.sample_rate);
+        return .{}, false;
     }
 
-    if format == .MP3 || format == .FLAC {
-        channels:     u32;
-        sample_rate:  u32;
-        total_frames: u64;
-
-        samples: *s16;
-        if format == .MP3 {
-            samples = Audio_Decoders.decode_mp3(
-                bytes.data, cast(u64) bytes.count,
-                *channels, *sample_rate, *total_frames);
-        } else {
-            samples = Audio_Decoders.decode_flac(
-                bytes.data, cast(u64) bytes.count,
-                *channels, *sample_rate, *total_frames);
-        }
-
-        if !samples {
-            log_error("decode_audio: decoder failed for '%'", name);
-            return .{}, format, false;
-        }
-
-        // Source bytes are no longer needed.
-        free(bytes);
-
-        total_samples := total_frames * cast(u64) channels;
-
-        result: Sound.Sound_Data;
-        result.name                     = copy_string(name);
-        result.loaded                   = true;
-        result.type                     = .LINEAR_SAMPLE_ARRAY;
-        result.nchannels                = cast(u16) channels;
-        result.sampling_rate            = sample_rate;
-        result.nsamples_times_nchannels = cast(s64) total_samples;
-        result.samples                  = samples;
-        result.buffer.count             = cast(s64) (total_samples * size_of(s16));
-        result.buffer.data              = cast(*u8) samples;
-
-        log_info("decode_audio: % ch=%, rate=%, frames=%", format, channels, sample_rate, total_frames);
-        return result, format, true;
+    total_frames := cast(s64) Stb_Vorbis.stb_vorbis_stream_length_in_samples(v);
+    if total_frames <= 0 {
+        log_error("decode_ogg: zero-length stream for '%'", name);
+        return .{}, false;
     }
 
-    if bytes.count >= 8 && bytes.data {
-        b := bytes.data;
-        log_error("decode_audio: unknown format for '%' (% bytes; first 8: %x %x %x %x %x %x %x %x)",
-                  name, bytes.count,
-                  formatInt(b[0], base=16), formatInt(b[1], base=16),
-                  formatInt(b[2], base=16), formatInt(b[3], base=16),
-                  formatInt(b[4], base=16), formatInt(b[5], base=16),
-                  formatInt(b[6], base=16), formatInt(b[7], base=16));
-    } else {
-        log_error("decode_audio: unknown format for '%' (% bytes — too short or null)", name, bytes.count);
+    total_samples := total_frames * nch;
+    samples := cast(*s16) alloc(total_samples * size_of(s16));
+    if !samples {
+        log_error("decode_ogg: alloc failed for '%'", name);
+        return .{}, false;
     }
-    return .{}, format, false;
+
+    // stb_vorbis sometimes hands back fewer frames than the header advertised
+    // (chained / corrupt streams), so loop until it returns 0 and use what
+    // we got.
+    decoded_frames: s64 = 0;
+    while decoded_frames < total_frames {
+        remaining_shorts := cast(s32) ((total_frames - decoded_frames) * nch);
+        n := Stb_Vorbis.stb_vorbis_get_samples_short_interleaved(
+            v, cast(s32) nch,
+            samples + decoded_frames * nch,
+            remaining_shorts);
+        if n <= 0  break;
+        decoded_frames += n;
+    }
+
+    if decoded_frames == 0 {
+        log_error("decode_ogg: no frames decoded for '%'", name);
+        free(samples);
+        return .{}, false;
+    }
+
+    actual_samples := decoded_frames * nch;
+
+    sd: Sound.Sound_Data;
+    sd.name                     = copy_string(name);
+    sd.loaded                   = true;
+    sd.type                     = .LINEAR_SAMPLE_ARRAY;
+    sd.nchannels                = cast(u16) nch;
+    sd.sampling_rate            = cast(u32) info.sample_rate;
+    sd.nsamples_times_nchannels = actual_samples;
+    sd.samples                  = samples;
+    sd.buffer.count             = actual_samples * size_of(s16);
+    sd.buffer.data              = cast(*u8) samples;
+
+    log_info("decode_ogg: '%' ch=%, rate=%, frames=%", name, nch, info.sample_rate, decoded_frames);
+    return sd, true;
 }
diff --git a/src/audio/index.jai b/src/audio/index.jai
index 9f26022..db34159 100644
--- a/src/audio/index.jai
+++ b/src/audio/index.jai
@@ -3,3 +3,4 @@
 #load "queue.jai";
 #load "fft.jai";
 #load "analysis.jai";
+#load "media_controls.jai";
diff --git a/src/audio/media_controls.jai b/src/audio/media_controls.jai
new file mode 100644
index 0000000..ed5b5fd
--- /dev/null
+++ b/src/audio/media_controls.jai
@@ -0,0 +1,154 @@
+//
+// Platform media controls integration.
+//
+// Generic API used throughout the codebase. Platform backends live below:
+//   Linux  — MPRIS2 via D-Bus (jai-mpris)
+//   macOS  — stub (add MPNowPlayingInfoCenter bindings here when needed)
+//
+// Call sites use media_controls_* unconditionally; the procs no-op on
+// platforms with no backend yet.
+//
+
+media_controls_init :: () {
+    #if OS == .LINUX   _mc_linux_init();
+}
+
+media_controls_shutdown :: () {
+    #if OS == .LINUX   _mc_linux_shutdown();
+}
+
+// Call once per frame from the main loop; dispatches pending D-Bus messages
+// and acts on any remote commands (next/prev/play-pause/seek).
+media_controls_pump :: () {
+    #if OS == .LINUX   _mc_linux_pump();
+}
+
+// Call after a new track starts playing.
+media_controls_notify_track :: () {
+    #if OS == .LINUX   _mc_linux_notify_track();
+}
+
+// Call after pause/unpause state changes.
+media_controls_notify_status :: () {
+    #if OS == .LINUX   _mc_linux_notify_status();
+}
+
+// ── Linux / MPRIS2 ───────────────────────────────────────────────────────────
+
+#if OS == .LINUX {
+
+Mpris :: #import "jai-mpris";
+
+#scope_file
+
+_mpris: Mpris.Mpris_Player;
+
+// Flags set by #c_call callbacks, read + cleared in _mc_linux_pump().
+_want_play_pause  : bool;
+_want_next        : bool;
+_want_prev        : bool;
+_want_stop        : bool;
+_seek_offset_us   : s64;
+_seek_pending     : bool;
+_set_pos_us       : s64;
+_set_pos_pending  : bool;
+
+_cb_play_pause :: (ud: *void) #c_call { _want_play_pause = true; }
+_cb_next       :: (ud: *void) #c_call { _want_next       = true; }
+_cb_prev       :: (ud: *void) #c_call { _want_prev       = true; }
+_cb_stop       :: (ud: *void) #c_call { _want_stop       = true; }
+_cb_seek       :: (offset_us: s64, ud: *void) #c_call {
+    _seek_offset_us = offset_us;
+    _seek_pending   = true;
+}
+_cb_set_position :: (track_id: *u8, pos_us: s64, ud: *void) #c_call {
+    _set_pos_us      = pos_us;
+    _set_pos_pending = true;
+}
+
+_mc_linux_init :: () {
+    _mpris = Mpris.mpris_player_create("CelicaPlayer", "Celica");
+    if !_mpris {
+        log_warn("media_controls: MPRIS registration failed");
+        return;
+    }
+    Mpris.mpris_on_play_pause  (_mpris, _cb_play_pause,   null);
+    Mpris.mpris_on_play        (_mpris, _cb_play_pause,   null);
+    Mpris.mpris_on_pause       (_mpris, _cb_play_pause,   null);
+    Mpris.mpris_on_next        (_mpris, _cb_next,         null);
+    Mpris.mpris_on_previous    (_mpris, _cb_prev,         null);
+    Mpris.mpris_on_stop        (_mpris, _cb_stop,         null);
+    Mpris.mpris_on_seek        (_mpris, _cb_seek,         null);
+    Mpris.mpris_on_set_position(_mpris, _cb_set_position, null);
+
+    Mpris.mpris_set_can_play       (_mpris, true);
+    Mpris.mpris_set_can_pause      (_mpris, true);
+    Mpris.mpris_set_can_go_next    (_mpris, true);
+    Mpris.mpris_set_can_go_previous(_mpris, true);
+    Mpris.mpris_set_can_seek       (_mpris, true);
+    Mpris.mpris_set_playback_status(_mpris, "Stopped");
+    log_info("media_controls: MPRIS registered as org.mpris.MediaPlayer2.CelicaPlayer");
+}
+
+_mc_linux_shutdown :: () {
+    if !_mpris  return;
+    Mpris.mpris_player_destroy(_mpris);
+    _mpris = null;
+}
+
+_mc_linux_pump :: () {
+    if !_mpris  return;
+
+    Mpris.mpris_process(_mpris);
+
+    if _want_play_pause {
+        _want_play_pause = false;
+        audio_toggle_pause();
+        _mc_linux_notify_status();
+    }
+    if _want_next { _want_next = false; queue_next(); }
+    if _want_prev { _want_prev = false; queue_prev(); }
+    if _want_stop {
+        _want_stop = false;
+        stop_current_stream();
+        Mpris.mpris_set_playback_status(_mpris, "Stopped");
+    }
+    if _seek_pending {
+        _seek_pending = false;
+        if app.current_stream && app.current_stream.sound_data {
+            rate   := cast(float64) app.current_stream.sound_data.sampling_rate;
+            cur_us := cast(s64)(app.current_stream.play_cursor / rate * 1_000_000.0);
+            audio_seek_seconds(cast(float)(cur_us + _seek_offset_us) / 1_000_000.0);
+        }
+    }
+    if _set_pos_pending {
+        _set_pos_pending = false;
+        audio_seek_seconds(cast(float) _set_pos_us / 1_000_000.0);
+    }
+
+    // Keep MPRIS playhead in sync every frame.
+    if app.current_stream && app.current_stream.sound_data {
+        rate   := cast(float64) app.current_stream.sound_data.sampling_rate;
+        pos_us := cast(s64)(app.current_stream.play_cursor / rate * 1_000_000.0);
+        Mpris.mpris_set_position(_mpris, pos_us);
+    }
+}
+
+_mc_linux_notify_track :: () {
+    if !_mpris  return;
+    meta: Mpris.Mpris_Metadata;
+    meta.title     = app.current_track.name;
+    meta.artist    = app.current_track.artist;
+    meta.album     = app.current_track.album;
+    // Jellyfin duration_ticks are 100-ns intervals; MPRIS wants microseconds.
+    meta.length_us = app.current_track.duration_ticks / 10;
+    Mpris.mpris_set_metadata(_mpris, meta);
+    Mpris.mpris_set_playback_status(_mpris, "Playing");
+}
+
+_mc_linux_notify_status :: () {
+    if !_mpris  return;
+    Mpris.mpris_set_playback_status(_mpris, ifx app.paused then "Paused" else "Playing");
+}
+
+} // #if OS == .LINUX
diff --git a/src/audio/player.jai b/src/audio/player.jai
index 75f63e6..32d25f0 100644
--- a/src/audio/player.jai
+++ b/src/audio/player.jai
@@ -1,12 +1,13 @@
 //
 // Audio playback. Downloads the track as OGG Vorbis (Jellyfin transcodes
-// everything server-side) and hands it to Sound_Player's native OGG path.
+// everything server-side), decodes locally to s16 PCM via stb_vorbis, and
+// hands the LINEAR_SAMPLE_ARRAY to Sound_Player. The visualizer reads from
+// the same sd.samples buffer — single source of truth, no parallel decoder,
+// no cursor desync.
 //
-// OGG_COMPRESSED streams from a decoder rather than decoding the full PCM
-// into memory, so there is no large heap buffer to manage — Sound_Player
-// owns the decoder lifecycle. The only memory we own is the heap-allocated
-// Sound_Data struct and the raw OGG bytes (stored in sound_data.buffer);
-// both are freed in sound_release_callback when Sound_Player is done.
+// We own the Sound_Data struct, the stb_vorbis-malloc'd PCM buffer (held in
+// sd.samples / sd.buffer), and the Track copy. All freed in
+// sound_release_callback when Sound_Player is finished with the stream.
 //
 
 audio_play_track :: (track: Track) {
@@ -25,7 +26,7 @@ audio_play_track :: (track: Track) {
     if output_hz <= 0  output_hz = 44100;
     path := tprint(
         "/Audio/%/universal?container=ogg&audioCodec=vorbis&maxStreamingBitrate=192000&audioSampleRate=%&audioChannels=2&userId=%&deviceId=%&api_key=%",
-        track.id, output_hz, app.jellyfin.user_id, DEVICE_ID, app.jellyfin.auth_token,
+        track.id, output_hz, app.jellyfin.user_id, app.jellyfin.device_id, app.jellyfin.auth_token,
     );
     log_info("audio: downloading '%' (% ticks)", track.name, track.duration_ticks);
     http_submit("GET", path, on_done=on_track_downloaded, user_data=pending);
@@ -44,6 +45,7 @@ audio_toggle_pause :: () {
         s.desired_rate = 1;
         s.inaudible    = false;
     }
+    media_controls_notify_status();
 }
 
 audio_is_paused :: () -> bool {
@@ -64,15 +66,6 @@ stop_current_stream :: () {
     app.current_stream = null;
 }
 
-analysis_close :: () {
-    if app.analysis_vorbis {
-        Stb_Vorbis.stb_vorbis_close(cast(*Stb_Vorbis.stb_vorbis) app.analysis_vorbis);
-        app.analysis_vorbis = null;
-    }
-    free(app.analysis_ogg);
-    app.analysis_ogg = "";
-}
-
 // Called by Sound_Player when it is finished with a stream (natural end or
 // stop_stream_abruptly). We own the Sound_Data struct and the OGG bytes in
 // sound_data.buffer; free both here.
@@ -122,36 +115,19 @@ on_track_downloaded :: (task: *Http_Task) {
         return;
     }
 
-    // Transfer ownership of the bytes to Sound_Data.buffer.
     bytes := task.response.body;
     task.response.body = "";
+    defer free(bytes);  // stb_vorbis_decode_memory copies what it needs
 
-    sd := Sound.load_audio_data(pending.name, bytes);
-    if !sd.loaded {
-        log_error("audio: OGG decode failed for '%'", pending.name);
-        free(bytes);
-        return;
-    }
+    sd, ok := decode_ogg(bytes, pending.name);
+    if !ok  return;
 
     data := New(Sound.Sound_Data);
     data.* = sd;
 
-    // Open a separate OGG decoder for the visualizer FFT before we transfer
-    // the bytes to Sound_Player (bytes is still valid at this point).
-    analysis_close();
-    app.analysis_ogg = copy_string(bytes);  // independent copy — Sound_Player owns `bytes`
-    {
-        err: s32;
-        v := Stb_Vorbis.stb_vorbis_open_memory(
-            app.analysis_ogg.data, cast(s32) app.analysis_ogg.count, *err, null);
-        if v  app.analysis_vorbis = v;
-        else  log_warn("audio: analysis decoder open failed (err=%)", err);
-    }
-
     stop_current_stream();
     free_track(*app.current_track);
-    app.current_track  = clone_track(pending.*);
-    app.current_format = .OGG;
+    app.current_track = clone_track(pending.*);
     app.paused = false;
 
     app.track_entity_id += 1;
@@ -161,6 +137,7 @@ on_track_downloaded :: (task: *Http_Task) {
 
     app.current_stream = stream;
     app.track_finished = false;
-    log_info("audio: playing '% — %' [OGG] artist_id=%",
+    media_controls_notify_track();
+    log_info("audio: playing '% — %' artist_id=%",
         app.current_track.artist, app.current_track.name, app.current_track.artist_id);
 }
diff --git a/src/core/app.jai b/src/core/app.jai
index 6c8c3a5..636cc0f 100644
--- a/src/core/app.jai
+++ b/src/core/app.jai
@@ -38,7 +38,6 @@ App :: struct {
     audio_inited:    bool;
     current_stream:  *Sound.Sound_Stream;
     current_track:   Track;
-    current_format:  Audio_Format;   // format of the bytes we played
     track_finished:  bool;            // set by Sound_Player release_asset callback
     track_entity_id: s64;             // monotonic — identifies the active stream
 
@@ -46,14 +45,11 @@ App :: struct {
     master_volume:   float = 1.0;     // 0..1; persisted in config.json
     scrub_seconds:   float;           // bound to the seek slider in now_playing_view
 
-    // Per-frame audio analysis used by the visualizer shader.
+    // Per-frame audio analysis used by the visualizer shader. Reads
+    // current_stream.sound_data.samples around play_cursor — same buffer
+    // Sound_Player is mixing from, so the bars stay perfectly in sync.
     spectrum: [SPECTRUM_BINS] float;
 
-    // Separate OGG decoder kept open for the visualizer FFT. Sound_Player's
-    // OGG_COMPRESSED type gives no PCM access, so we maintain our own handle.
-    analysis_vorbis: *void;   // *Stb_Vorbis.stb_vorbis, void to avoid header dep
-    analysis_ogg:    string;  // copy of OGG bytes kept alive for the decoder
-
     // Cover art palette — extracted once per album, used for theming.
     palette:          [4] Vector4;   // [0]=bg  [1]=highlight  [2]=mid  [3]=accent
     palette_ready:    bool;
@@ -70,6 +66,12 @@ app: App;
 app_init :: () {
     log_info("player starting up");
 
+    // Random seed feeds device_id generation on first run; left unseeded,
+    // every install would mint the same id and the first-token-wins rule
+    // (one active token per DeviceId) would chain-revoke.
+    ns, _ := to_nanoseconds(current_time_monotonic());
+    random_seed(cast(u64) ns);
+
     setup_data_directory();
 
     app.window = create_window(app.window_width, app.window_height, "Celica");
@@ -96,13 +98,20 @@ app_init :: () {
     app.jellyfin.username   = copy_string("");
     app.jellyfin.password   = copy_string("");
 
-    if config_load() {
-        app.current_view = .LIBRARY;
-        library_refresh_artists();
+    loaded := config_load();
+    ensure_device_id();
+    media_controls_init();
+
+    if loaded {
+        // Don't trust the saved token on faith — the server may have
+        // revoked it. Probe /System/Info; on_validate_session decides
+        // whether to land in LIBRARY or LOGIN.
+        jellyfin_validate_session_async();
     }
 }
 
 app_shutdown :: () {
+    media_controls_shutdown();
     if app.audio_inited  Sound.sound_player_shutdown();
     jellyfin_client_shutdown(*app.jellyfin);
     log_info("bye");
diff --git a/src/core/config.jai b/src/core/config.jai
index e2dbc32..d3a3b3a 100644
--- a/src/core/config.jai
+++ b/src/core/config.jai
@@ -5,6 +5,9 @@
 // We persist:
 // - server URL / username / auth token / user id (so subsequent launches
 //   skip the login screen entirely)
+// - device_id: a stable per-install id. Jellyfin permits only one active
+//   token per DeviceId, so re-using a hardcoded constant means any second
+//   instance silently revokes the first instance's saved token.
 // - master volume (so the user doesn't have to re-set it every time)
 //
 // Password is never persisted.
@@ -15,6 +18,7 @@ Persisted_Config :: struct {
     username:      string;
     auth_token:    string;
     user_id:       string;
+    device_id:     string;
     master_volume: float = 1.0;
 }
 
@@ -37,6 +41,7 @@ config_save :: () {
     cfg.username      = app.jellyfin.username;
     cfg.auth_token    = app.jellyfin.auth_token;
     cfg.user_id       = app.jellyfin.user_id;
+    cfg.device_id     = app.jellyfin.device_id;
     cfg.master_volume = app.master_volume;
 
     json := Jaison.json_write_string(cfg);
@@ -69,6 +74,7 @@ config_load :: () -> bool {
     app.jellyfin.username   = copy_string(cfg.username);
     app.jellyfin.auth_token = copy_string(cfg.auth_token);
     app.jellyfin.user_id    = copy_string(cfg.user_id);
+    app.jellyfin.device_id  = copy_string(cfg.device_id);
 
     if cfg.master_volume > 0  app.master_volume = clamp(cfg.master_volume, 0, 1);
 
@@ -80,8 +86,28 @@ config_load :: () -> bool {
     return false;
 }
 
+// Ensure we have a stable device_id. Called after config_load so first-run
+// installs (or pre-device_id configs migrating in) get one and persist it.
+ensure_device_id :: () {
+    if app.jellyfin.device_id.count > 0  return;
+    app.jellyfin.device_id = generate_device_id();
+    log_info("auth: generated new device_id");
+    config_save();
+}
+
 config_clear :: () {
     p := config_path();
     if !p  return;
     file_delete(p);
 }
+
+#scope_file
+
+generate_device_id :: () -> string {
+    builder: String_Builder;
+    for 0..15 {
+        b := random_get() & 0xff;
+        print_to_builder(*builder, "%", formatInt(b, base=16, minimum_digits=2));
+    }
+    return builder_to_string(*builder);
+}
diff --git a/src/core/imports.jai b/src/core/imports.jai
index 80d5e74..ff43565 100644
--- a/src/core/imports.jai
+++ b/src/core/imports.jai
@@ -14,6 +14,7 @@
 
 #import "Window_Creation";
 #import "GetRect_LeftHanded";
+#import "GL";
 
 Simp  :: #import "Simp";
 Input :: #import "Input";
@@ -21,6 +22,5 @@ Sound :: #import "Sound_Player";
 
 Jaison         :: #import "Jaison";
 Curl           :: #import "Curl"()(LINUX_USE_SYSTEM_LIBRARY=true);
-Audio_Decoders :: #import "audio_decoders";
 Stb_Image      :: #import "stb_image";
 Stb_Vorbis     :: #import "stb_vorbis";
diff --git a/src/core/window.jai b/src/core/window.jai
index 63d89a9..23fe9c2 100644
--- a/src/core/window.jai
+++ b/src/core/window.jai
@@ -24,8 +24,10 @@ run_main_loop :: () {
             getrect_handle_event(event);
         }
 
-        http_pump();   // fire callbacks for any HTTP requests that finished since last frame
-        image_pump();  // hand queued image fetches to http_submit, up to the concurrency cap
+        http_pump();                  // fire callbacks for any HTTP requests that finished since last frame
+        image_pump();                 // hand queued image fetches to http_submit, up to the concurrency cap
+        jellyfin_quick_connect_pump();// drives /QuickConnect/Connect polling at QC_POLL_INTERVAL_S cadence
+        media_controls_pump();        // dispatch D-Bus / OS media key events
 
         if app.audio_inited  Sound.set_master_volume(app.master_volume);
 
diff --git a/src/gfx/shaders.jai b/src/gfx/shaders.jai
index e21b0c9..599f8ef 100644
--- a/src/gfx/shaders.jai
+++ b/src/gfx/shaders.jai
@@ -1,7 +1,8 @@
 //
 // Visualizer backdrop. Reads `app.spectrum` (filled by audio/analysis.jai with
 // a real FFT each frame) and draws a mirrored bar visualizer with a
-// bass-driven background pulse.
+// bass-driven background pulse, plus a particle system that shoots sparks
+// from bar tips and arcs them under gravity.
 //
 // All immediate-mode quads via Simp for now. A real GLSL fragment shader is
 // on the roadmap (ai/todo.md) — once that lands this becomes a fullscreen
@@ -71,7 +72,7 @@ gfx_draw_visualizer_background :: (w: float, h: float, bg_alpha := 1.0) {
             }
             // Modulate brightness by amplitude; dim quiet bars a little.
             bright := 0.45 + 0.55 * v;
-            body = .{c.x * bright, c.y * bright, c.z * bright, 0.65 + 0.35 * v};
+            body = .{c.x * bright, c.y * bright, c.z * bright, 0.85 + 0.15 * v};
         } else {
             body = neon_color(hue, v, t);
         }
@@ -91,6 +92,18 @@ gfx_draw_visualizer_background :: (w: float, h: float, bg_alpha := 1.0) {
         if bar_h > cap_h * 2 {
             Simp.immediate_quad(x0, cy - bar_h - cap_h, x1, cy - bar_h, tip);
         }
+
+        // Spawn sparks from the top bar tip when the bar is energetic.
+        // Spawn x is random across the full bar width.
+        if v > 0.18 {
+            spawn_chance := (v - 0.18) / 0.82 * 0.28;
+            if random_get_zero_to_one() < spawn_chance {
+                spark_col := tip;
+                spark_col.w = 1.0;
+                spawn_x := x0 + random_get_zero_to_one() * bar_w;
+                particle_spawn(spawn_x, cy - bar_h, spark_col, min(w, h));
+            }
+        }
     }
 
     // Center mirror line — a thin neon glow across the middle.
@@ -100,10 +113,81 @@ gfx_draw_visualizer_background :: (w: float, h: float, bg_alpha := 1.0) {
         line_col.w = 0.5 + 0.3 * bass;
         Simp.immediate_quad(0, cy - line_h * 0.5, w, cy + line_h * 0.5, line_col);
     }
+
+    particles_update_draw(w, h);
 }
 
 #scope_file
 
+// ── Spark particle system ─────────────────────────────────────────────────────
+
+MAX_PARTICLES :: 800;
+
+Particle :: struct {
+    x, y:     float;
+    vx, vy:   float;
+    life:     float;   // remaining seconds
+    max_life: float;
+    color:    Vector4;
+    size:     float;
+}
+
+particles:  [MAX_PARTICLES] Particle;
+live_count: int;
+
+particle_spawn :: (x: float, y: float, color: Vector4, ref_size: float) {
+    if live_count >= MAX_PARTICLES  return;
+    p := *particles[live_count];
+    p.x        = x;
+    p.y        = y;
+    // Mostly upward: angle biased toward -PI/2 (up in screen coords), ±50° spread.
+    base_angle := -1.5708;  // -PI/2
+    angle      := base_angle + (random_get_zero_to_one() - 0.5) * 1.396;  // ±40°
+    speed      := 150.0 + random_get_zero_to_one() * 200.0;
+    p.vx       = cos(angle) * speed;
+    p.vy       = sin(angle) * speed;
+    p.max_life = 2.0 + random_get_zero_to_one() * 2.0;
+    p.life     = p.max_life;
+    p.color    = color;
+    // 0.3 – 0.8% of the smaller screen dimension.
+    p.size     = ref_size * (0.003 + random_get_zero_to_one() * 0.005);
+    live_count += 1;
+}
+
+particles_update_draw :: (w: float, h: float) {
+    dt := app.dt;
+    i  := 0;
+    while i < live_count {
+        p := *particles[i];
+        p.life -= dt;
+        if p.life <= 0 {
+            particles[i] = particles[live_count - 1];
+            live_count -= 1;
+            continue;
+        }
+
+        p.x += p.vx * dt;
+        p.y += p.vy * dt;
+
+        // Elastic bounce off all four walls.
+        if p.x < 0  { p.x = -p.x;      p.vx = -p.vx; }
+        if p.x > w  { p.x = 2*w - p.x; p.vx = -p.vx; }
+        if p.y < 0  { p.y = -p.y;      p.vy = -p.vy; }
+        if p.y > h  { p.y = 2*h - p.y; p.vy = -p.vy; }
+
+        // Linear fade so they stay vivid most of their life.
+        frac := p.life / p.max_life;
+        col  := p.color;
+        col.w = frac * 0.5;
+
+        half := p.size * 0.5;
+        Simp.immediate_quad(p.x - half, p.y - half, p.x + half, p.y + half, col);
+        i += 1;
+    }
+}
+
+// ── Colour helpers ────────────────────────────────────────────────────────────
+
 neon_color :: (hue: float, intensity: float, t: float) -> Vector4 {
     // Slow hue drift over time so the color palette evolves even on
     // sustained tones.
diff --git a/src/jellyfin/async.jai b/src/jellyfin/async.jai
index 1aaf6ab..684aca6 100644
--- a/src/jellyfin/async.jai
+++ b/src/jellyfin/async.jai
@@ -137,7 +137,7 @@ perform_curl_blocking :: (task: *Http_Task) -> Http_Response {
     curl_easy_setopt(handle, CURLoption.WRITEDATA, *buf);
 
     headers: *curl_slist;
-    headers = curl_slist_append(headers, temp_c_string(tprint("X-Emby-Authorization: %", task.auth)));
+    headers = curl_slist_append(headers, temp_c_string(tprint("Authorization: %", task.auth)));
     headers = curl_slist_append(headers, temp_c_string("Accept: application/json"));
     if task.method == "POST" || task.method == "PUT" {
         headers = curl_slist_append(headers, temp_c_string("Content-Type: application/json"));
diff --git a/src/jellyfin/auth.jai b/src/jellyfin/auth.jai
index 49b0157..0af0cd0 100644
--- a/src/jellyfin/auth.jai
+++ b/src/jellyfin/auth.jai
@@ -3,6 +3,10 @@
 //
 // Async via http_submit; on_login_response runs on the main thread.
 //
+// We also have a startup-time session probe (jellyfin_validate_session_async)
+// that hits /System/Info with the saved token. On 401 we drop to the login
+// screen instead of leaving the user staring at an empty library.
+//
 
 Login_User :: struct {
     Id:   string;
@@ -31,6 +35,29 @@ jellyfin_logout :: (c: *Jellyfin_Client) {
     config_clear();
 }
 
+// Token may have been revoked (re-login from another DeviceId match,
+// password change, admin revoke). Clear the token but keep server_url,
+// username, and the persisted device_id so the user can re-auth in one
+// click — and so we don't burn a new device_id, which would orphan all
+// prior session entries on the server.
+jellyfin_force_logout :: () {
+    log_warn("auth: clearing session (token rejected)");
+    free(app.jellyfin.auth_token);
+    free(app.jellyfin.user_id);
+    app.jellyfin.auth_token = "";
+    app.jellyfin.user_id    = "";
+    app.jellyfin.logged_in  = false;
+    config_save();
+    app.current_view = .LOGIN;
+}
+
+// Hit a cheap authenticated endpoint with the saved token to check whether
+// the server still honors it. /System/Info requires auth and is light.
+jellyfin_validate_session_async :: () {
+    log_info("auth: validating saved session");
+    http_submit("GET", "/System/Info", on_done=on_validate_session);
+}
+
 #scope_file
 
 on_login_response :: (task: *Http_Task) {
@@ -55,3 +82,21 @@ on_login_response :: (task: *Http_Task) {
     app.current_view = .LIBRARY;
     library_refresh_artists();
 }
+
+on_validate_session :: (task: *Http_Task) {
+    if task.response.status_code == 401 {
+        jellyfin_force_logout();
+        return;
+    }
+    if !task.response.ok {
+        // Network blip, server down, etc. — don't nuke the saved token over
+        // a transient failure; keep the user on the login screen so they
+        // can retry, but leave the credentials in place.
+        log_error("auth: validate failed status=% (keeping saved token)", task.response.status_code);
+        app.current_view = .LOGIN;
+        return;
+    }
+    log_info("auth: saved session ok");
+    app.current_view = .LIBRARY;
+    library_refresh_artists();
+}
diff --git a/src/jellyfin/client.jai b/src/jellyfin/client.jai
index 759eee0..dae165e 100644
--- a/src/jellyfin/client.jai
+++ b/src/jellyfin/client.jai
@@ -2,15 +2,23 @@
 // HTTP client for Jellyfin. Wraps libcurl with a write-to-builder callback so
 // each request returns the response body as a string.
 //
-// We follow Jellyfin's auth scheme: every authenticated request carries the
-// `Authorization: MediaBrowser Token="<token>"` header (along with a Client
-// fingerprint).
+// We follow Jellyfin's auth scheme: every request carries an
+// `Authorization: MediaBrowser ...` header. Authenticated requests append a
+// Token field; the login request omits it.
+//
+// Note: the legacy `X-Emby-Authorization` header is being removed in
+// Jellyfin 12.0 and admins on 10.11+ can already disable it. Use the modern
+// `Authorization` header.
+//
+// The DeviceId is per-install and persisted in config.json. Jellyfin
+// permits only one active access token per DeviceId — sharing one across
+// installs (or hardcoding a constant) silently revokes prior tokens the
+// next time anyone logs in.
 //
 
-CLIENT_NAME    :: "player";
+CLIENT_NAME    :: "Jellyfin Celica Music Player";
 CLIENT_VERSION :: "0.0.1";
-DEVICE_NAME    :: "player";
-DEVICE_ID      :: "player-dev-device";  // TODO: persist a real id per install
+DEVICE_NAME    :: "Celica";
 
 Jellyfin_Client :: struct {
     server_url:  string;
@@ -19,8 +27,23 @@ Jellyfin_Client :: struct {
 
     auth_token:    string;
     user_id:       string;
+    device_id:     string;     // persisted; one per install (see config.jai)
     logged_in:     bool;
     login_pending: bool;
+
+    // Quick Connect — see quick_connect.jai. Lives on the client so the
+    // login view can render its current state.
+    qc_state:   Quick_Connect_State;
+    qc_code:    string;        // 6-char user-facing code shown in the UI
+    qc_secret:  string;        // server token we poll with
+    qc_poll_at: float64;       // app.current_time when next poll is due
+}
+
+Quick_Connect_State :: enum {
+    IDLE;
+    INITIATING;
+    WAITING;
+    AUTHENTICATING;
 }
 
 jellyfin_client_init :: (c: *Jellyfin_Client) {
@@ -45,11 +68,11 @@ build_auth_header :: (c: *Jellyfin_Client) -> string {
     if c.logged_in && c.auth_token {
         return tprint(
             "MediaBrowser Client=\"%\", Device=\"%\", DeviceId=\"%\", Version=\"%\", Token=\"%\"",
-            CLIENT_NAME, DEVICE_NAME, DEVICE_ID, CLIENT_VERSION, c.auth_token);
+            CLIENT_NAME, DEVICE_NAME, c.device_id, CLIENT_VERSION, c.auth_token);
     }
     return tprint(
         "MediaBrowser Client=\"%\", Device=\"%\", DeviceId=\"%\", Version=\"%\"",
-        CLIENT_NAME, DEVICE_NAME, DEVICE_ID, CLIENT_VERSION);
+        CLIENT_NAME, DEVICE_NAME, c.device_id, CLIENT_VERSION);
 }
 
 http_get :: (c: *Jellyfin_Client, path: string) -> Http_Response {
@@ -95,7 +118,7 @@ http_request :: (c: *Jellyfin_Client, method: string, path: string, body: string
 
     headers: *curl_slist;
     auth := build_auth_header(c);
-    headers = curl_slist_append(headers, temp_c_string(tprint("X-Emby-Authorization: %", auth)));
+    headers = curl_slist_append(headers, temp_c_string(tprint("Authorization: %", auth)));
     headers = curl_slist_append(headers, temp_c_string("Accept: application/json"));
 
     if method == "POST" || method == "PUT" {
diff --git a/src/jellyfin/index.jai b/src/jellyfin/index.jai
index 564ea95..1618c5f 100644
--- a/src/jellyfin/index.jai
+++ b/src/jellyfin/index.jai
@@ -1,6 +1,7 @@
 #load "client.jai";
 #load "async.jai";
 #load "auth.jai";
+#load "quick_connect.jai";
 #load "library.jai";
 #load "images.jai";
 #load "stream.jai";
diff --git a/src/jellyfin/library.jai b/src/jellyfin/library.jai
index d88f5bf..2291403 100644
--- a/src/jellyfin/library.jai
+++ b/src/jellyfin/library.jai
@@ -153,6 +153,7 @@ library_select_album :: (album_id: string) {
 
 on_artists_loaded :: (task: *Http_Task) {
     app.library.artists_loading = false;
+    if task.response.status_code == 401 { jellyfin_force_logout(); return; }
     if !task.response.ok {
         log_error("artists: status=% body=%", task.response.status_code, slice(task.response.body, 0, min(300, task.response.body.count)));
         return;
@@ -182,6 +183,7 @@ on_albums_loaded :: (task: *Http_Task) {
     if gen != app.library.albums_request_gen  return;  // user moved on; discard
 
     app.library.albums_loading = false;
+    if task.response.status_code == 401 { jellyfin_force_logout(); return; }
     if !task.response.ok {
         log_error("albums: status=% body=%", task.response.status_code, slice(task.response.body, 0, min(300, task.response.body.count)));
         return;
@@ -212,6 +214,7 @@ on_tracks_loaded :: (task: *Http_Task) {
     if gen != app.library.tracks_request_gen  return;
 
     app.library.tracks_loading = false;
+    if task.response.status_code == 401 { jellyfin_force_logout(); return; }
     if !task.response.ok {
         log_error("tracks: status=% body=%", task.response.status_code, slice(task.response.body, 0, min(300, task.response.body.count)));
         return;
diff --git a/src/jellyfin/quick_connect.jai b/src/jellyfin/quick_connect.jai
new file mode 100644
index 0000000..b7c2644
--- /dev/null
+++ b/src/jellyfin/quick_connect.jai
@@ -0,0 +1,136 @@
+//
+// Quick Connect login. The user opens any other authenticated Jellyfin
+// client (web UI, mobile app, etc.), enters the 6-character code we
+// display, and the server hands us an AccessToken without ever seeing the
+// password — so we don't have to store one.
+//
+// Flow:
+//   1. POST /QuickConnect/Initiate   → { Code, Secret }
+//   2. show Code, poll GET /QuickConnect/Connect?secret=… every 3s
+//      until { Authenticated: true }
+//   3. POST /Users/AuthenticateWithQuickConnect with { Secret }
+//      → standard AuthenticationResult { AccessToken, User }
+//
+// The poll cadence runs off the main-loop pump (jellyfin_quick_connect_pump)
+// so we don't burn a thread sleeping between polls. State lives on
+// Jellyfin_Client; the login view reads it to draw either the form or the
+// "enter this code" panel.
+//
+
+QC_POLL_INTERVAL_S :: 3.0;
+
+// Field names map to the Jellyfin response via JsonName notes — `Code` is
+// a reserved primitive in Jai, so we rename to UserCode and instruct
+// Jaison to look for "Code" in the JSON.
+Quick_Connect_Result :: struct {
+    Authenticated: bool;
+    Secret:        string;
+    UserCode:      string;  @JsonName(Code)
+}
+
+jellyfin_quick_connect_start :: () {
+    c := *app.jellyfin;
+    if c.qc_state != .IDLE  return;
+    qc_reset(c);
+    c.qc_state = .INITIATING;
+    log_info("qc: initiating");
+    http_submit("POST", "/QuickConnect/Initiate", on_done=on_qc_initiated);
+}
+
+jellyfin_quick_connect_cancel :: () {
+    c := *app.jellyfin;
+    if c.qc_state == .IDLE  return;
+    log_info("qc: cancelled");
+    qc_reset(c);
+}
+
+// Called from the main loop. Fires the next /QuickConnect/Connect poll once
+// the cooldown has elapsed.
+jellyfin_quick_connect_pump :: () {
+    c := *app.jellyfin;
+    if c.qc_state != .WAITING  return;
+    if app.current_time < c.qc_poll_at  return;
+    c.qc_poll_at = app.current_time + QC_POLL_INTERVAL_S;
+    path := tprint("/QuickConnect/Connect?secret=%", c.qc_secret);
+    http_submit("GET", path, on_done=on_qc_polled);
+}
+
+#scope_file
+
+qc_reset :: (c: *Jellyfin_Client) {
+    free(c.qc_code);
+    free(c.qc_secret);
+    c.qc_code    = "";
+    c.qc_secret  = "";
+    c.qc_poll_at = 0;
+    c.qc_state   = .IDLE;
+}
+
+on_qc_initiated :: (task: *Http_Task) {
+    c := *app.jellyfin;
+    if c.qc_state != .INITIATING  return;
+    if !task.response.ok {
+        log_error("qc: initiate failed status=% body=%", task.response.status_code,
+            slice(task.response.body, 0, min(300, task.response.body.count)));
+        qc_reset(c);
+        return;
+    }
+    ok, parsed := Jaison.json_parse_string(task.response.body, Quick_Connect_Result);
+    if !ok || !parsed.UserCode || !parsed.Secret {
+        log_error("qc: initiate parse failed");
+        qc_reset(c);
+        return;
+    }
+    c.qc_code    = copy_string(parsed.UserCode);
+    c.qc_secret  = copy_string(parsed.Secret);
+    c.qc_state   = .WAITING;
+    c.qc_poll_at = app.current_time + QC_POLL_INTERVAL_S;
+    log_info("qc: code=%", c.qc_code);
+}
+
+on_qc_polled :: (task: *Http_Task) {
+    c := *app.jellyfin;
+    if c.qc_state != .WAITING  return;   // user cancelled while in flight
+    if !task.response.ok {
+        // 404 is normal once the server has cleaned up an expired request;
+        // anything else we treat as a transient error and keep polling.
+        log_warn("qc: poll status=%", task.response.status_code);
+        return;
+    }
+    ok, parsed := Jaison.json_parse_string(task.response.body, Quick_Connect_Result);
+    if !ok  return;
+    if !parsed.Authenticated  return;
+
+    log_info("qc: approved, exchanging secret for token");
+    c.qc_state = .AUTHENTICATING;
+    body := tprint("{\"Secret\":\"%\"}", c.qc_secret);
+    http_submit("POST", "/Users/AuthenticateWithQuickConnect", body, on_done=on_qc_authenticated);
+}
+
+on_qc_authenticated :: (task: *Http_Task) {
+    c := *app.jellyfin;
+    if c.qc_state != .AUTHENTICATING  return;
+
+    if !task.response.ok {
+        log_error("qc: authenticate failed status=%", task.response.status_code);
+        qc_reset(c);
+        return;
+    }
+    ok, parsed := Jaison.json_parse_string(task.response.body, Login_Response);
+    if !ok || !parsed.AccessToken {
+        log_error("qc: authenticate parse failed");
+        qc_reset(c);
+        return;
+    }
+
+    free(c.auth_token); free(c.user_id);
+    c.auth_token = copy_string(parsed.AccessToken);
+    c.user_id    = copy_string(parsed.User.Id);
+    c.logged_in  = true;
+    log_info("auth: logged in via Quick Connect as % (id=%)", parsed.User.Name, c.user_id);
+
+    qc_reset(c);
+    config_save();
+    app.current_view = .LIBRARY;
+    library_refresh_artists();
+}
diff --git a/src/ui/views/library_view.jai b/src/ui/views/library_view.jai
index 6c8832f..57eebfa 100644
--- a/src/ui/views/library_view.jai
+++ b/src/ui/views/library_view.jai
@@ -217,7 +217,7 @@ draw_transport_strip :: (x: float, y: float, w: float, h: float) {
     label_theme.text_color = .{1, 1, 1, 1};
 
     title := ifx app.current_stream
-        then tprint("% — % [%]", app.current_track.artist, app.current_track.name, app.current_format)
+        then tprint("% — %", app.current_track.artist, app.current_track.name)
         else "—";
     label(get_rect(x + h * 0.3, y + h * 0.2, w * 0.40, h * 0.6), title, *label_theme);
 
diff --git a/src/ui/views/login_view.jai b/src/ui/views/login_view.jai
index 6dcc385..93e9fec 100644
--- a/src/ui/views/login_view.jai
+++ b/src/ui/views/login_view.jai
@@ -1,6 +1,9 @@
 //
-// Login screen. Three text inputs (server URL, username, password) and a
-// big chunky CONNECT button.
+// Login screen. Two paths:
+//   1. Password — server URL + username + password + CONNECT
+//   2. Quick Connect — server URL + QUICK CONNECT, then enter the code
+//      shown here in another authenticated Jellyfin client. No password
+//      stored on disk.
 //
 
 draw_login_view :: () {
@@ -26,6 +29,11 @@ draw_login_view :: () {
         label(r, "a jellyfin music player", *label_theme);
     }
 
+    if app.jellyfin.qc_state != .IDLE {
+        draw_quick_connect_panel(w, h, k);
+        return;
+    }
+
     // Form column.
     field_w := min(w * 0.5, 12.0 * k);
     field_x := (w - field_w) * 0.5;
@@ -60,6 +68,52 @@ draw_login_view :: () {
             jellyfin_login_async(*app.jellyfin);
         }
     }
+
+    // QUICK CONNECT button — secondary action, smaller and below.
+    cursor_y += field_h * 1.2 + k * 0.3;
+    qc_button_theme := button_theme;
+    qc_button_theme.label_theme.text_color = .{0.7, 0.7, 0.9, 1};
+    if button(get_rect(field_x, cursor_y, field_w, field_h), "QUICK CONNECT", *qc_button_theme) {
+        jellyfin_quick_connect_start();
+    }
+}
+
+draw_quick_connect_panel :: (w: float, h: float, k: float) {
+    field_w := min(w * 0.5, 12.0 * k);
+    field_x := (w - field_w) * 0.5;
+    field_h := app.button_font.character_height * 1.7;
+
+    instructions := "Open another Jellyfin client and enter this code:";
+    if app.jellyfin.qc_state == .INITIATING  instructions = "Contacting server...";
+    if app.jellyfin.qc_state == .AUTHENTICATING  instructions = "Approved — signing in...";
+
+    {
+        label_theme := app.theme.label_theme;
+        label_theme.font = app.body_font;
+        label_theme.alignment = .Center;
+        label_theme.text_color = .{0.85, 0.85, 0.95, 1};
+        r := get_rect(0, h * 0.36, w, app.body_font.character_height * 1.5);
+        label(r, instructions, *label_theme);
+    }
+
+    {
+        label_theme := app.theme.label_theme;
+        label_theme.font = app.title_font;
+        label_theme.alignment = .Center;
+        label_theme.text_color = .{1, 0.4, 0.8, 1};
+        r := get_rect(0, h * 0.45, w, app.title_font.character_height * 1.4);
+        code := ifx app.jellyfin.qc_code then app.jellyfin.qc_code else "------";
+        label(r, code, *label_theme);
+    }
+
+    cancel_y := h * 0.65;
+    button_theme := app.theme.button_theme;
+    button_theme.font = app.button_font;
+    button_theme.label_theme.alignment = .Center;
+    button_theme.label_theme.text_color = .{0.7, 0.7, 0.9, 1};
+    if button(get_rect(field_x, cancel_y, field_w, field_h), "CANCEL", *button_theme) {
+        jellyfin_quick_connect_cancel();
+    }
 }
 
 //