{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Maximising Performance\n",
    "\n",
    "PyProBE uses [Polars LazyFrames](https://docs.pola.rs/user-guide/lazy/) under-the-hood. This means that data isn't loaded into memory and calculations aren't run until the data is requested by the user, either as a plot or as a DataFrame. This is what makes working with PyProBE much faster than working with Pandas DataFrames, as [this example notebook](comparing-pyprobe-performance.ipynb) demonstrates.\n",
    "\n",
    "Working with LazyFrames efficiently, though, requires use of some best practises which this notebook will demonstrate."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1",
   "metadata": {},
   "outputs": [],
   "source": [
    "import timeit\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "import pyprobe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load test data\n",
    "data_directory = \"../../../tests/sample_data/neware\"\n",
    "info_dictionary = {\"test_name\": \"Sample\", \"device\": \"Neware\"}\n",
    "\n",
    "\n",
    "def load_data():\n",
    "    \"\"\"Helper function to load fresh data for each benchmark run.\"\"\"\n",
    "    cell_new = pyprobe.Cell(info=info_dictionary)\n",
    "    cell_new.import_data(\n",
    "        procedure_name=\"Sample\",\n",
    "        data_path=data_directory + \"/sample_data_neware.parquet\",\n",
    "    )\n",
    "    return (\n",
    "        cell_new.procedure[\"Sample\"].experiment(\"Break-in Cycles\").cycle(1).discharge(0)\n",
    "    )"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "## Single get() with Multiple Arguments vs Multiple get() Calls\n",
    "\n",
    "When you need to retrieve multiple columns, the most efficient approach is to use a single `get()` call with multiple column arguments. This processes all columns in a single lazy evaluation plan, compared to calling `get()` separately for each column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Method 1: Multiple separate get() calls\n",
    "def multiple_get_calls():\n",
    "    result = load_data()\n",
    "    _ = result.get(\"Time [s]\")\n",
    "    _ = result.get(\"Current [A]\")\n",
    "    _ = result.get(\"Voltage [V]\")\n",
    "\n",
    "\n",
    "# Method 2: Single get() with multiple column arguments\n",
    "def single_get_multiple_args():\n",
    "    result = load_data()\n",
    "    _ = result.get(\"Time [s]\", \"Current [A]\", \"Voltage [V]\")\n",
    "\n",
    "\n",
    "# Benchmark the two methods\n",
    "num_runs = 10\n",
    "time_multiple_get = timeit.timeit(multiple_get_calls, number=num_runs) / num_runs\n",
    "time_single_get = timeit.timeit(single_get_multiple_args, number=num_runs) / num_runs\n",
    "\n",
    "# Visualize the results\n",
    "plt.figure(figsize=(8, 6))\n",
    "methods = [\n",
    "    \"Multiple get()\\ncalls\",\n",
    "    \"Single get()\\nwith multiple args\",\n",
    "]\n",
    "times = [\n",
    "    time_multiple_get * 1000,\n",
    "    time_single_get * 1000,\n",
    "]\n",
    "colors = [\"#ff7f0e\", \"#1f77b4\"]\n",
    "bars = plt.bar(methods, times, color=colors)\n",
    "plt.ylabel(\"Time (ms)\")\n",
    "plt.title(\"Single get() with Multiple Arguments vs Multiple get() Calls\")\n",
    "plt.ylim(0, max(times) * 1.2)\n",
    "\n",
    "# Add value labels on bars\n",
    "for bar, time in zip(bars, times):\n",
    "    height = bar.get_height()\n",
    "    plt.text(\n",
    "        bar.get_x() + bar.get_width() / 2,\n",
    "        height,\n",
    "        f\"{time:.2f} ms\",\n",
    "        ha=\"center\",\n",
    "        va=\"bottom\",\n",
    "    )\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "## Using collect() to Optimize Multiple get() Calls\n",
    "\n",
    "If you need to call `get()` multiple times, you can improve performance by calling `collect()` first. This materializes the lazy dataframe once, and subsequent `get()` calls operate on the collected data, avoiding repeated lazy evaluation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Benchmark multiple numbers of get() calls\n",
    "num_calls_list = [1, 3, 5, 10, 15, 20]\n",
    "times_multiple_get = []\n",
    "times_collect_then_get = []\n",
    "\n",
    "for num_calls in num_calls_list:\n",
    "    # Method 1: Multiple separate get() calls\n",
    "    def multiple_get_calls():\n",
    "        result = load_data()\n",
    "        for _ in range(num_calls):\n",
    "            _ = result.get(\"Time [s]\")\n",
    "            _ = result.get(\"Current [A]\")\n",
    "            _ = result.get(\"Voltage [V]\")\n",
    "\n",
    "    # Method 2: Single collect() followed by multiple get() calls\n",
    "    def single_collect_then_get():\n",
    "        result = load_data()\n",
    "        result.collect()\n",
    "        for _ in range(num_calls):\n",
    "            _ = result.get(\"Time [s]\")\n",
    "            _ = result.get(\"Current [A]\")\n",
    "            _ = result.get(\"Voltage [V]\")\n",
    "\n",
    "    # Benchmark\n",
    "    num_runs = 10\n",
    "    time_mg = timeit.timeit(multiple_get_calls, number=num_runs) / num_runs\n",
    "    time_cg = timeit.timeit(single_collect_then_get, number=num_runs) / num_runs\n",
    "\n",
    "    times_multiple_get.append(time_mg * 1000)  # Convert to ms\n",
    "    times_collect_then_get.append(time_cg * 1000)  # Convert to ms\n",
    "\n",
    "# Plot the results\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.plot(\n",
    "    num_calls_list,\n",
    "    times_multiple_get,\n",
    "    marker=\"o\",\n",
    "    linewidth=2,\n",
    "    markersize=8,\n",
    "    label=\"Multiple get() calls\",\n",
    "    color=\"#ff7f0e\",\n",
    ")\n",
    "plt.plot(\n",
    "    num_calls_list,\n",
    "    times_collect_then_get,\n",
    "    marker=\"s\",\n",
    "    linewidth=2,\n",
    "    markersize=8,\n",
    "    label=\"Single collect() + get() calls\",\n",
    "    color=\"#2ca02c\",\n",
    ")\n",
    "plt.xlabel(\"Number of get() Call Sets\")\n",
    "plt.ylabel(\"Total Time (ms)\")\n",
    "plt.title(\"Performance: Multiple get() Calls vs collect() + get() Calls\")\n",
    "plt.xticks(num_calls_list)\n",
    "plt.legend()\n",
    "plt.grid(True, alpha=0.3)\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}