diff --git a/Another_copy_of_FinRL_Ensemble_StockTrading_ICAIF_2020.ipynb b/Another_copy_of_FinRL_Ensemble_StockTrading_ICAIF_2020.ipynb new file mode 100644 index 0000000..1d8a4dc --- /dev/null +++ b/Another_copy_of_FinRL_Ensemble_StockTrading_ICAIF_2020.ipynb @@ -0,0 +1,3205 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "view-in-github", + "colab_type": "text" + }, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Lb9q2_QZgdNk" + }, + "source": [ + "\n", + " \"Open\n", + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "gXaoZs2lh1hi" + }, + "source": [ + "# Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading Using Ensemble Strategy\n", + "\n", + "Tutorials to use OpenAI DRL to trade multiple stocks using ensemble strategy in one Jupyter Notebook | Presented at ICAIF 2020\n", + "\n", + "* This notebook is the reimplementation of our paper: Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy, using FinRL.\n", + "* Check out medium blog for detailed explanations: https://medium.com/@ai4finance/deep-reinforcement-learning-for-automated-stock-trading-f1dad0126a02\n", + "* Please report any issues to our Github: https://github.com/AI4Finance-LLC/FinRL-Library/issues\n", + "* **Pytorch Version**\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "lGunVt8oLCVS" + }, + "source": [ + "# Content" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HOzAKQ-SLGX6" + }, + "source": [ + "* [1. Problem Definition](#0)\n", + "* [2. Getting Started - Load Python packages](#1)\n", + " * [2.1. Install Packages](#1.1) \n", + " * [2.2. Check Additional Packages](#1.2)\n", + " * [2.3. Import Packages](#1.3)\n", + " * [2.4. Create Folders](#1.4)\n", + "* [3. Download Data](#2)\n", + "* [4. Preprocess Data](#3) \n", + " * [4.1. Technical Indicators](#3.1)\n", + " * [4.2. Perform Feature Engineering](#3.2)\n", + "* [5.Build Environment](#4) \n", + " * [5.1. Training & Trade Data Split](#4.1)\n", + " * [5.2. User-defined Environment](#4.2) \n", + " * [5.3. Initialize Environment](#4.3) \n", + "* [6.Implement DRL Algorithms](#5) \n", + "* [7.Backtesting Performance](#6) \n", + " * [7.1. BackTestStats](#6.1)\n", + " * [7.2. BackTestPlot](#6.2) \n", + " * [7.3. Baseline Stats](#6.3) \n", + " * [7.3. Compare to Stock Market Index](#6.4) " + ] + }, + { + "cell_type": "code", + "source": [ + "from google.colab import drive\n", + "drive.mount('/content/drive')" + ], + "metadata": { + "id": "X2qJ-7rHBp8W", + "collapsed": true + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sApkDlD9LIZv" + }, + "source": [ + "\n", + "# Part 1. Problem Definition" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HjLD2TZSLKZ-" + }, + "source": [ + "This problem is to design an automated trading solution for single stock trading. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem.\n", + "\n", + "The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are:\n", + "\n", + "\n", + "* Action: The action space describes the allowed actions that the agent interacts with the\n", + "environment. Normally, a ∈ A includes three actions: a ∈ {−1, 0, 1}, where −1, 0, 1 represent\n", + "selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use\n", + "an action space {−k, ..., −1, 0, 1, ..., k}, where k denotes the number of shares. For example, \"Buy\n", + "10 shares of AAPL\" or \"Sell 10 shares of AAPL\" are 10 or −10, respectively\n", + "\n", + "* Reward function: r(s, a, s′) is the incentive mechanism for an agent to learn a better action. The change of the portfolio value when action a is taken at state s and arriving at new state s', i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio\n", + "values at state s′ and s, respectively\n", + "\n", + "* State: The state space describes the observations that the agent receives from the environment. Just as a human trader needs to analyze various information before executing a trade, so\n", + "our trading agent observes many different features to better learn in an interactive environment.\n", + "\n", + "* Environment: Dow 30 consituents\n", + "\n", + "\n", + "The data of the single stock that we will be using for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ffsre789LY08" + }, + "source": [ + "\n", + "# Part 2. Getting Started- Load Python Packages" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Uy5_PTmOh1hj" + }, + "source": [ + "\n", + "## 2.1. Install all the packages through FinRL library\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 707 + }, + "id": "mPT0ipYE28wL", + "outputId": "8e103cdb-23f8-4190-d7e1-289bd3dd3d54", + "collapsed": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Successfully installed AutoROM.accept-rom-license-0.6.1 Cython-3.0.11 MarkupSafe-2.1.5 PyYAML-6.0.1 SQLAlchemy-2.0.35 absl-py-2.1.0 aiodns-3.2.0 aiohappyeyeballs-2.4.0 aiohttp-3.10.6 aiohttp-cors-0.7.0 aiosignal-1.3.1 ale-py-0.8.1 alpaca-trade-api-3.2.0 annotated-types-0.7.0 asttokens-2.4.1 async-timeout-4.0.3 attrs-24.2.0 autorom-0.6.1 beautifulsoup4-4.12.3 cachetools-5.5.0 ccxt-3.1.60 certifi-2024.8.30 cffi-1.17.1 charset-normalizer-3.3.2 clarabel-0.9.0 click-8.1.7 cloudpickle-3.0.0 colorful-0.5.6 contourpy-1.3.0 cryptography-43.0.1 cvxpy-1.5.3 cycler-0.12.1 decorator-5.1.1 deprecation-2.1.0 distlib-0.3.8 ecos-2.0.14 elegantrl-0.3.10 empyrical-0.5.5 exceptiongroup-1.2.2 exchange-calendars-4.5.6 executing-2.1.0 farama-notifications-0.0.4 filelock-3.16.1 finrl-0.3.6 fonttools-4.54.1 frozendict-2.4.4 frozenlist-1.4.1 fsspec-2024.9.0 google-api-core-2.20.0 google-auth-2.35.0 googleapis-common-protos-1.65.0 greenlet-3.1.1 grpcio-1.66.1 gymnasium-0.29.1 html5lib-1.1 idna-3.10 importlib-resources-6.4.5 ipython-8.27.0 jedi-0.19.1 jinja2-3.1.4 joblib-1.4.2 jqdatasdk-1.9.6 jsonschema-4.23.0 jsonschema-specifications-2023.12.1 kiwisolver-1.4.7 korean-lunar-calendar-0.3.1 linkify-it-py-2.0.3 lxml-5.3.0 markdown-3.7 markdown-it-py-3.0.0 matplotlib-3.9.2 matplotlib-inline-0.1.7 mdit-py-plugins-0.4.2 mdurl-0.1.2 memray-1.14.0 mpmath-1.3.0 msgpack-1.0.3 multidict-6.1.0 multitasking-0.0.11 networkx-3.3 numpy-1.26.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.6.68 nvidia-nvtx-cu12-12.1.105 opencensus-0.11.4 opencensus-context-0.1.3 opencv-python-4.10.0.84 osqp-0.6.7.post1 packaging-23.2 pandas-2.2.3 pandas-datareader-0.10.0 parso-0.8.4 peewee-3.17.6 pexpect-4.9.0 pillow-10.4.0 platformdirs-4.3.6 ply-3.11 prometheus-client-0.21.0 prompt-toolkit-3.0.48 proto-plus-1.24.0 protobuf-5.28.2 psutil-6.0.0 psycopg2-binary-2.9.9 ptyprocess-0.7.0 pure-eval-0.2.3 py-spy-0.3.14 pyarrow-17.0.0 pyasn1-0.6.1 pyasn1-modules-0.4.1 pycares-4.4.0 pycparser-2.22 pydantic-2.9.2 pydantic-core-2.23.4 pyfolio-0.9.2 pygame-2.6.0 pygments-2.18.0 pyluach-2.2.0 pymysql-1.1.1 pyparsing-3.1.4 pyportfolioopt-1.5.5 python-dateutil-2.9.0.post0 pytz-2024.2 qdldl-0.1.7.post4 ray-2.37.0 referencing-0.35.1 requests-2.32.3 rich-13.8.1 rpds-py-0.20.0 rsa-4.9 scikit-learn-1.5.2 scipy-1.12.0 scs-3.2.7 seaborn-0.13.2 setuptools-75.1.0 shimmy-1.3.0 six-1.16.0 smart-open-7.0.4 soupsieve-2.6 stable-baselines3-2.4.0a7 stack-data-0.6.3 stockstats-0.5.4 sympy-1.13.3 tensorboard-2.18.0 tensorboard-data-server-0.7.2 tensorboardX-2.6.2.2 textual-0.81.0 threadpoolctl-3.5.0 thriftpy2-0.5.2 toolz-0.12.1 torch-2.4.1 tqdm-4.66.5 traitlets-5.14.3 triton-3.0.0 typing-extensions-4.12.2 tzdata-2024.2 uc-micro-py-1.0.3 urllib3-1.26.20 virtualenv-20.26.5 wcwidth-0.2.13 webencodings-0.5.1 websocket-client-1.8.0 websockets-10.4 werkzeug-3.0.4 wrapt-1.16.0 wrds-3.2.0 yarl-1.12.1 yfinance-0.2.43\n", + "\u001b[33mWARNING: Target directory /content/drive/packages/psycopg2 already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/greenlet already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/dateutil already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/tzdata already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/greenlet-3.1.1.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/numpy already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/scipy-1.12.0.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/packaging-23.2.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/psycopg2_binary-2.9.9.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/tzdata-2024.2.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/scipy.libs already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/numpy-1.26.4.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/pytz-2024.2.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/numpy.libs already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/pandas-2.2.3.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/pytz already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/sqlalchemy already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/psycopg2_binary.libs already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/typing_extensions.py already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/SQLAlchemy-2.0.35.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/scipy already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/typing_extensions-4.12.2.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/__pycache__ already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/python_dateutil-2.9.0.post0.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/packaging already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/wrds already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/six.py already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/wrds-3.2.0.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/six-1.16.0.dist-info already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/pandas already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/include already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m\u001b[33mWARNING: Target directory /content/drive/packages/bin already exists. Specify --upgrade to force replacement.\u001b[0m\u001b[33m\n", + "\u001b[0m" + ] + }, + { + "output_type": "display_data", + "data": { + "application/vnd.colab-display-data+json": { + "pip_warning": { + "packages": [ + "PIL", + "certifi", + "cffi", + "cycler", + "google", + "kiwisolver", + "matplotlib_inline", + "pexpect", + "pytz", + "six", + "wcwidth" + ] + }, + "id": "867206b47fa44e579f9d1705d740fa15" + } + }, + "metadata": {} + } + ], + "source": [ + "!pip install --target='/content/drive/packages' wrds\n", + "!pip install --target='/content/drive/packages' swig\n", + "!pip install --target='/content/drive/packages' -q condacolab\n", + "import condacolab\n", + "condacolab.install()\n", + "!apt-get update -y -qq && apt-get install -y -qq cmake libopenmpi-dev python3-dev zlib1g-dev libgl1-mesa-glx swig\n", + "!pip install --target='/content/drive/packages' git+https://github.com/AI4Finance-Foundation/FinRL.git" + ] + }, + { + "cell_type": "code", + "source": [ + "# from google.colab import drive\n", + "# drive.mount('/content/drive')\n", + "!cp -r /content/drive/packages/* /content/drive/\n" + ], + "metadata": { + "id": "IA0PE8RWivNh" + }, + "execution_count": 1, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "osBHhVysOEzi" + }, + "source": [ + "\n", + "\n", + "## 2.2. Check if the additional packages needed are present, if not install them.\n", + "* Yahoo Finance API\n", + "* pandas\n", + "* numpy\n", + "* matplotlib\n", + "* stockstats\n", + "* OpenAI gym\n", + "* stable-baselines\n", + "* tensorflow\n", + "* pyfolio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "nGv01K8Sh1hn" + }, + "source": [ + "\n", + "## 2.3. Import Packages" + ] + }, + { + "cell_type": "code", + "source": [], + "metadata": { + "id": "TOPYpjid4xoL" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "lPqeTTwoh1hn", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "72b957ff-3f49-41e7-faca-1d989271c100", + "collapsed": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "None\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/tensorflow/lite/python/util.py:55: DeprecationWarning: jax.xla_computation is deprecated. Please use the AOT APIs; see https://jax.readthedocs.io/en/latest/aot.html. For example, replace xla_computation(f)(*xs) with jit(f).lower(*xs).compiler_ir('hlo'). See CHANGELOG.md for 0.4.30 for more examples.\n", + " from jax import xla_computation as _xla_computation\n", + "/usr/local/lib/python3.10/dist-packages/pandas_datareader/compat/__init__.py:11: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.\n", + " PANDAS_VERSION = LooseVersion(pd.__version__)\n", + "/content/drive/packages/pyfolio/pos.py:26: UserWarning: Module \"zipline.assets\" not found; mutltipliers will not be applied to position notionals.\n", + " warnings.warn(\n" + ] + } + ], + "source": [ + "import pandas as pd\n", + "import numpy as np\n", + "import matplotlib\n", + "import matplotlib.pyplot as plt\n", + "# matplotlib.use('Agg')\n", + "import datetime\n", + "\n", + "import sys\n", + "os = sys.path.append(\"/content/drive/packages\")\n", + "print(os)\n", + "\n", + "import itertools\n", + "\n", + "\n", + "%matplotlib inline\n", + "from finrl.config_tickers import DOW_30_TICKER\n", + "from finrl.meta.preprocessor.yahoodownloader import YahooDownloader\n", + "from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split\n", + "from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv\n", + "from finrl.agents.stablebaselines3.models import DRLAgent,DRLEnsembleAgent\n", + "from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline\n", + "\n", + "from pprint import pprint\n", + "\n", + "import sys\n", + "sys.path.append(\"../FinRL-Library\")\n", + "\n", + "import itertools" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "T2owTj985RW4" + }, + "source": [ + "\n", + "## 2.4. Create Folders" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "w9A8CN5R5PuZ" + }, + "outputs": [], + "source": [ + "import os\n", + "from finrl.main import check_and_make_directories\n", + "from finrl.config import (\n", + " DATA_SAVE_DIR,\n", + " TRAINED_MODEL_DIR,\n", + " TENSORBOARD_LOG_DIR,\n", + " RESULTS_DIR,\n", + " INDICATORS,\n", + " TRAIN_START_DATE,\n", + " TRAIN_END_DATE,\n", + " TEST_START_DATE,\n", + " TEST_END_DATE,\n", + " TRADE_START_DATE,\n", + " TRADE_END_DATE,\n", + ")\n", + "\n", + "check_and_make_directories([DATA_SAVE_DIR, TRAINED_MODEL_DIR, TENSORBOARD_LOG_DIR, RESULTS_DIR])" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "A289rQWMh1hq" + }, + "source": [ + "\n", + "# Part 3. Download Data\n", + "Yahoo Finance is a website that provides stock data, financial news, financial reports, etc. All the data provided by Yahoo Finance is free.\n", + "* FinRL uses a class **YahooDownloader** to fetch data from Yahoo Finance API\n", + "* Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NPeQ7iS-LoMm" + }, + "source": [ + "\n", + "\n", + "-----\n", + "class YahooDownloader:\n", + " Provides methods for retrieving daily stock data from\n", + " Yahoo Finance API\n", + "\n", + " Attributes\n", + " ----------\n", + " start_date : str\n", + " start date of the data (modified from config.py)\n", + " end_date : str\n", + " end date of the data (modified from config.py)\n", + " ticker_list : list\n", + " a list of stock tickers (modified from config.py)\n", + "\n", + " Methods\n", + " -------\n", + " fetch_data()\n", + " Fetches data from yahoo API\n" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "JzqRRTOX6aFu", + "outputId": "0c705e10-bc77-4495-c665-e85f1ef3bcba" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "['AXP', 'AMGN', 'AAPL', 'BA', 'CAT', 'CSCO', 'CVX', 'GS', 'HD', 'HON', 'IBM', 'INTC', 'JNJ', 'KO', 'JPM', 'MCD', 'MMM', 'MRK', 'MSFT', 'NKE', 'PG', 'TRV', 'UNH', 'CRM', 'VZ', 'V', 'WBA', 'WMT', 'DIS', 'DOW']\n" + ] + } + ], + "source": [ + "print(DOW_30_TICKER)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "yCKm4om-s9kE", + "outputId": "5953ed90-c8ae-4fc4-cd69-89bf86bf7d4f" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n", + "[*********************100%***********************] 1 of 1 completed\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Shape of DataFrame: (97013, 8)\n" + ] + } + ], + "source": [ + "# TRAIN_START_DATE = '2009-04-01'\n", + "# TRAIN_END_DATE = '2021-01-01'\n", + "# TEST_START_DATE = '2021-01-01'\n", + "# TEST_END_DATE = '2022-06-01'\n", + "\n", + "TRAIN_START_DATE = '2010-01-01'\n", + "TRAIN_END_DATE = '2021-10-01'\n", + "TEST_START_DATE = '2021-10-01'\n", + "TEST_END_DATE = '2023-03-01'\n", + "\n", + "df = YahooDownloader(start_date = TRAIN_START_DATE,\n", + " end_date = TEST_END_DATE,\n", + " ticker_list = DOW_30_TICKER).fetch_data()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + }, + "id": "GiRuFOTOtj1Y", + "outputId": "63e8739f-1cb1-447c-cd0b-8421cd7065bc" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " date open high low close volume tic \\\n", + "0 2010-01-04 7.622500 7.660714 7.585000 6.454504 493729600 AAPL \n", + "1 2010-01-04 56.630001 57.869999 56.560001 40.915897 5277400 AMGN \n", + "2 2010-01-04 40.810001 41.099998 40.389999 32.992157 6894300 AXP \n", + "3 2010-01-04 55.720001 56.389999 54.799999 43.777557 6186700 BA \n", + "4 2010-01-04 57.650002 59.189999 57.509998 40.027199 7325600 CAT \n", + "\n", + " day \n", + "0 0 \n", + "1 0 \n", + "2 0 \n", + "3 0 \n", + "4 0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dateopenhighlowclosevolumeticday
02010-01-047.6225007.6607147.5850006.454504493729600AAPL0
12010-01-0456.63000157.86999956.56000140.9158975277400AMGN0
22010-01-0440.81000141.09999840.38999932.9921576894300AXP0
32010-01-0455.72000156.38999954.79999943.7775576186700BA0
42010-01-0457.65000259.18999957.50999840.0271997325600CAT0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "df", + "summary": "{\n \"name\": \"df\",\n \"rows\": 97013,\n \"fields\": [\n {\n \"column\": \"date\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 3311,\n \"samples\": [\n \"2010-03-19\",\n \"2012-09-13\",\n \"2015-06-23\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"open\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 73.3337048274467,\n \"min\": 6.870357036590576,\n \"max\": 555.0,\n \"num_unique_values\": 38726,\n \"samples\": [\n 153.4499969482422,\n 208.27000427246094,\n 7.67642879486084\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"high\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 74.08851731233877,\n \"min\": 7.0,\n \"max\": 558.0999755859375,\n \"num_unique_values\": 39077,\n \"samples\": [\n 169.03846740722656,\n 27.3700008392334,\n 47.7400016784668\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"low\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 72.55584141136859,\n \"min\": 6.794642925262451,\n \"max\": 550.1300048828125,\n \"num_unique_values\": 38859,\n \"samples\": [\n 12.343570709228516,\n 61.5099983215332,\n 131.2100067138672\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 70.17611073368398,\n \"min\": 5.792194366455078,\n \"max\": 538.8892822265625,\n \"num_unique_values\": 92710,\n \"samples\": [\n 42.52291488647461,\n 218.6311492919922,\n 75.48799133300781\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 61733648,\n \"min\": 305400,\n \"max\": 1880998000,\n \"num_unique_values\": 81903,\n \"samples\": [\n 3372002,\n 5786400,\n 37385300\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"tic\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 30,\n \"samples\": [\n \"WBA\",\n \"JPM\",\n \"TRV\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"day\",\n \"properties\": {\n \"dtype\": \"int32\",\n \"num_unique_values\": 5,\n \"samples\": [\n 1,\n 4,\n 2\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 8 + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + }, + "id": "DSw4ZEzVtj1Z", + "outputId": "a004e91f-c848-4ce0-f5b4-dac3011b122b" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " date open high low close volume \\\n", + "97008 2023-02-28 482.670013 483.359985 473.920013 463.423553 3902100 \n", + "97009 2023-02-28 220.000000 221.770004 219.500000 217.388611 5385400 \n", + "97010 2023-02-28 38.700001 38.970001 38.549999 34.966160 16685300 \n", + "97011 2023-02-28 35.480000 35.779999 35.320000 31.987909 8847000 \n", + "97012 2023-02-28 47.000000 47.549999 46.983334 46.214355 18054000 \n", + "\n", + " tic day \n", + "97008 UNH 1 \n", + "97009 V 1 \n", + "97010 VZ 1 \n", + "97011 WBA 1 \n", + "97012 WMT 1 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dateopenhighlowclosevolumeticday
970082023-02-28482.670013483.359985473.920013463.4235533902100UNH1
970092023-02-28220.000000221.770004219.500000217.3886115385400V1
970102023-02-2838.70000138.97000138.54999934.96616016685300VZ1
970112023-02-2835.48000035.77999935.32000031.9879098847000WBA1
970122023-02-2847.00000047.54999946.98333446.21435518054000WMT1
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "repr_error": "0" + } + }, + "metadata": {}, + "execution_count": 9 + } + ], + "source": [ + "df.tail()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "CV3HrZHLh1hy", + "outputId": "19a67c12-4b0a-4a76-e238-192f58fa696e" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(97013, 8)" + ] + }, + "metadata": {}, + "execution_count": 10 + } + ], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + }, + "id": "4hYkeaPiICHS", + "outputId": "67b0637f-e4ce-43c8-d437-ff491dcfb702" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " date open high low close volume tic \\\n", + "0 2010-01-04 7.622500 7.660714 7.585000 6.454504 493729600 AAPL \n", + "1 2010-01-04 56.630001 57.869999 56.560001 40.915897 5277400 AMGN \n", + "2 2010-01-04 40.810001 41.099998 40.389999 32.992157 6894300 AXP \n", + "3 2010-01-04 55.720001 56.389999 54.799999 43.777557 6186700 BA \n", + "4 2010-01-04 57.650002 59.189999 57.509998 40.027199 7325600 CAT \n", + "\n", + " day \n", + "0 0 \n", + "1 0 \n", + "2 0 \n", + "3 0 \n", + "4 0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dateopenhighlowclosevolumeticday
02010-01-047.6225007.6607147.5850006.454504493729600AAPL0
12010-01-0456.63000157.86999956.56000140.9158975277400AMGN0
22010-01-0440.81000141.09999840.38999932.9921576894300AXP0
32010-01-0455.72000156.38999954.79999943.7775576186700BA0
42010-01-0457.65000259.18999957.50999840.0271997325600CAT0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "summary": "{\n \"name\": \"df\",\n \"rows\": 5,\n \"fields\": [\n {\n \"column\": \"date\",\n \"properties\": {\n \"dtype\": \"object\",\n \"num_unique_values\": 1,\n \"samples\": [\n \"2010-01-04\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"open\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21.308478892619465,\n \"min\": 7.622499942779541,\n \"max\": 57.650001525878906,\n \"num_unique_values\": 5,\n \"samples\": [\n 56.630001068115234\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"high\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21.82086885953129,\n \"min\": 7.660714149475098,\n \"max\": 59.189998626708984,\n \"num_unique_values\": 5,\n \"samples\": [\n 57.869998931884766\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"low\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 21.177861035901508,\n \"min\": 7.585000038146973,\n \"max\": 57.5099983215332,\n \"num_unique_values\": 5,\n \"samples\": [\n 56.560001373291016\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"close\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 15.270260867050599,\n \"min\": 6.454503536224365,\n \"max\": 43.777557373046875,\n \"num_unique_values\": 5,\n \"samples\": [\n 40.915897369384766\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"volume\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 217932410,\n \"min\": 5277400,\n \"max\": 493729600,\n \"num_unique_values\": 5,\n \"samples\": [\n 5277400\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"tic\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 5,\n \"samples\": [\n \"AMGN\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"day\",\n \"properties\": {\n \"dtype\": \"int32\",\n \"num_unique_values\": 1,\n \"samples\": [\n 0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 11 + } + ], + "source": [ + "df.sort_values(['date','tic']).head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "a2vryMsdNL9H", + "outputId": "9aa5465c-4d73-49c2-bcf0-88f1d68e746a" + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "30" + ] + }, + "metadata": {}, + "execution_count": 12 + } + ], + "source": [ + "len(df.tic.unique())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "XcNyXa7RNPrF", + "outputId": "b681106e-013a-4ca9-f253-99b7de0393ff", + "collapsed": true + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "tic\n", + "AAPL 3311\n", + "AMGN 3311\n", + "WMT 3311\n", + "WBA 3311\n", + "VZ 3311\n", + "V 3311\n", + "UNH 3311\n", + "TRV 3311\n", + "PG 3311\n", + "NKE 3311\n", + "MSFT 3311\n", + "MRK 3311\n", + "MMM 3311\n", + "MCD 3311\n", + "KO 3311\n", + "JPM 3311\n", + "JNJ 3311\n", + "INTC 3311\n", + "IBM 3311\n", + "HON 3311\n", + "HD 3311\n", + "GS 3311\n", + "DIS 3311\n", + "CVX 3311\n", + "CSCO 3311\n", + "CRM 3311\n", + "CAT 3311\n", + "BA 3311\n", + "AXP 3311\n", + "DOW 994\n", + "Name: count, dtype: int64" + ], + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
count
tic
AAPL3311
AMGN3311
WMT3311
WBA3311
VZ3311
V3311
UNH3311
TRV3311
PG3311
NKE3311
MSFT3311
MRK3311
MMM3311
MCD3311
KO3311
JPM3311
JNJ3311
INTC3311
IBM3311
HON3311
HD3311
GS3311
DIS3311
CVX3311
CSCO3311
CRM3311
CAT3311
BA3311
AXP3311
DOW994
\n", + "

" + ] + }, + "metadata": {}, + "execution_count": 13 + } + ], + "source": [ + "df.tic.value_counts()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "uqC6c40Zh1iH" + }, + "source": [ + "# Part 4: Preprocess Data\n", + "Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state.\n", + "* Add technical indicators. In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI.\n", + "* Add turbulence index. Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "kM5bH9uroCeg" + }, + "outputs": [], + "source": [ + " INDICATORS = ['macd',\n", + " 'rsi_30',\n", + " 'cci_30',\n", + " 'dx_30']" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "jgXfBcjxtj1a", + "outputId": "54ad090e-1ea8-4bd4-90be-0880fa5e763c", + "collapsed": true + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Successfully added technical indicators\n", + "Successfully added turbulence index\n" + ] + } + ], + "source": [ + "fe = FeatureEngineer(use_technical_indicator=True,\n", + " tech_indicator_list = INDICATORS,\n", + " use_turbulence=True,\n", + " user_defined_feature = False)\n", + "\n", + "processed = fe.preprocess_data(df)\n", + "processed = processed.copy()\n", + "processed = processed.fillna(0)\n", + "processed = processed.replace(np.inf,0)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + }, + "id": "grvhGJJII3Xn", + "outputId": "c84822f8-623f-402b-acb5-5260a75ed07b", + "collapsed": true + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " date open high low close volume \\\n", + "23141 2013-03-06 24.583332 24.709999 24.416668 19.109367 21448800 \n", + "26932 2013-09-11 32.570000 32.930000 32.529999 27.212698 39087500 \n", + "85216 2021-09-03 175.100006 175.220001 173.809998 160.690674 4096900 \n", + "54590 2017-06-26 147.906311 148.900574 147.829834 106.022476 2252561 \n", + "75283 2020-04-27 43.333332 43.436668 42.723331 39.941040 17923800 \n", + "\n", + " tic day macd rsi_30 cci_30 dx_30 turbulence \n", + "23141 WMT 2 0.204328 59.008303 206.577885 41.378700 25.371484 \n", + "26932 MSFT 2 -0.114314 50.723271 44.325067 8.025848 33.204657 \n", + "85216 JNJ 4 0.729121 57.466903 15.587760 0.764838 7.185696 \n", + "54590 IBM 0 -0.030324 45.774899 123.891858 18.141758 22.193627 \n", + "75283 WMT 0 1.155208 55.933852 80.628321 28.997452 55.566703 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
dateopenhighlowclosevolumeticdaymacdrsi_30cci_30dx_30turbulence
231412013-03-0624.58333224.70999924.41666819.10936721448800WMT20.20432859.008303206.57788541.37870025.371484
269322013-09-1132.57000032.93000032.52999927.21269839087500MSFT2-0.11431450.72327144.3250678.02584833.204657
852162021-09-03175.100006175.220001173.809998160.6906744096900JNJ40.72912157.46690315.5877600.7648387.185696
545902017-06-26147.906311148.900574147.829834106.0224762252561IBM0-0.03032445.774899123.89185818.14175822.193627
752832020-04-2743.33333243.43666842.72333139.94104017923800WMT01.15520855.93385280.62832128.99745255.566703
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "repr_error": "0" + } + }, + "metadata": {}, + "execution_count": 26 + } + ], + "source": [ + "processed.sample(5)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-QsYaY0Dh1iw" + }, + "source": [ + "\n", + "# Part 5. Design Environment\n", + "Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a **Markov Decision Process (MDP)** problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds.\n", + "\n", + "Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation.\n", + "\n", + "The action space describes the allowed actions that the agent interacts with the environment. Normally, action a includes three actions: {-1, 0, 1}, where -1, 0, 1 represent selling, holding, and buying one share. Also, an action can be carried upon multiple shares. We use an action space {-k,…,-1, 0, 1, …, k}, where k denotes the number of shares to buy and -k denotes the number of shares to sell. For example, \"Buy 10 shares of AAPL\" or \"Sell 10 shares of AAPL\" are 10 or -10, respectively. The continuous action space needs to be normalized to [-1, 1], since the policy is defined on a Gaussian distribution, which needs to be normalized and symmetric." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Q2zqII8rMIqn", + "outputId": "2e9ec7cc-9d29-4a19-f2d0-8248c252ba14" + }, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Stock Dimension: 29, State Space: 175\n" + ] + } + ], + "source": [ + "stock_dimension = len(processed.tic.unique())\n", + "state_space = 1 + 2*stock_dimension + len(INDICATORS)*stock_dimension\n", + "print(f\"Stock Dimension: {stock_dimension}, State Space: {state_space}\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "AWyp84Ltto19" + }, + "outputs": [], + "source": [ + "env_kwargs = {\n", + " \"hmax\": 100,\n", + " \"initial_amount\": 1000000,\n", + " \"buy_cost_pct\": 0.001,\n", + " \"sell_cost_pct\": 0.001,\n", + " \"state_space\": state_space,\n", + " \"stock_dim\": stock_dimension,\n", + " \"tech_indicator_list\": INDICATORS,\n", + " \"action_space\": stock_dimension,\n", + " \"reward_scaling\": 1e-4,\n", + " \"print_verbosity\":5\n", + "\n", + "}" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HMNR5nHjh1iz" + }, + "source": [ + "\n", + "# Part 6: Implement DRL Algorithms\n", + "* The implementation of the DRL algorithms are based on **OpenAI Baselines** and **Stable Baselines**. Stable Baselines is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups.\n", + "* FinRL library includes fine-tuned standard DRL algorithms, such as DQN, DDPG,\n", + "Multi-Agent DDPG, PPO, SAC, A2C and TD3. We also allow users to\n", + "design their own DRL algorithms by adapting these DRL algorithms.\n", + "\n", + "* In this notebook, we are training and validating 3 agents (A2C, PPO, DDPG) using Rolling-window Ensemble Method ([reference code](https://github.com/AI4Finance-LLC/Deep-Reinforcement-Learning-for-Automated-Stock-Trading-Ensemble-Strategy-ICAIF-2020/blob/80415db8fa7b2179df6bd7e81ce4fe8dbf913806/model/models.py#L92))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "v-gthCxMtj1d" + }, + "outputs": [], + "source": [ + "rebalance_window = 63 # rebalance_window is the number of days to retrain the model\n", + "validation_window = 63 # validation_window is the number of days to do validation and trading (e.g. if validation_window=63, then both validation and trading period will be 63 days)\n", + "\n", + "ensemble_agent = DRLEnsembleAgent(df=processed,\n", + " train_period=(TRAIN_START_DATE,TRAIN_END_DATE),\n", + " val_test_period=(TEST_START_DATE,TEST_END_DATE),\n", + " rebalance_window=rebalance_window,\n", + " validation_window=validation_window,\n", + " **env_kwargs)\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "KsfEHa_Etj1d", + "scrolled": false + }, + "outputs": [], + "source": [ + "A2C_model_kwargs = {\n", + " 'n_steps': 5,\n", + " 'ent_coef': 0.005,\n", + " 'learning_rate': 0.0007\n", + " }\n", + "\n", + "PPO_model_kwargs = {\n", + " \"ent_coef\":0.01,\n", + " \"n_steps\": 2048,\n", + " \"learning_rate\": 0.00025,\n", + " \"batch_size\": 128\n", + " }\n", + "\n", + "DDPG_model_kwargs = {\n", + " #\"action_noise\":\"ornstein_uhlenbeck\",\n", + " \"buffer_size\": 10_000,\n", + " \"learning_rate\": 0.0005,\n", + " \"batch_size\": 64\n", + " }\n", + "\n", + "timesteps_dict = {'a2c' : 10_000,\n", + " 'ppo' : 10_000,\n", + " 'ddpg' : 10_000\n", + " }" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 193 + }, + "id": "_1lyCECstj1e", + "outputId": "3dca2625-62e8-4cad-f2bd-f4e4bcb96e84", + "scrolled": true + }, + "outputs": [ + { + "output_type": "error", + "ename": "TypeError", + "evalue": "DRLEnsembleAgent.run_ensemble_strategy() missing 2 required positional arguments: 'TD3_model_kwargs' and 'timesteps_dict'", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", + "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m df_summary = ensemble_agent.run_ensemble_strategy(A2C_model_kwargs,\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mPPO_model_kwargs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mDDPG_model_kwargs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m timesteps_dict)\n", + "\u001b[0;31mTypeError\u001b[0m: DRLEnsembleAgent.run_ensemble_strategy() missing 2 required positional arguments: 'TD3_model_kwargs' and 'timesteps_dict'" + ] + } + ], + "source": [ + "df_summary = ensemble_agent.run_ensemble_strategy(A2C_model_kwargs,\n", + " PPO_model_kwargs,\n", + " DDPG_model_kwargs,\n", + " timesteps_dict)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 175 + }, + "id": "-0qd8acMtj1f", + "outputId": "9f0cbf89-5f4b-4691-9e43-daa093ebceae" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
IterVal StartVal EndModel UsedA2C SharpePPO SharpeDDPG Sharpe
01262021-10-042022-01-03DDPG0.1318750.2236220.327114
11892022-01-032022-04-04A2C-0.14693-0.253404-0.238802
22522022-04-042022-07-06DDPG-0.302721-0.232303-0.168003
33152022-07-062022-10-04DDPG-0.150338-0.177032-0.138149
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + " Iter Val Start Val End Model Used A2C Sharpe PPO Sharpe DDPG Sharpe\n", + "0 126 2021-10-04 2022-01-03 DDPG 0.131875 0.223622 0.327114\n", + "1 189 2022-01-03 2022-04-04 A2C -0.14693 -0.253404 -0.238802\n", + "2 252 2022-04-04 2022-07-06 DDPG -0.302721 -0.232303 -0.168003\n", + "3 315 2022-07-06 2022-10-04 DDPG -0.150338 -0.177032 -0.138149" + ] + }, + "execution_count": 51, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_summary" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "W6vvNSC6h1jZ" + }, + "source": [ + "\n", + "# Part 7: Backtest Our Strategy\n", + "Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "X4JKB--8tj1g" + }, + "outputs": [], + "source": [ + "unique_trade_date = processed[(processed.date > TEST_START_DATE)&(processed.date <= TEST_END_DATE)].date.unique()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "q9mKF7GGtj1g", + "outputId": "99c5e5f8-2e3f-49c3-e5a6-4e66ed92e40a", + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Sharpe Ratio: -0.11910800246459344\n" + ] + } + ], + "source": [ + "df_trade_date = pd.DataFrame({'datadate':unique_trade_date})\n", + "\n", + "df_account_value=pd.DataFrame()\n", + "for i in range(rebalance_window+validation_window, len(unique_trade_date)+1,rebalance_window):\n", + " temp = pd.read_csv('results/account_value_trade_{}_{}.csv'.format('ensemble',i))\n", + " df_account_value = df_account_value.append(temp,ignore_index=True)\n", + "sharpe=(252**0.5)*df_account_value.account_value.pct_change(1).mean()/df_account_value.account_value.pct_change(1).std()\n", + "print('Sharpe Ratio: ',sharpe)\n", + "df_account_value=df_account_value.join(df_trade_date[validation_window:].reset_index(drop=True))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 206 + }, + "id": "oyosyW7_tj1g", + "outputId": "0e54f2d5-6057-4a14-c94a-5f2af26ad171" + }, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "
\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
account_valuedatedaily_returndatadate
01000000.0000002022-01-03NaN2022-01-03
1999006.1778902022-01-04-0.0009942022-01-04
2992190.3751432022-01-05-0.0068232022-01-05
3986549.9187762022-01-06-0.0056852022-01-06
4984951.5226952022-01-07-0.0016202022-01-07
\n", + "
\n", + " \n", + " \n", + " \n", + "\n", + " \n", + "
\n", + "
\n", + " " + ], + "text/plain": [ + " account_value date daily_return datadate\n", + "0 1000000.000000 2022-01-03 NaN 2022-01-03\n", + "1 999006.177890 2022-01-04 -0.000994 2022-01-04\n", + "2 992190.375143 2022-01-05 -0.006823 2022-01-05\n", + "3 986549.918776 2022-01-06 -0.005685 2022-01-06\n", + "4 984951.522695 2022-01-07 -0.001620 2022-01-07" + ] + }, + "execution_count": 54, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_account_value.head()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 293 + }, + "id": "wLsRdw2Ctj1h", + "outputId": "0e2b0bc2-840c-47fd-87d4-01201d8e4e3d" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 55, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "%matplotlib inline\n", + "df_account_value.account_value.plot()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Lr2zX7ZxNyFQ" + }, + "source": [ + "\n", + "## 7.1 BackTestStats\n", + "pass in df_account_value, this information is stored in env class\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "Nzkr9yv-AdV_", + "outputId": "ab0971b8-10b0-4fb1-a151-71a1de89cdf2", + "scrolled": true + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "==============Get Backtest Results===========\n", + "Annual return -0.039389\n", + "Cumulative returns -0.039389\n", + "Annual volatility 0.189054\n", + "Sharpe ratio -0.119108\n", + "Calmar ratio -0.226725\n", + "Stability 0.009971\n", + "Max drawdown -0.173731\n", + "Omega ratio 0.980324\n", + "Sortino ratio -0.165730\n", + "Skew NaN\n", + "Kurtosis NaN\n", + "Tail ratio 0.958268\n", + "Daily value at risk -0.023908\n", + "dtype: float64\n" + ] + } + ], + "source": [ + "print(\"==============Get Backtest Results===========\")\n", + "now = datetime.datetime.now().strftime('%Y%m%d-%Hh%M')\n", + "\n", + "perf_stats_all = backtest_stats(account_value=df_account_value)\n", + "perf_stats_all = pd.DataFrame(perf_stats_all)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "DiHhM1YkoCel", + "outputId": "c233f613-67a3-4882-8710-c1839247590e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "==============Get Baseline Stats===========\n", + "\r[*********************100%***********************] 1 of 1 completed\n", + "Shape of DataFrame: (251, 8)\n", + "Annual return -0.094324\n", + "Cumulative returns -0.093968\n", + "Annual volatility 0.198502\n", + "Sharpe ratio -0.402058\n", + "Calmar ratio -0.429901\n", + "Stability 0.236972\n", + "Max drawdown -0.219408\n", + "Omega ratio 0.936015\n", + "Sortino ratio -0.559755\n", + "Skew NaN\n", + "Kurtosis NaN\n", + "Tail ratio 1.014390\n", + "Daily value at risk -0.025326\n", + "dtype: float64\n" + ] + } + ], + "source": [ + "#baseline stats\n", + "print(\"==============Get Baseline Stats===========\")\n", + "df_dji_ = get_baseline(\n", + " ticker=\"^DJI\",\n", + " start = df_account_value.loc[0,'date'],\n", + " end = df_account_value.loc[len(df_account_value)-1,'date'])\n", + "\n", + "stats = backtest_stats(df_dji_, value_col_name = 'close')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "RhJ9whD75WTs", + "outputId": "8ae25787-8400-4357-ecc0-af7538689cee" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "df_dji: date dji\n", + "0 2022-01-03 1.000000e+06\n", + "1 2022-01-04 1.005866e+06\n", + "2 2022-01-05 9.951360e+05\n", + "3 2022-01-06 9.904718e+05\n", + "4 2022-01-07 9.903404e+05\n", + ".. ... ...\n", + "247 2022-12-27 9.086102e+05\n", + "248 2022-12-28 8.986103e+05\n", + "249 2022-12-29 9.080428e+05\n", + "250 2022-12-30 9.060324e+05\n", + "251 2023-01-03 NaN\n", + "\n", + "[252 rows x 2 columns]\n", + "df_dji: dji\n", + "date \n", + "2022-01-03 1.000000e+06\n", + "2022-01-04 1.005866e+06\n", + "2022-01-05 9.951360e+05\n", + "2022-01-06 9.904718e+05\n", + "2022-01-07 9.903404e+05\n", + "... ...\n", + "2022-12-27 9.086102e+05\n", + "2022-12-28 8.986103e+05\n", + "2022-12-29 9.080428e+05\n", + "2022-12-30 9.060324e+05\n", + "2023-01-03 NaN\n", + "\n", + "[252 rows x 1 columns]\n" + ] + } + ], + "source": [ + "df_dji = pd.DataFrame()\n", + "df_dji['date'] = df_account_value['date']\n", + "df_dji['dji'] = df_dji_['close'] / df_dji_['close'][0] * env_kwargs[\"initial_amount\"]\n", + "print(\"df_dji: \", df_dji)\n", + "df_dji.to_csv(\"df_dji.csv\")\n", + "df_dji = df_dji.set_index(df_dji.columns[0])\n", + "print(\"df_dji: \", df_dji)\n", + "df_dji.to_csv(\"df_dji+.csv\")\n", + "\n", + "df_account_value.to_csv('df_account_value.csv')\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9U6Suru3h1jc" + }, + "source": [ + "\n", + "## 7.2 BackTestPlot" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 1000 + }, + "id": "HggausPRoCem", + "outputId": "615e8d79-f3d7-47e9-c886-3cd18e4535f2" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "df_result_ensemble.columns: Index(['ensemble'], dtype='object')\n", + "df_trade_date: datadate\n", + "0 2021-10-04\n", + "1 2021-10-05\n", + "2 2021-10-06\n", + "3 2021-10-07\n", + "4 2021-10-08\n", + ".. ...\n", + "348 2023-02-22\n", + "349 2023-02-23\n", + "350 2023-02-24\n", + "351 2023-02-27\n", + "352 2023-02-28\n", + "\n", + "[353 rows x 1 columns]\n", + "df_result_ensemble: ensemble\n", + "date \n", + "2022-01-03 1000000.000000\n", + "2022-01-04 999006.177890\n", + "2022-01-05 992190.375143\n", + "2022-01-06 986549.918776\n", + "2022-01-07 984951.522695\n", + "... ...\n", + "2022-12-27 966931.603579\n", + "2022-12-28 956294.904131\n", + "2022-12-29 964607.097342\n", + "2022-12-30 960687.624327\n", + "2023-01-03 960610.826884\n", + "\n", + "[252 rows x 1 columns]\n", + "==============Compare to DJIA===========\n", + "result: ensemble dji\n", + "date \n", + "2022-01-03 1000000.000000 1.000000e+06\n", + "2022-01-04 999006.177890 1.005866e+06\n", + "2022-01-05 992190.375143 9.951360e+05\n", + "2022-01-06 986549.918776 9.904718e+05\n", + "2022-01-07 984951.522695 9.903404e+05\n", + "... ... ...\n", + "2022-12-27 966931.603579 9.086102e+05\n", + "2022-12-28 956294.904131 8.986103e+05\n", + "2022-12-29 964607.097342 9.080428e+05\n", + "2022-12-30 960687.624327 9.060324e+05\n", + "2023-01-03 960610.826884 NaN\n", + "\n", + "[252 rows x 2 columns]\n" + ] + }, + { + "data": { + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": { + "needs_background": "light" + }, + "output_type": "display_data" + } + ], + "source": [ + "\n", + "\n", + "# print(\"==============Compare to DJIA===========\")\n", + "# %matplotlib inline\n", + "# # S&P 500: ^GSPC\n", + "# # Dow Jones Index: ^DJI\n", + "# # NASDAQ 100: ^NDX\n", + "# backtest_plot(df_account_value,\n", + "# baseline_ticker = '^DJI',\n", + "# baseline_start = df_account_value.loc[0,'date'],\n", + "# baseline_end = df_account_value.loc[len(df_account_value)-1,'date'])\n", + "df.to_csv(\"df.csv\")\n", + "df_result_ensemble = pd.DataFrame({'date': df_account_value['date'], 'ensemble': df_account_value['account_value']})\n", + "df_result_ensemble = df_result_ensemble.set_index('date')\n", + "\n", + "print(\"df_result_ensemble.columns: \", df_result_ensemble.columns)\n", + "\n", + "# df_result_ensemble.drop(df_result_ensemble.columns[0], axis = 1)\n", + "print(\"df_trade_date: \", df_trade_date)\n", + "# df_result_ensemble['date'] = df_trade_date['datadate']\n", + "# df_result_ensemble['account_value'] = df_account_value['account_value']\n", + "df_result_ensemble.to_csv(\"df_result_ensemble.csv\")\n", + "print(\"df_result_ensemble: \", df_result_ensemble)\n", + "print(\"==============Compare to DJIA===========\")\n", + "result = pd.DataFrame()\n", + "# result = pd.merge(result, df_result_ensemble, left_index=True, right_index=True)\n", + "# result = pd.merge(result, df_dji, left_index=True, right_index=True)\n", + "result = pd.merge(df_result_ensemble, df_dji, left_index=True, right_index=True)\n", + "print(\"result: \", result)\n", + "result.to_csv(\"result.csv\")\n", + "result.columns = ['ensemble', 'dji']\n", + "\n", + "%matplotlib inline\n", + "plt.rcParams[\"figure.figsize\"] = (15,5)\n", + "plt.figure();\n", + "result.plot();" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "oBQx4bVQFi-a" + }, + "source": [] + } + ], + "metadata": { + "colab": { + "provenance": [], + "include_colab_link": true + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.10" + }, + "pycharm": { + "stem_cell": { + "cell_type": "raw", + "metadata": { + "collapsed": false + }, + "source": [] + } + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file