Quick start

  1. Clone the repo and proceed to the root:

    git clone https://github.com/########/EBES.git
    cd EBES
    
  2. Download the data. For example, for X5 Retail Hero dataset, first go to the data page and download teh archive. Assuming the archive is downloaded to EBES/data/x5-retail/retailhero-uplift.zip then:

    cd data/x5-retail
    unzip retailhero-uplift.zip  # will create data subfolder with CSV-files
    mkdir preprocessed  # data in EBES format will be stored here
    mkdir preprocessed/cat_codes  # categorical features are encoded into numbers, here will be the mapping
    
  3. Build Docker image and run it:

    docker build -t ebes .
    docker run -it --gpus all --ipc host -v /path/to/EBES:/workspace ebes bash
    
  4. Save the dataset in the EBES format. Starting from now everything should be run inside a docker container.

    cd /workspace
    python preprocess/x5-retail.py \
        --data-path data/x5-retail/data \
        --save-path data/x5-retail/preprocessed \
        --cat-codes-path data/x5-retail/preprocessed/cat_codes
    
  5. Run experiment. The majority of experiment configuration is done using YAML config files. They are located in configs folder in the repository. To run an experiment you shout choose the dataset on which to run the experiment, the method to benchmark (e.g. vanilla GRU, CoLES, mTAND, etc.), the experiment type (e.g. perform single test run, or launch HPO, etc). You can also deside to patch default config with the best found hyper-parameters for the particular method on the particular dataset. These patches are located in configs/specify/{dataset}/{method}/best.yaml. For example, to train the best (according to out HPO results) GRU, run from the repository root:

    python main.py \
        -d x5  `# dataset config` \
        -m gru        `# method config` \
        -e test       `# experiment config, test for simple single train and test` \
        -s best       `# pick the best config found for gru and x5-retail specifically` \
        -g 'cuda:1'   `# run on gpu 1` \
        --tqdm        `# enable train loop progress`
    

    During train the folder log/{dataset}/{method}/{experiment} is created with logs, checkpoints and evaluation results.

    Available experiments (-e option) are:

    • test — perform a single run.

    • correlation — perform multiple runs in parallel with different random seeds.

    • optuna — perform HPO. You can parallelize optuna search by launching parallel scripts. In this case they will share the same storage (see optuna docs).

  6. Analyze results. The notebook with logs analysis is located at notebooks/collect_results.ipynb.